Skip to content

Regression Shrinkage and Selection Via the Lasso

Why this mattered

Tibshirani’s lasso changed regression from a choice between prediction and interpretability into a single optimization problem that could often deliver both. Before it, ridge regression gave stable estimates but kept all variables, while subset selection produced sparse, readable models but was computationally unstable and combinatorial. The lasso’s (L_1) constraint made sparsity emerge directly from convex estimation: coefficients could be shrunk exactly to zero, turning variable selection into part of the fitting procedure rather than a separate model-search step.

This mattered because it made high-dimensional statistical modeling feel tractable in a new way. Once sparsity could be imposed through a convex penalty, statisticians and machine-learning researchers had a practical template for estimating models when many candidate predictors were available but only some were expected to matter. That idea became central to genomics, signal processing, text modeling, econometrics, neuroscience, and other fields where the number of measured variables could rival or exceed the number of observations. The paper also clarified a deeper connection between statistical estimation, regularization, and adaptive function estimation, aligning regression practice with the emerging wavelet-thresholding work of Donoho and Johnstone.

The lasso became a foundation for later breakthroughs because its core idea generalized so cleanly. Elastic net, group lasso, fused lasso, graphical lasso, sparse generalized linear models, compressed sensing, and modern regularized empirical-risk minimization all inherit part of its logic: encode structural assumptions as penalties, then solve an optimization problem that produces both prediction and structure. In retrospect, the paper helped shift statistical modeling toward sparsity, convex optimization, and scalable regularization as default tools for learning from complex data.

Abstract

SUMMARY We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

Sources