Regression Shrinkage and Selection Via the Lasso¶
Why this mattered¶
Tibshirani’s lasso changed regression from a choice between prediction and interpretability into a single optimization problem that could often deliver both. Before it, ridge regression gave stable estimates but kept all variables, while subset selection produced sparse, readable models but was computationally unstable and combinatorial. The lasso’s (L_1) constraint made sparsity emerge directly from convex estimation: coefficients could be shrunk exactly to zero, turning variable selection into part of the fitting procedure rather than a separate model-search step.
This mattered because it made high-dimensional statistical modeling feel tractable in a new way. Once sparsity could be imposed through a convex penalty, statisticians and machine-learning researchers had a practical template for estimating models when many candidate predictors were available but only some were expected to matter. That idea became central to genomics, signal processing, text modeling, econometrics, neuroscience, and other fields where the number of measured variables could rival or exceed the number of observations. The paper also clarified a deeper connection between statistical estimation, regularization, and adaptive function estimation, aligning regression practice with the emerging wavelet-thresholding work of Donoho and Johnstone.
The lasso became a foundation for later breakthroughs because its core idea generalized so cleanly. Elastic net, group lasso, fused lasso, graphical lasso, sparse generalized linear models, compressed sensing, and modern regularized empirical-risk minimization all inherit part of its logic: encode structural assumptions as penalties, then solve an optimization problem that produces both prediction and structure. In retrospect, the paper helped shift statistical modeling toward sparsity, convex optimization, and scalable regularization as default tools for learning from complex data.
Abstract¶
SUMMARY We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.
Related¶
- cite → Classification and Regression Trees. — The lasso contrasts its continuous shrinkage-and-selection procedure with CART's tree-based variable selection and prediction framework.
- cite → Bootstrap Methods: Another Look at the Jackknife — The lasso uses bootstrap resampling ideas from Efron to assess prediction error and stability of fitted regression models.
- cite → Classification and Regression Trees. — The lasso cites CART as an alternative regression and classification method that performs variable selection through recursive partitioning.
- enables → Spatial reconstruction of single-cell gene expression data — Lasso links them through sparse regression, the core method used to select landmark genes for spatial expression reconstruction.
- enables → Regularization and Variable Selection Via the Elastic Net — The elastic net extends the lasso's shrinkage-and-selection penalty by adding ridge regularization for correlated predictors.
- cite ← Spatial reconstruction of single-cell gene expression data — Spatial reconstruction of single-cell expression uses Lasso regression for sparse selection of marker genes predictive of spatial position.
- cite ← Regularization and Variable Selection Via the Elastic Net — Elastic net generalizes lasso by combining lasso's L1 variable selection with an L2 penalty to handle correlated predictors.
- enables ← Classification and Regression Trees. — CART popularized prediction via model selection and regularization tradeoffs, which lasso addressed for linear regression through L1 shrinkage.
- enables ← Bootstrap Methods: Another Look at the Jackknife — Bootstrap resampling enabled empirical assessment of estimator stability, a key concern for evaluating lasso's variable-selection behavior.
- enables ← Classification and Regression Trees. — CART's tree-based variable selection highlighted sparse predictive modeling, a goal lasso pursued with convex L1-penalized regression.