Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

A statistical perspective on algorithmic leveraging

Published: 01 January 2015 Publication History
  • Get Citation Alerts
  • Abstract

    One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales rows/columns of data matrices to reduce the data size before performing computations on the subproblem. This method has been successful in improving computational efficiency of algorithms for matrix problems such as least-squares approximation, least absolute deviations approximation, and low-rank matrix approximation. Existing work has focused on algorithmic issues such as worst-case running times and numerical issues associated with providing high-quality implementations, but none of it addresses statistical aspects of this method.
    In this paper, we provide a simple yet effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model with a fixed number of predictors. In particular, for several versions of leverage-based sampling, we derive results for the bias and variance, both conditional and unconditional on the observed data. We show that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other. This result is particularly striking, given the well-known result that, from the algorithmic perspective of worst-case analysis, leverage-based sampling provides uniformly superior worst-case algorithmic results, when compared with uniform sampling.
    Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller least-squares problem with "shrinkage" leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) least-squares problem (LEVUNW). A detailed empirical evaluation of existing leverage-based methods as well as these two new methods is carried out on both synthetic and real data sets. The empirical results indicate that our theory is a good predictor of practical performance of existing and new leverage-based algorithms and that the new algorithms achieve improved performance. For example, with the same computation reduction as in the original algorithmic leveraging approach, our proposed SLEV typically leads to improved biases and variances both unconditionally and conditionally (on the observed data), and our proposed LEVUNW typically yields improved unconditional biases and variances.

    References

    [1]
    N. Ailon and B. Chazelle. Faster dimension reduction. Communications of the ACM, 53 (2):97-104, 2010.
    [2]
    T. W. Anderson and J. B. Taylor. Strong consistency of least squares estimates in normal linear regression. Annals of Statistics, 4(4):788-790, 1976.
    [3]
    H. Avron, P. Maymounkov, and S. Toledo. Blendenpik: Supercharging LAPACK's least-squares solver. SIAM Journal on Scientific Computing, 32:1217-1236, 2010.
    [4]
    P. J. Bickel, F. Gotze, and W. R. van Zwet. Resampling fewer than n observations: gains, losses, and remedies for losses. Statistica Sinica, 7:1-31, 1997.
    [5]
    S. Chatterjee and A. S. Hadi. Influential observations, high leverage points, and outliers in linear regression. Statistical Science, 1(3):379-393, 1986.
    [6]
    K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, pages 81-90, 2013.
    [7]
    K. L. Clarkson, P. Drineas, M. Magdon-Ismail, M. W. Mahoney, X. Meng, and D. P. Woodruff. The Fast Cauchy Transform and faster robust linear regression. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 466-477, 2013.
    [8]
    N. Cloonan, A.R. Forrest, G. Kolle, B.B. Gardiner, G.J. Faulkner, M.K. Brown, D.F. Taylor, A.L. Steptoe, S. Wani, G. Bethel, A.J. Robertson, A.C. Perkins, S.J. Bruce, C.C. Lee, S.S. Ranade, H.E. Peckham, J.M. Manning, K.J. McKernan, and S.M. Grimmond. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods, 5(7): 613-619, 2008.
    [9]
    N. Cressie. Statistics for Spatial Data. Wiley, New York, 1991.
    [10]
    D. Dalpiaz, X. He, and P. Ma. Bias correction in RNA-Seq short-read counts using penalized regression. Statistics in Biosciences, 5(1):88-99, 2013.
    [11]
    P. Dhillon, Y. Lu, D. P. Foster, and L. Ungar. New subsampling algorithms for fast least squares regression. In Advances in Neural Information Processing Systems, pages 360- 368, 2013.
    [12]
    P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Sampling algorithms for l2 regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1127-1136, 2006.
    [13]
    P. Drineas, M. W. Mahoney, S. Muthukrishnan, and T. Sarlós. Faster least squares approximation. Numerische Mathematik, 117(2):219-249, 2010.
    [14]
    P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13: 3475-3506, 2012.
    [15]
    B. Efron. Bootstrap methods: another look at the jackknife. Annals of Statistics, 7(1): 1-26, 1979.
    [16]
    B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician, 37(1):36-48, 1983.
    [17]
    W. Fu and K. Knight. Asymptotics for lasso-type estimators. Annals of Statistics, 28: 1356-1378, 2000.
    [18]
    A. Gittens and M. W. Mahoney. Revisiting the Nyström method for improved large-scale machine learning. Technical report, 2013. Preprint: arXiv:1303.1849.
    [19]
    G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 1996.
    [20]
    D. A. Harville. Matrix Algebra from a Statistician's Perspective. Springer-Verlag, New York, 1997.
    [21]
    D. V. Hinkley. Jackknifing in unbalanced situations. Technometrics, 19(3):285-292, 1977.
    [22]
    D. C. Hoaglin and R. E. Welsch. The hat matrix in regression and ANOVA. American Statistician, 32(1):17-22, 1978.
    [23]
    D. Hsu, S. M. Kakade, and T. Zhang. Random design analysis of ridge regression. Foundations of Computational Mathematics, 14(3):569-600, 2014.
    [24]
    L. Jaeckel. The infinitesimal jackknife. Bell Laboratories Memorandum, MM:72-1215-11, 1972.
    [25]
    A. Kleiner, A. Talwalkar, P. Sarkar, and M. Jordan. The big data bootstrap. In Proceedings of the 29th International Conference on Machine Learning, 2012.
    [26]
    T. L. Lai, H. Robbins, and C. Z. Wei. Strong consistency of least squares estimates in multiple regression. Proceedings of National Academy of Sciences, 75(7):3034-3036, 1978.
    [27]
    J. Li, H. Jiang, and W. H. Wong. Modeling non-uniformity in short-read rates in RNA-seq data. Genome Biology, 11:R50, 2010.
    [28]
    J. S. Liu, R. Chen, and W. H. Wong. Rejection control and sequential importance sampling. Journal of the American Statistical Association, 93(443):1022-1031, 1998.
    [29]
    M. W. Mahoney. Randomized Algorithms for Matrices and Data. Foundations and Trends in Machine Learning. NOW Publishers, Boston, 2011. Also available at: arXiv:1104.5557.
    [30]
    M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of National Academy of Sciences, 106:697-702, 2009.
    [31]
    X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, pages 91-100, 2013.
    [32]
    X. Meng, M. A. Saunders, and M. W. Mahoney. LSRN: A parallel iterative solver for strongly over- or under-determined systems. SIAM Journal on Scientific Computing, 36 (2):C95-C118, 2014.
    [33]
    R. G. Miller. An unbalanced jackknife. Annals of Statistics, 2(5):880-891, 1974a.
    [34]
    R. G. Miller. The jackknife-a review. Biometrika, 61(1):1-15, 1974b.
    [35]
    T. Nielsen, R. B. West, S. C. Linn, O. Alter, M. A. Knowling, J. O'Connell, S. Zhu, M. Fero, G. Sherlock, J. R. Pollack, P. O. Brown, D. Botstein, and M. van de Rijn. Molecular characterisation of soft tissue tumours: a gene expression study. Lancet, 359(9314):1301- 1307, 2002.
    [36]
    P. Paschou, E. Ziv, E. G. Burchard, S. Choudhry, W. Rodriguez-Cintron, M. W. Mahoney, and P. Drineas. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genetics, 3:1672-1686, 2007.
    [37]
    P. Paschou, J. Lewis, A. Javed, and P. Drineas. Ancestry informative markers for fine-scale individual assignment to worldwide populations. Journal of Medical Genetics, page 2010.
    [38]
    D. N. Politis, J. P. Romano, and M. Wolf. Subsampling. Springer-Verlag, New York, 1999.
    [39]
    D. B. Rubin. The Bayesian bootstrap. Annals of Statistics, 9(1):130-134, 1981.
    [40]
    R. Serfling. Asymptotic relative efficiency in estimation. In Miodrag Lovric, editor, International Encyclopedia of Statistical Sciences, pages 68-72. Springer, 2010.
    [41]
    J. Shao. On Resampling Methods for Variance Estimation and Related Topics. PhD thesis, University of Wisconsin at Madison, 1987.
    [42]
    J. Shao and D. Tu. The Jackknife and Bootstrap. Springer-Verlag, New York, 1995.
    [43]
    P. F. Velleman and R. E. Welsch. Efficient computing of regression diagnostics. American Statistician, 35(4):234-242, 1981.
    [44]
    S. Weisberg. Applied Linear Regression. Wiley, New York, 2005.
    [45]
    C. F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. Annals of Statistics, 14(4):1261-1295, 1986.
    [46]
    J. Yang, X. Meng, and M. W. Mahoney. Quantile regression for large-scale applications. Technical report, 2013. Preprint: arXiv:1305.0087.

    Cited By

    View all
    • (2024)Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival dataStatistics and Computing10.1007/s11222-024-10391-y34:2Online publication date: 14-Feb-2024
    • (2024)Deterministic subsampling for logistic regression with massive dataComputational Statistics10.1007/s00180-022-01319-z39:2(709-732)Online publication date: 1-Apr-2024
    • (2023)A fast and accurate estimator for large scale linear model via data averagingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667639(34917-34927)Online publication date: 10-Dec-2023
    • Show More Cited By

    Index Terms

    1. A statistical perspective on algorithmic leveraging
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 16, Issue 1
      January 2015
      3855 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 January 2015
      Revised: 01 October 2014
      Published in JMLR Volume 16, Issue 1

      Author Tags

      1. least squares
      2. leverage scores
      3. linear regression
      4. randomized algorithm
      5. subsampling

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)31
      • Downloads (Last 6 weeks)6

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival dataStatistics and Computing10.1007/s11222-024-10391-y34:2Online publication date: 14-Feb-2024
      • (2024)Deterministic subsampling for logistic regression with massive dataComputational Statistics10.1007/s00180-022-01319-z39:2(709-732)Online publication date: 1-Apr-2024
      • (2023)A fast and accurate estimator for large scale linear model via data averagingProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667639(34917-34927)Online publication date: 10-Dec-2023
      • (2023)CS4MLProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666999(19990-20037)Online publication date: 10-Dec-2023
      • (2023)On randomized sketching algorithms and the Tracy–Widom lawStatistics and Computing10.1007/s11222-022-10148-533:1Online publication date: 19-Jan-2023
      • (2023)Subsampling in Longitudinal ModelsMethodology and Computing in Applied Probability10.1007/s11009-023-10015-425:1Online publication date: 25-Feb-2023
      • (2022)Research on the Method of Predicting Consumer Financial Loan Default Based on the Big Data ModelWireless Communications & Mobile Computing10.1155/2022/37867072022Online publication date: 1-Jan-2022
      • (2022)Statistical inference in massive datasets by empirical likelihoodComputational Statistics10.1007/s00180-021-01153-937:3(1143-1164)Online publication date: 1-Jul-2022
      • (2021)Balance-Subsampled Stable Prediction Across Unknown Test DataACM Transactions on Knowledge Discovery from Data10.1145/347705216:3(1-21)Online publication date: 22-Oct-2021
      • (2021)A conservative approach for online credit scoringExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.114835176:COnline publication date: 15-Aug-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media