Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626183.3659980acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition

Published: 17 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Candecomp / PARAFAC (CP) decomposition, a generalization of the matrix singular value decomposition to higher-dimensional tensors, is a popular tool for analyzing multidimensional sparse data. On tensors with billions of nonzero entries, computing a CP decomposition is a computationally intensive task. We propose the first distributed-memory implementations of two randomized CP decomposition algorithms, CP-ARLS-LEV and STS-CP, that offer nearly an order-of-magnitude speedup at high decomposition ranks over well-tuned non-randomized decomposition packages. Both algorithms rely on leverage score sampling and enjoy strong theoretical guarantees, each with varying time and accuracy tradeoffs. We tailor the communication schedule for our random sampling algorithms, eliminating expensive reduction collectives and forcing communication costs to scale with the random sample count. Finally, we optimize the local storage format for our methods, switching between analogues of compressed sparse column and compressed sparse row formats. Experiments show that our methods are fast and scalable, producing 11x speedup over SPLATT by decomposing the billion-scale Reddit tensor on 512 CPU cores in under two minutes.

    References

    [1]
    S. Smith, J. W. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis, "FROSTT: The Formidable Repository of Open Sparse Tensors and Tools," 2017. [Online]. Available: http://frostt.io/
    [2]
    H.-H. Mao, C.-J. Wu, E. E. Papalexakis, C. Faloutsos, K.-C. Lee, and T.-C. Kao, "MalSpot: Multi2 Malicious Network Behavior Patterns Analysis," in Advances in Knowledge Discovery and Data Mining, ser. Lecture Notes in Computer Science, V. S. Tseng, T. B. Ho, Z.-H. Zhou, A. L. P. Chen, and H.-Y. Kao, Eds. Cham: Springer International Publishing, 2014, pp. 1--14.
    [3]
    I. Balazevic, C. Allen, and T. Hospedales, "TuckER: Tensor Factorization for Knowledge Graph Completion," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 5185--5194.
    [4]
    H. Kim and H. Park, "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis," Bioinformatics, vol. 23, no. 12, pp. 1495--1502, 05 2007.
    [5]
    D. Hong, T. G. Kolda, and J. A. Duersch, "Generalized canonical polyadic tensor decomposition," SIAM Review, vol. 62, no. 1, pp. 133--163, 2020.
    [6]
    B. W. Larsen and T. G. Kolda, "Practical leverage-based sampling for low-rank tensor decomposition," SIAM J. Matrix Analysis and Applications, June 2022, accepted for publication.
    [7]
    S. Smith, K. Huang, N. D. Sidiropoulos, and G. Karypis, Streaming Tensor Factorization for Infinite Data Sources. SIAM, 2018, pp. 81--89.
    [8]
    T. G. Kolda and B. W. Bader, "Tensor Decompositions and Applications," SIAM Review, vol. 51, no. 3, pp. 455--500, Aug. 2009, publisher: Society for Industrial and Applied Mathematics.
    [9]
    J. H. Choi and S. Vishwanathan, "DFacTo: Distributed Factorization of Tensors," in Advances in Neural Information Processing Systems, Z. Ghahramani, M.Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27. Curran Associates, Inc., 2014.
    [10]
    S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, "SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication," in 2015 IEEE International Parallel and Distributed Processing Symposium, May 2015, pp. 61--70, iSSN: 1530--2075.
    [11]
    O. Kaya and B. Uçar, "Scalable sparse tensor decompositions in distributed memory systems," in SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015, pp. 1--11.
    [12]
    N. Park, B. Jeon, J. Lee, and U. Kang, "Bigtensor: Mining billion-scale tensor made easy," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ser. CIKM '16. New York, NY, USA: Association for Computing Machinery, 2016, p. 2457--2460.
    [13]
    D. Cheng, R. Peng, Y. Liu, and I. Perros, "SPALS: Fast Alternating Least Squares via Implicit Leverage Scores Sampling," in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016.
    [14]
    O. A. Malik, "More Efficient Sampling for Tensor Decomposition With Worst-Case Guarantees," in Proceedings of the 39th International Conference on Machine Learning. PMLR, Jun. 2022, pp. 14 887--14 917, iSSN: 2640--3498.
    [15]
    V. Bharadwaj, O. A. Malik, R. Murray, L. Grigori, A. Buluc, and J. Demmel, "Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition," in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://arxiv.org/pdf/2301.12584.pdf
    [16]
    S. Smith and G. Karypis, "A Medium-Grained Algorithm for Sparse Tensor Factorization," in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2016, pp. 902--911, iSSN: 1530--2075.
    [17]
    M. W. Mahoney, "Randomized algorithms for matrices and data," Foundations and Trends® in Machine Learning, vol. 3, no. 2, pp. 123--224, 2011.
    [18]
    P. Drineas and M.W.Mahoney, "RandNLA: Randomized numerical linear algebra," Commun. ACM, vol. 59, no. 6, p. 80--90, may 2016.
    [19]
    P.-G. Martinsson and J. A. Tropp, "Randomized numerical linear algebra: Foundations and algorithms," Acta Numerica, vol. 29, p. 403--572, 2020.
    [20]
    I. Nisa, J. Li, A. Sukumaran-Rajam, P. S. Rawat, S. Krishnamoorthy, and P. Sadayappan, "An efficient mixed-mode representation of sparse tensors," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC '19. New York, NY, USA: Association for Computing Machinery, 2019.
    [21]
    T. D. Ahle, M. Kapralov, J. B. T. Knudsen, R. Pagh, A. Velingker, D. P. Woodruff, and A. Zandieh, "Oblivious sketching of high-degree polynomial kernels," in Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, ser. SODA '20. USA: Society for Industrial and Applied Mathematics, 2020, p. 141--160.
    [22]
    C. Battaglino, G. Ballard, and T. G. Kolda, "A practical randomized cp tensor decomposition," SIAM Journal on Matrix Analysis and Applications, vol. 39, no. 2, pp. 876--901, 2018.
    [23]
    A. Gittens, K. Aggour, and B. Yener, "Adaptive sketching for fast and convergent canonical polyadic decomposition," in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 13--18 Jul 2020, pp. 3566--3575.
    [24]
    S. Smith and G. Karypis, "Tensor-matrix products with a compressed sparse tensor," in Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, ser. IA3 '15. New York, NY, USA: Association for Computing Machinery, 2015.
    [25]
    J. Li, Y. Ma, and R. Vuduc, "ParTI! : A parallel tensor infrastructure for multicore cpus and gpus," Oct 2018, last updated: Jan 2020. [Online]. Available: http://parti-project.org
    [26]
    A. Nguyen, A. E. Helal, F. Checconi, J. Laukemann, J. J. Tithi, Y. Soh, T. Ranadive, F. Petrini, and J. W. Choi, "Efficient, out-of-memory sparse mttkrp on massively parallel architectures," in Proceedings of the 36th ACM International Conference on Supercomputing, ser. ICS '22. New York, NY, USA: Association for Computing Machinery, 2022.
    [27]
    E. T. Phipps and T. G. Kolda, "Software for sparse tensor decomposition on emerging computing architectures," SIAM Journal on Scientific Computing, vol. 41, no. 3, pp. C269--C290, 2019.
    [28]
    S. Wijeratne, R. Kannan, and V. Prasanna, "Dynasor: A dynamic memory layout for accelerating sparse mttkrp for tensor decomposition on multi-core cpu," 2023.
    [29]
    R. Kanakagiri and E. Solomonik, "Minimum cost loop nests for contraction of a sparse tensor with a tensor network," 2023.
    [30]
    G. Ballard, K. Hayashi, and K. Ramakrishnan, "Parallel nonnegative CP decomposition of dense tensors," in 2018 IEEE 25th International Conference on High Performance Computing (HiPC). IEEE, 2018, pp. 22--31.
    [31]
    U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos, "GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries," in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD '12. New York, NY, USA: Association for Computing Machinery, Aug. 2012, pp. 316--324.
    [32]
    L. Ma and E. Solomonik, "Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree," in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2021, pp. 412--421, iSSN: 1530--2075.
    [33]
    E. Solomonik, D. Matthews, J. R. Hammond, J. F. Stanton, and J. Demmel, "A massively parallel tensor contraction framework for coupled-cluster computations," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3176--3190, 2014, publisher: Academic Press.
    [34]
    R. Yadav, A. Aiken, and F. Kjolstad, "Spdistal: Compiling distributed sparse tensor computations," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC '22. IEEE Press, 2022.
    [35]
    R. Jin, T. G. Kolda, and R. Ward, "Faster Johnson--Lindenstrauss transforms via Kronecker products," Information and Inference: A Journal of the IMA, vol. 10, no. 4, pp. 1533--1562, 10 2020.
    [36]
    H. Diao, Z. Song, W. Sun, and D. Woodruff, "Sketching for kronecker product regression and p-splines," in International Conference on Artificial Intelligence and Statistics. PMLR, 2018, pp. 1299--1308.
    [37]
    T. G. Kolda and D. Hong, "Stochastic Gradients for Large-Scale Tensor Decomposition," SIAM Journal on Mathematics of Data Science, vol. 2, no. 4, pp. 1066--1095, Jan. 2020.
    [38]
    E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn, "Collective communication: theory, practice, and experience: Research Articles," Concurrency and Computation: Practice & Experience, vol. 19, no. 13, pp. 1749--1783, Sep. 2007.
    [39]
    T. B. Rolinger, T. A. Simon, and C. D. Krieger, "Performance considerations for scalable parallel tensor decomposition," Journal of Parallel and Distributed Computing, vol. 129, pp. 83--98, 2019.
    [40]
    H. M. Aktulga, A. Buluç, S. Williams, and C. Yang, "Optimizing sparse matrixmultiple vectors multiplication for nuclear configuration interaction calculations," in 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014, pp. 1213--1222.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '24: Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures
    June 2024
    510 pages
    ISBN:9798400704161
    DOI:10.1145/3626183
    This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 June 2024

    Check for updates

    Author Tags

    1. CP decomposition
    2. leverage score sampling
    3. randomized linear algebra
    4. sparse tensors

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SPAA '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 39
      Total Downloads
    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media