research-article

On Optimizing Distributed Tucker Decomposition for Sparse Tensors

Authors:

Venkatesan T. Chakaravarthy,

Douglas J. Joseph,

Prakash Murali,

Shivmaran S. Pandian,

Yogish Sabharwal,

Dheeraj SreedharAuthors Info & Claims

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Pages 374 - 384

https://doi.org/10.1145/3205289.3205315

Published: 12 June 2018 Publication History

Abstract

The Tucker decomposition generalizes the notion of Singular Value Decomposition (SVD) to tensors, the higher dimensional analogues of matrices. We study the problem of constructing the Tucker decomposition of sparse tensors on distributed memory systems via the HOOI procedure, a popular iterative method. The scheme used for distributing the input tensor among the processors (MPI ranks) critically influences the HOOI execution time. Prior work has proposed different distribution schemes: an offline scheme based on sophisticated hypergraph partitioning method and simple, lightweight alternatives that can be used real-time. While the hypergraph based scheme typically results in faster HOOI execution time, being complex, the time taken for determining the distribution is an order of magnitude higher than the execution time of a single HOOI iteration. Our main contribution is a lightweight distribution scheme, which achieves the best of both worlds. We show that the scheme is near-optimal on certain fundamental metrics associated with the HOOI procedure and as a result, near-optimal on the computational load (FLOPs). Though the scheme may incur higher communication volume, the computation time is the dominant factor and as the result, the scheme achieves better performance on the overall HOOI execution time. Our experimental evaluation on large real-life tensors (having up to 4 billion elements) shows that the scheme outperforms the prior schemes on the HOOI execution time by a factor of up to 3x. On the other hand, its distribution time is comparable to the prior lightweight schemes and is typically lesser than the execution time of a single HOOI iteration.

References

[1]

W. Austin, G. Ballard, and T. Kolda. 2016. Parallel tensor compression for large-scale scientific data. In IPDPS.

[2]

B. Bader and T. Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM J. on Sci. Comp. 30, 1 (2007), 205--231.

Digital Library

[3]

M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In HPEC.

[4]

E. Boman, K. Devine, L. Fisk, R. Heaphy, B. Hendrickson, V. Leung, C. Vaughan, U. Catalyurek, D. Bozdag, and W. Mitchell. 1999. Zoltan home page. (1999). http://www.cs.sandia.gov/Zoltan.

[5]

J. Carroll and J. Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition. Psychometrika 35, 3 (1970), 283--319.

[6]

V. Chakaravarthy, J. Choi, D. Joseph, X. Liu, P. Murali, Y. Sabharwal, and D. Sreedhar. 2017. On optimizing distributed Tucker decomposition for dense tensors. In IPDPS.

[7]

J. Choi and S. Vishwanathan. 2014. DFacTo: Distributed factorization of tensors. In Advances in Neural Information Processing Systems.

Digital Library

[8]

R. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. UCLA Working Papers in Phonetics (1970).

[9]

V. Hernandez, J. Roman, A. Tomas, and V. Vidal. 2007. Restarted Lanczos bidiagonalization for the SVD in SLEPc. STR-8, Tech. Rep. (2007).

[10]

W. Hightower, J. Prins, and J. Reif. 1992. Implementations of randomized sorting on large parallel machines. In SPAA.

Digital Library

[11]

I. Jeon, E. Papalexakis, U. Kang, and C. Faloutsos. 2015. HaTen2: Billion-scale tensor decompositions. In ICDE.

[12]

U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos. 2012. GigaTensor: Scaling Tensor Analysis Up by 100 Times - Algorithms and Discoveries. In KDD.

Digital Library

[13]

L. Karlsson, D. Kressner, and A. Uschmajew. 2016. Parallel Algorithms for Tensor Completion in the CP Format. Parallel Comput. 57 (2016), 222--234.

Digital Library

[14]

O. Kaya and B. Uçar. 2015. Scalable sparse tensor decompositions in distributed memory systems. In SC.

Digital Library

[15]

O. Kaya and B. Uçar. 2016. High performance parallel algorithms for the Tucker decomposition of sparse tensors. In ICPP.

[16]

O. Kaya and B. Uçar. 2015. Scalable Sparse Tensor Decompositions in Distributed Memory Systems. In SC.

Digital Library

[17]

T. Kolda and B. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51 (2009), 455--500.

Digital Library

[18]

T. Kolda and J. Sun. 2008. Scalable tensor decompositions for multi-aspect data mining. In ICDM.

Digital Library

[19]

L. De Lathauwer, B. De Moor, and J. Vandewalle. 2000. A multilinear singular value decomposition. SIAM J. on Matrix Analysis and Applications 21, 4 (2000), 1253--1278.

Digital Library

[20]

L. De Lathauwer, B. De Moor, and J. Vandewalle. 2000. On the best rank-1 and rank-(R1, R2, ..., RN) approximation of higherorder tensors. SIAM J. Matrix Analysis and Applications 21 (2000), 1324--1342.

Digital Library

[21]

N. Liu, B. Zhang, J. Yan, Z. Chen, W. Liu, F. Bai, and L. Chien. 2005. Text representation: From vector to tensor. In ICDM.

Digital Library

[22]

D. Muti and S. Bourennane. 2005. Multidimensional filtering based on a tensor approach. Signal Processing 85 (2005), 2338--2353.

Digital Library

[23]

K. Shin and U. Kang. 2014. Distributed methods for high-dimensional and large-scale tensor factorization. In ICDM.

Digital Library

[24]

S. Smith, J. Choi, J. Li, R. Vuduc, J. Park, X. Liu, and G. Karypis. 2017. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/. (2017).

[25]

S. Smith and G. Karypis. 2016. A medium-grained algorithm for sparse tensor factorization. In IPDPS.

[26]

S. Smith and G. Karypis. 2017. Accelerating the Tucker Decomposition with Compressed Sparse Tensors. In EuroPar.

[27]

S. Smith, J. Park, and G. Karypis. 2017. Sparse tensor factorization on many-Core processors with high-bandwidth memory. In IPDPS.

[28]

L. Tucker. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31 (1966), 279--311.

[29]

V. Vazirani. 2001. Approximation Algorithms. Springer-Verlag.

Digital Library

[30]

G. Zhou, A. Cichocki, and S. Xie. 2015. Decomposition of big tensors With low multilinear rank. CoRR, arXiv:1412.1885 (2015).

Cited By

Shen ZJin JTan CTagami AWang SLi QZheng QYuan J(2023)A Survey of Next-generation Computing Technologies in Space-air-ground Integrated NetworksACM Computing Surveys10.1145/360601856:1(1-40)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3606018
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Xu HZhu TZhang LZhou WYu P(2023)Machine Unlearning: A SurveyACM Computing Surveys10.1145/360362056:1(1-36)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3603620
Show More Cited By

Index Terms

On Optimizing Distributed Tucker Decomposition for Sparse Tensors
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms

Recommendations

On optimizing distributed non-negative Tucker decomposition
ICS '19: Proceedings of the ACM International Conference on Supercomputing

The Tucker decomposition generalizes singular value decomposition (SVD) to high dimensional tensors. It factorizes a given N-dimensional tensor as the product of a small core tensor and a set of N factor matrices. Non-negative Tucker Decomposition (NTD) ...
Efficient MATLAB Computations with Sparse and Factored Tensors

In this paper, the term tensor refers simply to a multidimensional or $N$-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast ...
Hybrid CUR-type decomposition of tensors in the Tucker format
Abstract
The paper introduces a hybrid approach to the CUR-type decomposition of tensors in the Tucker format. The idea of the hybrid algorithm is to write a tensor $X$ as a product of a core tensor $S$ , a matrix C obtained by extracting mode-k fibers of $X$ , ... $_{}$

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

June 2018

407 pages

ISBN:9781450357838

DOI:10.1145/3205289

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICS '18

Sponsor:

SIGARCH

ICS '18: 2018 International Conference on Supercomputing

June 12 - 15, 2018

Beijing, China

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shen ZJin JTan CTagami AWang SLi QZheng QYuan J(2023)A Survey of Next-generation Computing Technologies in Space-air-ground Integrated NetworksACM Computing Surveys10.1145/360601856:1(1-40)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3606018
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Xu HZhu TZhang LZhou WYu P(2023)Machine Unlearning: A SurveyACM Computing Surveys10.1145/360362056:1(1-36)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3603620
Vatter JMayer RJacobsen H(2023)The Evolution of Distributed Systems for Graph Neural Networks and Their Origin in Graph Processing and Deep Learning: A SurveyACM Computing Surveys10.1145/359742856:1(1-37)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3597428
Houy SSchmid PBartel A(2023)Security Aspects of Cryptocurrency Wallets—A Systematic Literature ReviewACM Computing Surveys10.1145/359690656:1(1-31)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3596906
Amanullah MLoke SBaruwal Chhetri MDoss R(2023)A Taxonomy and Analysis of Misbehaviour Detection in Cooperative Intelligent Transport Systems: A Systematic ReviewACM Computing Surveys10.1145/359659856:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3596598
Miao ZCalhoun JGe RLi J(2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3580315
Bhattarai Mkharat NBoureima ISkau ENebgen BDjidjev HRajopadhye SSmith JAlexandrov B(2023)Distributed non-negative RESCAL with automatic model selection for exascale dataJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.04.010179(104709)Online publication date: Sep-2023
https://doi.org/10.1016/j.jpdc.2023.04.010
Li ZWang XGao FTang JXu H(2023)Analysis of mobility patterns for urban taxi ridership: the role of the built environmentTransportation10.1007/s11116-023-10372-651:4(1409-1431)Online publication date: 22-Feb-2023
https://doi.org/10.1007/s11116-023-10372-6
Xiao GYin CChen YDuan MLi K(2022)GSpTC: High-Performance Sparse Tensor Contraction on CPU-GPU Heterogeneous Systems2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080(380-387)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00080
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten