research-article

cuFastTucker: A Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs

Authors:

Wangdong Yang, and

Kenli LiAuthors Info & Claims

ACM Transactions on Parallel Computing, Volume 11, Issue 2

Article No.: 12, Pages 1 - 31

https://doi.org/10.1145/3661450

Published: 08 June 2024 Publication History

Abstract

High-order, high-dimension, and large-scale sparse tensors (HHLST) have found their origin in various real industrial applications, such as social networks, recommender systems, bioinformatics, and traffic information. To handle these complex tensors, sparse tensor decomposition techniques are employed to project the HHLST into a low-rank space. In this article, we propose a novel sparse tensor decomposition model called Sparse FastTucker Decomposition (SFTD), which is a variant of Sparse Tucker Decomposition (STD). The SFTD utilizes Kruskal approximation for the core tensor, and we present a theorem that reduces the exponential space and computational overhead to a polynomial one. Additionally, we reduce the space overhead of intermediate parameters in the algorithmic process by sampling the intermediate matrix. Furthermore, this method guarantees convergence. To enhance the speed of SFTD, we leverage the compactness of matrix multiplication and parallel access through a stochastic strategy, resulting in GPU-accelerated cuFastTucker. Moreover, we propose a data division and communication strategy for cuFastTucker to accommodate data on Multi-GPU setups. Our proposed cuFastTucker demonstrates faster calculation and convergence speeds, as well as significantly lower space and computational overhead compared to state-of-the-art (SOTA) algorithms such as P-Tucker, Vest, GTA, Bigtensor, and SGD_Tucker.

References

[1]

Constantinos Skordis. 2009. The tensor-vector-scalar theory and its cosmology. Classical and Quantum Gravity 26, 14 (2009), 143001.

[2]

Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455–500.

Digital Library

[3]

Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao, and Danilo P. Mandic. 2016. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends® in Machine Learning 9, 4-5 (2016), 249–429.

Digital Library

[4]

A. Cichocki, A. H. Phan, I. Oseledets, Q. Zhao, M. Sugiyama, N. Lee, and D. Mandic. 2017. Tensor networks for dimensionality reduction and large-scale optimizations: Part 2 applications and future perspectives. Foundations and Trends in Machine Learning 9, 6 (2017), 431–673.

[5]

Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, and Jingkuan Song. 2019. Matching images and text with multi-modal tensor fusion and re-ranking. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, 12–20.

Digital Library

[6]

Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, and Qibin Zhao. 2019. Deep multimodal multilinear fusion with high-order polynomial pooling. In Proceedings of the Advances in Neural Information Processing Systems. 12113–12122.

[7]

Jiani Liu, Ce Zhu, and Yipeng Liu. 2020. Smooth compact tensor ring regression. IEEE Transactions on Knowledge and Data Engineering 34, 9 (2020), 4439–4452.

[8]

Jean Kossaifi, Zachary C. Lipton, Arinbjörn Kolbeinsson, Aran Khanna, Tommaso Furlanello, and Anima Anandkumar. 2020. Tensor regression networks. Journal of Machine Learning Research 21, 123 (2020), 1–21.

[9]

Zhifeng Hao, Lifang He, Bingqian Chen, and Xiaowei Yang. 2013. A linear support higher-order tensor machine for classification. IEEE Transactions on Image Processing 22, 7 (2013), 2911–2920.

[10]

Giuseppe G. Calvi, Vladimir Lucic, and Danilo P. Mandic. 2019. Support tensor machine for financial forecasting. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8152–8156.

[11]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265–283.

[12]

Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. 2019. Coupled graph and tensor factorization for recommender systems and community detection. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 909–920.

[13]

Xin Luo, Hao Wu, Huaqiang Yuan, and MengChu Zhou. 2019. Temporal pattern-aware QoS prediction via biased non-negative latent factorization of tensors. IEEE Transactions on Cybernetics 50, 5 (2019), 1798–1809.

[14]

Kun Xie, Xiaocan Li, Xin Wang, Gaogang Xie, Jigang Wen, and Dafang Zhang. 2019. Active sparse mobile crowd sensing based on matrix completion. In Proceedings of the International Conference on Management of Data. 195–210.

Digital Library

[15]

Xiaokang Wang, Laurence T. Yang, Xia Xie, Jirong Jin, and M. Jamal Deen. 2017. A cloud-edge computing framework for cyber-physical-social services. IEEE Communications Magazine 55, 11 (2017), 80–85.

[16]

Puming Wang, Laurence T. Yang, Gongwei Qian, Jintao Li, and Zheng Yan. 2019. HO-OTSVD: A novel tensor decomposition and its incremental decomposition for cyber-physical-social networks (CPSN). IEEE Transactions on Network Science and Engineering 7, 2 (2019), 713–725.

[17]

Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 3111–3124.

Digital Library

[18]

Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, En Zhu, Tongliang Liu, Marius Kloft, Dinggang Shen, Jianping Yin, and Wen Gao. 2019. Multiple kernel k-means with incomplete kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1191–1204.

[19]

Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. 2009. Dimensionality reduction: A comparative. Journal of Machine Learning Research 10, 66-71 (2009), 13.

[20]

Ivana Balazevic, Carl Allen, and Timothy Hospedales. 2019. TuckER: Tensor factorization for knowledge graph completion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5188–5197.

[21]

Bernard N. Sheehan and Yousef Saad. 2007. Higher order orthogonal iteration of tensors (HOOI) and its relation to PCA and GLRAM. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 355–365.

[22]

Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash Murali, Shivmaran S. Pandian, Yogish Sabharwal, and Dheeraj Sreedhar. 2018. On optimizing distributed Tucker decomposition for sparse tensors. In Proceedings of the International Conference on Supercomputing. 374–384.

[23]

Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the IEEE International Conference on Data Engineering. IEEE, 1120–1131.

[24]

Rose Yu and Yan Liu. 2016. Learning from multiway data: Simple and efficient tensor regression. In Proceedings of the International Conference on Machine Learning. PMLR, 373–381.

[25]

Sujoy Kumar Biswas and Peyman Milanfar. 2017. Linear support tensor machine with LSK channels: Pedestrian detection in thermal infrared images. IEEE Transactions on Image Processing 26, 9 (2017), 4229–4242.

Digital Library

[26]

Miao Yin, Siyu Liao, Xiao-Yang Liu, Xiaodong Wang, and Bo Yuan. 2021. Towards extremely compact RNNs for video recognition with fully decomposed hierarchical tucker structure. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12085–12094.

[27]

Yannis Panagakis, Jean Kossaifi, Grigorios G. Chrysos, James Oldfield, Mihalis A. Nicolaou, Anima Anandkumar, and Stefanos Zafeiriou. 2021. Tensor methods in computer vision and deep learning. Proceedings of the IEEE 109, 5 (2021), 863–890.

[28]

Shaden Smith and George Karypis. 2017. Accelerating the tucker decomposition with compressed sparse tensors. In Proceedings of the European Conference on Parallel Processing. Springer, 653–668.

[29]

Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, and Richard Vuduc. 2019. Optimizing sparse tensor times matrix on GPUs. Journal of Parallel and Distributed Computing 129 (2019), 99–109. DOI:

Digital Library

[30]

Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, and Yogish Sabharwal. 2019. On optimizing distributed non-negative Tucker decomposition. In Proceedings of the ACM International Conference on Supercomputing. 238–249.

Digital Library

[31]

Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE, 1120–1131.

[32]

Moonjeong Park, Jun-Gi Jang, and Sael Lee. 2021. Vest: Very sparse tucker factorization of large-scale tensors. IEEE International Conference on Big Data and Smart Computing (BigComp). DOI:

[33]

Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.

[34]

Rie Johnson and Tong Zhang. 2013. Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of the Advances in Neural Information Processing Systems. 315–323.

[35]

Hao Yu, Rong Jin, and Sen Yang. 2019. On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization. In Proceedings of the International Conference on Machine Learning. 7184–7193.

[36]

Mark Schmidt, Nicolas Le Roux, and Francis Bach. 2017. Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 1-2 (2017), 83–112.

Digital Library

[37]

Lam M. Nguyen, Jie Liu, Katya Scheinberg, and Martin Takáč. 2017. SARAH: A novel method for machine learning problems using stochastic recursive gradient. In Proceedings of the International Conference on Machine Learning. JMLR. org, 2613–2621.

[38]

Yurii Nesterov. 2013. Introductory Lectures on Convex Optimization: A Basic Course. Springer Science and Business Media.

Digital Library

[39]

Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2011/file/218a0aefd1d1a4be65601cc6ddc1520e-Paper.pdf

[40]

H. Li, Z. Li, K. Li, J. S. Rellermeyer, L. Y. Chen, and K. Li. 2021. SGD_Tucker: A novel stochastic optimization strategy for parallel sparse tucker decomposition. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2021), 1828–1841. DOI:

[41]

Moonjeong Park, Jun Gi Jang, and Lee Sael. 2021. VEST: Very sparse tucker factorization of large-scale tensors. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp).

[42]

Sejoon Oh, Namyong Park, Sael Lee, and U. Kang. 2018. Scalable tucker factorization for sparse tensors - algorithms and discoveries. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1120–1131. DOI:

[43]

Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.

[44]

Namyong Park, Byungsoo Jeon, Jungwoo Lee, and U. Kang. 2016. Bigtensor: Mining billion-scale tensor made easy. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2457–2460.

Digital Library

[45]

Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 188–197.

[46]

Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.), ACM, 86–94. DOI:

Digital Library

[47]

Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.), Association for Computational Linguistics, 2605–2610. DOI:

[48]

Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: Non-locking, stOchastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proceedings of the VLDB Endowment 7, 11 (2014), 975–986.

Digital Library

Index Terms

cuFastTucker: A Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform
The amount of scientific data is currently growing at an unprecedented pace, with tensors being a common form of data that display high-order, high-dimensional, and sparse features. While tensor-based analysis methods are effective, the vast increase in ...
Read More
Multi-dimensional low rank plus sparse decomposition for reconstruction of under-sampled dynamic MRI
Abstract
In this paper, we introduce a multi-dimensional approach to the problem of reconstruction of MR image sequences that are highly undersampled in k-space. By formulating the reconstruction as a high-order low rank tensor plus sparse ...
Graphical abstract

Display Omitted
Highlights
- A novel multi-dimensional analysis model is learnt to recover higher quality MRI sequences.
Read More
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems
Special Issue on Innovations in Systems for Irregular Applications, Part 2

Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific and engineering applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations and ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 11, Issue 2

June 2024

164 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3613599

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2024

Online AM: 30 April 2024

Accepted: 18 April 2024

Revised: 23 February 2024

Received: 26 May 2023

Published in TOPC Volume 11, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Key Program of National Natural Science Foundation of China
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
68
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)14

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents