Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

cuFastTucker: A Novel Sparse FastTucker Decomposition For HHLST on Multi-GPUs

Published: 08 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    High-order, high-dimension, and large-scale sparse tensors (HHLST) have found their origin in various real industrial applications, such as social networks, recommender systems, bioinformatics, and traffic information. To handle these complex tensors, sparse tensor decomposition techniques are employed to project the HHLST into a low-rank space. In this article, we propose a novel sparse tensor decomposition model called Sparse FastTucker Decomposition (SFTD), which is a variant of Sparse Tucker Decomposition (STD). The SFTD utilizes Kruskal approximation for the core tensor, and we present a theorem that reduces the exponential space and computational overhead to a polynomial one. Additionally, we reduce the space overhead of intermediate parameters in the algorithmic process by sampling the intermediate matrix. Furthermore, this method guarantees convergence. To enhance the speed of SFTD, we leverage the compactness of matrix multiplication and parallel access through a stochastic strategy, resulting in GPU-accelerated cuFastTucker. Moreover, we propose a data division and communication strategy for cuFastTucker to accommodate data on Multi-GPU setups. Our proposed cuFastTucker demonstrates faster calculation and convergence speeds, as well as significantly lower space and computational overhead compared to state-of-the-art (SOTA) algorithms such as P-Tucker, Vest, GTA, Bigtensor, and SGD_Tucker.

    References

    [1]
    Constantinos Skordis. 2009. The tensor-vector-scalar theory and its cosmology. Classical and Quantum Gravity 26, 14 (2009), 143001.
    [2]
    Tamara G. Kolda and Brett W. Bader. 2009. Tensor decompositions and applications. SIAM Review 51, 3 (2009), 455–500.
    [3]
    Andrzej Cichocki, Namgil Lee, Ivan Oseledets, Anh-Huy Phan, Qibin Zhao, and Danilo P. Mandic. 2016. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Foundations and Trends® in Machine Learning 9, 4-5 (2016), 249–429.
    [4]
    A. Cichocki, A. H. Phan, I. Oseledets, Q. Zhao, M. Sugiyama, N. Lee, and D. Mandic. 2017. Tensor networks for dimensionality reduction and large-scale optimizations: Part 2 applications and future perspectives. Foundations and Trends in Machine Learning 9, 6 (2017), 431–673.
    [5]
    Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, and Jingkuan Song. 2019. Matching images and text with multi-modal tensor fusion and re-ranking. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, 12–20.
    [6]
    Ming Hou, Jiajia Tang, Jianhai Zhang, Wanzeng Kong, and Qibin Zhao. 2019. Deep multimodal multilinear fusion with high-order polynomial pooling. In Proceedings of the Advances in Neural Information Processing Systems. 12113–12122.
    [7]
    Jiani Liu, Ce Zhu, and Yipeng Liu. 2020. Smooth compact tensor ring regression. IEEE Transactions on Knowledge and Data Engineering 34, 9 (2020), 4439–4452.
    [8]
    Jean Kossaifi, Zachary C. Lipton, Arinbjörn Kolbeinsson, Aran Khanna, Tommaso Furlanello, and Anima Anandkumar. 2020. Tensor regression networks. Journal of Machine Learning Research 21, 123 (2020), 1–21.
    [9]
    Zhifeng Hao, Lifang He, Bingqian Chen, and Xiaowei Yang. 2013. A linear support higher-order tensor machine for classification. IEEE Transactions on Image Processing 22, 7 (2013), 2911–2920.
    [10]
    Giuseppe G. Calvi, Vladimir Lucic, and Danilo P. Mandic. 2019. Support tensor machine for financial forecasting. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8152–8156.
    [11]
    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265–283.
    [12]
    Vassilis N. Ioannidis, Ahmed S. Zamzam, Georgios B. Giannakis, and Nicholas D. Sidiropoulos. 2019. Coupled graph and tensor factorization for recommender systems and community detection. IEEE Transactions on Knowledge and Data Engineering 33, 3 (2019), 909–920.
    [13]
    Xin Luo, Hao Wu, Huaqiang Yuan, and MengChu Zhou. 2019. Temporal pattern-aware QoS prediction via biased non-negative latent factorization of tensors. IEEE Transactions on Cybernetics 50, 5 (2019), 1798–1809.
    [14]
    Kun Xie, Xiaocan Li, Xin Wang, Gaogang Xie, Jigang Wen, and Dafang Zhang. 2019. Active sparse mobile crowd sensing based on matrix completion. In Proceedings of the International Conference on Management of Data. 195–210.
    [15]
    Xiaokang Wang, Laurence T. Yang, Xia Xie, Jirong Jin, and M. Jamal Deen. 2017. A cloud-edge computing framework for cyber-physical-social services. IEEE Communications Magazine 55, 11 (2017), 80–85.
    [16]
    Puming Wang, Laurence T. Yang, Gongwei Qian, Jintao Li, and Zheng Yan. 2019. HO-OTSVD: A novel tensor decomposition and its incremental decomposition for cyber-physical-social networks (CPSN). IEEE Transactions on Network Science and Engineering 7, 2 (2019), 713–725.
    [17]
    Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Transactions on Knowledge and Data Engineering 27, 11 (2015), 3111–3124.
    [18]
    Xinwang Liu, Xinzhong Zhu, Miaomiao Li, Lei Wang, En Zhu, Tongliang Liu, Marius Kloft, Dinggang Shen, Jianping Yin, and Wen Gao. 2019. Multiple kernel k-means with incomplete kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 5 (2019), 1191–1204.
    [19]
    Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. 2009. Dimensionality reduction: A comparative. Journal of Machine Learning Research 10, 66-71 (2009), 13.
    [20]
    Ivana Balazevic, Carl Allen, and Timothy Hospedales. 2019. TuckER: Tensor factorization for knowledge graph completion. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5188–5197.
    [21]
    Bernard N. Sheehan and Yousef Saad. 2007. Higher order orthogonal iteration of tensors (HOOI) and its relation to PCA and GLRAM. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 355–365.
    [22]
    Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash Murali, Shivmaran S. Pandian, Yogish Sabharwal, and Dheeraj Sreedhar. 2018. On optimizing distributed Tucker decomposition for sparse tensors. In Proceedings of the International Conference on Supercomputing. 374–384.
    [23]
    Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the IEEE International Conference on Data Engineering. IEEE, 1120–1131.
    [24]
    Rose Yu and Yan Liu. 2016. Learning from multiway data: Simple and efficient tensor regression. In Proceedings of the International Conference on Machine Learning. PMLR, 373–381.
    [25]
    Sujoy Kumar Biswas and Peyman Milanfar. 2017. Linear support tensor machine with LSK channels: Pedestrian detection in thermal infrared images. IEEE Transactions on Image Processing 26, 9 (2017), 4229–4242.
    [26]
    Miao Yin, Siyu Liao, Xiao-Yang Liu, Xiaodong Wang, and Bo Yuan. 2021. Towards extremely compact RNNs for video recognition with fully decomposed hierarchical tucker structure. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12085–12094.
    [27]
    Yannis Panagakis, Jean Kossaifi, Grigorios G. Chrysos, James Oldfield, Mihalis A. Nicolaou, Anima Anandkumar, and Stefanos Zafeiriou. 2021. Tensor methods in computer vision and deep learning. Proceedings of the IEEE 109, 5 (2021), 863–890.
    [28]
    Shaden Smith and George Karypis. 2017. Accelerating the tucker decomposition with compressed sparse tensors. In Proceedings of the European Conference on Parallel Processing. Springer, 653–668.
    [29]
    Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, and Richard Vuduc. 2019. Optimizing sparse tensor times matrix on GPUs. Journal of Parallel and Distributed Computing 129 (2019), 99–109. DOI:
    [30]
    Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, and Yogish Sabharwal. 2019. On optimizing distributed non-negative Tucker decomposition. In Proceedings of the ACM International Conference on Supercomputing. 238–249.
    [31]
    Sejoon Oh, Namyong Park, Sael Lee, and Uksong Kang. 2018. Scalable tucker factorization for sparse tensors-algorithms and discoveries. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE, 1120–1131.
    [32]
    Moonjeong Park, Jun-Gi Jang, and Sael Lee. 2021. Vest: Very sparse tucker factorization of large-scale tensors. IEEE International Conference on Big Data and Smart Computing (BigComp). DOI:
    [33]
    Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.
    [34]
    Rie Johnson and Tong Zhang. 2013. Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of the Advances in Neural Information Processing Systems. 315–323.
    [35]
    Hao Yu, Rong Jin, and Sen Yang. 2019. On the linear speedup analysis of communication efficient momentum sgd for distributed non-convex optimization. In Proceedings of the International Conference on Machine Learning. 7184–7193.
    [36]
    Mark Schmidt, Nicolas Le Roux, and Francis Bach. 2017. Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162, 1-2 (2017), 83–112.
    [37]
    Lam M. Nguyen, Jie Liu, Katya Scheinberg, and Martin Takáč. 2017. SARAH: A novel method for machine learning problems using stochastic recursive gradient. In Proceedings of the International Conference on Machine Learning. JMLR. org, 2613–2621.
    [38]
    Yurii Nesterov. 2013. Introductory Lectures on Convex Optimization: A Basic Course. Springer Science and Business Media.
    [39]
    Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, Curran Associates, Inc. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2011/file/218a0aefd1d1a4be65601cc6ddc1520e-Paper.pdf
    [40]
    H. Li, Z. Li, K. Li, J. S. Rellermeyer, L. Y. Chen, and K. Li. 2021. SGD_Tucker: A novel stochastic optimization strategy for parallel sparse tucker decomposition. IEEE Transactions on Parallel and Distributed Systems 32, 7 (2021), 1828–1841. DOI:
    [41]
    Moonjeong Park, Jun Gi Jang, and Lee Sael. 2021. VEST: Very sparse tucker factorization of large-scale tensors. In Proceedings of the 2021 IEEE International Conference on Big Data and Smart Computing (BigComp).
    [42]
    Sejoon Oh, Namyong Park, Sael Lee, and U. Kang. 2018. Scalable tucker factorization for sparse tensors - algorithms and discoveries. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1120–1131. DOI:
    [43]
    Sejoon Oh, Namyong Park, Jun-Gi Jang, Lee Sael, and U. Kang. 2019. High-performance tucker factorization on heterogeneous platforms. IEEE Transactions on Parallel and Distributed Systems 30, 10 (2019), 2237–2248.
    [44]
    Namyong Park, Byungsoo Jeon, Jungwoo Lee, and U. Kang. 2016. Bigtensor: Mining billion-scale tensor made easy. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2457–2460.
    [45]
    Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 188–197.
    [46]
    Mengting Wan and Julian J. McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, October 2-7, 2018. Sole Pera, Michael D. Ekstrand, Xavier Amatriain, and John O’Donovan (Eds.), ACM, 86–94. DOI:
    [47]
    Mengting Wan, Rishabh Misra, Ndapa Nakashole, and Julian J. McAuley. 2019. Fine-grained spoiler detection from large-scale review corpora. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.), Association for Computational Linguistics, 2605–2610. DOI:
    [48]
    Hyokun Yun, Hsiang-Fu Yu, Cho-Jui Hsieh, S. V. N. Vishwanathan, and Inderjit Dhillon. 2014. NOMAD: Non-locking, stOchastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proceedings of the VLDB Endowment 7, 11 (2014), 975–986.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 11, Issue 2
    June 2024
    164 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/3613599
    • Editor:
    • David A. Bader
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2024
    Online AM: 30 April 2024
    Accepted: 18 April 2024
    Revised: 23 February 2024
    Received: 26 May 2023
    Published in TOPC Volume 11, Issue 2

    Check for updates

    Author Tags

    1. GPU CUDA parallelization
    2. Kruskal approximation
    3. sparse tucker decomposition
    4. stochastic strategy

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • Key Program of National Natural Science Foundation of China
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 68
      Total Downloads
    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)14

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media