Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Published: 04 April 2017 Publication History

Abstract

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics.
In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

References

[1]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 123--132. ACM, 2012.
[2]
A. Asuncion and D. Newman. Uci machine learning repository, 2007.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3: 993--1022, 2003.
[4]
J. L. Boyd-Graber, D. M. Blei, and X. Zhu. A topic model for word sense disambiguation. In EMNLP-CoNLL, 2007.
[5]
L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In ICCV, 2007.
[6]
J. Chang and D. Blei. Relational topic models for document networks. In AISTATS, 2009.
[7]
J. Chen, K. Li, J. Zhu, and W. Chen. Warplda: a cache efficient o (1) algorithm for latent dirichlet allocation. In VLDB, 2016.
[8]
N. Chen, J. Zhu, F. Xia, and B. Zhang. Discriminative relational topic models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37 (5): 973--986, 2015.
[9]
W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang. Collaborative filtering for orkut communities: discovery of user latent behavior. In WWW, 2009.
[10]
Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521: 452--459, 2015.
[11]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1): 5228--5235, 2004.
[12]
T. Iwata, T. Yamada, and N. Ueda. Probabilistic latent semantic visualization: topic model for visualizing documents. In KDD, 2008.
[13]
Z. G. Kingsley. Selective studies and the principle of relative frequency in language, 1932.
[14]
A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In KDD, 2014.
[15]
J. D. O. Mark Harris, Shubhabrata Sengupta. Parallel prefix sum (scan) with cuda. http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html, 2007.
[16]
]segreduceNVIDIA. Segmented reduction. https://nvlabs.github.io/moderngpu/segreduce.html, 2013\natexlaba.
[17]
]segsortNVIDIA. Segmented sort and locality sort. https://nvlabs.github.io/moderngpu/segsort.html, 2013\natexlabb.
[18]
NVIDIA. Cuda c programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions, 2015.
[19]
J.-B. Tristan, J. Tassarotti, and G. Steele. Efficient training of lda on a gpu by mean-for-mode estimation. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 59--68, 2015.
[20]
A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3 (3): 253--256, 1977.
[21]
H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1105--1112. ACM, 2009.
[22]
Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, J. Zeng, Q. Yang, et al. Towards topic modeling for big data. ACM Transactions on Intelligent Systems and Technology, 2014.
[23]
F. Yan, N. Xu, and Y. Qi. Parallel inference for latent dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems, pages 2134--2142, 2009.
[24]
L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009.
[25]
H.-F. Yu, C.-J. Hsieh, H. Yun, S. Vishwanathan, and I. S. Dhillon. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web, pages 1340--1350. International World Wide Web Conferences Steering Committee, 2015.
[26]
J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T.-Y. Liu, and W.-Y. Ma. Lightlda: Big topic models on modest compute clusters. In WWW, 2015.
[27]
M. Zaheer. Dmlc experimental-lda. https://github.com/dmlc/experimental-lda, 2016.
[28]
M. Zaheer, M. Wick, J.-B. Tristan, A. Smola, and G. L. Steele Jr. Exponential stochastic cellular automata for massively parallel inference. In AISTATS, 2015.
[29]
H. Zhao, B. Jiang, J. F. Canny, and B. Jaros. Same but different: Fast and high quality gibbs parameter estimation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1495--1502. ACM, 2015.
[30]
J. Zhu, A. Ahmed, and E. Xing. Medlda: maximum margin supervised topic models. Journal of Machine Learning Research, 13: 2237--2278, 2012.
[31]
J. Zhu, J. Chen, and W. Hu. Big learning with bayesian methods. pharXiv:1411.6370, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 4
ASPLOS '17
April 2017
811 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093336
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
    April 2017
    856 pages
    ISBN:9781450344654
    DOI:10.1145/3037697
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017
Published in SIGPLAN Volume 52, Issue 4

Check for updates

Author Tags

  1. gpu
  2. lda
  3. palellel computing
  4. topic model

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Tree-based data filtering for online user-generated reviewsIISE Transactions10.1080/24725854.2023.222886156:8(824-840)Online publication date: Aug-2023
  • (2022)Acceptable set topic modelingEuropean Journal of Operational Research10.1016/j.ejor.2021.11.024299:2(653-673)Online publication date: Jun-2022
  • (2022)Training personalized recommendation systems from (GPU) scratchProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527386(860-873)Online publication date: 18-Jun-2022
  • (2022)CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303810934:9(4119-4132)Online publication date: 1-Sep-2022
  • (2019)CuLDAProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3325407(195-205)Online publication date: 17-Jun-2019
  • (2019)CuLDA_CGSProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3301496(435-436)Online publication date: 16-Feb-2019
  • (2018)Probabilistic machine learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304839(5754-5759)Online publication date: 13-Jul-2018
  • (2018)Scalable training of hierarchical topic modelsProceedings of the VLDB Endowment10.14778/3192965.319297211:7(826-839)Online publication date: 1-Mar-2018
  • (2018)Pólya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel samplerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.2832641(1-1)Online publication date: 2018
  • (2018)Sampled Dense Matrix Multiplication for High-Performance Machine Learning2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00013(32-41)Online publication date: Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media