research-article

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs

Authors:

Jun ZhuAuthors Info & Claims

ACM SIGPLAN Notices, Volume 52, Issue 4

Pages 497 - 509

https://doi.org/10.1145/3093336.3037740

Published: 04 April 2017 Publication History

Abstract

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based systems have emerged as a promising alternative because of the high computational power and memory bandwidth of GPUs. However, existing GPU-based LDA systems cannot support a large number of topics because they use algorithms on dense data structures whose time and space complexity is linear to the number of topics.

In this paper, we propose SaberLDA, a GPU-based LDA system that implements a sparsity-aware algorithm to achieve sublinear time complexity and scales well to learn a large number of topics. To address the challenges introduced by sparsity, we propose a novel data layout, a new warp-based sampling kernel, and an efficient sparse count matrix updating algorithm that improves locality, makes efficient utilization of GPU warps, and reduces memory consumption. Experiments show that SaberLDA can learn from billions-token-scale data with up to 10,000 topics, which is almost two orders of magnitude larger than that of the previous GPU-based systems. With a single GPU card, SaberLDA is able to learn 10,000 topics from a dataset of billions of tokens in a few hours, which is only achievable with clusters with tens of machines before.

References

[1]

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 123--132. ACM, 2012.

Digital Library

[2]

A. Asuncion and D. Newman. Uci machine learning repository, 2007.

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3: 993--1022, 2003.

Digital Library

[4]

J. L. Boyd-Graber, D. M. Blei, and X. Zhu. A topic model for word sense disambiguation. In EMNLP-CoNLL, 2007.

[5]

L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In ICCV, 2007.

[6]

J. Chang and D. Blei. Relational topic models for document networks. In AISTATS, 2009.

[7]

J. Chen, K. Li, J. Zhu, and W. Chen. Warplda: a cache efficient o (1) algorithm for latent dirichlet allocation. In VLDB, 2016.

Digital Library

[8]

N. Chen, J. Zhu, F. Xia, and B. Zhang. Discriminative relational topic models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 37 (5): 973--986, 2015.

[9]

W.-Y. Chen, J.-C. Chu, J. Luan, H. Bai, Y. Wang, and E. Y. Chang. Collaborative filtering for orkut communities: discovery of user latent behavior. In WWW, 2009.

Digital Library

[10]

Z. Ghahramani. Probabilistic machine learning and artificial intelligence. Nature, 521: 452--459, 2015.

[11]

T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1): 5228--5235, 2004.

[12]

T. Iwata, T. Yamada, and N. Ueda. Probabilistic latent semantic visualization: topic model for visualizing documents. In KDD, 2008.

Digital Library

[13]

Z. G. Kingsley. Selective studies and the principle of relative frequency in language, 1932.

[14]

A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In KDD, 2014.

Digital Library

[15]

J. D. O. Mark Harris, Shubhabrata Sengupta. Parallel prefix sum (scan) with cuda. http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html, 2007.

[16]

]segreduceNVIDIA. Segmented reduction. https://nvlabs.github.io/moderngpu/segreduce.html, 2013\natexlaba.

[17]

]segsortNVIDIA. Segmented sort and locality sort. https://nvlabs.github.io/moderngpu/segsort.html, 2013\natexlabb.

[18]

NVIDIA. Cuda c programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-vote-functions, 2015.

[19]

J.-B. Tristan, J. Tassarotti, and G. Steele. Efficient training of lda on a gpu by mean-for-mode estimation. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 59--68, 2015.

[20]

A. J. Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software (TOMS), 3 (3): 253--256, 1977.

Digital Library

[21]

H. M. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno. Evaluation methods for topic models. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1105--1112. ACM, 2009.

Digital Library

[22]

Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, J. Zeng, Q. Yang, et al. Towards topic modeling for big data. ACM Transactions on Intelligent Systems and Technology, 2014.

[23]

F. Yan, N. Xu, and Y. Qi. Parallel inference for latent dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems, pages 2134--2142, 2009.

[24]

L. Yao, D. Mimno, and A. McCallum. Efficient methods for topic model inference on streaming document collections. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937--946. ACM, 2009.

Digital Library

[25]

H.-F. Yu, C.-J. Hsieh, H. Yun, S. Vishwanathan, and I. S. Dhillon. A scalable asynchronous distributed algorithm for topic modeling. In Proceedings of the 24th International Conference on World Wide Web, pages 1340--1350. International World Wide Web Conferences Steering Committee, 2015.

Digital Library

[26]

J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T.-Y. Liu, and W.-Y. Ma. Lightlda: Big topic models on modest compute clusters. In WWW, 2015.

Digital Library

[27]

M. Zaheer. Dmlc experimental-lda. https://github.com/dmlc/experimental-lda, 2016.

[28]

M. Zaheer, M. Wick, J.-B. Tristan, A. Smola, and G. L. Steele Jr. Exponential stochastic cellular automata for massively parallel inference. In AISTATS, 2015.

[29]

H. Zhao, B. Jiang, J. F. Canny, and B. Jaros. Same but different: Fast and high quality gibbs parameter estimation. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1495--1502. ACM, 2015.

Digital Library

[30]

J. Zhu, A. Ahmed, and E. Xing. Medlda: maximum margin supervised topic models. Journal of Machine Learning Research, 13: 2237--2278, 2012.

Digital Library

[31]

J. Zhu, J. Chen, and W. Hu. Big learning with bayesian methods. pharXiv:1411.6370, 2014.

Cited By

Liang Q(2023)Tree-based data filtering for online user-generated reviewsIISE Transactions10.1080/24725854.2023.222886156:8(824-840)Online publication date: Aug-2023
https://doi.org/10.1080/24725854.2023.2228861
Berk Wheelock LPachamanova D(2022)Acceptable set topic modelingEuropean Journal of Operational Research10.1016/j.ejor.2021.11.024299:2(653-673)Online publication date: Jun-2022
https://doi.org/10.1016/j.ejor.2021.11.024
Kwon YRhu MSalapura VZahran MChong FTang L(2022)Training personalized recommendation systems from (GPU) scratchProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527386(860-873)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527386
Show More Cited By

Index Terms

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms
      1. Vector / streaming algorithms

Recommendations

SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
Asplos'17

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based ...
SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images. Applications require LDA to handle both large datasets and a large number of topics. Though distributed CPU systems have been used, GPU-based ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 52, Issue 4

ASPLOS '17

April 2017

811 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/3093336

Editor:
Matthew Fluet

Issue’s Table of Contents

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
General Chairs:
Yunji Chen
Institute of Computing Technology, CAS, China
,
Olivier Temam
Google, USA
,
Program Chair:
John Carter
IBM, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Published in SIGPLAN Volume 52, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
503
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liang Q(2023)Tree-based data filtering for online user-generated reviewsIISE Transactions10.1080/24725854.2023.222886156:8(824-840)Online publication date: Aug-2023
https://doi.org/10.1080/24725854.2023.2228861
Berk Wheelock LPachamanova D(2022)Acceptable set topic modelingEuropean Journal of Operational Research10.1016/j.ejor.2021.11.024299:2(653-673)Online publication date: Jun-2022
https://doi.org/10.1016/j.ejor.2021.11.024
Kwon YRhu MSalapura VZahran MChong FTang L(2022)Training personalized recommendation systems from (GPU) scratchProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527386(860-873)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527386
Miao XMa LYang ZShao YCui BYu LJiang J(2022)CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303810934:9(4119-4132)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TKDE.2020.3038109
Xie XLiang YLi XTan WWeissman JButt ASmirni E(2019)CuLDAProceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3307681.3325407(195-205)Online publication date: 17-Jun-2019
https://dl.acm.org/doi/10.1145/3307681.3325407
Xie XLiang YLi XTan WHollingsworth JKeidar I(2019)CuLDA_CGSProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3301496(435-436)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3301496
Zhu J(2018)Probabilistic machine learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304839(5754-5759)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304839
Chen JZhu JLu JLiu S(2018)Scalable training of hierarchical topic modelsProceedings of the VLDB Endowment10.14778/3192965.319297211:7(826-839)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.14778/3192965.3192972
Terenin AMagnusson MJonsson LDraper D(2018)Pólya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel samplerIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2018.2832641(1-1)Online publication date: 2018
https://doi.org/10.1109/TPAMI.2018.2832641
Nisa ISukumaran-Rajam AKurt SHong CSadayappan P(2018)Sampled Dense Matrix Multiplication for High-Performance Machine Learning2018 IEEE 25th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2018.00013(32-41)Online publication date: Dec-2018
https://doi.org/10.1109/HiPC.2018.00013
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents