Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2832415.2832564guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Short and sparse text topic modeling via self-aggregation

Published: 25 July 2015 Publication History

Abstract

The overwhelming amount of short text data on social media and elsewhere has posed great challenges to topic modeling due to the sparsity problem. Most existing attempts to alleviate this problem resort to heuristic strategies to aggregate short texts into pseudo-documents before the application of standard topic modeling. Although such strategies cannot be well generalized to more general genres of short texts, the success has shed light on how to develop a generalized solution. In this paper, we present a novel model towards this goal by integrating topic modeling with short text aggregation during topic inference. The aggregation is founded on general topical affinity of texts rather than particular heuristics, making the model readily applicable to various short texts. Experimental results on real-world datasets validate the effectiveness of this new model, suggesting that it can distill more meaningful topics from short texts.

References

[1]
David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003.
[2]
David M Blei. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012.
[3]
Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.
[4]
Jonathan Chang, Sean Gerrish, Chong Wang, Jordan L Boyd-graber, and David M Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems, pages 288-296, 2009.
[5]
Thomas L Griffiths and Mark Steyvers. Finding scientific topics. Proceedings of the National academy of Sciences of the United States of America, 101(Suppl 1):5228-5235, 2004.
[6]
Weiwei Guo, Hao Li, Heng Ji, and Mona T Diab. Linking tweets to news: A framework to enrich short text data in social media. In Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics, pages 239-249, 2013.
[7]
Liangjie Hong and Brian D Davison. Empirical study of topic modeling in twitter. In Proceedings of the First Workshop on Social Media Analytics, pages 80-88. ACM, 2010.
[8]
Ou Jin, Nathan N Liu, Kai Zhao, Yong Yu, and Qiang Yang. Transferring topical knowledge from auxiliary long texts for short text clustering. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pages 775-784. ACM, 2011.
[9]
Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development, pages 889- 892. ACM, 2013.
[10]
David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. Optimizing semantic coherence in topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 262-272. Association for Computational Linguistics, 2011.
[11]
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100-108. Association for Computational Linguistics, 2010.
[12]
Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. Text classification from labeled and unlabeled documents using em. Machine learning, 39(2-3):103-134, 2000.
[13]
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pages 487-494. AUAI Press, 2004.
[14]
Jian Tang, Ming Zhang, and Qiaozhu Mei. One theme in all views: modeling consensus topics in multiple contexts. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 5-13. ACM, 2013.
[15]
Ellen M Voorhees. Query expansion using lexical-semantic relations. In SIGIR94, pages 61-69. Springer, 1994.
[16]
Xuerui Wang and Andrew McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 424-433. ACM, 2006.
[17]
Jianshu Weng, Ee-Peng Lim, Jing Jiang, and Qi He. Twitterrank: finding topic-sensitive influential twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, pages 261- 270. ACM, 2010.
[18]
Jinxi Xu and W Bruce Croft. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4-11. ACM, 1996.
[19]
Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. A biterm topic model for short texts. In Proceedings of the 22nd International Conference on World Wide Web, pages 1445-1456. International World Wide Web Conferences Steering Committee, 2013.
[20]
Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. Comparing twitter and traditional media using topic models. In Advances in Information Retrieval, pages 338- 349. Springer, 2011.

Cited By

View all
  • (2024)See, caption, clusterExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121391237:PBOnline publication date: 1-Feb-2024
  • (2021)BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and SegmentationACM Transactions on Intelligent Systems and Technology10.1145/346826812:5(1-29)Online publication date: 15-Oct-2021
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence
July 2015
4429 pages
ISBN:9781577357384

Sponsors

  • The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

Publisher

AAAI Press

Publication History

Published: 25 July 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)See, caption, clusterExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121391237:PBOnline publication date: 1-Feb-2024
  • (2021)BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and SegmentationACM Transactions on Intelligent Systems and Technology10.1145/346826812:5(1-29)Online publication date: 15-Oct-2021
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • (2019)Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document SummarizationACM Transactions on Knowledge Discovery from Data10.1145/333303013:4(1-21)Online publication date: 8-Aug-2019
  • (2019)Quantifying and Visualizing the Demand and Supply Gap from E-commerce Search Data using Topic ModelsCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316605(348-353)Online publication date: 13-May-2019
  • (2019)Where to PostProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3297001.3297018(136-142)Online publication date: 3-Jan-2019
  • (2019)Topic Modelling for Identification of Vaccine Reactions in TwitterProceedings of the Australasian Computer Science Week Multiconference10.1145/3290688.3290735(1-10)Online publication date: 29-Jan-2019
  • (2018)A joint model of conversational discourse and latent topics on microblogsComputational Linguistics10.1162/coli_a_0033544:4(719-754)Online publication date: 1-Dec-2018
  • (2018)Insights from the Long-TailProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271706(297-306)Online publication date: 17-Oct-2018
  • (2018)Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context CorrelationsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186009(1105-1114)Online publication date: 10-Apr-2018
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media