Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1699510.1699543dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

Published: 06 August 2009 Publication History

Abstract

A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA's improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on a variety of datasets.

References

[1]
D. M. Blei and J. Lafferty. 2006. Correlated Topic Models. NIPS, 18:147.
[2]
D. Blei and J McAuliffe. 2007. Supervised Topic Models. In NIPS, volume 21.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet allocation. JMLR.
[4]
T. L. Griffiths and M. Steyvers. 2004. Finding scientific topics. PNAS, 1:5228--35.
[5]
P. Heymann, G. Koutrika, and H. Garcia-Molina. 2008. Can social bookmarking improve web search. In WSDM.
[6]
S. Ji, L. Tang, S. Yu, and J. Ye. 2008. Extracting shared subspace for multi-label classification. In KDD, pages 381--389, New York, NY, USA. ACM.
[7]
H. Kazawa, H. Taira T. Izumitani, and E. Maeda. 2004. Maximal margin labeling for multi-topic text categorization. In NIPS.
[8]
S. Lacoste-Julien, F. Sha, and M. I. Jordan. 2008. DiscLDA: Discriminative learning for dimensionality reduction and classification. In NIPS, volume 22.
[9]
D. D. Lewis, Y. Yang, T. G. Rose, G. Dietterich, F. Li, and F. Li. 2004. RCV1: A new benchmark collection for text categorization research. JMLR, 5:361--397.
[10]
Wei Li and Andrew McCallum. 2006. Pachinko allocation: Dag-structured mixture models of topic correlations. In International conference on Machine learning, pages 577--584.
[11]
A. McCallum and K. Nigam. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 7.
[12]
Q. Mei, X. Shen, and C Zhai. 2007. Automatic labeling of multinomial topic models. In KDD.
[13]
D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina. 2009. Clustering the tagged web. In WSDM.
[14]
N. Ueda and K. Saito. 2003. Parametric mixture models for multi-labeled text includes models that can be seen to fit within a dimensionality reduction framework. In NIPS.

Cited By

View all
  • (2024)Beyond Labels and Topics: Discovering Causal Relationships in Neural Topic ModelingProceedings of the ACM Web Conference 202410.1145/3589334.3645715(4460-4469)Online publication date: 13-May-2024
  • (2023)A Review Selection Method Based on Consumer Decision Phases in E-commerceACM Transactions on Information Systems10.1145/358726542:1(1-27)Online publication date: 21-Aug-2023
  • (2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
August 2009
505 pages
ISBN:9781932432596

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 06 August 2009

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)146
  • Downloads (Last 6 weeks)18
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond Labels and Topics: Discovering Causal Relationships in Neural Topic ModelingProceedings of the ACM Web Conference 202410.1145/3589334.3645715(4460-4469)Online publication date: 13-May-2024
  • (2023)A Review Selection Method Based on Consumer Decision Phases in E-commerceACM Transactions on Information Systems10.1145/358726542:1(1-27)Online publication date: 21-Aug-2023
  • (2023)Addressing the gap between current language models and key-term-based clusteringProceedings of the ACM Symposium on Document Engineering 202310.1145/3573128.3604900(1-10)Online publication date: 22-Aug-2023
  • (2022)Multi-label Classification of Cybersecurity Text with Distant SupervisionProceedings of the 17th International Conference on Availability, Reliability and Security10.1145/3538969.3543795(1-9)Online publication date: 23-Aug-2022
  • (2022)Keyword Assisted Embedded Topic ModelProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498518(372-380)Online publication date: 11-Feb-2022
  • (2022)Hetero-Labeled LDA: A Partially Supervised Topic Model with Heterogeneous LabelsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_41(640-655)Online publication date: 10-Mar-2022
  • (2021)Topic modelling of racist and xenophobic YouTube comments. Analyzing hate speech against migrants and refugees spread through YouTube in SpanishNinth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM'21)10.1145/3486011.3486494(456-460)Online publication date: 26-Oct-2021
  • (2021)Deep Clustering based on Bi-Space Association LearningProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475270(12-21)Online publication date: 17-Oct-2021
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • (2021)Topic Modeling for Multi-Aspect Listwise ComparisonsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482398(2507-2516)Online publication date: 26-Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media