Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3077136.3080781acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Public Access

Accounting for the Correspondence in Commented Data

Published: 07 August 2017 Publication History

Abstract

One important way for people to make their voice heard is to comment on the articles they have read online, such as news reports and each other's posts. The user-generated comments together with the commented documents form a unique correspondence structure. Properly modeling the dependency in such data is thus vital for one to obtain accurate insight of people's opinions and attention.
In this work, we develop a Commented Correspondence Topic Model to model correspondence in commented text data. We focus on two levels of correspondence. First, to capture topic-level correspondence, we treat the topic assignments in commented documents as the prior to their comments' topic proportions. This captures the thematic dependency between commented documents and their comments. Second, to capture word-level correspondence, we utilize the Dirichlet compound multinomial distribution to model topics. This captures the word repetition patterns within the commented data. By integrating these two aspects, our model demonstrated encouraging performance in capturing the correspondence sturcture, which provides improved results in modeling user-generated content, spam comment detection, and sentence-based comment retrieval compared with state-of-the-art topic model solutions for correspondence modeling.

References

[1]
Charles E. Antoniak. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The annals of statistics (1974), 1152--1174.
[2]
David M. Blei. 2012. Probabilistic topic models. Commun. ACM Vol. 55, 4 (2012), 77--84.
[3]
David M. Blei and Michael I. Jordan 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 127--134.
[4]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research Vol. 3, Jan (2003), 993--1022.
[5]
Gilles Celeux, Didier Chauveau, and Jean Diebolt. 1996. Stochastic versions of the EM algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation, Vol. 55, 4 (1996), 287--314.
[6]
Kenneth W. Church and William A. Gale 1995. Poisson mixtures. Natural Language Engineering Vol. 1, 02 (1995), 163--190.
[7]
Mrinal Kanti Das, Trapit Bansal, and Chiranjib Bhattacharyya. 2014. Going beyond Corr-LDA for detecting specific comments on news & blogs Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 483--492.
[8]
Gabriel Doyle and Charles Elkan 2009. Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 281--288.
[9]
Charles Elkan. 2006. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In Proceedings of the 23rd international conference on Machine learning. ACM, 289--296.
[10]
Kosuke Fukumasu, Koji Eguchi, and Eric P. Xing. 2012. Symmetric correspondence topic models for multilingual text analysis Advances in Neural Information Processing Systems. 1286--1294.
[11]
Giorgos Giannopoulos, Ingmar Weber, Alejandro Jaimes, and Timos Sellis 2012. Diversifying user comments on news articles. In International Conference on Web Information Systems Engineering. Springer, 100--113.
[12]
Thomas L. Griffiths and Mark Steyvers 2004. Finding scientific topics. Proceedings of the National academy of Sciences, Vol. 101, suppl 1 (2004), 5228--5235.
[13]
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57.
[14]
Meishan Hu, Aixin Sun, and Ee-Peng Lim 2008. Comments-oriented document summarization: understanding documents with readers' feedback. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 291--298.
[15]
Andreas M. Kaplan and Michael Haenlein 2010. Users of the world, unite! The challenges and opportunities of Social Media. Business horizons, Vol. 53, 1 (2010), 59--68.
[16]
Slava M. Katz. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering Vol. 2, 01 (1996), 15--59.
[17]
Zongyang Ma, Aixin Sun, Quan Yuan, and Gao Cong. 2012. Topic-driven reader comments summarization. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 265--274.
[18]
Rasmus E. Madsen, David Kauchak, and Charles Elkan. 2005. Modeling word burstiness using the Dirichlet distribution Proceedings of the 22nd international conference on Machine learning. ACM, 545--552.
[19]
Thomas Minka. 2000. Estimating a Dirichlet distribution. (2000).
[20]
Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, and Haohong Wang 2015. Leveraging user reviews to improve accuracy for mobile app retrieval Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 533--542.
[21]
Kristen Purcell, Lee Rainie, Amy Mitchell, Tom Rosenstiel, and Kenny Olmstead 2010. Understanding the participatory news consumer: How Internet and cell phone users have turned news into a social experience. Pew Internet & American Life Project. (2010).
[22]
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 487--494.
[23]
Clay Shirky. 2011. The political power of social media: Technology, the public sphere, and political change. Foreign affairs (2011), 28--41.
[24]
Alexandru Tatar, Jérémie Leguay, Panayotis Antoniadis, Arnaud Limbourg, Marcelo Dias de Amorim, and Serge Fdida. 2011. Predicting the popularity of online articles based on user comments Proceedings of the International Conference on Web Intelligence, Mining and Semantics. ACM, 67.
[25]
Goutham Tholpadi, Mrinal Kanti Das, Trapit Bansal, and Chiranjib Bhattacharyya 2015. Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence. AAAI. 311--317.
[26]
Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2009. Predicting the volume of comments on online news stories Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1765--1768.
[27]
Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2010. News comments: Exploring, modeling, and online prediction European Conference on Information Retrieval. Springer, 191--203.
[28]
Xing Wei and W. Bruce Croft 2006. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 178--185.
[29]
Aonan Zhang, Jun Zhu, and Bo Zhang 2013. Sparse online topic models. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1489--1500.

Cited By

View all
  • (2022)Graph-based Extractive Explainer for RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512168(2163-2171)Online publication date: 25-Apr-2022
  • (2022)Comparative Explanations of RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512031(3113-3123)Online publication date: 25-Apr-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. social media
  2. text correspondence modeling
  3. topic models
  4. user comments

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '17
Sponsor:

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)18
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Graph-based Extractive Explainer for RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512168(2163-2171)Online publication date: 25-Apr-2022
  • (2022)Comparative Explanations of RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512031(3113-3123)Online publication date: 25-Apr-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media