research-article

Public Access

Accounting for the Correspondence in Commented Data

Authors:

Hongning WangAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 365 - 374

https://doi.org/10.1145/3077136.3080781

Published: 07 August 2017 Publication History

Abstract

One important way for people to make their voice heard is to comment on the articles they have read online, such as news reports and each other's posts. The user-generated comments together with the commented documents form a unique correspondence structure. Properly modeling the dependency in such data is thus vital for one to obtain accurate insight of people's opinions and attention.

In this work, we develop a Commented Correspondence Topic Model to model correspondence in commented text data. We focus on two levels of correspondence. First, to capture topic-level correspondence, we treat the topic assignments in commented documents as the prior to their comments' topic proportions. This captures the thematic dependency between commented documents and their comments. Second, to capture word-level correspondence, we utilize the Dirichlet compound multinomial distribution to model topics. This captures the word repetition patterns within the commented data. By integrating these two aspects, our model demonstrated encouraging performance in capturing the correspondence sturcture, which provides improved results in modeling user-generated content, spam comment detection, and sentence-based comment retrieval compared with state-of-the-art topic model solutions for correspondence modeling.

References

[1]

Charles E. Antoniak. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The annals of statistics (1974), 1152--1174.

[2]

David M. Blei. 2012. Probabilistic topic models. Commun. ACM Vol. 55, 4 (2012), 77--84.

Digital Library

[3]

David M. Blei and Michael I. Jordan 2003. Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 127--134.

Digital Library

[4]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning research Vol. 3, Jan (2003), 993--1022.

Digital Library

[5]

Gilles Celeux, Didier Chauveau, and Jean Diebolt. 1996. Stochastic versions of the EM algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation, Vol. 55, 4 (1996), 287--314.

[6]

Kenneth W. Church and William A. Gale 1995. Poisson mixtures. Natural Language Engineering Vol. 1, 02 (1995), 163--190.

[7]

Mrinal Kanti Das, Trapit Bansal, and Chiranjib Bhattacharyya. 2014. Going beyond Corr-LDA for detecting specific comments on news & blogs Proceedings of the 7th ACM international conference on Web search and data mining. ACM, 483--492.

[8]

Gabriel Doyle and Charles Elkan 2009. Accounting for burstiness in topic models. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 281--288.

Digital Library

[9]

Charles Elkan. 2006. Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution. In Proceedings of the 23rd international conference on Machine learning. ACM, 289--296.

Digital Library

[10]

Kosuke Fukumasu, Koji Eguchi, and Eric P. Xing. 2012. Symmetric correspondence topic models for multilingual text analysis Advances in Neural Information Processing Systems. 1286--1294.

Digital Library

[11]

Giorgos Giannopoulos, Ingmar Weber, Alejandro Jaimes, and Timos Sellis 2012. Diversifying user comments on news articles. In International Conference on Web Information Systems Engineering. Springer, 100--113.

Digital Library

[12]

Thomas L. Griffiths and Mark Steyvers 2004. Finding scientific topics. Proceedings of the National academy of Sciences, Vol. 101, suppl 1 (2004), 5228--5235.

[13]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57.

Digital Library

[14]

Meishan Hu, Aixin Sun, and Ee-Peng Lim 2008. Comments-oriented document summarization: understanding documents with readers' feedback. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 291--298.

Digital Library

[15]

Andreas M. Kaplan and Michael Haenlein 2010. Users of the world, unite! The challenges and opportunities of Social Media. Business horizons, Vol. 53, 1 (2010), 59--68.

[16]

Slava M. Katz. 1996. Distribution of content words and phrases in text and language modelling. Natural Language Engineering Vol. 2, 01 (1996), 15--59.

Digital Library

[17]

Zongyang Ma, Aixin Sun, Quan Yuan, and Gao Cong. 2012. Topic-driven reader comments summarization. In Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 265--274.

Digital Library

[18]

Rasmus E. Madsen, David Kauchak, and Charles Elkan. 2005. Modeling word burstiness using the Dirichlet distribution Proceedings of the 22nd international conference on Machine learning. ACM, 545--552.

[19]

Thomas Minka. 2000. Estimating a Dirichlet distribution. (2000).

[20]

Dae Hoon Park, Mengwen Liu, ChengXiang Zhai, and Haohong Wang 2015. Leveraging user reviews to improve accuracy for mobile app retrieval Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 533--542.

[21]

Kristen Purcell, Lee Rainie, Amy Mitchell, Tom Rosenstiel, and Kenny Olmstead 2010. Understanding the participatory news consumer: How Internet and cell phone users have turned news into a social experience. Pew Internet & American Life Project. (2010).

[22]

Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 487--494.

Digital Library

[23]

Clay Shirky. 2011. The political power of social media: Technology, the public sphere, and political change. Foreign affairs (2011), 28--41.

[24]

Alexandru Tatar, Jérémie Leguay, Panayotis Antoniadis, Arnaud Limbourg, Marcelo Dias de Amorim, and Serge Fdida. 2011. Predicting the popularity of online articles based on user comments Proceedings of the International Conference on Web Intelligence, Mining and Semantics. ACM, 67.

[25]

Goutham Tholpadi, Mrinal Kanti Das, Trapit Bansal, and Chiranjib Bhattacharyya 2015. Relating Romanized Comments to News Articles by Inferring Multi-Glyphic Topical Correspondence. AAAI. 311--317.

[26]

Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2009. Predicting the volume of comments on online news stories Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 1765--1768.

[27]

Manos Tsagkias, Wouter Weerkamp, and Maarten De Rijke. 2010. News comments: Exploring, modeling, and online prediction European Conference on Information Retrieval. Springer, 191--203.

[28]

Xing Wei and W. Bruce Croft 2006. LDA-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 178--185.

Digital Library

[29]

Aonan Zhang, Jun Zhu, and Bo Zhang 2013. Sparse online topic models. In Proceedings of the 22nd international conference on World Wide Web. ACM, 1489--1500.

Digital Library

Cited By

Wang PCai RWang H(2022)Graph-based Extractive Explainer for RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512168(2163-2171)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512168
Yang AWang NCai RDeng HWang H(2022)Comparative Explanations of RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512031(3113-3123)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512031

Index Terms

Accounting for the Correspondence in Commented Data
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning in probabilistic graphical models
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models
  2. World Wide Web
    1. Web mining

Recommendations

Does sentiment help requirement engineering: exploring sentiments in user comments to discover informative comments
Abstract
User comments are valuable resources for software improvement; however, owing to excessive volume, informative comments need to be selectively analyzed. We attempt to address this problem by sentiment analysis and expect sentiment can be a useful ...
Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
Abstract
Topic models are a useful and popular method to find latent topics of documents. However, the short and sparse texts in social media micro-blogs such as Twitter are challenging for the most commonly used Latent Dirichlet Allocation (LDA) topic ...
Sentence Retrieval with Sentiment-specific Topical Anchoring for Review Summarization
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

We propose Topic Anchoring-based Review Summarization (TARS), a two-step extractive summarization method, which creates review summaries from the sentences that represent the most important aspects of a review. In the first step, the proposed method ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
423
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)18

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang PCai RWang H(2022)Graph-based Extractive Explainer for RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512168(2163-2171)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512168
Yang AWang NCai RDeng HWang H(2022)Comparative Explanations of RecommendationsProceedings of the ACM Web Conference 202210.1145/3485447.3512031(3113-3123)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512031

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten