Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2983323.2983683acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity

Published: 24 October 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Many news websites from different regions in the world allow readers to write comments in their own languages about an event. Digesting such enormous amount of comments in different languages is difficult. One elegant way to digest and organize these comments is to detect latent discussion topics with the consideration of language attributes. Some discussion topics are common topics shared between languages whereas some topics are specifically dominated by a particular language. To tackle this task of discovering discussion topics that exhibit commonality or specificity from news reader comments written in different languages, we propose a new model called TDCS based on graphical models, which can cope with the language gap and detect language-common and language-specific latent discussion topics simultaneously. Our TDCS model also exploits comment-oriented clues via a scalable Dirichlet Multinomial Regression method. To learn the model parameters, we develop an inference method which alternates between EM and Gibbs sampling. Experimental results show that our proposed TDCS model can provide an effective way to digest multilingual news reader comments.

    References

    [1]
    D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of Machine Learning Research, 3:993--1022, 2003.
    [2]
    J. Boyd-Graber and D. M. Blei. Multilingual topic models for unaligned text. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 75--82, 2009.
    [3]
    M. K. Das, T. Bansal, and C. Bhattacharyya. Going beyond corr-lda for detecting specific comments on news & blogs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pages 483--492, 2014.
    [4]
    T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1):5228--5235, 2004.
    [5]
    T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42(1--2):177--196, 2001.
    [6]
    D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45(1--3):503--528, 1989.
    [7]
    Z. Ma, A. Sun, Q. Yuan, and G. Cong. Topic-driven reader comments summarization. In Proceedings of the 21st ACM international conference on Information and knowledge management, pages 265--274, 2012.
    [8]
    S. Melluish. Globalization, culture and psychology. International Review of Psychiatry, 26(5):538--543, 2014.
    [9]
    D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv preprint arXiv:1206.3278, 2012.
    [10]
    D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100--108, 2010.
    [11]
    R. E. Prasojo, M. Kacimi, and W. Nutt. Entity and aspect extraction for organizing news comments. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 233--242, 2015.
    [12]
    C. Shaoul. The westbury lab wikipedia corpus. Edmonton, AB: University of Alberta, 2010.
    [13]
    C. Tamer, K. Gary, and R. Riesenberger John. International business: Strategy, management and the new realities, 2010.
    [14]
    H. M. Wallach. Topic modeling: beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, pages 977--984, 2006.
    [15]
    D. Zhang, Q. Mei, and C. Zhai. Cross-lingual latent topic extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1128--1137, 2010.

    Index Terms

    1. Digesting Multilingual Reader Comments via Latent Discussion Topics with Commonality and Specificity

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
          October 2016
          2566 pages
          ISBN:9781450340731
          DOI:10.1145/2983323
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 24 October 2016

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. commonality and specificity
          2. latent discussion topics
          3. multilingual news reader comments

          Qualifiers

          • Research-article

          Conference

          CIKM'16
          Sponsor:
          CIKM'16: ACM Conference on Information and Knowledge Management
          October 24 - 28, 2016
          Indiana, Indianapolis, USA

          Acceptance Rates

          CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
          Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

          Upcoming Conference

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 107
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 26 Jul 2024

          Other Metrics

          Citations

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media