Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1014052.1016919acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Tracking dynamics of topic trends using a finite mixture model

Published: 22 August 2004 Publication History

Abstract

In a wide range of business areas dealing with text data streams, including CRM, knowledge management, and Web monitoring services, it is an important issue to discover topic trends and analyze their dynamics in real-time. Specifically we consider the following three tasks in topic trend analysis: 1)Topic Structure Identification; identifying what kinds of main topics exist and how important they are, 2)Topic Emergence Detection; detecting the emergence of a new topic and recognizing how it grows, 3)Topic Characterization; identifying the characteristics for each of main topics. For real topic analysis systems, we may require that these three tasks be performed in an on-line fashion rather than in a retrospective way, and be dealt with in a single framework. This paper proposes a new topic analysis framework which satisfies this requirement from a unifying viewpoint that a topic structure is modeled using a finite mixture model and that any change of a topic trend is tracked by learning the finite mixture model dynamically. In this framework we propose the usage of a time-stamp based discounting learning algorithm in order to realize real-time topic structure identification. This enables tracking the topic structure adaptively by forgetting out-of-date statistics. Further we apply the theory of dynamic model selection to detecting changes of main components in the finite mixture model in order to realize topic emergence detection. We demonstrate the effectiveness of our framework using real data collected at a help desk to show that we are able to track dynamics of topic trends in a timely fashion.

References

[1]
J.Allen, R.Papka, and V.Lavrenko: On-line new event detection and tracking, in Proceedings of SIGIR International Conference on Information Retrieval, pp:37--45, 1998.
[2]
X.Liu, Y.Gong, W.Xu, and S.Zhu: Document clustering with cluster refinement and model selection capabilities, in Proceedings of SIGIR International Conference on Information Retrieval, pp:191-198, 2002.
[3]
S.Harve, B.Hetzler, and L.Norwell: ThemeRiver: Visualizing theme changes over time, in Proceesings of IEEE Symposium on Information Visualization, pp:115--123, 2000.
[4]
J.Kleiberg: Bursty and hierarchical structure in streams, in Proceedings of KDD2002, pp:91--101, ACM Press, 2003.
[5]
H.Li and K.Yamanishi: Text classification using ESC-based decision lists, Information Processing and Management, vol.38/3, pp:343--361, 2002.
[6]
H.Li and K.Yamanishi: Topic analysis using a finite mixture model, Information Processing and Management, Vol.39/4, pp 521--541, 2003.
[7]
Y.Matsunaga and K.Yamanishi: An information-theoretic approach to detecting anomalous behaviors, in Information Technology Letters vol.2 (Proc. of the 2nd Forum on Information Technologies), pp:123--124, (in Japanese) 2003.
[8]
G.McLahlan and D.Peel: Finite Mixture Models, Wiley Series in Probability and Statistics, John Wiley and Sons, 2000.
[9]
R.M.Neal and G.E.Hinton: A view of the EM algorithm that justifies incremental sparse, and other variants, Learning in Graphical Models, M.Jordan (editor), MIT Press, Cambridge MA, USA.
[10]
J.Rissanen: Universal coding, information, and estimation, IEEE Trans. on Inform. Theory, 30:629--636, 1984.
[11]
R.Swan and J.Allen: Extracting significant time-varying features from text, in Proceedings of 8th International Conference on Information Knowledge Management, pp:38--45, 1999.
[12]
R.Swan and J.Allen: Automatic generation of overview timelines, in Proceedings of SIGIR International Conference on Information Retrieval, pp:49--56, 2000.
[13]
K.Yamanishi: A Decision-theoretic Extension of Stochastic Complexity and Its Applications to Learning, IEEE Trans. on Inform. Theory, vol.44/4, pp:1424--1439, 1998.
[14]
K.Yamanishi, J.Takeuchi, G.Williams, and P.Milne: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms," in Proceedings of KDD2000, ACM Press, pp:320--324 2000.
[15]
Y.Yang, T.Pierce, J.G.Carbonell: A study on retrospective and on-line event detection, in Proceedings of SIGIR International Conference on Information Retrieval, pp:28--30, 1998.
[16]
Y.Yang, J.Zang, J.Carbonell, and C.Jin: Topic-conditioned novelty detection, in Proceedings of KDD 2002, pp:688--693, 2002.

Cited By

View all
  • (2023)Iteratively Tracking Hot Topics on Public Opinion Based on Parallel IntelligenceIEEE Journal of Radio Frequency Identification10.1109/JRFID.2022.32143467(158-162)Online publication date: 2023
  • (2021)Emerging Research Topic Detection Using Filtered-LDAAI10.3390/ai20400352:4(578-599)Online publication date: 31-Oct-2021
  • (2021)Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric ModelsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462982(2126-2130)Online publication date: 11-Jul-2021
  • Show More Cited By

Index Terms

  1. Tracking dynamics of topic trends using a finite mixture model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CRM
    2. model selection
    3. text mining
    4. topic analysis

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Iteratively Tracking Hot Topics on Public Opinion Based on Parallel IntelligenceIEEE Journal of Radio Frequency Identification10.1109/JRFID.2022.32143467(158-162)Online publication date: 2023
    • (2021)Emerging Research Topic Detection Using Filtered-LDAAI10.3390/ai20400352:4(578-599)Online publication date: 31-Oct-2021
    • (2021)Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric ModelsProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462982(2126-2130)Online publication date: 11-Jul-2021
    • (2021)Automatic trend detection: Time-biased document clusteringKnowledge-Based Systems10.1016/j.knosys.2021.106907(106907)Online publication date: Mar-2021
    • (2020)A decade of Semantic Web research through the lenses of a mixed methods approachSemantic Web10.3233/SW-20037111:6(979-1005)Online publication date: 1-Jan-2020
    • (2019)From Social Network Graphs to Causal Bayes Nets2019 22th International Conference on Information Fusion (FUSION)10.23919/FUSION43075.2019.9011199(1-7)Online publication date: Jul-2019
    • (2019)Mashup-based Architecture for Social Trends Analysis System2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC.2019.8785460(30-33)Online publication date: May-2019
    • (2019)Temporal Methods to Detect Content-Based Anomalies in Social MediaFrom Security to Community Detection in Social Networking Platforms10.1007/978-3-030-11286-8_10(213-230)Online publication date: 10-Apr-2019
    • (2017)A spatial, temporal and sentiment based framework for indexing and clustering in twitter blogosphereJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-16929732:5(3619-3632)Online publication date: 1-Jan-2017
    • (2017)Streaming news sequential evolution model based on distributed representations2017 36th Chinese Control Conference (CCC)10.23919/ChiCC.2017.8028895(9647-9650)Online publication date: Jul-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media