Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150450acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Topics over time: a non-Markov continuous-time model of topical trends

Published: 20 August 2006 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

    References

    [1]
    C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50:5--43, 2003.
    [2]
    D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
    [3]
    D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006.
    [4]
    E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(Suppl. 1), 2004.
    [5]
    T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228--5235, 2004.
    [6]
    T. Griffiths, M. Steyvers, D. Blei, and J. Tenenbaum. Integrating topics and syntax. In Advances in Neural Information Processing Systems (NIPS) 17, 2004.
    [7]
    J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002.
    [8]
    P. Kumaraswamy. A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46:79--88, 1980.
    [9]
    R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the Dirichlet distribution. In Proceedings of the 22nd International Conference on Machine Learning, 2005.
    [10]
    A. McCallum, A. Corrada-Emanuel, and X. Wang. Topic and role discovery in social networks. In Proceedings of 19th International Joint Conference on Artificial Intelligence, 2005.
    [11]
    U. Nodelman, C. Shelton, and D. Koller. Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 378--387, 2002.
    [12]
    M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, 2004.
    [13]
    P. Sarkar and A. Moore. Dynamic social network analysis using latent space models. In The 19th Annual Conference on Neural Information Processing Systems, 2005.
    [14]
    X. Song, C.-Y. Lin, B. L. Tseng, and M.-T. Sun. Modeling and predicting personal information dissemination behavior. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
    [15]
    R. Swan and D. Jensen. Timemines: Constructing timelines with statistical models of word usage. In The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Text Mining, pages 73--80, 2000.
    [16]
    Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Technical report, UC Berkeley Statistics TR-653, 2004.
    [17]
    X. Wang and A. McCallum. A note on topical n-grams. Technical report, UMass UM-CS-2005-071, 2005.
    [18]
    X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and text. In The 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Workshop on Link Discovery: Issues, Approachesand Applications, pages 28--35, 2005.

    Cited By

    View all
    • (2024)Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clusteringJournal of Big Data10.1186/s40537-024-00930-911:1Online publication date: 9-May-2024
    • (2024)Hidden Markov model with Pitman-Yor priors for probabilistic topic modelCommunications in Statistics - Theory and Methods10.1080/03610926.2024.2370920(1-15)Online publication date: 29-Jul-2024
    • (2024)NRDL: Decentralized user preference learning for privacy-preserving next POI recommendationExpert Systems with Applications10.1016/j.eswa.2023.122421239(122421)Online publication date: May-2024
    • Show More Cited By

    Index Terms

    1. Topics over time: a non-Markov continuous-time model of topical trends

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2006
        986 pages
        ISBN:1595933395
        DOI:10.1145/1150402
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 August 2006

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. graphical models
        2. temporal analysis
        3. topic modeling

        Qualifiers

        • Article

        Conference

        KDD06

        Acceptance Rates

        Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

        Upcoming Conference

        KDD '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)154
        • Downloads (Last 6 weeks)11
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clusteringJournal of Big Data10.1186/s40537-024-00930-911:1Online publication date: 9-May-2024
        • (2024)Hidden Markov model with Pitman-Yor priors for probabilistic topic modelCommunications in Statistics - Theory and Methods10.1080/03610926.2024.2370920(1-15)Online publication date: 29-Jul-2024
        • (2024)NRDL: Decentralized user preference learning for privacy-preserving next POI recommendationExpert Systems with Applications10.1016/j.eswa.2023.122421239(122421)Online publication date: May-2024
        • (2024)A topic-enhanced dirichlet model for short text stream clusteringNeural Computing and Applications10.1007/s00521-024-09480-w36:14(8125-8140)Online publication date: 2-Mar-2024
        • (2024)ANTM: Aligned Neural Topic Models for Exploring Evolving TopicsTransactions on Large-Scale Data- and Knowledge-Centered Systems LVI10.1007/978-3-662-69603-3_3(76-97)Online publication date: 21-Jul-2024
        • (2023)Topic Classification for Short TextsProceedings of the 30th International Conference on Information Systems Development10.62036/ISD.2022.50Online publication date: 2023
        • (2023)Exploring trends in blockchain publications with topic modeling: Implications for forecasting the emergence of industry applicationsETRI Journal10.4218/etrij.2022-025745:6(982-995)Online publication date: 23-Jan-2023
        • (2023)Topic-Clustering Model with Temporal Distribution for Public Opinion Topic Analysis of Geospatial Social Media DataISPRS International Journal of Geo-Information10.3390/ijgi1207027412:7(274)Online publication date: 8-Jul-2023
        • (2023)Veinticinco años de investigación en redes sociales: evolución de temas entre 1997 y 2021 empleando el algoritmo Asignación Latente de DirichletInvestigación Bibliotecológica: archivonomía, bibliotecología e información10.22201/iibi.24488321xe.2023.96.5877737:96(145-177)Online publication date: 25-Aug-2023
        • (2023)Topic Selection Using Conceptual Distance: How to Select Topics that are Interesting but Unfamiliar to UsersIEEJ Journal of Industry Applications10.1541/ieejjia.2200678412:4(588-595)Online publication date: 1-Jul-2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media