Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2783258.2783411acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

Published: 10 August 2015 Publication History

Abstract

Clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using contents only? In this paper, we propose a novel random process, referred to as the Dirichlet-Hawkes process, to take into account both information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.

Supplementary Material

MP4 File (p219.mp4)

References

[1]
O. Aalen, O. Borgan, and H. Gjessing. Survival and event history analysis: a process point of view. Springer, 2008.
[2]
A. Ahmed, J. Eisenstein, Q. Ho, E. P. Xing, A. J. Smola, and C. H. Teo. The topic-cluster model. In Artificial Intelligence and Statistics AISTATS, 2011.
[3]
A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. Smola, and C. Teo. Unified analysis of streaming news. In Proceedings of WWW, Hyderabad, India, 2011. IW3C2, Sheridan Printing.
[4]
A. Ahmed and E. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In SDM, pages 219--230. SIAM, 2008.
[5]
C. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of Statistics, 2:1152--1174, 1974.
[6]
D. Blei and P. Frazier. Distance dependent chinese restaurant processes. In ICML, pages 87--94, 2010.
[7]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, pages 113--120, 2006.
[8]
D. Daley and D. Vere-Jones. An introduction to the theory of point processes: volume II: general theory and structure, volume 2. Springer, 2007.
[9]
Q. Diao and J. Jiang. Recurrent chinese restaurant process with a duration-based discount for event identification from twitter. In SDM, 2014.
[10]
A. Doucet, J. F. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle filtering for dynamic bayesian networks. In C. Boutilier and M. Goldszmidt, editors, UAI, pages 176--183, SF, CA, 2000.
[11]
A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001.
[12]
N. Du, L. Song, A. Smola, and M. Yuan. Learning networks of heterogeneous influence. In NIPS, pages 2789--2797, 2012.
[13]
N. Du, L. Song, H. Woo, and H. Zha. Uncover Topic-Sensitive Information Diffusion Networks. In Artificial Intelligence and Statistics (AISTATS), 2013.
[14]
M. Farajtabar, N. Du, M. Gomez-Rodriguez, I. Valera, H. Zha, and L. Song. Shaping Social Activity by Incentivizing Users. In NIPS, 2014.
[15]
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.
[16]
T. Griffiths and Z. Ghahramani. The indian buffet process: An introduction and review. Journal of Machine Learning Research, 12:1185--1224, 2011.
[17]
A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83--90, 1971.
[18]
N. L. Hjort, C. Holmes, P. Muller, and S. G. Walker. Bayesian Nonparametrics. Cambridge University Press, 2010.
[19]
J. Kingman. On doubly stochastic poisson processes. Mathematical Proceedings of the Cambridge Philosophical Society, pages 923--930, 1964.
[20]
J. F. C. Kingman. Poisson processes, volume 3. Oxford university press, 1992.
[21]
L. Li, H. Deng, A. Dong, Y. Chang, and H. Zha. Identifying and labeling search tasks via query-based hawkes processes. In KDD, pages 731--740, 2014.
[22]
C. Suen, S. Huang, C. Eksombatchai, R. Sosic, and J. Leskovec. Nifty: A system for large scale information flow tracking and clustering. In WWW, 2013.
[23]
Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 985--992, 2006.
[24]
X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In KDD, 2006.

Cited By

View all
  • (2024)Bayesian estimation of nonlinear Hawkes processesBernoulli10.3150/23-BEJ163130:2Online publication date: 1-May-2024
  • (2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
  • (2023)Short Text Clustering in Continuous Time Using Stacked Dirichlet-Hawkes Process with Inverse Cluster Frequency PriorProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571059(118-122)Online publication date: 4-Jan-2023
  • Show More Cited By

Index Terms

  1. Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
      August 2015
      2378 pages
      ISBN:9781450336642
      DOI:10.1145/2783258
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 August 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. dirichlet process
      2. document modeling
      3. hawkes process

      Qualifiers

      • Research-article

      Funding Sources

      • NSF
      • NSF/NIH
      • NSF CAREER

      Conference

      KDD '15
      Sponsor:

      Acceptance Rates

      KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)230
      • Downloads (Last 6 weeks)31
      Reflects downloads up to 17 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Bayesian estimation of nonlinear Hawkes processesBernoulli10.3150/23-BEJ163130:2Online publication date: 1-May-2024
      • (2024)Entity Footprinting: Modeling Contextual User States via Digital Activity MonitoringACM Transactions on Interactive Intelligent Systems10.1145/364389314:2(1-27)Online publication date: 5-Feb-2024
      • (2023)Short Text Clustering in Continuous Time Using Stacked Dirichlet-Hawkes Process with Inverse Cluster Frequency PriorProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571059(118-122)Online publication date: 4-Jan-2023
      • (2023)Curb Your Procrastination: A Study of Academic Procrastination Behaviors vs. A Planning and Time Management AppProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization10.1145/3565472.3592953(124-134)Online publication date: 18-Jun-2023
      • (2023)Hawkes Processes Modeling, Inference, and Control: An OverviewSIAM Review10.1137/21M139692765:2(331-374)Online publication date: 9-May-2023
      • (2023)Spatio-Temporal Point Process for Multiple Object TrackingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.299700634:4(1777-1788)Online publication date: Apr-2023
      • (2023)Data Science, Machine learning and big data in Digital Journalism: A survey of state-of-the-art, challenges and opportunitiesExpert Systems with Applications10.1016/j.eswa.2023.119795221(119795)Online publication date: Jul-2023
      • (2023)Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion NetworksAdvances in Information Retrieval10.1007/978-3-031-28238-6_47(562-570)Online publication date: 2-Apr-2023
      • (2023)Multivariate Powered Dirichlet-Hawkes ProcessAdvances in Information Retrieval10.1007/978-3-031-28238-6_4(47-61)Online publication date: 2-Apr-2023
      • (2023)Properties of Reddit News Topical InteractionsComplex Networks and Their Applications XI10.1007/978-3-031-21127-0_2(16-28)Online publication date: 4-Jan-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media