Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1276958.1277340acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Article

A genetic algorithm for dynamic modelling and prediction of activity in document streams

Published: 07 July 2007 Publication History

Abstract

This paper presents an evolutionary algorithm for modeling the arrival dates of document streams, which is any time-stamped collection of documents, such as newscasts, e-mails, scientific journals archives and weblog postings. The goal is to find a frequency curve that fits the data circumventing the unavoidable noise. Classical dynamic programming algorithms are limited by memory and efficiency requirements, which can be a problem when dealing with long streams. This suggests to explore alternative search methods which although do not guarantee optimality, are far more efficient. Experiments have shown that the designed evolutionary algorithm is able to reach high quality solutions in a short time. We have also explored different approaches to infer whether new arrivals increase or decrease interest in the topic the document stream is about. In particular, we present a variant of the evolutionary algorithm, which is able to very quickly fit a stream extended with new data, by taking advantage of the fit obtained for the original substream. These mechanisms can be used for real time detection of changes in the trend of interest in a topic, an important application of this kind of models.

References

[1]
J. Allan. Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, 2002.
[2]
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study, 1998.
[3]
E. Bingham, A. Kabán, and M. Girolami. Topic identification in dynamical text by complexity pursuit. Neural Process. Lett., 17(1):69--83, 2003.
[4]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Proceedings of the 29th International Colloquium on Automata, Languages, and Programming, 2002., 2002.
[5]
A. I. Elwalid and D. Mitra. Effective bandwidth of general markovian traffic sources and admission control of high speed networks. IEEE/ACM Trans. Netw., 1(3):329--343, 1993.
[6]
G. D. Forney. The Viterbi algorithm. Proceedings of The IEEE, 61(3):268--278, 1973.
[7]
M. Girolami and A. Kaban. Simplicial mixtures of Markov chains: Distributed modelling of dynamic user profiles. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
[8]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 91--101. ACM, 2002.
[9]
J. Kleinberg. Temporal dynamics of on-line information streams. In M. Garofalakis, J. Gehrke, and R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams (to appear). Springer, 2005.
[10]
R. Papka. On-line New Event Detection, Clustering and Tracking. PhD thesis, Department of Computer Science, University of Massachusetts, 1999.
[11]
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Readings in speech recognition, pages 267--296. Morgan Kaufmann Publishers Inc., 1990.

Cited By

View all
  • (2013)Dynamic Constrained Optimization with offspring repair based Gravitational Search Algorithm2013 IEEE Congress on Evolutionary Computation10.1109/CEC.2013.6557858(2414-2421)Online publication date: Jun-2013
  • (2013)Evolutionary Optimization on Continuous Dynamic Constrained Problems - An AnalysisEvolutionary Computation for Dynamic Optimization Problems10.1007/978-3-642-38416-5_8(193-217)Online publication date: 2013
  • (2013)Differential Evolution and Offspring Repair Method Based Dynamic Constrained Optimization4th International Conference on Swarm, Evolutionary, and Memetic Computing - Volume 829710.1007/978-3-319-03753-0_27(298-309)Online publication date: 19-Dec-2013
  • Show More Cited By

Index Terms

  1. A genetic algorithm for dynamic modelling and prediction of activity in document streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation
    July 2007
    2313 pages
    ISBN:9781595936974
    DOI:10.1145/1276958
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. buzz detection
    2. event stream modelling
    3. evolutionary algorithms
    4. online text streams

    Qualifiers

    • Article

    Conference

    GECCO07
    Sponsor:

    Acceptance Rates

    GECCO '07 Paper Acceptance Rate 266 of 577 submissions, 46%;
    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)Dynamic Constrained Optimization with offspring repair based Gravitational Search Algorithm2013 IEEE Congress on Evolutionary Computation10.1109/CEC.2013.6557858(2414-2421)Online publication date: Jun-2013
    • (2013)Evolutionary Optimization on Continuous Dynamic Constrained Problems - An AnalysisEvolutionary Computation for Dynamic Optimization Problems10.1007/978-3-642-38416-5_8(193-217)Online publication date: 2013
    • (2013)Differential Evolution and Offspring Repair Method Based Dynamic Constrained Optimization4th International Conference on Swarm, Evolutionary, and Memetic Computing - Volume 829710.1007/978-3-319-03753-0_27(298-309)Online publication date: 19-Dec-2013
    • (2012)Continuous Dynamic Constrained Optimization—The ChallengesIEEE Transactions on Evolutionary Computation10.1109/TEVC.2011.218053316:6(769-786)Online publication date: 1-Dec-2012

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media