Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1516360.1516397acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Indexing density models for incremental learning and anytime classification on data streams

Published: 24 March 2009 Publication History

Abstract

Classification of streaming data faces three basic challenges: it has to deal with huge amounts of data, the varying time between two stream data items must be used best possible (anytime classification) and additional training data must be incrementally learned (anytime learning) for applying the classifier consistently to fast data streams. In this work, we propose a novel index-based technique that can handle all three of the above challenges using the established Bayes classifier on effective kernel density estimators. Our novel Bayes tree automatically generates (adapted efficiently to the individual object to be classified) a hierarchy of mixture densities that represent kernel density estimators at successively coarser levels. Our probability density queries together with novel classification improvement strategies provide the necessary information for very effective classification at any point of interruption. Moreover, we propose a novel evaluation method for anytime classification using Poisson streams and demonstrate the anytime learning performance of the Bayes tree.

References

[1]
D. Andre and P. Stone. Physiological data modeling contest (ICML-2004): http://www.cs.utexas.edu/users/pstone/workshops/2004icml/, 2004.
[2]
T. Bayes. An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society, 53:370--418, 1763.
[3]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990.
[4]
C. Böhm, A. Pryakhin, and M. Schubert. The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors. ICDE, 2006.
[5]
J. S. Breese and E. Horvitz. Ideal reformulation of belief networks. In UAI, pages 129--144, 1990.
[6]
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. DMKD, 2(2):121--167, 1998.
[7]
Y. Cao and J. Wu. Projective art for clustering data sets in high dimensional spaces. Neural Networks, 15(1):105--120, 2002.
[8]
K. Crammer, J. S. Kandola, and Y. Singer. Online classification on a budget. In NIPS, 2003.
[9]
D. DeCoste. Anytime interval-valued outputs for kernel machines: Fast support vector machine classification via distance geometry. In ICML, 2002.
[10]
P. Domingos and G. Hulten. Mining high-speed data streams. In KDD, pages 71--80, 2000.
[11]
P. Domingos and G. Hulten. Learning from infinite data in finite time. In NIPS, pages 673--680, 2001.
[12]
R. Duda, P. Hart, and D. Stork. Pattern Classification (2nd Edition). Wiley, 2000.
[13]
S. Esmeir and S. Markovitch. Interruptible anytime algorithms for iterative improvement of decision trees. In UBDM workshop at KDD, 2005.
[14]
A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SDM, 2003.
[15]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[16]
T. Hastie, R. Tibshirani, and J. H. Friedman. Datasets for "The Elements of Statistical Learning": http://www-stat.stanford.edu/~tibs/elemstatlearn/.
[17]
T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2002.
[18]
S. Hettich and S. Bay. The UCI KDD archive http://kdd.ics.uci.edu, 1999.
[19]
G. Hulten and P. Domingos. Mining complex models from arbitrarily large databases in constant time. In KDD, pages 525--531, 2002.
[20]
M. Jordan and R. Jacobs. Hierarchical Mixtures of Experts and the EM Algorithm. Graphical Models: Foundations of Neural Computation, 2001.
[21]
P. Kranen, D. Kensche, S. Kim, N. Zimmermann, E. Muller, C. Quix, X. Li, T. Gries, T. Seidl, M. Jarke, and S. Leonhardt. Mobile mining and information management in healthnet scenarios. In MDM, 2008.
[22]
C.-L. Liu and M. P. Wellman. On state-space abstraction for anytime evaluation of bayesian networks. SIGART Bulletin, 7(2):50--57, 1996.
[23]
K. Myers, M. J. Kearns, S. P. Singh, and M. A. Walker. A boosting approach to topic spotting on subdialogues. In ICML, pages 655--662, 2000.
[24]
J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.
[25]
T. Seidl. Nearest Neighbor Classification / Liu L., Özsu M. T. (eds.): Encyclopedia of Database Systems. (to appear). Springer, 2009.
[26]
T. Seidl and K. H.-P. Optimal multi-step k-nearest neighbor search. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, Washington, pages 154--165, 1998.
[27]
L. Silva, J. M. de Sa, and L. Alexandre. Neural network classification using shannonŠs entropy. In ESANN, 2005.
[28]
B. Silverman. Density Estimation for Statistics and Data Analysis. 1986.
[29]
W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In KDD, pages 377--382, 2001.
[30]
K. Ueno, X. Xi, E. Keogh, and D. Lee. Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining. ICDM, pages 623--632, 2006.
[31]
W. Wahlster. Verbmobil: Foundations of Speech-To-Speech Translation. Springer, 2000.
[32]
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In KDD, pages 226--235, 2003.
[33]
G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69--101, 1996.
[34]
Y. Yang, G. Webb, K. Korb, and K. M. Ting. Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1):35--53, 2007.
[35]
T. Zhang, R. Ramakrishnan, and M. Livny. Fast density estimation using cf-kernel for very large databases. In KDD, pages 312--316, 1999.
[36]
S. Zilberstein. Using anytime algorithms in intelligent systems. The AI magazine, 17(3):73--83, 1996.

Cited By

View all
  • (2023)Learning framework based on ER Rule for data streams with generalized feature spacesInformation Sciences10.1016/j.ins.2023.119604649(119604)Online publication date: Nov-2023
  • (2023)Auxiliary Network: Scalable and Agile Online Learning for Dynamic System with Inconsistently Available InputsNeural Information Processing10.1007/978-3-031-30105-6_46(549-561)Online publication date: 13-Apr-2023
  • (2022)Prediction With Unpredictable Feature EvolutionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307131133:10(5706-5715)Online publication date: Oct-2022
  • Show More Cited By
  1. Indexing density models for incremental learning and anytime classification on data streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    EDBT/ICDT '09
    EDBT/ICDT '09: EDBT/ICDT '09 joint conference
    March 24 - 26, 2009
    Saint Petersburg, Russia

    Acceptance Rates

    Overall Acceptance Rate 7 of 10 submissions, 70%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Learning framework based on ER Rule for data streams with generalized feature spacesInformation Sciences10.1016/j.ins.2023.119604649(119604)Online publication date: Nov-2023
    • (2023)Auxiliary Network: Scalable and Agile Online Learning for Dynamic System with Inconsistently Available InputsNeural Information Processing10.1007/978-3-031-30105-6_46(549-561)Online publication date: 13-Apr-2023
    • (2022)Prediction With Unpredictable Feature EvolutionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.307131133:10(5706-5715)Online publication date: Oct-2022
    • (2021)Learning With Feature Evolvable StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295409033:6(2602-2615)Online publication date: 1-Jun-2021
    • (2021)Anytime clustering of data streams while handling noise and concept driftJournal of Experimental & Theoretical Artificial Intelligence10.1080/0952813X.2021.188200134:3(399-429)Online publication date: 15-Mar-2021
    • (2021)Classifying Potentially Unbounded Hierarchical Data Streams with Incremental Gaussian Naive BayesIntelligent Systems10.1007/978-3-030-91702-9_28(421-436)Online publication date: 28-Nov-2021
    • (2020)Anytime Frequent Itemset Mining of Transactional Data StreamsBig Data Research10.1016/j.bdr.2020.100146(100146)Online publication date: Jul-2020
    • (2019)Mining Data StreamsSentiment Analysis and Knowledge Discovery in Contemporary Business10.4018/978-1-5225-4999-4.ch014(251-278)Online publication date: 2019
    • (2018)Knowledge Discovery Using Data Stream MiningSocial Network Analytics for Contemporary Business Organizations10.4018/978-1-5225-5097-6.ch012(231-258)Online publication date: 2018
    • (2018)Large-Scale Indexing, Discovery, and Ranking for the Internet of Things (IoT)ACM Computing Surveys10.1145/315452551:2(1-53)Online publication date: 12-Mar-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media