research-article

Adaptive Model Rules From High-Speed Data Streams

Authors:

Albert BifetAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 10, Issue 3

Article No.: 30, Pages 1 - 22

https://doi.org/10.1145/2829955

Published: 29 January 2016 Publication History

Abstract

Decision rules are one of the most expressive and interpretable models for machine learning. In this article, we present Adaptive Model Rules (AMRules), the first stream rule learning algorithm for regression problems. In AMRules, the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. In order to maintain a regression model compatible with the most recent state of the process generating data, each rule uses a Page-Hinkley test to detect changes in this process and react to changes by pruning the rule set. Online learning might be strongly affected by outliers. AMRules is also equipped with outliers detection mechanisms to avoid model adaption using anomalous examples. In the experimental section, we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms.

References

[1]

Ezilda Almeida, Carlos Abreu Ferreira, and João Gama. 2013a. Adaptive model rules from data streams. In Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science), Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Zelezný (Eds.), Vol. 8188. Springer, 480--492.

Digital Library

[2]

Ezilda Almeida, Petr Kosina, and João Gama. 2013b. Random rules from data streams. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC'13, Coimbra, Portugal, March 18-22, 2013, Sung Y. Shin and José Carlos Maldonado (Eds.). ACM, 813--814.

Digital Library

[3]

K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.

[4]

B. B. Bhattacharyya. 1987. One sided Chebyshev inequality when the first four moments are known. Commun. Statist.—Theory Methods 16, 9 (1987), 2789--2791.

[5]

Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. J. Mach. Learn. Res. 11 (2010), 1601--1604.

Digital Library

[6]

Leo Breiman. 1996. Bagging predictors. Mach. Learn. 24, 2 (1996), 123--140.

Digital Library

[7]

Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32.

Digital Library

[8]

L. Breiman, J. Friedman, R. Olshen, and C. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. 238 pages.

[9]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (July 2009), Article 15, 58 pages.

Digital Library

[10]

Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the ACM 6th International Conference on Knowledge Discovery and Data Mining, Ismail Parsa, Raghu Ramakrishnan, and Sal Stolfo (Eds.). ACM Press, Boston, MA, USA, 71--80.

Digital Library

[11]

Francisco J. Ferrer-Troyano, Jesús S Aguilar-Ruiz, and José Cristóbal Riquelme Santos. 2005. Incremental rule learning and border examples selection from numerical data streams. J. Universal Comput. Sci. 11, 8 (2005), 1426--1439.

[12]

Eibe Frank, Yong Wang, Stuart Inglis, Geoffrey Holmes, and Ian H. Witten. 1998. Using model trees for classification. Mach. Learn. 32, 1 (1998), 63--76.

Digital Library

[13]

Johannes Fürnkranz, Dragan Gamberger, and Nada Lavra. 2012. Foundations of Rule Learning. Springer.

Digital Library

[14]

João Gama. 2010. Knowledge Discovery from Data Streams. CRC Press.

Digital Library

[15]

João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2013. On evaluating stream learning algorithms. Mach. Learn. 90, 3 (2013), 317--346.

Digital Library

[16]

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 1 (2009), 10--18.

Digital Library

[17]

V. J. Hodge and J. Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Rev. 22, 2 (2004), 85--126.

Digital Library

[18]

Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. J. Am. Statist. Assoc. 58, 301 (1963), 13--30.

[19]

Elena Ikonomovska, João Gama, and Saso Dzeroski. 2011. Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 1 (2011), 128--168.

Digital Library

[20]

Ron Kohavi. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence—Volume 2 (IJCAI'95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137--1143.

Digital Library

[21]

P. Kosina and J. Gama. 2012. Very fast decision rules for multi-class problems. In Proceedings of the 2012 ACM Symposium on Applied Computing. ACM, New York, NY, USA, 795--800.

Digital Library

[22]

A. Liaw and M. Wiener. 2002. Classification and regression by random forest. R News 2, 3 (2002), 18--22.

[23]

H. Mouss, D. Mouss, N. Mouss, and L. Sefouhi. 2004. Test of page-Hinkley, an approach for fault detection in an agro-alimentary production system. In Proceedings of the Asian Control Conference, Vol. 2. INTER-RESEARCH, 815--818.

[24]

ElMoustapha Ould-Ahmed-Vall, James Woodlee, Charles Yount, Kshitij A. Doshi, and Seth Abraham. 2007. Using model trees for computer architecture performance analysis of software applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software ISPASS 2007. IEEE, 116--125.

[25]

E. S. Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100--115.

[26]

Duncan Potts and Claude Sammut. 2005. Incremental learning of linear model trees. Mach. Learn. 61, 1--3 (2005), 5--48.

Digital Library

[27]

J. R. Quinlan. 1992. Learning with continuous classes. In Proceedings of the Australian Joint Conference for Artificial Intelligence. World Scientific, 343--348.

[28]

J. Ross Quinlan. 1993a. Combining instance-based and model-based learning. In Proceedings of the 10th International Conference on Machine Learning, University of Massachusetts, Amherst, MA, USA, June 27--29, 1993. Morgan Kaufmann, 236--243.

[29]

R. Quinlan. 1993b. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo, CA.

Digital Library

[30]

Ammar Shaker and Eyke Hüllermeier. 2012. IBLStreams: A system for instance-based classification and regression on data streams. Evol. Syst. 3, 4 (2012), 235--249.

[31]

Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, Washington, D.C., 226--235.

Digital Library

[32]

Sholom M. Weiss and Nitin Indurkhya. 1995. Rule-based machine learning methods for functional prediction. J. Artificial Intelligence Res. 3, 1 (1995), 383--403.

Digital Library

[33]

C. J. Willmott and K. Matsuura. 2005. Advantages of the mean absolute error (MAE) over the mean square error (RMSE) in assessing average model performance. Climate Res. 30, 1 (2005), 79--82.

Cited By

Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3662186
Lee JSim MHong J(2024)Assessing Decision Tree Stability: A Comprehensive Method for Generating a Stable Decision TreeIEEE Access10.1109/ACCESS.2024.341922812(90061-90072)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3419228
Gama JRibeiro RMastelini SDavari NVeloso B(2024)From fault detection to anomaly explanation: A case study on predictive maintenanceJournal of Web Semantics10.1016/j.websem.2024.10082181(100821)Online publication date: Jul-2024
https://doi.org/10.1016/j.websem.2024.100821
Show More Cited By

Index Terms

Adaptive Model Rules From High-Speed Data Streams

Recommendations

Adaptive model rules from data streams
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Decision rules are one of the most expressive languages for machine learning. In this paper we present Adaptive Model Rules (AMRules), the first streaming rule learning algorithm for regression problems. In AMRules the antecedent of a rule is a ...
Learning model rules from high-speed data streams
UDM'13: Proceedings of the 3rd International Conference on Ubiquitous Data Mining - Volume 1088

Decision rules are one of the most expressive languages for machine learning. In this paper we present Adaptive Model Rules (AMRules), the first streaming rule learning algorithm for regression problems. In AMRules the antecedent of a rule is a ...
Random rules from data streams
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

Existing works suggest that random inputs and random features produce good results in classification. In this paper we study the problem of generating random rule sets from data streams. One of the most interpretable and flexible models for data stream ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 10, Issue 3

February 2016

358 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/2888412

Editor:
Philip S. Yu
University of Illinois at Chicago, USA

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2016

Accepted: 01 September 2015

Revised: 01 February 2015

Received: 01 October 2014

Published in TKDD Volume 10, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Fundação para a Ciência e Tecnologia
European Comission

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
701
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qian YZhang ZZhao PZhou Z(2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.1145/3662186
Lee JSim MHong J(2024)Assessing Decision Tree Stability: A Comprehensive Method for Generating a Stable Decision TreeIEEE Access10.1109/ACCESS.2024.341922812(90061-90072)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3419228
Gama JRibeiro RMastelini SDavari NVeloso B(2024)From fault detection to anomaly explanation: A case study on predictive maintenanceJournal of Web Semantics10.1016/j.websem.2024.10082181(100821)Online publication date: Jul-2024
https://doi.org/10.1016/j.websem.2024.100821
da Silva BCiarelli P(2024)A fast online stacked regressor to handle concept driftsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107757131:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107757
Moya AVeloso BGama JVentura S(2024)Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approachData Mining and Knowledge Discovery10.1007/s10618-023-00997-738:3(1289-1315)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s10618-023-00997-7
Jakubowski JBobek SNalepa G(2023)Poster: Human-in-the-Loop Anomaly Detection in Industrial Data StreamsProceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter10.1145/3605390.3610830(1-2)Online publication date: 20-Sep-2023
https://dl.acm.org/doi/10.1145/3605390.3610830
Martiello Mastelini SNakano FVens Cde Leon Ferreira de Carvalho A(2023)Online Extra Trees RegressorIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321285934:10(6755-6767)Online publication date: Oct-2023
https://doi.org/10.1109/TNNLS.2022.3212859
Song YLu JLu HZhang G(2023)Learning Data Streams With Changing Distributions and Temporal DependencyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312253134:8(3952-3965)Online publication date: Aug-2023
https://doi.org/10.1109/TNNLS.2021.3122531
Wu ZLoo CPasupa K(2023)Correlated Online k-Nearest Neighbors Regressor Chain for Online Multi-output RegressionNeural Information Processing10.1007/978-981-99-8067-3_3(28-39)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.1007/978-981-99-8067-3_3
Silva MVeloso BGama J(2023)Predictive Maintenance, Adversarial Autoencoders and ExplainabilityMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43430-3_16(260-275)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43430-3_16
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents