Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Adaptive Model Rules From High-Speed Data Streams

Published: 29 January 2016 Publication History

Abstract

Decision rules are one of the most expressive and interpretable models for machine learning. In this article, we present Adaptive Model Rules (AMRules), the first stream rule learning algorithm for regression problems. In AMRules, the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. In order to maintain a regression model compatible with the most recent state of the process generating data, each rule uses a Page-Hinkley test to detect changes in this process and react to changes by pruning the rule set. Online learning might be strongly affected by outliers. AMRules is also equipped with outliers detection mechanisms to avoid model adaption using anomalous examples. In the experimental section, we report the results of AMRules on benchmark regression problems, and compare the performance of our system with other streaming regression algorithms.

References

[1]
Ezilda Almeida, Carlos Abreu Ferreira, and João Gama. 2013a. Adaptive model rules from data streams. In Machine Learning and Knowledge Discovery in Databases (Lecture Notes in Computer Science), Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Zelezný (Eds.), Vol. 8188. Springer, 480--492.
[2]
Ezilda Almeida, Petr Kosina, and João Gama. 2013b. Random rules from data streams. In Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC'13, Coimbra, Portugal, March 18-22, 2013, Sung Y. Shin and José Carlos Maldonado (Eds.). ACM, 813--814.
[3]
K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.
[4]
B. B. Bhattacharyya. 1987. One sided Chebyshev inequality when the first four moments are known. Commun. Statist.—Theory Methods 16, 9 (1987), 2789--2791.
[5]
Albert Bifet, Geoff Holmes, Richard Kirkby, and Bernhard Pfahringer. 2010. MOA: Massive online analysis. J. Mach. Learn. Res. 11 (2010), 1601--1604.
[6]
Leo Breiman. 1996. Bagging predictors. Mach. Learn. 24, 2 (1996), 123--140.
[7]
Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32.
[8]
L. Breiman, J. Friedman, R. Olshen, and C. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. 238 pages.
[9]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (July 2009), Article 15, 58 pages.
[10]
Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the ACM 6th International Conference on Knowledge Discovery and Data Mining, Ismail Parsa, Raghu Ramakrishnan, and Sal Stolfo (Eds.). ACM Press, Boston, MA, USA, 71--80.
[11]
Francisco J. Ferrer-Troyano, Jesús S Aguilar-Ruiz, and José Cristóbal Riquelme Santos. 2005. Incremental rule learning and border examples selection from numerical data streams. J. Universal Comput. Sci. 11, 8 (2005), 1426--1439.
[12]
Eibe Frank, Yong Wang, Stuart Inglis, Geoffrey Holmes, and Ian H. Witten. 1998. Using model trees for classification. Mach. Learn. 32, 1 (1998), 63--76.
[13]
Johannes Fürnkranz, Dragan Gamberger, and Nada Lavra. 2012. Foundations of Rule Learning. Springer.
[14]
João Gama. 2010. Knowledge Discovery from Data Streams. CRC Press.
[15]
João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2013. On evaluating stream learning algorithms. Mach. Learn. 90, 3 (2013), 317--346.
[16]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 1 (2009), 10--18.
[17]
V. J. Hodge and J. Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Rev. 22, 2 (2004), 85--126.
[18]
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. J. Am. Statist. Assoc. 58, 301 (1963), 13--30.
[19]
Elena Ikonomovska, João Gama, and Saso Dzeroski. 2011. Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 1 (2011), 128--168.
[20]
Ron Kohavi. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence—Volume 2 (IJCAI'95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1137--1143.
[21]
P. Kosina and J. Gama. 2012. Very fast decision rules for multi-class problems. In Proceedings of the 2012 ACM Symposium on Applied Computing. ACM, New York, NY, USA, 795--800.
[22]
A. Liaw and M. Wiener. 2002. Classification and regression by random forest. R News 2, 3 (2002), 18--22.
[23]
H. Mouss, D. Mouss, N. Mouss, and L. Sefouhi. 2004. Test of page-Hinkley, an approach for fault detection in an agro-alimentary production system. In Proceedings of the Asian Control Conference, Vol. 2. INTER-RESEARCH, 815--818.
[24]
ElMoustapha Ould-Ahmed-Vall, James Woodlee, Charles Yount, Kshitij A. Doshi, and Seth Abraham. 2007. Using model trees for computer architecture performance analysis of software applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software ISPASS 2007. IEEE, 116--125.
[25]
E. S. Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100--115.
[26]
Duncan Potts and Claude Sammut. 2005. Incremental learning of linear model trees. Mach. Learn. 61, 1--3 (2005), 5--48.
[27]
J. R. Quinlan. 1992. Learning with continuous classes. In Proceedings of the Australian Joint Conference for Artificial Intelligence. World Scientific, 343--348.
[28]
J. Ross Quinlan. 1993a. Combining instance-based and model-based learning. In Proceedings of the 10th International Conference on Machine Learning, University of Massachusetts, Amherst, MA, USA, June 27--29, 1993. Morgan Kaufmann, 236--243.
[29]
R. Quinlan. 1993b. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo, CA.
[30]
Ammar Shaker and Eyke Hüllermeier. 2012. IBLStreams: A system for instance-based classification and regression on data streams. Evol. Syst. 3, 4 (2012), 235--249.
[31]
Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, Washington, D.C., 226--235.
[32]
Sholom M. Weiss and Nitin Indurkhya. 1995. Rule-based machine learning methods for functional prediction. J. Artificial Intelligence Res. 3, 1 (1995), 383--403.
[33]
C. J. Willmott and K. Matsuura. 2005. Advantages of the mean absolute error (MAE) over the mean square error (RMSE) in assessing average model performance. Climate Res. 30, 1 (2005), 79--82.

Cited By

View all
  • (2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
  • (2024)Assessing Decision Tree Stability: A Comprehensive Method for Generating a Stable Decision TreeIEEE Access10.1109/ACCESS.2024.341922812(90061-90072)Online publication date: 2024
  • (2024)From fault detection to anomaly explanation: A case study on predictive maintenanceJournal of Web Semantics10.1016/j.websem.2024.10082181(100821)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 3
February 2016
358 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2888412
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2016
Accepted: 01 September 2015
Revised: 01 February 2015
Received: 01 October 2014
Published in TKDD Volume 10, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data streams
  2. regression
  3. rule learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Fundação para a Ciência e Tecnologia
  • European Comission

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Learning with Asynchronous LabelsACM Transactions on Knowledge Discovery from Data10.1145/366218618:8(1-27)Online publication date: 3-May-2024
  • (2024)Assessing Decision Tree Stability: A Comprehensive Method for Generating a Stable Decision TreeIEEE Access10.1109/ACCESS.2024.341922812(90061-90072)Online publication date: 2024
  • (2024)From fault detection to anomaly explanation: A case study on predictive maintenanceJournal of Web Semantics10.1016/j.websem.2024.10082181(100821)Online publication date: Jul-2024
  • (2024)A fast online stacked regressor to handle concept driftsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107757131:COnline publication date: 1-May-2024
  • (2024)Improving hyper-parameter self-tuning for data streams by adapting an evolutionary approachData Mining and Knowledge Discovery10.1007/s10618-023-00997-738:3(1289-1315)Online publication date: 1-May-2024
  • (2023)Poster: Human-in-the-Loop Anomaly Detection in Industrial Data StreamsProceedings of the 15th Biannual Conference of the Italian SIGCHI Chapter10.1145/3605390.3610830(1-2)Online publication date: 20-Sep-2023
  • (2023)Online Extra Trees RegressorIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321285934:10(6755-6767)Online publication date: Oct-2023
  • (2023)Learning Data Streams With Changing Distributions and Temporal DependencyIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.312253134:8(3952-3965)Online publication date: Aug-2023
  • (2023)Correlated Online k-Nearest Neighbors Regressor Chain for Online Multi-output RegressionNeural Information Processing10.1007/978-981-99-8067-3_3(28-39)Online publication date: 20-Nov-2023
  • (2023)Predictive Maintenance, Adversarial Autoencoders and ExplainabilityMachine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track10.1007/978-3-031-43430-3_16(260-275)Online publication date: 18-Sep-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media