Article

SECRET: a scalable linear regression tree algorithm

Authors:

Johannes GehrkeAuthors Info & Claims

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 481 - 487

https://doi.org/10.1145/775047.775117

Published: 23 July 2002 Publication History

Abstract

Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable regression tree algorithm is known. This paper proposes a novel regression tree construction algorithm (SECRET) that produces trees of high quality and scales to very large datasets. At every node, SECRET uses the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters. Goodness of split measures, like the gini gain, can then be used to determine the split variable and the split point much like in classification tree construction. Scalability of the algorithm can be achieved by employing scalable versions of the EM and classification tree construction algorithms. An experimental evaluation on real and artificial data shows that SECRET has accuracy comparable to other linear regression tree algorithms but takes orders of magnitude less computation time for large datasets.

References

[1]

W. P. Alexander and S. D. Grimshaw. Treed regression. Journal of Computational and Graphical Statistics, (5):156--175, 1996.

[2]

P. S. Bradley, U. M. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Knowledge Discovery and Data Mining, pages 9--15, 1998.

[3]

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.

[4]

P. Chaudhuri, M.-C. Huang, W.-Y. Loh, and R. Yao. Piecewise-polynomial regression trees. Statistica Sinica, 4:143--167, 1994.

[5]

J. H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, 19:1--141 (with discussion), 1991.

[6]

K. Fukanaga. Introduction to Statistical Pattern Recognition, Second edition. Academic Press, 1990.

Digital Library

[7]

J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest -- a framework for fast decision tree construction of large datasets. In Proceedings of the 24th International Conference on Very Large Databases, pages 416--427. Morgan Kanfmarn, August 1998.

Digital Library

[8]

G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins, 1996.

[9]

A. Karalic. Linear regression in regression tree leaves. In International School for Synthesis of Expert Knowledge, Bled, Slovenia, 1992.

Digital Library

[10]

K.-C. Li, H.-H. Lue, and C.-H. Chen. Interactive tree-structured regression via principal hessian directions. journal of the American Statistical Association, (95):547--560, 2000.

[11]

W.-Y. Loh. Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 2002. in press.

[12]

W.-Y. Loh and Y.-S. Shih. Split selection methods for classification trees. Statistica Sinica, 7(4), October 1997.

[13]

S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 1997.

Digital Library

[14]

J. R. Quinlan. Learning with Continuous Classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343--348, 1992.

[15]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.

Digital Library

[16]

L. Torgo. Functional models for regression tree leaves. In Proc. l4th International Conference on Machine Learning, pages 385--393. Morgan Kaufmann, 1997.

Digital Library

[17]

L. Torgo. Kernel regression trees. In European Conference on Machine Learning, 1997. Poster paper.

[18]

L. Torgo. A comparative study of reliable error estimators for pruning regression trees. Iberoamerican Conf. on Artificial Intelligence. Springer-Verlag, 1998.

Digital Library

Cited By

Raymaekers JRousseeuw PVerdonck TYao R(2024)Fast linear model trees by PILOTMachine Learning10.1007/s10994-024-06590-3Online publication date: 8-Jul-2024
https://doi.org/10.1007/s10994-024-06590-3
Lu JXu WZhou KGuo Z(2023)Frequent Itemset Mining Algorithm Based on Linear TableJournal of Database Management10.4018/JDM.31845034:1(1-21)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.4018/JDM.318450
Karapanagiotis CKrebber K(2023)Machine Learning Approaches in Brillouin Distributed Fiber Optic SensorsSensors10.3390/s2313618723:13(6187)Online publication date: 6-Jul-2023
https://doi.org/10.3390/s23136187
Show More Cited By

Index Terms

SECRET: a scalable linear regression tree algorithm

Recommendations

Adobe Flash Professional CS5 Classroom in a Book
Optimal Information Rate of Secret Sharing Schemes on Trees

The information rate for an access structure is the reciprocal of the load of the optimal secret sharing scheme for this structure. We determine this value for all trees: it is $(2-1/c)^{-1}$, where $c$ is the size of the largest core of the tree. A ...
Hierarchical Scheme for Secret Separation Based on Computable Access Labels
Abstract
In this paper, we propose a threshold scheme for sharing a secret between tree-like graph nodes. Nodes whose descendants cover the entire set of leaf nodes can recover the secret. Otherwise, the nodes do not receive any information about the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

July 2002

719 pages

ISBN:158113567X

DOI:10.1145/775047

Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

KDD02

Sponsor:

KDD02: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 23 - 26, 2002

Alberta, Edmonton, Canada

Acceptance Rates

KDD '02 Paper Acceptance Rate 44 of 307 submissions, 14%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
932
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Raymaekers JRousseeuw PVerdonck TYao R(2024)Fast linear model trees by PILOTMachine Learning10.1007/s10994-024-06590-3Online publication date: 8-Jul-2024
https://doi.org/10.1007/s10994-024-06590-3
Lu JXu WZhou KGuo Z(2023)Frequent Itemset Mining Algorithm Based on Linear TableJournal of Database Management10.4018/JDM.31845034:1(1-21)Online publication date: 24-Feb-2023
https://dl.acm.org/doi/10.4018/JDM.318450
Karapanagiotis CKrebber K(2023)Machine Learning Approaches in Brillouin Distributed Fiber Optic SensorsSensors10.3390/s2313618723:13(6187)Online publication date: 6-Jul-2023
https://doi.org/10.3390/s23136187
Lehrer SXie T(2022)The Bigger PictureManagement Science10.1287/mnsc.2020.391168:1(189-210)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1287/mnsc.2020.3911
Chen XZeng YKang SJin R(2022)INN: An Interpretable Neural Network for AI Incubation in ManufacturingACM Transactions on Intelligent Systems and Technology10.1145/351931313:5(1-23)Online publication date: 21-Jun-2022
https://dl.acm.org/doi/10.1145/3519313
Haque MHossain TSarker MPaul MHoque MUddin SSuman AMd Saad MUl Huque T(2022)A hybrid approach to enhance the lifespan of WSNs in nuclear power plant monitoring systemScientific Reports10.1038/s41598-022-08075-612:1Online publication date: 14-Mar-2022
https://doi.org/10.1038/s41598-022-08075-6
Schone MKohlhase M(2021)Curvature-Oriented Splitting for Multivariate Model Trees2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9659858(01-09)Online publication date: 5-Dec-2021
https://doi.org/10.1109/SSCI50451.2021.9659858
Tao QLi ZXu JXie NWang SSuykens J(2021)Learning with continuous piecewise linear decision treesExpert Systems with Applications10.1016/j.eswa.2020.114214168(114214)Online publication date: Apr-2021
https://doi.org/10.1016/j.eswa.2020.114214
Sun XDavis JSchulte OLiu GGupta RLiu YShah MRajan STang JPrakash B(2020)Cracking the Black BoxProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403367(3154-3162)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403367
Wu DLin CHuang JZeng Z(2020)On the Functional Equivalence of TSK Fuzzy Systems to Neural Networks, Mixture of Experts, CART, and Stacking Ensemble RegressionIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2019.294169728:10(2570-2580)Online publication date: Oct-2020
https://doi.org/10.1109/TFUZZ.2019.2941697
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents