Abstract
In this paper, we report our success in building efficient scalable classifiers by exploring the capabilities of modern relational database management systems (RDBMS). In addition to high classification accuracy, the unique features of the approach include its high training speed, linear scalability, and simplicity in implementation. More importantly, the major computation required in the approach can be implemented using standard functions provided by the modern relational DBMS. Besides, with the effective rule pruning strategy, the algorithm proposed in this paper can produce a compact set of classification rules. The results of experiments conducted for performance evaluation and analysis are presented.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Shim K. Developing tightly-coupled data mining applications on a relational database system. InProceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, August, 1996, pp.112–118.
Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. InProceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, USA, 1998, pp.80–86.
Meretakis D, Wüthrich B. Extending naïve Bayes classifiers using long itemsets. InProceedings of 5th International Conference on Knowledge Discovery and Data Mining, San Diego, California, August, 1999, pp.295–301.
Wang M, Iyer B, Vitter J S. Scalable mining for classification rules in relational databases. InProceedings of the 1998 International Database Engineering and Applications Symposium, Barry Eaglestone, Bipin C Desai, Jianhua Shao (eds.), Cardiff, Wales, U.K.,IEEE Computer Society, 1998, pp.58–67.
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. InProceedings of ACM SIGMOD International Conference of Management of Data, Washington D.C., May 1993, pp.207–216.
Mehta M, Agrawal R, Rissanen J. SLIQ: A fast scalable classifier for data mining. InProceedings of the 5th International Conference on Extending Database Technology, Avignon, France, March, 1996, 18–33.
Hongjun Lu, Hongyan Liu, Decision tables; Scalable classification exploring RDBMS capabilities. InProceedings of the 16th International Conference on Very Large Databases, Cairo, Egypt, 2000, pp.373–384.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the Tsinghua University 985 Basic Research Project (No.091101004).
LIU Hongyan received her Ph.D. degree in management science and engineering from Tsinghua University in 2000. Now she is a lecturer in the School of Economics and Management, Tsinghua University. Her major research interests include database, neural network, data warehousing, and data mining.
LU Hongjun received his Ph.D. degree in computer science from the University of Wisconsin in 1985. Now he is a professor in the Computer Science Department, Hong Kong University of Science and Technology. His major research interests include data/knowledge base management systems, physical database design and database performance, data warehousing, and data mining.
CHEN Jian received his Ph.D. degree in system engineering from Tsinghua University in 1989. Now he is a full professor and Chairman of the Department of Management Science and Engineering, Tsinghua University. His main research interests include supply chain management, E-commerce, decision support systems and information systems, forecast and optimization techniques.
Rights and permissions
About this article
Cite this article
Liu, H., Lu, H. & Chen, J. A fast scalable classifier tightly integrated with RDBMS. J. Comput. Sci. & Technol. 17, 152–159 (2002). https://doi.org/10.1007/BF02962207
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02962207