Abstract
The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chan, P., and Stolfo, S. (1993). “Towards Parallel and Distributed Learning by Meta-Learning.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAAI-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 227–240.
Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. (1991). “Knowledge Discovery in Databases: An Overview.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Gebhardt, F. (1991). “Choosing among Competing Generalizations.”Knowledge Acquisition 3, pp. 361–380.
Gebhardt, F. (1994). “Discovering interesting statements from a database.”Applied Stochastic Models and Data Analysis 10 (1).
Hoschka, P., and Klösgen, W. (1991). “A Support System for Interpreting Statistical Data.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Klösgen, W. (1992a). “Problems for Knowledge Discovery in Databases and their Treatment in the Statistics Interpreter EXPLORA.”International Journal for Intelligent Systems vol. 7(7), pp. 649–673.
Klösgen, W. (1992b). “Patterns for Knowledge Discovery in Databases.” In Zytkow, J. (Ed.),Proc. ML-92 Workshop on Machine Discovery, pp. 1–10. National Institute for Aviation Research, Wichita, KS.
Klösgen, W. (1993).Explora: A support system for Discovery in Databases, Version 1.1, User Manual. GMD, Sankt Augustin.
Koopmans, L.H. (1981).An Introduction to Contemporary Statistics. Duxbury Press, Boston, MA.
Major, J.A., and Mangano, J.J. (1994). this issue.
Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. (1993). “Systems for Knowledge Discovery in Databases.” IEEE TKDE special issue onLearning and Discovery in Knowledge-Based Databases.
Merzbacher, M., and Chu, W. (1993). “Pattern-Based Clustering for Database Attribute Values.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAA1-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 291–298.
Morik, K., Wrobel, S., Kietz, J. U., and Emde, W. (1993).Knowledge Acquisition and Machine Learning: Theory, Methods and Applications. Academic Press, New York.
Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.) (1991),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Piatetsky-Shapiro, G. and Matheus, C. J. (1992). “Knowledge Discovery Workbench for Exploring Business Databases.”International Journal for Intelligent Systems vol. 7(7), pp. 675–686.
Quinlan, J. R. (1990). “Learning Logical Definitions from Relations.”Machine Learning 5(3), pp. 239–266.
Valdes-Perez, R., Simon, H., and Zytkow, J. (1993). “Scientific Model Building as Search in Matrix Spaces.” InProc. Eleventh National Conference on Artificial Intelligence, pp. 472–478.
Zytkow, J. (Ed.) (1992).Proc. ML-92 Workshop on Machine Discovery. “National Institute for Aviation Research,” Wichita, KS.
Zytkow, J., and Baker, J. (1991). “Interactive Mining of Regularities in Databases.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Zytkow, J., and Zembowicz, R. (1993). “Database Exploration in Search of Regularities.”Journal of Intelligent Information Systems 2, pp. 39–81.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Klösgen, W. Efficient discovery of interesting statements in databases. J Intell Inf Syst 4, 53–69 (1995). https://doi.org/10.1007/BF00962822
Issue Date:
DOI: https://doi.org/10.1007/BF00962822