Efficient discovery of interesting statements in databases

Klösgen, Willi

doi:10.1007/BF00962822

Efficient discovery of interesting statements in databases

Published: January 1995

Volume 4, pages 53–69, (1995)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Willi Klösgen¹

74 Accesses
Explore all metrics

Abstract

The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Chan, P., and Stolfo, S. (1993). “Towards Parallel and Distributed Learning by Meta-Learning.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAAI-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 227–240.
Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. (1991). “Knowledge Discovery in Databases: An Overview.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Google Scholar
Gebhardt, F. (1991). “Choosing among Competing Generalizations.”Knowledge Acquisition 3, pp. 361–380.
Google Scholar
Gebhardt, F. (1994). “Discovering interesting statements from a database.”Applied Stochastic Models and Data Analysis 10 (1).
Hoschka, P., and Klösgen, W. (1991). “A Support System for Interpreting Statistical Data.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Google Scholar
Klösgen, W. (1992a). “Problems for Knowledge Discovery in Databases and their Treatment in the Statistics Interpreter EXPLORA.”International Journal for Intelligent Systems vol. 7(7), pp. 649–673.
Google Scholar
Klösgen, W. (1992b). “Patterns for Knowledge Discovery in Databases.” In Zytkow, J. (Ed.),Proc. ML-92 Workshop on Machine Discovery, pp. 1–10. National Institute for Aviation Research, Wichita, KS.
Google Scholar
Klösgen, W. (1993).Explora: A support system for Discovery in Databases, Version 1.1, User Manual. GMD, Sankt Augustin.
Google Scholar
Koopmans, L.H. (1981).An Introduction to Contemporary Statistics. Duxbury Press, Boston, MA.
Google Scholar
Major, J.A., and Mangano, J.J. (1994). this issue.
Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. (1993). “Systems for Knowledge Discovery in Databases.” IEEE TKDE special issue onLearning and Discovery in Knowledge-Based Databases.
Merzbacher, M., and Chu, W. (1993). “Pattern-Based Clustering for Database Attribute Values.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAA1-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 291–298.
Morik, K., Wrobel, S., Kietz, J. U., and Emde, W. (1993).Knowledge Acquisition and Machine Learning: Theory, Methods and Applications. Academic Press, New York.
Google Scholar
Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.) (1991),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Google Scholar
Piatetsky-Shapiro, G. and Matheus, C. J. (1992). “Knowledge Discovery Workbench for Exploring Business Databases.”International Journal for Intelligent Systems vol. 7(7), pp. 675–686.
Google Scholar
Quinlan, J. R. (1990). “Learning Logical Definitions from Relations.”Machine Learning 5(3), pp. 239–266.
Google Scholar
Valdes-Perez, R., Simon, H., and Zytkow, J. (1993). “Scientific Model Building as Search in Matrix Spaces.” InProc. Eleventh National Conference on Artificial Intelligence, pp. 472–478.
Zytkow, J. (Ed.) (1992).Proc. ML-92 Workshop on Machine Discovery. “National Institute for Aviation Research,” Wichita, KS.
Google Scholar
Zytkow, J., and Baker, J. (1991). “Interactive Mining of Regularities in Databases.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.
Google Scholar
Zytkow, J., and Zembowicz, R. (1993). “Database Exploration in Search of Regularities.”Journal of Intelligent Information Systems 2, pp. 39–81.
Google Scholar

Download references

Author information

Authors and Affiliations

German National Research Center for Computer Science (GMD), 53757, Sankt Augustin, Germany
Willi Klösgen

Authors

Willi Klösgen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Klösgen, W. Efficient discovery of interesting statements in databases. J Intell Inf Syst 4, 53–69 (1995). https://doi.org/10.1007/BF00962822

Download citation

Issue Date: January 1995
DOI: https://doi.org/10.1007/BF00962822

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient discovery of interesting statements in databases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Constraint-Based Querying for Bayesian Network Exploration

Efficiently mining association rules based on maximum single constraints

PARAS\(^{\mathrm{c}}\): a parameter space-driven approach for complete association rule mining

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient discovery of interesting statements in databases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Constraint-Based Querying for Bayesian Network Exploration

Efficiently mining association rules based on maximum single constraints

PARAS\(^{\mathrm{c}}\): a parameter space-driven approach for complete association rule mining

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation