Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Systems for Knowledge Discovery in Databases

Published: 01 December 1993 Publication History

Abstract

Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research.

References

[1]
{1} R. Agrawal, T. Imielinski, and Arun Swami, "Database mining: A performance perspective," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[2]
{2} H. Almuallim and T. G. Dietterich, "Learning with many irrelevant features, " in Proc. AAAI 91. pp. 547-552, 1991.
[3]
{3} T. Anand and G. Kahn, "SPOTLIGHT: A data explanation system," in Proc. Eighth IEEE Conf. Appl. AI, 1992.
[4]
{4} W. Buntine, "Stratifying samples to improve learning," in Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley, eds. Cambridge, MA: AAAI/MIT, 1991, pp. 305-314.
[5]
{5} Y. Cai, N. Cercone, and J. Han, "Learning characteristic rules from relational databases," in Computational Intelligence II. Vol. II, 2nd ed., Gardin and G. Mauri, eds. New York: Elsevier, 1990. 187- 196.
[6]
{6} G. Cooper and E. Herskovits, "A Bayesian method for the induction of probabilistic networks from data," Technical Report KSL-91-02, Knowledge Systems Laboratory, Stanford University, Stanford, CA. 1991.
[7]
{7} C. J. Date, An Introduction to Database Systems. Reading, MA: Addison-Wesley, 1977.
[8]
{8} V. Dhar and A. Tuzhilin, "Abstract-driven pattern discovery in databases," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[9]
{9} W. J. Dixon and F. J. Massey, Introduction to Statistical Analysis. New York: McGraw-Hill, 1979.
[10]
{10} R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973.
[11]
{11} G. Dunn and B. S. Everitt, An Introduction to Mathematical Taxonomy . Cambridge, MA: MIT, 1982.
[12]
{12} S. Dzeroski and N. Lavrac, "Inductive learning in deductive databases," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[13]
{13} D. Fisher, M. Pazzani, and P. Langley, eds. Concept Formation: Knowledge and Experience in Unsupervised Learning. San Mateo, CA: Kaufmann, 1991.
[14]
{14} W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge discovery in databases: An overview," in Knowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991, pp. 1-27. Reprinted in AI Magazine, vol. 13, no. 3, 1992.
[15]
{15} C. Glymour, R. Scheines, P. Spirtes. and K. Kelly. Discovering Causal Structure. New York: Academic, 1987.
[16]
{16} J. Han, Y. Hwang, and N. Cercone, "Intelligent query answering using discovered knowledge," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[17]
{17} L. B. Holder and D. J. Cook, "Discovery of inexact concepts from structural data," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[18]
{18} J. H. Holland, K. J. Holyoak, R. E. Nisbett, and P. R. Thagard, Induction: Processes of Inference. Learning, and Discovery. Cambridge, MA: MIT, 1986.
[19]
{19} P. Hoschka and W. Klosgen, "A support system for interpreting statistical data," in Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley, eds. Cambridge, MA: AAAI/MIT, 1991, pp. 325-345.
[20]
{20} K. A. Kaufman, R. S. Michalski, and L. Kerschberg, "Mining for knowledge in databases: Goals and general description of the INLEN system, " in Knowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991.
[21]
{21} W. Klosgen, "Visualization and adaptivity in the statistics interpreter EXPLORA," in Workshop Notes from the 9th Nat. Conf. Art. Intell.: Knowledge Discovery in Databases. American Association for Artificial Intelligence, Anaheim, CA, July 1991, pp. 25-34.
[22]
{22} P. Langley, "A general theory of discrimination learning," in Production System Models of Learning and Development, D. Klahr, P. Langley, and R. Neches. eds. Cambridge, MA: MIT, 1987, pp. 99- 161.
[23]
{23} D. B. Lenat, "On automatic scientific theory formation: A case study using the AM program," in Mach. Intell. Vol. 9. New York: Halsted, 1977, pp. 251-286.
[24]
{24} H. Mannila and K.-J. Raiha, "Dependency inference," in Proc. 13th Int. Conf. Very Large Data Bases, Brighton, England, 1987, pp. 155- 158.
[25]
{25} R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Machine Learning: An Artificial Intelligence Approach. Palo Alto: Tioga, 1983.
[26]
{26} John K. Ousterhout, "TCL: An embeddable command language," in Proc. 1990 Winter USENIX Conference. Washington, D.C., pp. 133- 146, 1990.
[27]
{27} J. Pearl and T. S. Verma, "A theory of inferred causation," in Proc. 2nd Int. Conf. Principles of Knowledge Representation and Reasoning . San Mateo, CA: Kaufmann, 1991, pp. 441-452.
[28]
{28} J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA: Kaufmann, 1988.
[29]
{29} G. Piatetsky-Shapiro and W. J. Frawley, eds., Knowledge Discovery in Databases. Cambridge, MA, AAAI/MIT, 1991.
[30]
{30} G. Piatetsky-Shapiro and C. J. Matheus, "Knowledge Discovery Workbench: An exploratory environment for discovery in business databases," in Workshop Notes from the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, Anaheim, CA, July 1991, pp. 11-24.
[31]
{31} G. Piatetsky-Shapiro, "Discovery, analysis, and presentation of strong rules," in Knowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991, pp. 229-248.
[32]
{32} G. Piatetsky-Shapiro, ed., Workshop Notes from the 9th Nat. Conf. Art. Intell.: Knowledge Discovery in Databases, Anaheim, CA. July 1991.
[33]
{33} G. Piatetsky-Shapiro, "Probabilistic data dependencies," in Proc. Mach. Discovery Work, (9th Mach. Learn. Conf.), Aberdeen, Scotland, 1992, pp. 11-17.
[34]
{34} G. Piatetsky-Shapiro, ed. Special issue on: "Knowledge Discovery in Data and Knowledge Bases," Int. J. Intell. Syst., vol. 7, no. 7, 1992.
[35]
{35} I. Ross Quinlan, "Induction of decision trees," Mach. Learn., vol. 1, no. 1, 1986.
[36]
{36} J. Ross Quinlan, "Learning relations: Comparison of a symbolic and a connectionist approach," Tech. Rep. TR-346, Basser Department of Computer Science, University of Sydney, Australia, May 1989.
[37]
{37} J. R. Quinlan, "Unknown attribute values in induction," in Proceedings of the Sixth International Machine Learning Workshop, A. M. Segre, ed. San Mateo, CA: Kaufmann, 1989, pp. 164-168.
[38]
{38} S. F. Roth and J. Mattis, "Automating the presentation of information," in IEEE Conf. Art. Intell. Appl., Miami Beach, FL, 1991.
[39]
{39} D. E. Rummelhart and J. L. McClelland, Parallel Distributed Processing, Vol. 1, Cambridge, MA: MIT, 1986.
[40]
{40} R. Scheines and P. Spirtes, "Finding latent variable models in large data bases," Int. J. Intell. Syst., 1992. vol. 7, no. 7, Sept. 1992, pp. 609-622.
[41]
{41} J. Schlimmer, "Learning determinations and checking databases," in Proc. Knowledge Discovery in Databases, 1991, pp. 64-76.
[42]
{42} J. Schmitz, G. Armstrong, and J. D. C. Little, "CoverStory-automated news finding in marketing," in DSS Transactions. Institute of Management Sciences, Providence, RI, 1990.
[43]
{43} S. Shekhar, B. Hamidzadeh, A. Kohli, and M. Coyle, "Learning transformation rules for semantic query optimization: A data-driven approach," IEEE Trans. Knowl. Data Eng., vol. 5, no. 6, 1993.
[44]
{44} S. Smith, D. Bergeron, and G. Grinstein, "Stereophonic and surface sound generation for exploratory data analysis," in conf. the Special Interest Group in Computer and Human Interaction. Seattle, WA, April 1990.
[45]
{45} M. Stonebraker, "Triggers and inference in data base systems," in Proc. Islamoora Conf. Expert Data Bases, Islamorada, 1985.
[46]
{46} Edward R. Tufte. The Visual Display of Quantitative Information. Cheshire, CT: Graphics, 1983.
[47]
{47} J. D. Ullman, Principles of Database Systems. Rockville, MD: Computer Science Press, 1982.
[48]
{48} J. M. Zytkow and J. Baker, "interactive mining of regularities in databases," in Knowledge Discovery in Databases. Cambridge, MA: AAAI/MIT, 1991.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 5, Issue 6
December 1993
185 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 December 1993

Author Tags

  1. CoverStory
  2. EXPLORA
  3. KDD systems
  4. Knowledge Discovery Workbench
  5. deductive databases
  6. future research
  7. idealized knowledge-discovery system
  8. knowledge acquisition
  9. knowledge based systems
  10. knowledge discovery
  11. learning (artificial intelligence)
  12. machine learning
  13. real-world databases

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)An overview of data mining and knowledge discoveryJournal of Computer Science and Technology10.1007/BF0294662413:4(348-368)Online publication date: 22-Mar-2023
  • (2022)Classification rule mining based on Pareto-based Multiobjective OptimizationApplied Soft Computing10.1016/j.asoc.2022.109321127:COnline publication date: 1-Sep-2022
  • (2017)Building business process ontology based on concept hierarchy modelInternational Journal of Computational Science and Engineering10.5555/3141013.314102115:1-2(66-73)Online publication date: 1-Jan-2017
  • (2017)Building business process ontology based on concept hierarchy modelInternational Journal of Computational Science and Engineering10.5555/3140984.314099215:1-2(66-73)Online publication date: 1-Jan-2017
  • (2009)A hybrid approach to design efficient learning classifiersComputers & Mathematics with Applications10.1016/j.camwa.2009.01.03858:1(65-73)Online publication date: 1-Jul-2009
  • (2009)Composing Miners to Develop an Intrusion Detection SolutionPrivacy, Security, and Trust in KDD10.1007/978-3-642-01718-6_5(55-73)Online publication date: 13-May-2009
  • (2007)New approach for extracting knowledge from the XCS learning classifier systemInternational Journal of Hybrid Intelligent Systems10.5555/1367006.13670074:2(49-62)Online publication date: 1-Apr-2007
  • (2005)Architecture for knowledge discovery and knowledge managementKnowledge and Information Systems10.5555/2993953.29940267:3(310-336)Online publication date: 1-Mar-2005
  • (2004)Instance-Based Regression by Partitioning Feature ProjectionsApplied Intelligence10.1023/B:APIN.0000027767.87895.b221:1(57-79)Online publication date: 1-Jul-2004
  • (2003)Theory and Application of Cellular Automata For Pattern ClassificationFundamenta Informaticae10.5555/2371016.237102358:3-4(321-354)Online publication date: 1-Aug-2003
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media