Abstract
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abiteboul, S., Hull, R., & Vianu, V. (1995). Foundations of databases. Reading, MA: Addison-Wesley.
Afrati, F. N., Gionis, A., & Mannila H. (2004). Approximating a collection of frequent sets. In W. Kim, R. Kohavi, J. Gehrke, & W. DuMouchel (Eds.), International conference on knowledge discovery and data mining (KDD’04) (pp. 2–19). Washington, USA: ACM.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In J. B. Bocca, M. Jarke, & C. Zaniolo (Eds.), International conference on very large data bases (VLDB’94) (pp. 487–499). Santiago de Chile, Chile: Morgan Kaufmann.
Albrecht, M., Buchholz, E., Düsterhöft, A., & Thalheim, B. (1995). An informal and efficient approach for obtaining semantic constraints using sample data and natural language processing. In L. Libkin & B. Thalheim (Eds.), Semantics in databases (Vol. 1358, pp. 1–28). Lecture Notes in Computer Science, Springer.
Bauckmann, J., Leser, U., Naumann, F., & Tietz, V. (2007). Efficiently detecting inclusion dependencies. In International conference on data engineering (ICDE’07) (pp. 1448–1450). IEEE Computer Society.
Bay, S. D. (1999). The UCI KDD archive [http://kdd.ics.uci.edu]. Technical report, Irvine, CA: University of California, Department of Information and Computer Science.
Bell, S., & Brockhausen, P. (1995). Discovery of constraints and data dependencies in databases (extended abstract). In N. Lavrac & S. Wrobel (Eds.), European conference on machine learning (ECML’95) (Vol. 912, pp. 267–270). Crete, Greece: Lecture Notes in Computer Science, Springer.
Calders, T., & Wijsen, J. (2001). On monotone data mining languages. In G. Ghelli & G. Grahne (Eds.), International workshop on database programming languages (DBPL’01), Frascati, Italy: Springer.
Casanova, M., Fagin, R., & Papadimitriou C. (1984). Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 24(1), 29–59.
Casanova, M. A., Tucherman, L., & Furtado, A. L. (1988). Enforcing inclusion dependencies and referencial integrity. In F. Bancilhon & D. J. DeWitt (Eds.), International conference on very large data bases (VLDB’88) (pp. 38–49). Los Angeles, CA, USA: Morgan Kaufmann.
Cheng, Q., Gryz, J., Koo, F., Leung, T. Y. C., Liu, L., Qian, X., et al. (1999). Implementation of two semantic query optimization techniques in DB2 universal database. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, & M. L. Brodie (Eds.), International conference on very large data bases (VLDB’99) (pp. 687–698). Edinburgh, Scotland, UK: Morgan Kaufmann.
Dasu, T., Johnson, T., Muthukrishnan, S., & Shkapenyuk, V. (2002). Mining database structure; or, how to build a data quality browser. In ACM SIGMOD conference 2002 (pp. 240–251). Madison, WI, USA.
De Marchi, F., Lopes, S., & Petit, J.-M. (2002). Efficient algorithms for mining inclusion dependencies. In C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, et al. (Eds.), International conference on extending database technology (EDBT’02) (Vol. 2287, pp. 464–476). Prague, Czech Republic: Lecture Notes in Computer Science, Springer.
De Marchi, F., Lopes, S., Petit, J.-M., & Toumani, F. (2003). Analysis of existing databases at the logical level: The DBA companion project. ACM Sigmod Record, 32(1), 47–52.
De Marchi, F., & Petit, J.-M. (2003). Zigzag: A new algorithm for discovering large inclusion dependencies in relational databases. In International conference on data mining (ICDM’03) (pp. 27–34). Melbourne, FL, USA: IEEE Computer Society.
De Marchi, F., & Petit, J.-M. (2005). Approximating a set of approximate inclusion dependencies. In International conference on intelligent information system (IIS’05) (pp. 633–640). Gdansk, Poland: Springer-Verlag.
Ganter, B., & Wille, R. (1999). Formal concept analysis. Springer-Verlag.
Gryz, J. (1998). Query folding with inclusion dependencies. In International conference on data engineering (ICDE’98) (pp. 126–133). Orlando, FL, USA: IEEE Computer Society.
Han, J., & Kamber, M. (2000). Data mining: Concepts and techniques. Morgan Kaufmann.
Huhtala, Y., Karkkainen, J, Porkka, P., & Toivonen, H. (1999). TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2), 100–111.
Kantola, M., Mannila, H., Raïha K. J., & Siirtola, H. (1992). Discovering functional and inclusion dependencies in relational databases. International Journal of Intelligent Systems, 7, 591–607.
Kivinen, J., & Mannila, H. (1995). Approximate inference of functional dependencies from relations. Theoretical Computer Science, 149(1), 129–149.
Koeller, A., & Rundensteiner, E. A. (2003). Discovery of high-dimentional inclusion dependencies (Poster). In Poster session of international conference on data engineering (ICDE’03). IEEE Computer Society.
Levene, M., & Loizou, G. (1999). A guided tour of relational databases and beyond. Springer.
Levene, M., & Vincent, M. W. (2000). Justification for inclusion dependency normal form. IEEE Transactions on Knowledge and Data Engineering, 12(2), 281–291.
Lopes, S., De Marchi, F., & Petit, J.-M. (2004). DBA companion: A tool for logical database tuning. In Demo session of international conference on data engineering (ICDE’04). http://www.isima.fr/~demarchi/dbacomp/, IEEE Computer Society.
Lopes, S., Petit, J.-M., & Lakhal, L. (2002a). Functional and approximate dependencies mining: Databases and FCA point of view. Special issue of JETAI, 14(2/3), 93–114.
Lopes, S., Petit, J.-M., & Toumani, F. (2002b). Discovering interesting inclusion dependencies: Application to logical database tuning. Information System, 17(1), 1–19.
Mannila, H., & Räihä, K. J. (1994). The design of relational databases (2nd ed.). Addison-Wesley.
Mannila, H., & Räihä, K.-J. (1986). Inclusion dependencies in database design. In International conference on data engineering (ICDE’86) (pp. 713–718). Los Angeles, CA, USA: IEEE Computer Society.
Mannila, H., & Toivonen, H. (1997). Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(1), 241–258.
Miller, R. J., Hernandez, M. A., Haas, L. M., Yan, L., Ho, C. T. H., Fagin, R. et al. (2001). The clio project: Managing heterogeneity. ACM SIGMOD Record, 30(1), 78–83.
Mitchell, J. C. (1983). The implication problem for functional and inclusion dependencies. Information and Control, 56(3), 154–173.
Novelli, N., & Cicchetti, R. (2001). Functional and embedded dependency inference: A data mining point of view. Information System, 26(7), 477–506.
Sarawagi, S., Thomas, S., & Agrawal, R. (2000). Integrating association rule mining with relational database systems: Alternatives and implications. Data Mining and Knowledge Discovery, 4(2/3), 89–125.
Wyss, C., Giannella, C., & Robertson, E. (2001). FastFDs: A heuristic-driven depth-first algorithm for mining functional dependencies from relation instances. In Y. Kambayashi, W. Winiwarter, & M. Arikawa (Eds.), Data warehousing and knowledge discovery (DaWaK’01) (Vol. 2114, pp. 101–110). Munich, Germany: Lecture Notes in Computer Science.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Marchi, F.D., Lopes, S. & Petit, JM. Unary and n-ary inclusion dependency discovery in relational databases. J Intell Inf Syst 32, 53–73 (2009). https://doi.org/10.1007/s10844-007-0048-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-007-0048-x