Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Semantic-integration research in the database community

Published: 01 March 2005 Publication History
  • Get Citation Alerts
  • Abstract

    Semantic integration has been a long-standing challenge for the database community. It has received steady attention over the past two decades, and has now become a prominent area of database research. In this article, we first review database applications that require semantic integration and discuss the difficulties underlying the integration process. We then describe recent progress and identify open research issues. We focus in particular on schema matching, a topic that has received much attention in the database community, but also discuss data matching (for example, tuple deduplication) and open issues beyond the match discovery context (for example, reasoning with matches, match verification and repair, and reconciling inconsistent data values). For previous surveys of database research on semantic integration, see Rahm and Bernstein (2001); Ouksel and Seth (1999); and Batini, Lenzerini, and Navathe (1986).


    Aberer, K. 2003. Special Issue on Peer to Peer Data Management. SIGMOD Record 32(3).]]
    Ananthakrishna, R.; Chaudhuri, S.; and Ganti, V. 2002. Eliminating Fuzzy Duplicates in Data Warehouses. In Proceedings of the Twenty-Eighth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Andritsos, P.; Miller, R.J.; and Tsaparas, P. 2004. Information-Theoretic Tools for Mining Database Structure from Large Data Sets. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Batini, C.; Lenzerini, M.; and Navathe, S. 1986. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Survey 18(4):323-364.]]
    Bergamaschi, S.; Castano, S.; Vincini, M.; and Beneven-Tano, D. 2001. Semantic Integration of Heterogeneous Information Sources. Data and Knowledge Engineering 36(3):215-249.]]
    Berlin, J., and Motro, A. 2001. Autoplex: Automated Discovery of Content for Virtual Databases. Paper presented at the Sixth International Conference on Cooperative Information Systems (CoopIS '01), Trento, Italy, September 5-7.]]
    Berlin, J., and Motro, A. 2002. Database Schema Matching Using Machine Learning with Feature Selection. In Proceedings of the Conference on Advanced Information Systems Engineering (Caise). Lecture Notes in Computer Science, volume 2348. Berlin: Springer-Verlag.]]
    Bernstein, P. 2003. Applying Model Management to Classical Meta Data Problems. Paper presented at the Conference on Innovative Database Research (CIDR), Asilomar, CA, January 5.]]
    Bernstein, P. A.; Melnik, S.; Petropoulos, M.; and Quix, C. 2004. Industrial-Strength Schema Matching. SIGMOD Record 33(4).]]
    Bhattacharya, I., and Getoor, L. 2004. Iterative Record Linkage for Cleaning and Integration. In Proceedings of the Ninth ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. New York: Association for Computing Machinery.]]
    Bilenko, M., and Mooney, R. 2003. Adaptive Duplicate Detection Using Learnable String Similarity Measures. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery.]]
    Biskup, J., and Convent, B. 1986. A Formal View Integration Method. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Castano, S., and Antonellis, V. D. 1999. A Schema Analysis and Reconciliation Tool Environment. Paper presented at the International Database Engineering and Applications Symposium (Ideas), Montreal, Quebec, August 1-3.]]
    Clifton, C.; Housman, E.; and Rosenthal, A. 1997. Experience with a Combined Approach to Attribute-Matching Across Heterogeneous Databases. In Proceedings of the Seventh IFIP Working Conference on Data Semantics (DS-7). Amsterdam: Elsevier North-Holland.]]
    Cohen, W. 1998. Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Cohen, W., and Richman, J. 2002. Learning to Match and Cluster Entity Names. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery.]]
    Dhamankar, R.; Lee, Y.; Doan, A.; Halevy, A.; and Domin-Gos, P. 2004. Imap: Discovering Complex Matches Between Database Schemas. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Do, H., and Rahm, E. 2002. Coma: A System for Flexible Combination of Schema Matching Approaches. In Proceedings of the Twenty-Eighth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Doan, A.; Domingos, P.; and Halevy, A. 2001. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Doan, A.; Lu, Y.; Lee, Y.; and Han, J. 2003a. Object Matching for Data Integration: A Profile-Based Approach. IEEE Intelligent Systems 18(5): 54-59.]]
    Doan, A.; Madhavan, J.; Dhamankar, R.; Domingos, P.; and Halevy, A. 2003b. Learning to Match Ontologies on the Semantic Web. VLDB Journal 12(4): 303-319.]]
    Dong, X.; Halevy, A.; Nemes, E.; Sigurdsson, S.; and Domingos, P. 2004. Semex: Toward On-The-Fly Personal Information Integration. Paper presented at the Workshop in Information Integration on the Web, Toronto, Ontario, August 30 (http://cips.eas.asu.edu/iiweb-proceedings.html).]]
    Elmagarmid, A., and Pu, C. 1990. Guest Editors' Introduction to the Special Issue on Heterogeneous Databases. ACM Computing Survey 22(3): 175-178.]]
    Embley, D.; Jackman, D.; and Xu, L. 2001. Multi-Faceted Exploitation of Metadata for Attribute Match Discovery in Information Integration. Paper presented at the International Workshop on Information Integration on the Web, Rio de Janeiro, Brazil, April 9-11.]]
    Etzioni, O.; Halevy, A.; Doan, A.; Ives, Z.; Madhavan, J.; McDowell, L.; and Tatarinov, I. 2003. Crossing the Structure Chasm. Paper presented at the Conference for Innovative Database Research, Asilomar, CA, January 6.]]
    Fang, H.; Sinha, R.; Wu, W.; Doan, A.; and Zhai, C. 2004. Entity Retrieval Over Structured Data. Technical Report UIUC-CS-2414, Dept. of Computer Science, Univ. of Illinois, Urbana-Champaign.]]
    Freitag, D. 1998. Machine Learning for Information Extraction in Informal Domains. Ph.D. diss. Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, PA.]]
    Friedman, M., and Weld, D. 1997. Efficiently Executing Information-Gathering Plans. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence . San Francisco: Morgan Kaufmann Publishers.]]
    Garcia-Molina, H.; Papakonstantinou, Y.; Quass, D.; Rajaraman, A.; Sagiv, Y.; Ullman, J.; and Widom, J. 1997. The Tsimmis Project: Integration of Heterogeneous Information Sources. Journal of Intelligent Information Systems 8(2): 117-132.]]
    Gravano, L.; Ipeirotis, P.; Koudas, N.; and Srivastava, D. 2003. Text Join for Data Cleansing and Integration in an RDBMS. In Proceedings of Nineteenth International IEEE Conference on Data Engineering. Piscataway, NJ: Institute of Electrical and Electronics Engineers.]]
    He, B., and Chang, K. 2003. Statistical Schema Matching Across Web Query Interfaces. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    He, B.; Chang, K. C. C.; and Han, J. 2004. Discovering Complex Matchings Across Web Query Interfaces: A Correlation Mining Approach. In Proceedings of the Tenth ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery.]]
    Hernandez, M., and Stolfo, S. 1995. the Merge/Purge Problem for Large Databases. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, 127-138. New York: Association for Computing Machinery.]]
    Ives, Z.; Florescu, D.; Friedman, M.; Levy, A.; and Weld, D. 1999. An Adaptive Query Execution System for Data Integration. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Kang, J., and Naughton, J. 2003. On Schema Matching with Opaque Column Names and Data Values. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Kashyap, V., and Sheth, A. 1996. Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. VLDB Journal 5(4): 276-304.]]
    Keim, G.; Shazeer, N.; Littman, M.; Agarwal, S.; Cheves, C.; Fitzgerald, J.; Grosland, J.; Jiang, F.; Pollard, S.; and Weinmeister, K. 1999. Proverb: The Probabilistic Cruciverbalist. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, 710-717. Menlo Park, CA: AAAI Press]]
    Knoblock, C.; Minton, S.; Ambite, J.; Ashish, N.; Modi, P.; Muslea, I.; Philpot, A.; and Tejada, S. 1998. Modeling Web Sources for Information Integration. In Proceedings of the Fifteenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.]]
    Kushmerick, N. 2000. Wrapper Verification. World Wide Web Journal 3(2):79-94.]]
    Kushmerick, N.; Weld, D.; and Doorenbos, R. 1997. Wrapper Induction for Information Extraction. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers.]]
    Lambrecht, E.; Kambhampati, S.; and Gnanaprakasam, S. 1999. Optimizing Recursive Information Gathering Plans. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence . San Francisco: Morgan Kaufmann Publishers.]]
    Larson, J. A.; Navathe, S. B.; and Elmasri, R. 1989. A Theory of Attribute Equivalence in Database with Application to Schema Integration. IEEE Transaction on Software Engineering 15(4):449-463.]]
    Lenzerini, M. 2002. Data Integration: A Theoretical Perspective. In Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York: Association for Computing Machinery.]]
    Lerman, K.; Minton, S.; and Knoblock, C. A. 2003. Wrapper Maintenance: A Machine Learning Approach. Journal of Artificial Intelligence Research 18: 149-181.]]
    Levy, A. Y.; Rajaraman, A.; and Ordille, J. 1996. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the Twenty-Second International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Li, W., and Clifton, C. 2000. Semint: A Tool for Identifying Attribute Correspondence In Heterogeneous Databases Using Neural Networks. Data and Knowledge Engineering 33(1): 49-84.]]
    Li, W.; Clifton, C.; and Liu, S. 2000. Database Integration Using Neural Network: Implementation and Experience. Knowledge and Information Systems 2(1):73-96.]]
    Lin, D. 1998. An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers.]]
    Madhavan, J.; Bernstein, P.; Doan, A.; and Halevy, A. 2005. Corpus-Based Schema Matching. In Proceedings of the Eighteenth International Conference on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society.]]
    Madhavan, J.; Bernstein, P.; and Rahm, E. 2001. Generic Schema Matching With Cupid. In Proceedings of the Twenty-Seventh International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Madhavan, J.; Halevy, A.; Domingos, P.; and Bernstein, P. 2002. Representing and Reasoning About Mappings Between Domain Models. In Proceedings of the Eighteenth National Conference on Artificial Intelligence . Menlo Park, CA: AAAI Press.]]
    Manning, C., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.]]
    McCallum, A.; Nigam, K.; and Ungar, L. 2000. Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery]]
    McCann, R.; Doan, A.; Kramnik, A.; and Varadarajan, V. 2003. Building Data Integration Systems Via Mass Collaboration. Paper presented at the Sixth International Workshop on the Web and Databases (WebDB-03), San Diego, June 12-13.]]
    Melnik, S.; Molina-Garcia, H.; and Rahm, E. 2002. Similarity Flooding: A Versatile Graph Matching Algorithm. In Proceedings of the Eighteenth International Conference on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society.]]
    Miller, R.; Haas, L.; and Hernandez, M. 2000. Schema Mapping As Query Discovery. In Proceedings of the Twenty-Sixth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Milo, T., and Zohar, S. 1998. Using Schema Matching to Simplify Heterogeneous Data Translation. In Proceedings of the Twenty-Fourth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Mitra, P.; Wiederhold, G.; and Jannink, J. 1999. Semi-Automatic Integration of Knowledge Sources. In Proceedings of the Second International Conference on Information Fusion (Fusion'99). Los Alamitos, CA: IEEE Computer Society.]]
    Monge, A., and Elkan, C. 1996. The Field Matching Problem: Algorithms and Applications. In Proceedings of the Second International Conference Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press.]]
    Neumann, F.; Ho, C.; Tian, X.; Haas, L.; and Meggido, N. 2002. Attribute Classification Using Feature Analysis. In Proceedings of the Eighteenth International Conference on Data Engineering (ICDE). Los Alamitos, CA: IEEE Computer Society.]]
    Noy, N., and Musen, M. 2001. Anchor-Prompt: Using Non-Local Context for Semantic Matching. Paper presented at the IJCAI Workshop on Ontologies and Information Sharing, Seattle, August 4-5.]]
    Noy, N., and Musen, M. 2000. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In Proceedings of the Seventeenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.]]
    Ouksel, A., and Seth, A. P. 1999. Semantic Interoperability in Global Information Systems. SIGMOD Record 28(1): 5-12.]]
    Palopoli, L.; Sacca, D.; Terracina, G.; and Ursino, D. 1999. A Unified Graph-Based Framework for Deriving Nominal Interscheme Properties, Type Conflicts, and Object Cluster Similarities. Paper presented at the Fourth International Conference on Cooperative Information Systems (CoopIS '99), Edinburgh, Scotland, September 2-4.]]
    Palopoli, L.; Sacca, D.; and Ursino, D. 1998. Semi-Automatic, Semantic Discovery of Properties from Data-base Schemes. In Proceedings of the International Database Engineering and Applications Symposium (IDEAS-98), 244-253. Los Alamitos, CA: IEEE Computer Society.]]
    Palopoli, L.; Terracina, G.; and Ursino, D. 2000. The System Dike: Towards the Semi-Automatic Synthesis of Cooperative Information Systems and Data Warehouses. In Proceedings of the Symposium on Advances in Databases and Information Systems, Enlarged Fourth East-European Conference on Advances in Databases and Information Systems. New York: Association for Computing Machinery.]]
    Parag, and Domingos, P. 2004. Multi-Relational Record Linkage. Paper presented at the Third SIGKDD Workshop on Multi-Relational Data Mining, Seattle, August 22.]]
    Parent, C., and Spaccapietra, S. 1998. Issues and Approaches of Database Integration. Communications of the ACM 41(5):166-178.]]
    Perkowitz, M., and Etzioni, O. 1995. Category Translation: Learning to Understand Information on the Internet. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers.]]
    Pottinger, R. A., and Bernstein, P. A. 2003. Merging Models Based on Given Correspondences. In Proceedings of the Twenty-Ninth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Punyakanok, V., and Roth, D. 2000. The Use of Classifiers in Sequential Inference. In Proceedings of the Conference on Neural Information Processing Systems (NIPS-00). Cambridge, MA: MIT Press.]]
    Rahm, E., and Bernstein, P. 2001. A Survey of Approaches to Automatic Schema Matching. VLDB Journal 10(4): 334-350.]]
    Rahm, E.; Do, H.; and Massmann, S. 2004. Matching Large XML Schemas. Special Issue on Semantic Integration, SIGMOD Record 33(3).]]
    Rosenthal, A.; Seligman, L.; and Renner, S. 2004. From Semantic Integration to Semantics Management: Case Studies and A Way Forward. Special Issue on Semantic Integration, SIGMOD Record 33(3).]]
    Ryutaro, I.; Hideaki, T.; and Shinichi, H. 2001. Rule Induction for Concept Hierarchy Alignment. Paper presented at the Second Workshop on Ontology Learning, Seattle, August 4.]]
    Sarawagi, S., and Bhamidipaty, A. 2002. Interactive Deduplication Using Active Learning. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery.]]
    Seligman, L.; Rosenthal, A.; Lehner, P.; and Smith, A. 2002. Data Integration: Where Does the Time Go? IEEE Data Engineering Bulletin 25(3): 3-10.]]
    Seth, A., and Larson, J. 1990. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Survey 22(3):183-236.]]
    Sheth, A. P., and Kashyap, V. 1992. So Far (Schematically) Yet So Near (Semantically). In Proceedings of the IFIP WG 2.6 Database Semantics Conference on Interoperable Database Systems (DS-5). Amsterdam: Elsevier North-Holland.]]
    Tejada, S.; Knoblock, C.; and Minton, S. 2002. Learning Domain-Independent String Transformation Weights for High Accuracy Object Identification. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery.]]
    Velegrakis, Y.; Miller, R. J.; and Popa, L. 2003. Mapping Adaptation Under Evolving Schemas. In Proceedings of the Twenty-Ninth International Conference on Very Large Databases (VLDB). San Francisco: Morgan Kaufmann Publishers.]]
    Wu, W.; Yu, C.; Doan, A.; and Meng, W. 2004. An Interactive Clustering-Based Approach to Integrating Source Query Interfaces on the Deep Web. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. New York: Association for Computing Machinery.]]
    Xu, L., and Embley, D. 2003. Using Domain Ontologies to Discover Direct and Indirect Matches for Schema Elements. Paper presented at the Semantic Integration Workshop, Second International Semantic Web Conference (ISWC 2003), Sanibel Island, FL, October 20 (http://Smi.Stanford.Edu/Si2003).]]
    Yan, L.; Miller, R.; Haas, L.; and Fagin, R. 2001. Data Driven Understanding and Refinement of Schema Mappings. SIGMOD Record 30(2): 485-496]]

    Cited By

    View all
    • (2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
    • (2022)Element Similarity Calculator in XML Schema MatchingProceedings of the 27th European Conference on Pattern Languages of Programs10.1145/3551902.3551970(1-10)Online publication date: 6-Jul-2022
    • (2021)Deep transfer learning for multi-source entity linkage via domain adaptationProceedings of the VLDB Endowment10.14778/3494124.349413115:3(465-477)Online publication date: 1-Nov-2021
    • Show More Cited By



    Information & Contributors


    Published In

    cover image AI Magazine
    AI Magazine  Volume 26, Issue 1
    Spring 2005
    216 pages
    Issue’s Table of Contents


    American Association for Artificial Intelligence

    United States

    Publication History

    Published: 01 March 2005


    • Article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics


    Cited By

    View all
    • (2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
    • (2022)Element Similarity Calculator in XML Schema MatchingProceedings of the 27th European Conference on Pattern Languages of Programs10.1145/3551902.3551970(1-10)Online publication date: 6-Jul-2022
    • (2021)Deep transfer learning for multi-source entity linkage via domain adaptationProceedings of the VLDB Endowment10.14778/3494124.349413115:3(465-477)Online publication date: 1-Nov-2021
    • (2021)A variational database management systemProceedings of the 20th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3486609.3487197(29-42)Online publication date: 17-Oct-2021
    • (2020)Survey on complex ontology matchingSemantic Web10.3233/SW-19036611:4(689-727)Online publication date: 1-Jan-2020
    • (2020)Entity resolution for media metadata based on structural clusteringMultimedia Tools and Applications10.1007/s11042-019-08062-679:1-2(219-242)Online publication date: 1-Jan-2020
    • (2019)Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer LearningThe World Wide Web Conference10.1145/3308558.3313578(2413-2424)Online publication date: 13-May-2019
    • (2019)Towards Interoperable Open Statistical DataElectronic Government10.1007/978-3-030-27325-5_14(180-191)Online publication date: 2-Sep-2019
    • (2018)Mining Abstract XML Data-TypesACM Transactions on the Web10.1145/326746713:1(1-37)Online publication date: 4-Dec-2018
    • (2018)Refining Traceability Links Between Vulnerability and Software Component in a Vulnerability Knowledge GraphWeb Engineering10.1007/978-3-319-91662-0_3(33-49)Online publication date: 5-Jun-2018
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options







    Share this Publication link

    Share on social media