Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-02463-4_12guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Clio: Schema Mapping Creation and Data Exchange

Published: 04 July 2009 Publication History

Abstract

The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural <em>schema mappings</em> to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, <em>data integration</em> and <em>data exchange</em> . In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

References

[1]
Abiteboul, S., Bidoit, N.: Non-first Normal Form Relations: An Algebra Allowing Data Restructuring. J. Comput. Syst. Sci. 33, 361-393 (1986).
[2]
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995).
[3]
Alexe, B., Chiticariu, L., Miller, R. J., Tan, W.-C.: Muse: Mapping understanding and design by example. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 10-19 (2008).
[4]
Alexe, B., Tan, W.-C., Velegrakis, Y.: STBenchmark: towards a benchmark for mapping systems. In: Proceedings of the VLDB Endowment, vol. 1(1), pp. 230- 244 (2008).
[5]
An, Y., Borgida, A., Miller, R. J., Mylopoulos, J.: A Semantic Approach to Discovering Schema Mapping Expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206-215 (2007).
[6]
Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys 18(4), 323-364 (1986).
[7]
Beeri, C., Vardi, M. Y.: A proof procedure for data dependencies. J. ACM 31(4), 718-741 (1984).
[8]
Bernstein, P., Halevy, A., Pottinger, R.: A Vision for Management of Complex Models. SIGMOD Record 29(4), 55-63 (2000).
[9]
Bernstein, P. A., Haas, L. M.: Information Integration in the Enterprise. Commun. ACM 51(9), 72-79 (2008).
[10]
Bernstein, P. A., Melnik, S., Mork, P.: Interactive Schema Translation with Instance-Level Mapping. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1283-1286 (2005).
[11]
Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting Context into Schema Matching. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 307-318 (2006).
[12]
Bohannon, P., Fan, W., Flaster, M., Narayan, P.P.S.: Information Preserving XML Schema Embedding. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 85-96 (2005).
[13]
Bonifati, A., Chang, E. Q., Ho, T., Lakshmanan, V.S., Pottinger, R.: HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1267-1270 (2005).
[14]
Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: International Conference on Extending Database Technology (EDBT), pp. 85-96 (2008).
[15]
Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: The spicy system: towards a notion of mapping quality. In: ACM SIGMOD Conference, pp. 1289-1294 (2008).
[16]
Chawathe, S., GarciaMolina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., Widom, J.: The TSIMMIS Project: Integration of Heterogeneous Information Sources. In: Proc. of the 100th Anniversary Meeting of the Information Processing Society of Japan (IPSJ), Tokyo, Japan, pp. 7-18 (1994).
[17]
Deutsch, A., Tannen, V.: XML queries and constraints, containment and reformulation. Theoretical Comput. Sci. 336(1), 57-87 (2005).
[18]
Fagin, R.: Inverting schema mappings. ACM Transactions on Database Systems (TODS) 32(4), 25 (2007).
[19]
Fagin, R., Kolaitis, P. G., Miller, R. J., Popa, L.: Data Exchange: Semantics and Query Answering. Theoretical Comput. Sci. 336(1), 89-124 (2005).
[20]
Fagin, R., Kolaitis, P. G., Nash, A., Popa, L.: Towards a theory of schema-mapping optimization. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 33-42 (2008).
[21]
Fagin, R., Kolaitis, P. G., Popa, L., Tan, W.: Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems (TODS) 30(4), 994-1055 (2005).
[22]
Fagin, R., Kolaitis, P. G., Popa, L., Tan, W.-C.: Quasi-inverses of schema mappings. ACM Transactions on Database Systems (TODS) 33(2), 1-52 (2008).
[23]
Franklin, M. J., Halevy, A. Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27-33 (2005).
[24]
Fuxman, A., Hernández, M. A., Ho, H., Miller, R. J., Papotti, P., Popa, L.: Nested Mappings: Schema Mapping Reloaded. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 67-78 (2006).
[25]
Fuxman, A., Kolaitis, P.G., Miller, R., Tan, W.-C.: Peer Data Exchange. ACM Transactions on Database Systems (TODS) 31(4), 1454-1498 (2006).
[26]
Haas, L.: Beauty and the beast: The theory and practice of information integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28-43. Springer, Heidelberg (2006).
[27]
Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Tork Roth, M.: Clio grows up: From research prototype to industrial tool. In: ACM SIGMOD Conference, pp. 805-810 (2005).
[28]
Halevy, A.Y., Ives, Z.G., Madhavan, J., Mork, P., Suciu, D., Tatarinov, I.: The piazza peer data management system. IEEE Transactions On Knowledge and Data Engineering 16(7), 787-798 (2004).
[29]
Hernández, M.A., Papotti, P., Tan, W.-C.: Data exchange with data-metadata translations. Proceedings of the VLDB Endowment 1(1), 260-273 (2008).
[30]
Hull, R., Yoshikawa, M.: ILOG: Declarative Creation and Manipulation of Object Identifiers. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 455-468 (1990).
[31]
Jiang, H., Ho, H., Popa, L., Han, W.-S.: Mapping-driven XML transformation. In: Proceedings of the International WWW Conference, pp. 1063-1072 (2007).
[32]
Jiang, L., Borgida, A., Mylopoulos, J.: Towards a compositional semantic account of data quality attributes. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 55-68. Springer, Heidelberg (2008).
[33]
Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 233-246 (2002).
[34]
Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 251-262 (1996).
[35]
Madhavan, J., Halevy, A.Y.: Composing Mappings Among Data Sources. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 572-583 (2003).
[36]
Maier, D., Mendelzon, A.O., Sagiv, Y.: Testing Implications of Data Dependencies. ACM Transactions on Database Systems (TODS) 4(4), 455-469 (1979).
[37]
Melnik, S., Bernstein, P.A., Halevy, A., Rahm, E.: Applying model management to executable mappings. In: ACM SIGMOD Conference, pp. 167-178 (2005).
[38]
Miller, R.J., Haas, L.M., Hernández, M.: Schema Mapping as Query Discovery. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 77-88 (2000).
[39]
Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 122-133 (1998).
[40]
Nash, A., Bernstein, P.A., Melnik, S.: Composition of mappings given by embedded dependencies. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), pp. 172-183 (2005).
[41]
Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 413-424 (1996).
[42]
Popa, L., Tannen, V.: An Equational Chase for Path-Conjunctive Queries, Constraints, and Views. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 39-57. Springer, Heidelberg (1998).
[43]
Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating Web Data. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 598-609 (2002).
[44]
Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernández, M.A.: Clip: a Visual Language for Explicit Schema Mappings. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 30-39 (2008).
[45]
Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Pay-as-you-go information integration in dataspaces. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 663-674 (2007).
[46]
Shu, N.C., Housel, B.C., Lum, V.Y.: Convert: A high level translation definition language for data conversion. Commun. ACM 18(10), 557-567 (1975).
[47]
Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: A Data EXtraction, Processing and REstructuring System. ACM Transactions on Database Systems (TODS) 2(2), 134-174 (1977).
[48]
Velegrakis, Y.: Managing Schema Mappings in Highly Heterogeneous Environments. PhD thesis, Department of Computer Science, University of Toronto (2004).
[49]
Velegrakis, Y., Miller, R.J., Popa, L.: On Preserving Mapping Consistency under Schema Changes. International Journal on Very Large Data Bases 13(3), 274-293 (2004).
[50]
Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Transactions on Database Systems (TODS) 30(2), 624-660 (2005).
[51]
Yan, L.-L., Miller, R.J., Haas, L., Fagin, R.: Data-Driven Understanding and Refinement of Schema Mappings. ACM SIGMOD Conference 30(2), 485-496 (2001).
[52]
Yu, C., Popa, L.: Constraint-Based XML Query Rewriting For Data Integration. ACM SIGMOD Conference 33(2), 371-382 (2004).

Cited By

View all
  • (2024)Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey InstrumentsProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics10.1145/3665939.3665965(1-7)Online publication date: 14-Jun-2024
  • (2024)Generating SPARQL Queries for Data DiscoveryAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_5(63-76)Online publication date: 28-Aug-2024
  • (2023)Logical big data integration and near real-time data analyticsData & Knowledge Engineering10.1016/j.datak.2023.102185146:COnline publication date: 1-Jul-2023
  • Show More Cited By
  1. Clio: Schema Mapping Creation and Data Exchange

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide books
    Conceptual Modeling: Foundations and Applications: Essays in Honor of John Mylopoulos
    July 2009
    503 pages
    ISBN:9783642024627
    • Editors:
    • Alexander T. Borgida,
    • Vinay K. Chaudhri,
    • Paolo Giorgini,
    • Eric S. Yu

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 04 July 2009

    Qualifiers

    • Chapter

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Drag, Drop, Merge: A Tool for Streamlining Integration of Longitudinal Survey InstrumentsProceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics10.1145/3665939.3665965(1-7)Online publication date: 14-Jun-2024
    • (2024)Generating SPARQL Queries for Data DiscoveryAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_5(63-76)Online publication date: 28-Aug-2024
    • (2023)Logical big data integration and near real-time data analyticsData & Knowledge Engineering10.1016/j.datak.2023.102185146:COnline publication date: 1-Jul-2023
    • (2023)Conceptually-grounded mapping patterns for Virtual Knowledge GraphsData & Knowledge Engineering10.1016/j.datak.2023.102157145:COnline publication date: 1-May-2023
    • (2023)Data Preparation: A Technological Perspective and ReviewSN Computer Science10.1007/s42979-023-01828-84:4Online publication date: 2-Jun-2023
    • (2023)Using a Conceptual Model in Plug-and-Play SQLConceptual Modeling10.1007/978-3-031-47262-6_8(145-161)Online publication date: 6-Nov-2023
    • (2023)FLOWER: Viewing Data Flow in ER DiagramsBig Data Analytics and Knowledge Discovery10.1007/978-3-031-39831-5_32(356-371)Online publication date: 28-Aug-2023
    • (2022)Generating Plugs and Data Sockets for Plug-and-Play Database Web ServicesCooperative Information Systems10.1007/978-3-031-17834-4_16(279-288)Online publication date: 4-Oct-2022
    • (2021)Hierarchical Semantics Matching For Heterogeneous Spatio-temporal SourcesProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482350(565-575)Online publication date: 26-Oct-2021
    • (2021)SMAT: An Attention-Based Deep Learning Solution to the Automation of Schema MatchingAdvances in Databases and Information Systems10.1007/978-3-030-82472-3_19(260-274)Online publication date: 24-Aug-2021
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media