Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining Abstract XML Data-Types

Published: 04 December 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Schema integration has been a long-standing challenge for the data-engineering community that has received steady attention over the past three decades. General-purpose integration approaches construct unified schemas that encompass all schema elements. Schema integration has been revisited in the past decade in service-oriented computing since the input/output data-types of service interfaces are heterogeneous XML schemas. However, service integration differs from the traditional integration problem, since it should generalize schemas (mining abstract data-types) instead of unifying all schema elements. To mine well-formed abstract data-types, the fundamental Liskov Substitution Principle (LSP), which generally holds between abstract data-types and their subtypes, should be followed. However, due to the heterogeneity of service data-types, the strict employment of LSP is not usually feasible. On top of that, XML offers a rich type system, based on which data-types are defined via combining type patterns (e.g., composition, aggregation). The existing integration approaches have not dealt with the challenges of a defining subtyping relation between XML type patterns. To address these challenges, we propose a relaxed version of LSP between XML type patterns and an automated generalization process for mining abstract XML data-types. We evaluate the effectiveness and the efficiency of the process on the schemas of two datasets against two representative state-of-the-art approaches.

    References

    [1]
    A. Doan and A. Y. Halevy. 2005. Semantic integration research in the database community: A brief survey. AI Magazine 26, 1 (2005), 83--94.
    [2]
    Carlo Batini, Maurizio Lenzerini, and Shamkant B. Navathe. 1986. A comparative analysis of methodologies for database schema integratison. ACM Computings Surveys 18, 4 (1986), 323--364.
    [3]
    R. Pottinger and P. A. Bernstein. 2003. Merging models based on given correspondences. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Berlin, 826--873.
    [4]
    T. Erl. 2005. Service-Oriented Architecture: Concepts, Technology, and Design. Prentice Hall.
    [5]
    D. Athanasopoulos, A. Zarras, P. Vassiliadis, and V. Issarny. 2011. Mining service abstractions. In Proceedings of the International Conference on Software Engineering. IEEE, HI, Hawaii, 944--947.
    [6]
    X. Liu and H. Liu. 2012. Automatic abstract service generation from web service communities. In Proceedings of the International Conference on Web Services. IEEE, HI, Hawaii, 154--161.
    [7]
    B. Liskov and J. M. Wing. 1994. A behavioural notion of subtyping. ACM Transactions on Programming Languages and Systems 16, 6 (1994), 1811--1841.
    [8]
    Erhard Rahm, Hong Hai Do, and Sabine Massmann. 2004. Matching large XML schemas. SIGMOD Record 33, 4 (2004). ACM, 26--31.
    [9]
    K. Saleem, Z. Bellahsene, and E. Hunt. 2008. PORSCHE: Performance ORiented SCHEma mediation. Information Systems 33, 7--8 (2008). Elsevier, 637--657.
    [10]
    A. Y. Halevy, A. Rajaraman, and J. J. Ordille. 2006. Data integration: The teenage years. In Proceedings of the International Conference on Very Large Data Bases. ACM, Seoul, 9--16.
    [11]
    R. Pottinger and P. A. Bernstein. 2008. Schema merging and mapping creation for relational sources. In Proceedings of the International Conference on Extending Database Technology: Advances in Database Technology. ACM, Nantes, 73--84.
    [12]
    C. Parent and S. Spaccapietra. 1998. Issues and approaches of database integration. Communications of the ACM 41, 5 (1998), 166--178.
    [13]
    Xiang Li. 2012. Constraint-Driven Schema Merging. Ph.D. Dissertation. RWTH Aachen University.
    [14]
    A. Baqasah, E. Pardede, and J. W. Rahayu. 2014. A new approach for meaningful XML schema merging. In Proceedings of the International Conference on Information Integration and Web-based Applications 8 Services. ACM, Hanoi, 430--439.
    [15]
    H. Ma, K.-D. Schewe, B. Thalheim, and J. Zhao. 2005. View integration and cooperation in databases, data warehouses and web information systems. Journal on Data Semantics. Springer, 213--249.
    [16]
    V. Kashyap and A. P. Sheth. 1996. Semantic and schematic similarities between database objects: A context-based approach. The VLDB Journal 5, 4 (1996). Springer, 276--304.
    [17]
    X. Li and C. Quix. 2011. Merging relational views: A minimization approach. In Proceedings of the International Conference on Conceptual Modeling. Springer, Brussels, 379--392.
    [18]
    M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. 2010. Foundations of schema mapping management. In Proceedings of the ACM Symposium on Principles of Database Systems. ACM, Indianapolis, Indiana, 227--238.
    [19]
    P. A. Bernstein, S. Melnik, M. Petropoulos, and C. Quix. 2004. Industrial-strength schema matching. ACM SIGMOD Record 33, 4 (2004), 38--43.
    [20]
    A. Radwan, L. Popa, I. R. Stanoi, and A. Younis. 2009. Top-k generation of integrated schemas based on directed and weighted correspondences. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Providence, Rhode Island, 641--654.
    [21]
    A. D. Sarma, X. Dong, and A. Halevy. 2008. Bootstrapping pay-as-you-go data integration systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Vancouver, 861--874.
    [22]
    S. Melnik, E. Rahm, and P. A. Bernstein. 2003. Rondo: A programming platform for generic model management. In Proceedings of the ACM SIGMOD International conference on Management of Data. ACM, San Diego, California, 193--204.
    [23]
    Aída Jiménez, Fernando Berzal, and Juan Carlos Cubero Talavera. 2010. Frequent tree pattern mining: A survey. Intelligent Data Analysis 14, 6 (2010). IOS Press, 603--622.
    [24]
    M. J. Zaki. 2005. Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 33--52.
    [25]
    Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok. 2005. Frequent subtree mining -- An overview. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 161--198.
    [26]
    M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 17, 8 (2005), 1021--1035.
    [27]
    J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16, 11 (2004), 1424--1440.
    [28]
    X. Yan, J. Han, and R. Afshar. 2003. CloSpan: Mining closed sequential patterns in large databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, San Francisco, 166--177.
    [29]
    C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, Sydney, 441--451.
    [30]
    L. Zou, Y. Lu, H. Zhang, and R. Hu. 2006. PrefixTreeESpan: A pattern growth algorithm for mining embedded subtrees. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, Wuhan, 499--505.
    [31]
    J. I. Chowdhury and R. Nayak. 2014. BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, Gold Coast, 459--471.
    [32]
    E. Rahm and P. A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (2001). Springer, 334--350.
    [33]
    Z. Bellahsene, A. Bonifati, and E. Rahm (Eds.). 2011. Schema Matching and Mapping. Springer.
    [34]
    P. Shvaiko and J. Euzenat. 2013. Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering 25, 1 (2013), 158--176.
    [35]
    M. Hamdaqa and L. Tahvildari. 2014. Prison break: A generic schema matching solution to the cloud vendor lock-in problem. In Proceedings of the International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. IEEE, Victoria, British Columbia, 37--46.
    [36]
    F. Duchateau, Z. Bellahsene, and M. Roche. 2007. A context-based measure for discovering approximate semantic matching between schema elements. In Proceedings of the International Conference on Research Challenges in Information Science. IEEE, Ouarzazate, 9--20.
    [37]
    F. Duchateau, Z. Bellahsene, M. Roantree, and M. Roche. 2007. Poster session: An indexing structure for automatic schema matching. In Proceedings of the IEEE International Conference on Data Engineering Workshop. IEEE, Istanbul, 485--491.
    [38]
    P. De Meo, G. Quattrone, G. Terracina, and D. Ursino. 2006. Integration of XML schemas at various “severity” levels. Information Systems 31, 6 (2006). Elsevier, 397--434.
    [39]
    F. Duchateau, Z. Bellahsene, and M. Roche. 2007. BMatch: A semantically context-based tool enhanced by an indexing structure to accelerate schema matching. In Journées Bases de Données Avancées. IEEE, Marseille, 1--20.
    [40]
    W. Hu, Y. Qu, and G. Cheng. 2008. Matching large ontologies: A divide-and-conquer approach. Data 8 Knowledge Engineering 67, 1 (2008). Elsevier, 140--160.
    [41]
    H. H. Do and E. Rahm. 2002. COMA -- A system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Hong Kong, 610--621.
    [42]
    H. H. Do and E. Rahm. 2007. Matching large schemas: Approaches and evaluation. Information Systems 32, 6 (2007). Elsevier, 857--885.
    [43]
    J. Madhavan, P. A. Bernstein, and E. Rahm. 2001. Generic schema matching with CUPID. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Roma, 49--58.
    [44]
    A. Algergawy, E. Schallehn, and G. Saake. 2009. Improving XML schema matching performance using Prüfer sequences. Data and Knowledge Engineering 68, 8 (2009). Elsevier, 728--747.
    [45]
    M. Lee, L. H. Yang, W. Hsu, and X. Yang. 2002. XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, McLean, Virginia, 292--299.
    [46]
    F. Giunchiglia, P. Shvaiko, and M. Yatskevich. 2004. S-Match: An algorithm and an implementation of semantic matching. In Proceedings of the European Semantic Web Symposium. Springer, Heraklion, Crete, 61--75.
    [47]
    R. Nayak and W. Iryadi. 2007. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems 20, 4 (2007). ACM, 336--349.
    [48]
    A. Algergawy, R. Nayak, and G. Saake. 2010. Element similarity measures in XML schema matching. Information Sciences 180, 24 (2010). Elsevier, 4975--4998.
    [49]
    J. Kim, Y. Peng, N. Ivezik, and J. Shin. 2011. An optimization approach for semantic-based XML schema matching. International Journal of Trade, Economics, and Finance 2, 1 (2011). IACSIT Press, 78--86.
    [50]
    M. M. Meijer. 2008. On a method for XML schema matching. In Proceedings of the 8th Twente Student Conference on Information Technology. University of Twente, Twente, 1--10.
    [51]
    I. F. Cruz, F. P. Antonelli, and C. Stroe. 2009. AgreementMaker: Efficient matching for large real-world schemas and ontologies. VLDB Endowment 2, 2 (2009). ACM, 1586--1589.
    [52]
    Y. R. Jean-Mary, E. P. Shironoshita, and M. R. Kabuka. 2009. Ontology matching with semantic verification. Web Semantics: Science, Services and Agents on the World Wide Web 7, 3 (2009). Elsevier, 235--251.
    [53]
    P. Lambrix and H. Tan. 2006. SAMBO -- A system for aligning and merging biomedical ontologies. Web Semantics: Science, Services and Agents on the World Wide Web 4, 3 (2006). Elsevier, 196--206.
    [54]
    K. Voigt. 2011. Structural Graph-Based Metamodel Matching. Ph.D. Dissertation. Technical University of Dresden, Department of Computer Science.
    [55]
    C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.
    [56]
    D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. 2005. Schema and ontology matching with COMA++. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD. 906--908.
    [57]
    P. Bille. 2005. A survey on tree edit distance and related problems. Theoretical Computer Science 337, 1--3 (2005). Elsevier, 217--239.
    [58]
    S. Melnik, H. Garcia-Molina, and E. Rahm. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proceedings of the International Conference on Data Engineering. IEEE, San Jose, California, 117--128.
    [59]
    G. Valiente. 2002. Algorithms on Trees and Graphs. Springer.
    [60]
    T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. 2001. Introduction to Algorithms (2nd ed.). McGraw-Hill Higher Education.
    [61]
    T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. 2002. Efficient substructure discovery from large semi-structured data. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Maebashi City, 158--174.
    [62]
    M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, AB, 71--80.
    [63]
    P. Plebani and B. Pernici. 2009. URBE: Web service retrieval based on similarity evaluation. IEEE Transactions on Knowledge and Data Engineering 21, 11 (2009), 1629--1642.
    [64]
    E. Stroulia and Y. Wang. 2005. Structural and semantic matching for assessing web-service similarity. International Journal of Cooperative Information Systems (2005). World Scientific, 407--438.
    [65]
    G. A. Miller. 1995. WordNet: A lexical database for english. ACM Communications 38, 11 (1995), 39--41.
    [66]
    T. Pedersen, S. Patwardhan, and J. Michelizzi. 2004. WordNet: : Similarity -- Measuring the relatedness of concepts. In Proceedings of the National Conference on Innovative Applications of Artificial Intelligence. AAAI Press, San Jose, California, 1024--1025.
    [67]
    R. Burkard, M. Dell’Amico, and S. Martello. 2009. Assignment Problems. Society for Industrial and Applied Mathematics, USA. SIAM.
    [68]
    A. V. Aho, J. E. Hopcroft, and J. Ullman. 1983. Data Structures and Algorithms. Addison-Wesley.
    [69]
    F. Duchateau and Z. Bellahsene. 2010. Measuring the Quality of an Integrated Schema. In Proceedings of the International Conference on Conceptual Modeling. Springer, Vancouver, BC, 261--273.
    [70]
    R. A. Baeza-Yates and B. A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press/Addison-Wesley.
    [71]
    D. Zhang and J. P. Tsai. 2007. Advances in Machine Learning Applications in Software Engineering. IGI Global, Hershey, PA, USA.

    Cited By

    View all
    • (2021)Composite Pattern for Autonomic Switching of Service Back-Ends between the Fog and the CloudProceedings of the 26th European Conference on Pattern Languages of Programs10.1145/3489449.3490000(1-10)Online publication date: 7-Jul-2021

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 13, Issue 1
    February 2019
    206 pages
    ISSN:1559-1131
    EISSN:1559-114X
    DOI:10.1145/3297729
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 December 2018
    Accepted: 01 August 2018
    Revised: 01 April 2018
    Received: 01 August 2015
    Published in TWEB Volume 13, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Type pattern
    2. embedded subtree
    3. pruning
    4. subtyping relation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Composite Pattern for Autonomic Switching of Service Back-Ends between the Fog and the CloudProceedings of the 26th European Conference on Pattern Languages of Programs10.1145/3489449.3490000(1-10)Online publication date: 7-Jul-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media