research-article

Mining Abstract XML Data-Types

Authors:

Dionysis Athanasopoulos,

Apostolos ZarrasAuthors Info & Claims

ACM Transactions on the Web (TWEB), Volume 13, Issue 1

Article No.: 2, Pages 1 - 37

https://doi.org/10.1145/3267467

Published: 04 December 2018 Publication History

Abstract

Schema integration has been a long-standing challenge for the data-engineering community that has received steady attention over the past three decades. General-purpose integration approaches construct unified schemas that encompass all schema elements. Schema integration has been revisited in the past decade in service-oriented computing since the input/output data-types of service interfaces are heterogeneous XML schemas. However, service integration differs from the traditional integration problem, since it should generalize schemas (mining abstract data-types) instead of unifying all schema elements. To mine well-formed abstract data-types, the fundamental Liskov Substitution Principle (LSP), which generally holds between abstract data-types and their subtypes, should be followed. However, due to the heterogeneity of service data-types, the strict employment of LSP is not usually feasible. On top of that, XML offers a rich type system, based on which data-types are defined via combining type patterns (e.g., composition, aggregation). The existing integration approaches have not dealt with the challenges of a defining subtyping relation between XML type patterns. To address these challenges, we propose a relaxed version of LSP between XML type patterns and an automated generalization process for mining abstract XML data-types. We evaluate the effectiveness and the efficiency of the process on the schemas of two datasets against two representative state-of-the-art approaches.

References

[1]

A. Doan and A. Y. Halevy. 2005. Semantic integration research in the database community: A brief survey. AI Magazine 26, 1 (2005), 83--94.

Digital Library

[2]

Carlo Batini, Maurizio Lenzerini, and Shamkant B. Navathe. 1986. A comparative analysis of methodologies for database schema integratison. ACM Computings Surveys 18, 4 (1986), 323--364.

Digital Library

[3]

R. Pottinger and P. A. Bernstein. 2003. Merging models based on given correspondences. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Berlin, 826--873.

Digital Library

[4]

T. Erl. 2005. Service-Oriented Architecture: Concepts, Technology, and Design. Prentice Hall.

Digital Library

[5]

D. Athanasopoulos, A. Zarras, P. Vassiliadis, and V. Issarny. 2011. Mining service abstractions. In Proceedings of the International Conference on Software Engineering. IEEE, HI, Hawaii, 944--947.

Digital Library

[6]

X. Liu and H. Liu. 2012. Automatic abstract service generation from web service communities. In Proceedings of the International Conference on Web Services. IEEE, HI, Hawaii, 154--161.

Digital Library

[7]

B. Liskov and J. M. Wing. 1994. A behavioural notion of subtyping. ACM Transactions on Programming Languages and Systems 16, 6 (1994), 1811--1841.

Digital Library

[8]

Erhard Rahm, Hong Hai Do, and Sabine Massmann. 2004. Matching large XML schemas. SIGMOD Record 33, 4 (2004). ACM, 26--31.

Digital Library

[9]

K. Saleem, Z. Bellahsene, and E. Hunt. 2008. PORSCHE: Performance ORiented SCHEma mediation. Information Systems 33, 7--8 (2008). Elsevier, 637--657.

Digital Library

[10]

A. Y. Halevy, A. Rajaraman, and J. J. Ordille. 2006. Data integration: The teenage years. In Proceedings of the International Conference on Very Large Data Bases. ACM, Seoul, 9--16.

Digital Library

[11]

R. Pottinger and P. A. Bernstein. 2008. Schema merging and mapping creation for relational sources. In Proceedings of the International Conference on Extending Database Technology: Advances in Database Technology. ACM, Nantes, 73--84.

Digital Library

[12]

C. Parent and S. Spaccapietra. 1998. Issues and approaches of database integration. Communications of the ACM 41, 5 (1998), 166--178.

Digital Library

[13]

Xiang Li. 2012. Constraint-Driven Schema Merging. Ph.D. Dissertation. RWTH Aachen University.

[14]

A. Baqasah, E. Pardede, and J. W. Rahayu. 2014. A new approach for meaningful XML schema merging. In Proceedings of the International Conference on Information Integration and Web-based Applications 8 Services. ACM, Hanoi, 430--439.

Digital Library

[15]

H. Ma, K.-D. Schewe, B. Thalheim, and J. Zhao. 2005. View integration and cooperation in databases, data warehouses and web information systems. Journal on Data Semantics. Springer, 213--249.

Digital Library

[16]

V. Kashyap and A. P. Sheth. 1996. Semantic and schematic similarities between database objects: A context-based approach. The VLDB Journal 5, 4 (1996). Springer, 276--304.

Digital Library

[17]

X. Li and C. Quix. 2011. Merging relational views: A minimization approach. In Proceedings of the International Conference on Conceptual Modeling. Springer, Brussels, 379--392.

Digital Library

[18]

M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. 2010. Foundations of schema mapping management. In Proceedings of the ACM Symposium on Principles of Database Systems. ACM, Indianapolis, Indiana, 227--238.

Digital Library

[19]

P. A. Bernstein, S. Melnik, M. Petropoulos, and C. Quix. 2004. Industrial-strength schema matching. ACM SIGMOD Record 33, 4 (2004), 38--43.

Digital Library

[20]

A. Radwan, L. Popa, I. R. Stanoi, and A. Younis. 2009. Top-k generation of integrated schemas based on directed and weighted correspondences. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Providence, Rhode Island, 641--654.

Digital Library

[21]

A. D. Sarma, X. Dong, and A. Halevy. 2008. Bootstrapping pay-as-you-go data integration systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Vancouver, 861--874.

Digital Library

[22]

S. Melnik, E. Rahm, and P. A. Bernstein. 2003. Rondo: A programming platform for generic model management. In Proceedings of the ACM SIGMOD International conference on Management of Data. ACM, San Diego, California, 193--204.

Digital Library

[23]

Aída Jiménez, Fernando Berzal, and Juan Carlos Cubero Talavera. 2010. Frequent tree pattern mining: A survey. Intelligent Data Analysis 14, 6 (2010). IOS Press, 603--622.

Digital Library

[24]

M. J. Zaki. 2005. Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 33--52.

Digital Library

[25]

Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok. 2005. Frequent subtree mining -- An overview. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 161--198.

Digital Library

[26]

M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 17, 8 (2005), 1021--1035.

Digital Library

[27]

J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16, 11 (2004), 1424--1440.

Digital Library

[28]

X. Yan, J. Han, and R. Afshar. 2003. CloSpan: Mining closed sequential patterns in large databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, San Francisco, 166--177.

[29]

C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, Sydney, 441--451.

[30]

L. Zou, Y. Lu, H. Zhang, and R. Hu. 2006. PrefixTreeESpan: A pattern growth algorithm for mining embedded subtrees. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, Wuhan, 499--505.

Digital Library

[31]

J. I. Chowdhury and R. Nayak. 2014. BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, Gold Coast, 459--471.

[32]

E. Rahm and P. A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (2001). Springer, 334--350.

Digital Library

[33]

Z. Bellahsene, A. Bonifati, and E. Rahm (Eds.). 2011. Schema Matching and Mapping. Springer.

Digital Library

[34]

P. Shvaiko and J. Euzenat. 2013. Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering 25, 1 (2013), 158--176.

Digital Library

[35]

M. Hamdaqa and L. Tahvildari. 2014. Prison break: A generic schema matching solution to the cloud vendor lock-in problem. In Proceedings of the International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. IEEE, Victoria, British Columbia, 37--46.

Digital Library

[36]

F. Duchateau, Z. Bellahsene, and M. Roche. 2007. A context-based measure for discovering approximate semantic matching between schema elements. In Proceedings of the International Conference on Research Challenges in Information Science. IEEE, Ouarzazate, 9--20.

[37]

F. Duchateau, Z. Bellahsene, M. Roantree, and M. Roche. 2007. Poster session: An indexing structure for automatic schema matching. In Proceedings of the IEEE International Conference on Data Engineering Workshop. IEEE, Istanbul, 485--491.

Digital Library

[38]

P. De Meo, G. Quattrone, G. Terracina, and D. Ursino. 2006. Integration of XML schemas at various “severity” levels. Information Systems 31, 6 (2006). Elsevier, 397--434.

Digital Library

[39]

F. Duchateau, Z. Bellahsene, and M. Roche. 2007. BMatch: A semantically context-based tool enhanced by an indexing structure to accelerate schema matching. In Journées Bases de Données Avancées. IEEE, Marseille, 1--20.

[40]

W. Hu, Y. Qu, and G. Cheng. 2008. Matching large ontologies: A divide-and-conquer approach. Data 8 Knowledge Engineering 67, 1 (2008). Elsevier, 140--160.

Digital Library

[41]

H. H. Do and E. Rahm. 2002. COMA -- A system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Hong Kong, 610--621.

Digital Library

[42]

H. H. Do and E. Rahm. 2007. Matching large schemas: Approaches and evaluation. Information Systems 32, 6 (2007). Elsevier, 857--885.

Digital Library

[43]

J. Madhavan, P. A. Bernstein, and E. Rahm. 2001. Generic schema matching with CUPID. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Roma, 49--58.

Digital Library

[44]

A. Algergawy, E. Schallehn, and G. Saake. 2009. Improving XML schema matching performance using Prüfer sequences. Data and Knowledge Engineering 68, 8 (2009). Elsevier, 728--747.

Digital Library

[45]

M. Lee, L. H. Yang, W. Hsu, and X. Yang. 2002. XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, McLean, Virginia, 292--299.

Digital Library

[46]

F. Giunchiglia, P. Shvaiko, and M. Yatskevich. 2004. S-Match: An algorithm and an implementation of semantic matching. In Proceedings of the European Semantic Web Symposium. Springer, Heraklion, Crete, 61--75.

[47]

R. Nayak and W. Iryadi. 2007. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems 20, 4 (2007). ACM, 336--349.

Digital Library

[48]

A. Algergawy, R. Nayak, and G. Saake. 2010. Element similarity measures in XML schema matching. Information Sciences 180, 24 (2010). Elsevier, 4975--4998.

Digital Library

[49]

J. Kim, Y. Peng, N. Ivezik, and J. Shin. 2011. An optimization approach for semantic-based XML schema matching. International Journal of Trade, Economics, and Finance 2, 1 (2011). IACSIT Press, 78--86.

[50]

M. M. Meijer. 2008. On a method for XML schema matching. In Proceedings of the 8th Twente Student Conference on Information Technology. University of Twente, Twente, 1--10.

[51]

I. F. Cruz, F. P. Antonelli, and C. Stroe. 2009. AgreementMaker: Efficient matching for large real-world schemas and ontologies. VLDB Endowment 2, 2 (2009). ACM, 1586--1589.

Digital Library

[52]

Y. R. Jean-Mary, E. P. Shironoshita, and M. R. Kabuka. 2009. Ontology matching with semantic verification. Web Semantics: Science, Services and Agents on the World Wide Web 7, 3 (2009). Elsevier, 235--251.

Digital Library

[53]

P. Lambrix and H. Tan. 2006. SAMBO -- A system for aligning and merging biomedical ontologies. Web Semantics: Science, Services and Agents on the World Wide Web 4, 3 (2006). Elsevier, 196--206.

Digital Library

[54]

K. Voigt. 2011. Structural Graph-Based Metamodel Matching. Ph.D. Dissertation. Technical University of Dresden, Department of Computer Science.

[55]

C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.

[56]

D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. 2005. Schema and ontology matching with COMA++. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD. 906--908.

Digital Library

[57]

P. Bille. 2005. A survey on tree edit distance and related problems. Theoretical Computer Science 337, 1--3 (2005). Elsevier, 217--239.

Digital Library

[58]

S. Melnik, H. Garcia-Molina, and E. Rahm. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proceedings of the International Conference on Data Engineering. IEEE, San Jose, California, 117--128.

Digital Library

[59]

G. Valiente. 2002. Algorithms on Trees and Graphs. Springer.

Digital Library

[60]

T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. 2001. Introduction to Algorithms (2nd ed.). McGraw-Hill Higher Education.

Digital Library

[61]

T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. 2002. Efficient substructure discovery from large semi-structured data. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Maebashi City, 158--174.

[62]

M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, AB, 71--80.

Digital Library

[63]

P. Plebani and B. Pernici. 2009. URBE: Web service retrieval based on similarity evaluation. IEEE Transactions on Knowledge and Data Engineering 21, 11 (2009), 1629--1642.

Digital Library

[64]

E. Stroulia and Y. Wang. 2005. Structural and semantic matching for assessing web-service similarity. International Journal of Cooperative Information Systems (2005). World Scientific, 407--438.

[65]

G. A. Miller. 1995. WordNet: A lexical database for english. ACM Communications 38, 11 (1995), 39--41.

Digital Library

[66]

T. Pedersen, S. Patwardhan, and J. Michelizzi. 2004. WordNet: : Similarity -- Measuring the relatedness of concepts. In Proceedings of the National Conference on Innovative Applications of Artificial Intelligence. AAAI Press, San Jose, California, 1024--1025.

Digital Library

[67]

R. Burkard, M. Dell’Amico, and S. Martello. 2009. Assignment Problems. Society for Industrial and Applied Mathematics, USA. SIAM.

Digital Library

[68]

A. V. Aho, J. E. Hopcroft, and J. Ullman. 1983. Data Structures and Algorithms. Addison-Wesley.

Digital Library

[69]

F. Duchateau and Z. Bellahsene. 2010. Measuring the Quality of an Integrated Schema. In Proceedings of the International Conference on Conceptual Modeling. Springer, Vancouver, BC, 261--273.

Digital Library

[70]

R. A. Baeza-Yates and B. A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press/Addison-Wesley.

Digital Library

[71]

D. Zhang and J. P. Tsai. 2007. Advances in Machine Learning Applications in Software Engineering. IGI Global, Hershey, PA, USA.

Digital Library

Cited By

Athanasopoulos D(2021)Composite Pattern for Autonomic Switching of Service Back-Ends between the Fog and the CloudProceedings of the 26th European Conference on Pattern Languages of Programs10.1145/3489449.3490000(1-10)Online publication date: 7-Jul-2021
https://dl.acm.org/doi/10.1145/3489449.3490000

Index Terms

Mining Abstract XML Data-Types
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Markup languages
        Extensible Markup Language (XML)
2. Information systems
  1. World Wide Web
    1. Web services

Recommendations

XML data mining

With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in support of mining XML data. In our framework, raw data, mining ...
Polymorphic type inference and abstract data types
Polymorphic type inference and abstract data types

Many statically typed programming languages provide an abstract data type construct, such as the module in Modula-2. However, in most of these languages, implementations of abstract data types are not first-class values. Thus, they cannot be assigned to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web

ACM Transactions on the Web Volume 13, Issue 1

February 2019

206 pages

ISSN:1559-1131

EISSN:1559-114X

DOI:10.1145/3297729

Editor:
Brian D. Davison
Lehigh University, USA

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 December 2018

Accepted: 01 August 2018

Revised: 01 April 2018

Received: 01 August 2015

Published in TWEB Volume 13, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
254
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Athanasopoulos D(2021)Composite Pattern for Autonomic Switching of Service Back-Ends between the Fog and the CloudProceedings of the 26th European Conference on Pattern Languages of Programs10.1145/3489449.3490000(1-10)Online publication date: 7-Jul-2021
https://dl.acm.org/doi/10.1145/3489449.3490000

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents