Abstract
XML Schema is becoming an indispensable component in developing web applications. With its widespread adoption and its web accessibility, XML Schema reuse is becoming imperative. To support XML Schema reuse, the first step is to develop mechanism to search for relevant XML Schemas over the web. This paper describes a XML Schema matching system that compares two XML Schemas. Our matching system can find accurate matches and scales to large XML Schemas with hundreds of elements. In this system, XML Schemas are modelled as labeled, unordered and rooted trees, and a new tree matching algorithm is developed. Compared with the tree edit-distance algorithm and other schema matching systems, it is faster and more suitable for XML Schema matching.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Banerjee, S., Pedersen, T.: Extended Gloss Overlaps as a Measure of Semantic Relatedness. IJCAI (2003)
Bunke, H.: On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Lett. 18(8), 689–694 (1997)
Bunke, H.: Recent Developments in Graph Matching. In: Proc. 15th Int. Conf. on Pattern Recognition, Barcelona, vol. 2, pp. 117–124 (2000)
Biron, P.V., Malhotra, A. (eds.): W3C, XML Schema Part 2: Datatypes (April 2000), http://www.w3.org/TR/xmlschema-2/
Budanitsky, A., Hirst, G.: Semantic distance in WordNet: An experimental, applicationoriented evaluation of five measures. In: Proceedings of the NAACL 2001 Workshop on WordNet and Other Lexical Resources, Pittsburgh (June 2001)
Do, H., Rahm, E.: COMA A system for flexible combination of schema matching approaches. In: VLDB 2002 (2002)
Do, H., Melnik, S., Rahm, E.: Comparison of Schema Matching Evaluations. In: Proc. GI-Workshop Web and Databases, Erfurt (October 2002)
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine- learning approach. In: Proc. SIGMOD Conference (2001)
Gupta, Nishimura, N.: Finding largest subtrees and smallest supertrees. Algorithmica 21, 183–210 (1998)
HITIS - Hospitality Industry Technology Integration Standard, http://www.hitis.org
Hlaoui, A., Wang, S.: A Node-Mapping-Based Algorithm for Graph Matching, submitted (and revised) to J. Discrete Algorithms (2004)
Jarmasz, M., Szpakowicz, S.: Roget’s Thesaurus and Semantic Similarity. In: RANLP 2003 (2003)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 83–97 (1955)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 292–299 (2002)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with Cupid. In: VLDB (2001)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE (2002)
Mili, A., Mili, R., Mittermeir, R.T.: A survey of software reuse libraries. Annals of Software Engineering (1998)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Shasha, D., Wang, J., Zhang, K., Shih, F.Y.: Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics 24(4) (April 1994)
Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and Applications of Tree and Graph Searching. In: Proc. PODS 2002, June 3-5 (2002)
Su, H., Padmanabhan, S., Lo, M.: Identification of syntactically similar DTD Elements for schema matching. In: Wang, X.S., Yu, G., Lu, H. (eds.) WAIM 2001. LNCS, vol. 2118, p. 145. Springer, Heidelberg (2001)
Sycara, K., Klusch, M., Widoff, S., Lu, J.: Dynamic Service Matchmaking among Agents in Open Information Environments. Journal of ACM SIGMOD Record 28(1), 47–53 (1999)
Thompson, H.S., Beech, D., Maloney, M., Mendelsohn, N. (eds.): W3C, XML Schema Part 1: Structures (April 2000), http://www.w3.org/TR/xmlschema-1/
Wang, J., Shapiro, B.A., Shasha, D., Zhang, K., Currey, K.M.: An algorithm for finding the largest approximately common substructures of two trees. IEEE Trans. PAMI 20, 889–895 (1998)
Wang, S., Lu, J., Wang, J.: Approximate Common Structures in XML Schema Matching (submitted)
WordNet–a lexical database for English, http://www.cogsci.princeton.edu/~wn/
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18(6), 1245–1263 (1989)
Zhang, K., Shasha, D., Wang, J.T.L.: Approximate Tree Matching in the Presence of Variable Length Don’t Cares. Journal of Algorithms 16(1), 33–66
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, J., Wang, S., Wang, J. (2005). An Experiment on the Matching and Reuse of XML Schemas. In: Lowe, D., Gaedke, M. (eds) Web Engineering. ICWE 2005. Lecture Notes in Computer Science, vol 3579. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11531371_38
Download citation
DOI: https://doi.org/10.1007/11531371_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27996-9
Online ISBN: 978-3-540-31484-4
eBook Packages: Computer ScienceComputer Science (R0)