Ontology Mapping with Auxiliary Resources
Frederik Christiaan Schadd
Version 2. Oktober 2015
Ontology Mapping with Auxiliary Resources
Ontology Mapping with Auxiliary Resources
PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Universiteit Maastricht,
op gezag van de Rector Magnificus,
Prof. dr. L.L.G. Soete,
volgens het besluit van het College van Decanen,
in het openbaar te verdedigen
op donderdag 17 december 2015 om 14.00 uur
door
Frederik Christiaan Schadd
Promotor:
Copromotor:
Prof. dr. ir. J.C. Scholtes
Dr. ir. ing. N. Roos
Leden van de beoordelingscommissie:
Prof. dr. ir. R.L.M. Peeters (chair )
Prof. dr. C.T.H. Evelo
Prof. dr. ir. F.A.H. van Harmelen (VU University Amsterdam)
Prof. dr. H. Stuckenschmidt (University of Mannheim)
Prof. dr. G.B. Weiss
This research has been funded by the transnational University of Limburg (tUL).
Dissertation Series No. 20XX-XX
The research reported in this thesis has been carried out under the auspices of SIKS,
the Dutch Research School for Information and Knowledge Systems.
Printed by Company name, City
ISBN xxx-xxx-xxx-xxx-x
c 2015 F.C. Schadd, Maastricht, The Netherlands.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronically, mechanically, photocopying, recording or otherwise, without prior permission of the author.
Preface
First and foremost, I’d like to thank my supervisor Nico Roos for his continuous
support, stimulation and encouragement over the past years. It is due to his guidance
that was it possible to channel my efforts into several publications and ultimately this
thesis. I have thoroughly enjoyed our thoughtful, prompt and open discussions about
research, teaching or any current topics. The humorous nature of our discussions
made our weekly meeting always a joy, never a chore. He gave me the freedom to
pursue my research ideas and actively shape this project. I would also like to thank
Jan Scholtes agreeing to be my promotor and his insightful feedback, which helped
in shaping this thesis into what it is today.
I’d also to thank my colleagues and friends at DKE, specifically: Daan Bloembergen, Hendrik Baier, Steven de Jong, Nyree Lemmens, Philippe Uyttendaele, Nela
Lekic, Matthijs Cluitmans, Daniel Claes, Haitham Bou Ammar, Michael Kaisers, Daniel Hennes, Michael Clerx, Pietro Bonizzi, Marc Lanctot, Pim Nijssen, Siqi Chen,
Wei Zhao, Zhenlong Sun, Nasser Davarzani and Bijan Ranjbarsahraei. Sharing stories during lunch time and creating new ones at conferences, courses or pubs has
been a highlight during my time as a PhD.
Further, I’d like to thank the frequent participants of the PhD-Academy events,
such as drinks, movie-nights or outdoor activities. I’d like to thank the following
people who made my stay in Maastricht particularly fun and exciting: Julie Dela
Cruz, Gabri Marconi, Mare Oehlen, Mehrdad Seirafi, Sanne ten Oever, Joost Mulders, Shuan Ghazi, Tia Ammerlaan, Rina Tsubaki, Howard Hudson, Lisa Bushart,
Anna Zseleva, Annelore Verhagen, Burcu Duygu, Christine Gutekunst, Paola Spataro, Anne-Sophie Warda, Masaoki Ishiguro, Barbara Zarzycka, Paula Nagler, Nordin Hanssen, Anna Zseleva, Roxanne Korthals, Zubin Vincent, Lukasz Wojtulewicz,
Paola Spataro, Peter Barton, Bas Ganzevles, Jo-Anne Murasaki, Dorijn Hertroijs,
Eveline van Velthuijsen, Hedda Munthe, Mahdi Al Taher, Ibrahima Sory Kaba, Mueid Ayub and Jessie Lemmens . During my time as a PhD I also started climbing as
a new hobby, thanks to which I met the following great people: Nico Salamanca, Jan
Feld, Frauke Meyer, Maria Zumbühl and Martijn Moorlag. Thanks to you I always
had something to look forward to, even if things weren’t going well with research,
be it going bouldering every week, climbing outside in France or Belgium, or any
non-sporty activity with the other great PhD-Academy people. Further, I’d like to
thank the members of MaasSAC, the Maastricht student climbing association, for
the many fun evenings of either climbing or other social activities.
Wie heisst es doch so schön? Ob Norden, Süden, Osten oder Westen, zu Hause
vi
ist es doch am besten. Ich bin dankbar für meine Freunde in meiner Heimat, besonders Marcel Ludwig, Markus Bock, Sebastian Müller, Christine Müller, Alexander
Miesen, Christopher Löltgen, Daniel Adenau, Dominik Fischer, Stefan Kolb, Stefan
Schiffer, Michael Lichtner, Stefan Bauman und Andreas Bauman für die ganzen
tollen Jahre die wir zusammen verbracht haben. Auf euch konnte ich mich immer
verlassen um Spass zu haben und um meinen Verstand wieder zu gewinnen, wann
auch immer ich nach Deutschland zurück gereist bin.
In this concluding paragraph, I’d like to thank some very important people in my
life. I’d like to thank my parents Peter Schadd and Anneke Lely for their support
and for helping me realize my own potential. I’d like to thank my brother Maarten
for his helpful advice and my baby niece Mila for being adorable. Lastly, I’d like to
express my gratitude for the support of Brigitte Schadd and Kurt Laetsch.
Frederik Schadd, 2015
Acknowledgments
The research has been carried out under the auspices of SIKS, the Dutch Research
School for Information and Knowledge Systems. This research has been funded by
the transnational University Limbug (tUL).
Table of Contents
Preface
v
Table of Contents
vii
1 Introduction
1.1 Knowledge Systems and Information Exchange . . . . . . . . . .
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Schema Integration . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Information Integration . . . . . . . . . . . . . . . . . . .
1.2.3 Ontology Engineering . . . . . . . . . . . . . . . . . . . .
1.2.4 Information Sharing . . . . . . . . . . . . . . . . . . . . .
1.2.5 Web-Service Composition . . . . . . . . . . . . . . . . . .
1.2.6 Querying of Semantic Information with Natural Language
1.2.7 Agent Communication . . . . . . . . . . . . . . . . . . . .
1.3 Core challenges within Ontology Mapping . . . . . . . . . . . . .
1.3.1 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Mapping with background knowledge . . . . . . . . . . .
1.3.3 Automatic Configuration . . . . . . . . . . . . . . . . . .
1.3.4 User involvement . . . . . . . . . . . . . . . . . . . . . . .
1.3.5 Correspondence justification . . . . . . . . . . . . . . . . .
1.3.6 Crowdsourcing . . . . . . . . . . . . . . . . . . . . . . . .
1.3.7 Alignment infrastructures . . . . . . . . . . . . . . . . . .
1.4 Problem Statement and Research Questions . . . . . . . . . . . .
1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
2.1 The Mapping Problem . . . . . . . . . . . . . .
2.2 Evaluation of Alignments . . . . . . . . . . . .
2.3 Alignment Evaluation with Partial Alignments
2.4 Ontology Alignment Evaluation Initiative . . .
2.4.1 Datasets . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
3
5
9
10
13
15
18
19
19
20
21
22
22
23
23
24
26
.
.
.
.
.
27
30
35
44
46
47
viii
3 Mapping Techniques
3.1 Basic Techniques . . . . . . . . . .
3.1.1 System Composition . . . .
3.1.2 Similarity Metrics . . . . .
3.1.3 Similarity Aggregation . . .
3.1.4 Correspondence Extraction
3.2 Mapping system survey . . . . . .
Table of Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Concept-Sense Disambiguation for Lexical Similarities
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Lexical Similarity Measures . . . . . . . . . . . . .
4.1.2 Word-Sense Disambiguation . . . . . . . . . . . . .
4.1.3 Virtual Documents . . . . . . . . . . . . . . . . . .
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Methods of Word-Sense Disambiguation . . . . . .
4.2.2 Word-Sense Disambiguation in Ontology Mapping
4.3 Concept Sense Disambiguation Framework . . . . . . . . .
4.3.1 Concept Disambiguation . . . . . . . . . . . . . . .
4.3.2 Lexical Similarity Metric . . . . . . . . . . . . . .
4.3.3 Applied Document Model . . . . . . . . . . . . . .
4.3.4 Term-Frequency Weighting . . . . . . . . . . . . .
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Concept Disambiguation . . . . . . . . . . . . . . .
4.4.2 Framework Comparison . . . . . . . . . . . . . . .
4.4.3 Weighting Schemes Experiments . . . . . . . . . .
4.4.4 Runtime Analysis . . . . . . . . . . . . . . . . . . .
4.5 Chapter Conclusions and Future Work . . . . . . . . . . .
4.6 Chapter Conclusions . . . . . . . . . . . . . . . . . . . . .
4.7 Future Research . . . . . . . . . . . . . . . . . . . . . . .
5 Anchor Profiles for Partial Alignments
5.1 Related Work . . . . . . . . . . . . . . . . . . .
5.2 Anchor Profiles . . . . . . . . . . . . . . . . . .
5.3 Experiments . . . . . . . . . . . . . . . . . . . .
5.3.1 Evaluation . . . . . . . . . . . . . . . .
5.3.2 Performance Track Breakdown . . . . .
5.3.3 Alternate Profile Creation . . . . . . . .
5.3.4 Influence of Deteriorating PA Precision
5.3.5 Comparison with other Frameworks . .
5.4 Chapter Conclusions and Future Work . . . . .
5.4.1 Chapter Conclusions . . . . . . . . . . .
5.4.2 Future Research . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
51
53
56
64
68
73
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
84
85
87
89
90
90
91
92
94
94
96
100
100
101
103
107
110
112
112
113
.
.
.
.
.
.
.
.
.
.
.
115
116
117
120
121
122
124
126
129
130
130
131
Table of Contents
6 Anchor Evaluation using Feature Selection
6.1 Anchor Filtering . . . . . . . . . . . . . . .
6.2 Proposed Approach . . . . . . . . . . . . . .
6.2.1 Filtering using Feature Selection . .
6.3 Evaluation . . . . . . . . . . . . . . . . . . .
6.3.1 Syntactic Similarity . . . . . . . . .
6.3.2 Structural Similarity . . . . . . . . .
6.3.3 Lexical Similarity . . . . . . . . . . .
6.4 Chapter Conclusion and Future Research .
6.4.1 Chapter Conclusions . . . . . . . . .
6.4.2 Future Research . . . . . . . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
134
135
138
140
142
143
144
145
145
146
7 Anchor-Based Profile Enrichment
7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Semantic Enrichment . . . . . . . . . . . . . .
7.2 Profile Similarities and the Terminological Gap . . . .
7.3 Anchor-Based Profile Enrichment . . . . . . . . . . . .
7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Benchmark . . . . . . . . . . . . . . . . . . . .
7.4.2 MultiFarm . . . . . . . . . . . . . . . . . . . .
7.4.3 Influence of Partial Alignment Size . . . . . . .
7.4.4 Comparison with Lexical Enrichment Systems .
7.5 Chapter Conclusion and Future Work . . . . . . . . .
7.5.1 Chapter Conclusions . . . . . . . . . . . . . . .
7.5.2 Future Research . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
148
148
149
151
156
157
159
161
162
163
163
164
8 Conclusions and Future Research
8.1 Conclusions on the Research Questions . . . . . . . . . . . .
8.1.1 Concept Disambiguation . . . . . . . . . . . . . . . .
8.1.2 Exploiting Partial Alignments . . . . . . . . . . . . .
8.1.3 Filtering Partial Alignments . . . . . . . . . . . . . .
8.1.4 Matching Terminologically Heterogeneous Ontologies
8.2 Conclusion to Problem Statement . . . . . . . . . . . . . . .
8.3 Recommendations for Future Research . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
167
167
168
169
170
171
172
173
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
References
175
List of Figures
197
List of Tables
199
List of Algorithms
200
Addendum: Valorization
201
Summary
207
x
Table of Contents
Samenvatting
213
About the Author
219
Chapter 1
Introduction
1.1
Knowledge Systems and Information Exchange
Many technologies that emerged from the dawn of the information age rely on the
accessibility of stored information. To be able to interpret the stored data, it is
required that the data is structured and annotated with regard to its meaning and
relation to other data entries. To formally specify how data in a knowledge system
is structured, one typically creates an ontology. An ontology is defined as an explicit
specification of a conceptualization (Gruber, 1993). In essence, given a domain, an
ontology defines a list of concepts which exist in this domain, what data values
are associated with each concept and how the concepts are related to each other.
Depending on the technology used to express the ontology, an ontology description
can even include logic-based axioms, facilitating logical reasoning over an ontology
and the data that is encoded using that ontology.
The implicit intention behind the creation of an ontology is that it represents a
consensus on how a given domain should be modelled with regard to its potential applications in a knowledge system. The term consensus implies that multiple parties
are involved in the creation of the ontology. A party may contribute to the ontology development process by for instance suggesting terminology or relations, or by
discussing or criticizing the suggestions of other parties. This process iterates until
all parties are in agreement about the correctness and completeness of the ontology.
After the consensus is reached, all involved parties can implement and populate their
knowledge systems based on a single ontology. A globally shared ontology has the
advantage that it facilitates the smooth access and exchange of information between
different knowledge systems.
Ideally, ontologies should be shared in order to facilitate the interoperability
between knowledge systems. Unfortunately, it is very common that businesses and
developers choose to create their own ontology instead of adopting an already created
ontology. This decision can be justified by multiple motivations. Technical motivations can be the particular knowledge system being required for different tasks, the
developers following different design principles or the domain expert developing the
ontology having a different perspective on the domain. Another motivation can be
2
Introduction
that a fully developed ontology represents a strategic asset to a company. Therefore,
a company might be unwilling to share its ontology with a competitor. One option
for preventing the development of many ontologies modelling the same domain involves the creation of a global standard, created by a consensus reached by domain
experts, which can then be adopted by all relevant knowledge systems. However, it
is possible that the experts tasked with the creation of a global standard do not agree
on how the given domain should be modelled, leading to the situation where for a
given domain a global standard does not exist. This leads to a common situation
where the exchange of information between two knowledge systems is impeded due
to the deployment of differing ontologies by the systems.
Instead or trying to create ontology standards, it is also possible to take a different approach by facilitating information exchange between knowledge systems
utilizing different ontologies is the process of ontology mapping. This approach will
be the main focus of this thesis. Ontology mapping involves the identification of
correspondences between the definitions of two given ontologies. A correspondence
denotes a certain semantic relation, e.g. equivalence or subsumption, which is ascertained to hold between two ontology concepts. It follows that a correspondence
denotes that two given ontology concepts are likely used to encode the same type
information. Given a correspondence between two concepts, one can develop a bidirectional function which is capable of transforming the encoded information of
one concept such that it conforms to the specification of the other. The collection
of all correspondences between two ontologies is referred to as an alignment or mapping. Thus, the identification of an alignment between two ontologies, which denotes
all concept pairs which encode the same information, forms the foundation for the
facilitation of information exchange between two heterogeneous knowledge systems.
Ontology mapping is applicable to various processes which regularly occur in
businesses which maintain an information infrastructure or web-service based businesses. Example processes involving ontology-based knowledge systems which businesses need to perform are knowledge system mergers, ontology evolution, query
answering or data translation. Since these processes aide businesses in their daily
operations or realizations of long term goals, they are of strategical importance,
therefore making ontology mapping a vital tool for the operation of a business.
An alignment between two ontologies is created by measuring the similarities
between their modelled concepts. Highly similar concept-pairs are extracted which
then form the alignment. To what extent two given ontology concepts are likely to
denote the same information is measured by determining the overlap or similarities
between their respective available meta-data. Exploitable meta-data can for instance be concept names or labels, structural information, or data that has already
been modelled using that ontology. Specialized techniques examine some type of
meta-data and quantify the similarity between two inputs. Many approaches have
been developed for the purpose of determining concept similarities. While many
approaches only utilize the information encoded within the ontology, some novel approaches attempt to utilize external information to aid in the comparison of ontology
concepts.
This thesis aims at extending the available scope of approaches which utilize
external information in order to compute concept correspondences. Specifically, the
1.2 — Applications
3
presented research focuses on two types of external information: (1) lexical resources
and (2) partial alignments. The first type, being lexical resources, can be described as
an enriched thesaurus. A thesauri typically consists of a list of word definitions. Each
word is commonly defined by a set of synonyms and a written explanation. Next to
the word definitions, a lexical resource also contains relations which hold between
the different words, allowing a reader to quickly query related definitions in order to
gain more information about a certain word. For example, a very popular thesaurus
is Wikipedia, with the English variant containing more than 4 million articles. The
second category of investigated external information is a special type of alignment.
An alignment between two ontologies is a complete list of all concept-pairs which can
be used to model the same information. A partial alignment between two ontologies
however is an incomplete list of concept-pairs. These can be the result of a domain
expert attempting to create an alignment but being unable to finish it. Therefore,
partial alignments can be exploited in order to create complete alignments.
1.2
Applications
Comparing and mapping meta-data of structural models is a common task occurring
in a variety of fields. This task has its origins in the field of databases, where the
objective is to match database schemas such that one can transfer data between
two databases (Batini, Lenzerini, and Navathe, 1986; Kim and Seo, 1991). In the
past decade, newer technologies for the representation and application of ontologies
became available. With their introduction, the problem of enabling information
exchange became more prevalent. We will provide an overview of a selection of
notable applications where ontology mapping can provide a viable long-term solution
for potential ontology interoperability problems. In order, the discussed applications
are (1) schema integration, (2) information integration, (3) ontology engineering,
(4) information sharing, (5) web-service composition, (6) Natural-language-based
querying of semantic information and (7) agent communication.
1.2.1
Schema Integration
Schema integration (Batini et al., 1986; Sheth and Larson, 1990; Sheth, Gala, and
Navathe, 1993; Batini and Lenzerini, 1984; Spaccapietra and Parent, 1994; Parent
and Spaccapietra, 1998) is the oldest process which involves the task of ontology
mapping. In this category, the general problem is that two parties, each possessing
a knowledge resource with corresponding ontology, wish to establish the exchange
of information between their respective resources. This exchange of information can
be unidirectional or bidirectional.
Figure 1.1 depicts a generalized schema integration task. Here, Party 1 and Party
2 each operate a knowledge resource in which information is stored, possibly using
4
Introduction
Party 1
Party 2
Matcher
Receives:
Issues Query:
”Green Car”
Car#1465
Car#1843
Car#2071
Translated Query:
”Green Automobile”
Ontology 1
Alignment
Ontology2
Translate data
Query:”Green Car”
contains
Car#1465
Query:”Green
Automobile”
contains
Automobile#1843
Automobile#2071
Figure 1.1: Example illustration of a schema integration task.
different technologies such as SQL1 , XML2 , XSD3 , RDF4 or OWL5 . Each party has
encoded its information using an ontology which fulfils the party’s own needs, in this
example represented as Ontology 1 and Ontology 2. The general task of information
integration involves the creation of an alignment between the two ontologies, such
that each party can gain access to the data stored in the other party’s information
system. This is illustrated by Party 1 issuing a query concerning a Green Car.
The created mapping allows for the translation of the query into the terminology of
Ontology 2, such that the example query is now reformulated as Green Automobile.
The translated query can then be executed on the ontology belonging to Party 2,
resulting in a list of relevant data entries. These entries however are still expressed
1 Structured-queryl-language(SQL). Query language for accessing and managing relational
databases. Commercially released in 1979 by Relational Software Inc. (now Oracle Corporation),
it became widely used in the following decades as the standard data storage and management
technology.
2 Extensible Markup Language (XML). Markup language for storing information in a structured
and machine-readable way. Since it is a text-based storage format, it is widely used to transfer
structured data over the internet.
3 XML-Schema-Definition(XSD). Schema language for defining how specific types of XML documents should be structured.
4 Resource-Description-Framework(RDF). Data-model based on making statements about resources. Statements are formulated as triplets in a subject-predicate-object format, with a collection of triples essentially forming a directed and labelled multigraph. Data in an RDF storage can
link to resources of other storages.
5 Web-Ontology-Language(OWL). Language for specifying ontologies based on formal semantics.
A variety of syntaxes exist for expressing OWL ontologies, for instance utilizing XML or RDF.
1.2 — Applications
5
using the terminology of Ontology 2, meaning that these cannot yet be processed by
the knowledge system of Party 1. However, using the alignment between Ontology 1
and Ontology 2, all retrieved data entries can be translated back into the terminology
of Ontology 1, such that these can be presented as query results to Party 1.
The process of schema integration is performed regularly in the corporate field. A
company’s knowledge system is typically the responsibility of the company’s Chief
Information Officer (CIO). One of these responsibilities is Enterprise Meta-data
Management (EMM), which includes the design and maintenance of the ontology
of the company’s knowledge system. Should a company wish to gain access to an
external knowledge system, it becomes the responsibility of the CIO to oversee the
integration of the new knowledge system.
Suppose that two companies wish to merge their operations, or that one company performs a takeover of another company. Both companies are likely to possess
knowledge systems storing critical information, like customer information or product
data. Here, the critical task is that the two knowledge resources have to be merged
into a single resource. In a takeover scenario this is typically the resource belonging
to the company performing the takeover. As the first step, a mapping needs to be
created between the two ontologies. These are likely to differ as the two companies
had different requirements for their own resources during their creation. These differences can stem from the system designer following different design principles, the
companies operating in different areas and even from the companies having different
business goals, which might have to be adjusted after the merger/acquisition.
Alternatively, two companies may decide to strategically cooperate, requiring the
free exchange of information between their respective knowledge resources. In such a
scenario, each company needs access to the other company’s respective information
without having to alter its own internal infrastructure. A mapping would thus allow
a company to seamlessly access the partner’s company data.
1.2.2
Information Integration
Information integration is another common task which requires a mapping between
ontologies. Processes which we define under information integration are catalogue
integration (Bouquet, Serafini, and Zanobini, 2003; Ding et al., 2002), data integration (Halevy, Rajaraman, and Ordille, 2006; Arens, Knoblock, and Shen, 1996; Calvanese et al., 1998; Subrahmanian et al., 1995; Halevy et al., 2005; Lenzerini, 2002)
and data warehousing (Bernstein and Rahm, 2000). Here, access to heterogeneous
sources is facilitated by the creation of a common ontology, also referred to as a
mediator. In the following paragraphs we will provide a general illustration of an
information integration scenario. In subsections 1.2.2 and 1.2.2 we will discuss catalogue integration and data integration in more detail.
Figure 1.2 illustrates an example scenario of an information integration problem.
One is given a series of knowledge resources, with every resource being encoded using
its own specific ontology, denoted here by Local Ontology 1 and Local Ontology 2.
The goal is to provide the user with a single interface with which a multitude of
resources can be queried. The accessible resources can also be implemented using
different technologies such as SQL, XML, RDF or OWL. A Common Ontology is
6
Introduction
Common/Mediator
Ontology
Matcher
Alignment
Local Ontology 1
Matcher
Alignment
Local Ontology 2
Figure 1.2: Example illustration of an information integration task.
created which models all information that is encoded using the local ontologies.
Creating a mapping between the Common Ontology and all local ontologies would
allow the user to formulate a query by only using the terms of the Common Ontology.
That query can then be translated into the terminologies of each local ontology, thus
enabling access to all local ontologies. After the query is executed on each individual
resource, all retrieved search results are translated back into terms of the common
ontology and merged before being presented to the user. Depending on the domain,
the process of merging these multiple result lists might remove duplicate results or
specifically contrast very similar results so they can be compared easier.
For example, a user might desire the answer to the query “find a DVD with the
title ’Sherlock’ ”. When entered into an information system modelling media, the
query is then translated into the ontology-terminology used by the different connected resources, for example amazon.com or bol.com. The information integration
system would execute the translated queries on each resource, translate the results
back into the terms of the common ontology and present a merger of all results to
the user. Note that at no point was it necessary for the user to directly interact
with the individual resources, interaction was limited to the interface provided by
the integration system.
The problem presented in sub-section 1.2.1, denoted as schema integration, is
very related to the task of information integration, despite the fact that it typically
1.2 — Applications
7
does not involve the creation of a common ontology. If one were to interpret one
of the given ontologies as a common ontology, then the entire system can be interpreted as an information integration system where one local ontology is already
interoperable with the common ontology. Also, it may be the case that the two
ontologies do not overlap perfectly with regard to the scope of the modelled data.
For example, given two ontologies A and B modelling the pharmaceutical domain it
may be the case that herbal medicine is modelled in A and experimental medicine is
only modelled in B. In this scenario, designating either A or B as mediator ontology
would require an update to the mediator such that the information of the remaining
ontology is modelled as well. This updated version of A or B can thus be interpreted
as the mediator ontology of an information integration system.
Information integration systems share some similarities with Federated Search
systems. A Federated Search system is a type of meta information-retrieval-system,
where user queries are distributed among local collections (Shokouhi and Si, 2011).
A local collection represents a single search engine, such as Google or Yahoo. The
results of the local collections are merged into a single list and presented to the
user. There are however some key differences between Federated Search systems and
information integration systems. First, both local collections and federated search
systems can only process queries that are expresses as simple strings. These strings
typically contain keywords or a natural language phrase. Therefore, any query is
compatible with any search system, since the query does not need to be translated
into a different terminology. The quality of the search results thus depends on
the strength of the algorithms deployed by each local collection. While a federated
search system does not employ a global ontology, it does utilize a management
system referred to as the Broker. The Broker is responsible for determining which
local collections are relevant to a particular query, forwarding the query to each
relevant local collection, retrieving the search results of each local collection and
merging different result lists into a single list. The second key difference is that the
retrieved results are a series of documents rated by relevance instead of a list of data
entries. Thus, if a particular piece of data is required, the user must still inspect
every document until the desired data is found.
Catalogue Integration
Businesses which produce and sell a vast array of different products are likely to
store information about their products in a catalogue. Here, the ontology used to
model the product data is designed for the specific goal of organizing and classifying
their products. An ontology that is designed for such a goal is also referred to
as hierarchical classification (HC). In a business-to-business application (B2B), a
company selling products from its catalogue might wish to sell its products on a
marketplace. Marketplaces, such as eBay or the Amazon Marketplace, have current
offerings organized using their own HC. Thus, if a seller wishes to automatically sell
his products on the different available marketplaces, it becomes necessary to generate
a mapping between the seller’s HC and the HC of each marketplace. This task is
referred to as the catalogue matching problem (Bouquet et al., 2003; Ding et al.,
2002). With such mappings in place, a seller is able to automatically translate
8
Introduction
product descriptions and specifications into the terminology of the marketplace HC
and submit products to that marketplace for sale to customers.
There have been initiatives to standardize product classifications by creating a
global HC. A global HC would then model all types of products such that if one is able
to express product details using the global HC terminology, one would then be able to
transfer information between all catalogues which support the global HC. Examples
of such global HC’s are UNSPSC, eCl@ss, eOTD and the RosettaNet Technical
Dictionary. Related standards can be found in trade-item-identifier systems, such
as the International Article Number (EAN) system and the Universal Product Code
(UPC) system. These systems provide a unique numerical value for every individual
product . Products are fitted with a depiction of their respective item code in the
form of a bar-code, such that the trade of these products can be easily tracked
in stores. Contrary to a HC, a trade-item-identifier system does not contain a
hierarchical structure with which products can be classified and organized. If a
knowledge system of a company however does not support the global HC which
is adopted by a certain marketplace, then that company would have to create a
mapping between its own catalogue and the global HC such that the company can
translate its product descriptions such that it can be entered in the marketplace.
Even if a knowledge system has adopted a specific global HC, it might differ
from the HC deployed in the marketplace. This can stem from the marketplace
having adopted a different HC or having developed its own. An additional issue is
that the creation and maintenance of such global HCs presents challenges. Often
they are unevenly balanced, lack in specificity and require a significant amount of
maintenance in order for the HCs to keep up with current developments (Hepp,
Leukel, and Schmitz, 2007). Submitting suggestions, requests or additions to a
central governing entity that manages a global HC can be a time consuming process,
which can cause frustration for companies having adopted a HC and wishing for
changes that suit their personal needs. Lowering the barrier for submitting changes
could potentially alleviate this problem. The OntoWiki (Hepp, Bachlechner, and
Siorpaes, 2006) project demonstrates that a de-centralized system, here comprised
of several Wikipedia communities, is a possibility for the creation of a consensual
global categorization.
Given that a global HC is constantly being managed and updated, the adaptation
of a global HC would also mean that eventually an update might induce changes as
a result of which the current terminology is no longer compatible with the updated
version. Here, a mapping between the two HCs is required in order to restore
compatibility. This type of problem is further discussed in subsection 1.2.3.
Data Integration
Data Integration (Halevy et al., 2006; Arens et al., 1996; Calvanese et al., 1998;
Subrahmanian et al., 1995) is a special type of information integration problem, of
which the key aspect is that the data is not fully loaded into a central information
system before the exchange of information (Halevy et al., 2005). Here, a mapping
for each element of a source ontology is denoted as a query over the target ontology.
One can distinguish these mappings as a Local-as-View (LAV) and Global-as-View
1.2 — Applications
9
(GAV) approach (Lenzerini, 2002).
In a GAV approach, a mapping between the global ontology and a local ontology
is formulated such that each concept of the global ontology is mapped with a query
over the local ontology. As an example, assume that a global ontology G models the
concept vehicle and that a local ontology L models the concepts car and motorcycle.
In a GAV approach a mapping for the concept vehicle can be expressed as:
vehicle ↔ SELECT
∗
FROM car, motorcycle
This principle tends to be more advantageous if the local ontologies are considered
stable, i.e. experience little to no changes or updates. The LAV approach differs to
GAV with regard to which ontology concepts in a mapping are denoted as a query.
Here, the concepts of the local ontologies are expressed as queries over the global
ontology in the mapping. Returning to the previous example, a LAV mapping for
the concept motorcycle from the local ontology L can be expressed as:
SELECT
∗
FROM vehicle WHERE wheels = 2 ↔ motorcycle
This approach is effective if the global ontology in the information system is stable
and well established, preferably as a global standard for the given domain.
In both the LAV and GAV approach, the queries are processed using an inference
engine, which allows the query to be expressed using the terminology of each local
ontology. In a Data Integration system the data of each local ontology is not fully
loaded into a central information system. This has the distinct advantage that the
relevant data of each query is retrieved during the execution of each query. For a
user this means that he has always access to the most up-to-date information across
the entire system.
1.2.3
Ontology Engineering
Ontology Engineering is the process of building and maintaining an ontology for a desired knowledge system. Particularly the maintenance aspect of ontology engineering
might require the application of mapping techniques. For knowledge systems that
are deployed over a long period of time, it can occur that certain changes have to be
made. These can be small and incremental for the purpose of improving the ontology,
causing it to evolve over time. More abrupt and significant changes might also occur,
especially when there is a change in the requirements or a shift of corporate strategy
(Plessers and De Troyer, 2005; Rogozan and Paquette, 2005; Redmond et al., 2008).
Furthermore, if the ontology is distributed among multiple users, it can occur that
each user develops different requirements over time. Each user would then wish to
update the ontology such that it suits his needs. This can lead to different versions
of the same original ontology being deployed among multiple internal or external
information systems. Often the individual changes serve the same purpose but are
executed differently, due to the system designers of a single information system not
having access to the change-log of other information systems. This causes the designers to be unaware of any changes that have already been performed across the
different systems. Hence, it can occur that sufficient changes to an ontology cause it
10
Introduction
to be incompatible with its original version (Klein and Noy, 2003; Noy and Musen,
2002; Hepp and Roman, 2007; Noy and Musen, 2004). In this case one can perform ontology mapping so that one can transfer the data encoded using the original
version into a system using the updated ontology.
Matcher
Ontology Version X
Alignment
Ontology Version X+1
Generator
Transformation
Figure 1.3: Example of an ontology engineering task.
A general mapping task in an ontology engineering scenario is presented in Figure
1.3. An XML based knowledge system has its data encoded using Ontology Version
X. At some point in time changes are made to that ontology, resulting in the creation
of Ontology Version X+1. This ontology encodes the information of the same domain, but in a slightly different way. For instance, in the new ontology entities could
have been added, removed or renamed or data values could have been altered such
that these use different data-types or model data up to a different accuracy. In this
scenario, an ontology mapping approach needs to find all corresponding concepts
between the old and new ontology. Based on the generated mapping, a transformation needs to be created which dictates how data needs to be processed such that
it conforms to the new ontology. Using the transformation, all data instances can
then be converted such that these conform to the new ontology.
1.2.4
Information Sharing
A different type of information exchange infrastructures are Peer-To-Peer (P2P) systems (Androutsellis-Theotokis and Spinellis, 2004; Ehrig et al., 2004; Pouwelse et al.,
2005). In a P2P system content is spread across different nodes. These nodes can
be internal servers or spread around the globe. The nature of P2P systems provides several distinct advantages over centralized systems, such as easy scalability
1.2 — Applications
11
and robustness against attacks or defects, resulting in the benefit that information
stays available even if nodes are taken off-line for a given reason. Thus, a distinct
feature of a P2P system is that it is decentralized. One can categorize the degree of
decentralization in three categories (Androutsellis-Theotokis and Spinellis, 2004):
Hybrid Decentralized A set of client systems, with each system storing information that is available over the entire network. The clients are connected to
a central directory server which maintains a list of clients and a list of files
that each client offers. A given client desiring information connects first to
the central server to obtain a list of connected clients. Then, the given client
individually connects with each client in the network in order to obtain the
required information.
Purely Decentralized Each node in the network acts both as a client and a server,
meaning that there is no central dedicated server which coordinates network
activity. Network activity is propagated using a broadcast-like mechanism,
where a node upon receiving a message, for example a file request, will not
only answer the message but also repeat it to its neighbours. The neighbour
responses are then back-propagated through the message trace until they are
received by the original sender.
Partially Centralized A partially centralized system is similar to a purely decentralized system in that it has no central dedicated server. Instead, nodes are
dynamically assigned the task of super-node based on the bandwidth and computing power of the node. A super-node acts as a server for a small cluster of
nodes, indexing its files and possibly cashing the files of its assigned nodes. By
acting as a proxy, the super-node initially handles many of the search requests
on behalf of its assigned nodes, thus reducing the computational load of these
nodes.
Some P2P systems, such as BitTorrent, eDonkey or Gnutella, describe their content using a globally adopted ontology, where files are for instance annotated using
attributes such as ’name’, ’title’, ’release date’ or ’author’. Thus, the problems one
encounters when faced with heterogeneous ontologies are circumvented by enforcing
an ontology over all connected information sources. However, for a P2P-based data
exchange on the semantic web it is likely the case that the enforcement of a global
ontology is a undesirable solution due to the autonomy of each node being of importance (Nejdl et al., 2002; Ives et al., 2004). Here, instead of individual people
the nodes of the P2P system would represent the knowledge systems of companies
or organizations, who are likely to have different requirements for their own system
and might be unwilling to convert their system to a different ontology for the sole
purpose of joining a P2P network. If the nodes of a P2P network utilize difference
ontologies, then it is necessary that the nodes are able to map their respective local
ontologies in order for queries and information to be successfully transferred across
a P2P network. Such a scenario is illustrated in Figure 1.4 for a hybrid decentralized
system.
In this scenario, a client using Local Ontology 1 would connect to a central
directory server. This server returns a list of known clients. The clients establish
12
Introduction
Matcher
A: The IT Crowd,
Little Britain,
Mitchel & Webb,
...
Alignment
Local Ontology 2
A
knows
Local Ontology 3
Generator
Wrapper
A
A
Mediator
Q
Matcher
Alignment
Local Ontology 1
Generator
Wrapper
Q: find british
comedy series
Wrapper
A
Mediator
Q
Q
list of
peers
Q
requests
peers
knows
Central Directory Server
Figure 1.4: Information sharing in a hybrid decentralized P2P system.
communication with each other and the server using a specific P2P protocol, which
is implemented as a wrapper for the knowledge system of each client. In the case
that the information of the retrieved clients are encoded using a different ontology,
in this example Local Ontology 2 and Local Ontology 3, the client would need to
map its ontology with the ontologies of all other clients within the network. Once
the mappings are created, the given client can then send appropriately translated
queries to all clients in the network and interpret their responses. Note that without
a mapping process as described subsection 1.2.1 it is only possible to exchange
information with clients using the same ontology, meaning that only part of all
information residing in the network is available to the user.
For purely decentralized networks the ability to communicate between clients is
also a very vital issue. Here, any given client has only a limited amount of direct
connections to other clients within the network. The remaining clients within the
network are reached by forwarding and back-propagating queries and answers. If the
directly connected clients do not utilize the same ontology, then this means that the
given client cannot submit appropriately formatted queries, resulting in the client
essentially being completely isolated from the rest of the network. The analytical
metrics to measure properties of networks, including vulnerabilities and reachability
of nodes, originate from the field network science (Börner, Sanyal, and Vespignani,
2007). For example, measures of centrality can be used to express the reachability of
one or multiple nodes within the graph, whereas the measure of modularity expresses
how strongly a network is separated into different groups. For network with a high
1.2 — Applications
13
modularity it is of high importance that nodes connecting different clusters can
exchange information. Otherwise entire clusters of nodes can be cut off from the
rest of the network.
Significant issues can also occur in a partially centralized network. If a node
utilizes a different ontology than its assigned super-node, then this would effectively
result in the entire network being unable to access the information of that node. This
occurs due to the super-node being unable to index the information of the given node.
The problem of inaccessible information is exacerbated if the ontology of a supernode differs from the ontologies of the other super-nodes in the network. Such a
situation would result in an entire cluster of clients being isolated from the system,
which could eventually completely fragment the network if a sufficient quantity of
different ontologies are employed by the super-nodes.
1.2.5
Web-Service Composition
The original capabilities of the internet were designed as a tool for the global access of information by humans. With the emergence of Semantic Web technologies it
also became possible for businesses to offer services directly via the internet (Bussler,
Fensel, and Maedche, 2002). The Semantic Web envisions web-services expressing
their capabilities using precise semantics, such that autonomous agents that roam
the web can automatically discover these services, interpret their functionality, decide which service best suits the agents’ needs and interact with the most appropriate
service. The availability of web-services means that businesses can effectively outsource certain components of their applications, for example querying services or
data processing, to web-services which specialize in specific areas and tasks. Thus,
a service offered by a business can essentially become a composite of interconnected
services.
The process of Web-Service Composition can be described by three core tasks
which need to be performed (Sycara et al., 2003): (1) a planning task, where all
processes in the application are specified, especially with regard to the inputs and
outputs of certain components, (2) a discovery task, where web-services are discovered which can execute the tasks described in the components of the plan, and (3)
the management of interaction between the identified services. Of those three tasks,
two of these might require the application of mapping techniques in order to resolve
certain heterogeneities. The most obvious application is the task of web-service integration. Once a service is identified, it may become necessary to perform a mapping
step in order to be able to translate queries into the input format specified by the
ontology which is used by the accessed web-service. This task is comparable to the
process of information integration, described in section 1.2.2.
The task of web-service discovery requires more unique mapping techniques
(Medjahed, Bouguettaya, and Elmagarmid, 2003; Maximilien and Singh, 2004; Klusch,
Fries, and Sycara, 2009). In order for a web-service to be discovered by an agent, a
semantic description needs to be matched with a designed profile of the desired service. Service descriptions are typically expressed using specialized ontologies, such
14
Introduction
as DAML-S6 (Coalition et al., 2002) and OWL-S7 (Martin et al., 2004). The first
issue that arises here is that the ontologies used to describe the service capabilities
need to be interoperable, such that all details of a service are expressed using the
same terminology. It can occur that a web service expresses its capabilities using
a different ontology than the one used by the business performing the service composition. Thus, in this case schema integration needs to be performed between the
two service description ontologies.
Mapping the service description ontologies ensures a compatibility with regard to
how a service description is expressed. As a next step, all encountered descriptions
need to be translated and mapped to the required service description so that the
most appropriate service can be found. Here, all instances of the service ontology and
their associated attribute data, referred to as a service profile, need to be examined
and compared to the profile representing the desired service (Medjahed et al., 2003;
Klusch, Fries, and Sycara, 2006). This problem is essentially a soft version of an
instance mapping problem (Ferrara et al., 2008). In instance mapping, one is given
two instances belonging to the same ontology or two different ontologies and one has
to determine whether or not the two given instances denote the same object. While
in instance mapping the desired output is a ’true’ or ’false’ statement, in the case
of mapping service profiles one would need to express the output as a continuous
value, since one would rather choose a service being capable of almost all required
tasks than a service capable of none of them, despite neither being able to perform
all desired tasks. The processes of instance mapping and ontology mapping are very
interrelated, since one can utilize instance mapping to solve an ontology mapping
problem and vice-versa (Wang, Englebienne, and Schlobach, 2008). Additionally,
both problems have theoretical similarities, where in both cases the inputs are two
lists of entities, lists of concepts for ontology mapping and lists of individual entities
for instance mapping, and the mapping system needs to exploit the available metadata in order to determine the overlapping entities between the two lists. A result
of this interrelatedness is that systems which tackle either ontology mapping or
instance mapping problems often exhibit a significant overlap with regards to their
applied techniques.
Figure 1.5 depicts a web-service composition scenario. A given web-application
wishes to outsource a component of its system to a web-service. To achieve this,
it creates a service description which describes the ideal properties of the desired
system. In this example, the description is formulated using the DAML-S service
ontology. Any web-service on the internet might advertise its capabilities using
a different service ontology, however. In this example, the ideal web-service has
expressed its capabilities using OWL-S. Therefore, in order to be able to determine
whether the service is appropriate, the terminology of OWL-S must be matched
to DAML-S such that the service descriptions can be compared. After a mapping
between the two service ontologies is established, a translator is generated which is
6 DARPA Agent Markup Language for Services (DAML-S). Built using the DAML+OIL ontology language. DAML+OIL combines features from both the DARPA Agent Markup Language
(DAML) and the Ontology Inference Layer (OIL).
7 Built using the Web-Ontology-Language(OWL). OWL is used for specifying ontologies based
on formal semantics.
1.2 — Applications
15
Matcher
DAML-S
Alignment
OWL-S
instance-of
instance-of
Matcher
Required Service
Description
Alignment
Translator
Translated
Description
Service
Description
describes
component
describes
Web
Application
WebService
O
I
Mediator
I
O
Figure 1.5: Mapping tasks in a web-service composition scenario.
capable of translating service descriptions into the different ontology terminologies.
This translator then reformulates the service description into the DAML-S format.
Using the translated description a matching system specialized in comparing service
descriptions compares the translated description with the description of the desired
service. This comparison entails an estimate as to how appropriate the service is for
the desired task and how the different inputs and output should be mapped such
that the application can interact with the service. Once the ideal web-service has
been determined, a mediator is created which, based on the mapping between the
service descriptions, can translate and transfer the inputs and outputs between the
application and the web-service.
1.2.6
Querying of Semantic Information with Natural Language
In the subsection regarding information integration we described a scenario where
a user would formulate a query using the terminology and semantic properties of
a common ontology. In essence, the effectiveness of such a system relies on the
familiarity of the user with the applied common ontology with which queries are
16
Introduction
formulated. For specialized applications intended for businesses a familiarity of the
user with query-formulation can be assumed. However, one cannot assume that
the user is comparably familiar with queries when the application is intended for the
general public. For a user of the general public one can only assume a familiarity with
Information Retrieval (IR) systems, services such as Google, Bing or DuckDuckGo,
which only receive natural language (NL) or keywords as queries. For such a userbase to be able to effectively query semantic information sources one must parse the
natural language query into an ontology-based format, similar to queries executed
in information integration systems.
Matcher
issues
NL query
Alignment
Query Ontology
Alignment
Alignment
Alignment
Domain Ontology 1
Domain Ontology 2
Domain Ontology 3
Figure 1.6: Mapping in an information system receiving NL queries.
The process of parsing a natural language query into an ontology-based query
can be interpreted as a mapping task. However, one of the ontologies is limited to
the terminology occurring in the query and the only available structural information is the word order in which the query was written. Thus, in essence this is a
mapping task in which one ontology contains significantly less meta-information for
each concept than professionally engineered ontologies. To complement the techniques used in standard information integration scenarios, special approaches are
developed which individually process each word, e.g. by grammatical and semantic
annotation, and create a mapping between the user input and a query ontology.
From this mapping a semantic query is created using the terminology of the query
ontology (Lopez, Pasin, and Motta, 2005; Tran et al., 2007; Lei, Uren, and Motta,
2006). A decision system then forwards the generated query to accessible knowledge resource which might contain information relevant to the query (Lopez et al.,
2007). Executing a query formulated using the terminology of a query ontology over
1.2 — Applications
17
different system is an example of a information integration problem, where one can
utilize the techniques which are often applied in this situation in order to access the
available knowledge resources.
A noteworthy example of a natural language query system is Watson, developed
by IBM Research (Ferrucci et al., 2010). Watson is a special-purpose knowledge
system designed for the Jeopardy! challenge, intended to be equally ambitious as
IBM’s previously conquered challenge of defeating a world-champion chess player
using the Deep Blue supercomputer (Hsu, 2002). The goal of the Jeopardy! challenge
was to design a question and answering (QA) system which could outperform human
champions on the TV game-show Jeopardy!. Jeopardy! is essentially a trivia-based
question and answering game-show with certain unique gameplay properties. The
most recognizable feature is that instead of questions the contestants receive a piece
of trivia about an unknown entity. The contestants must guess which entity is being
referred to in the trivia and must respond with the entity, while formulating his
response as a question. For example, a contestant may receive the following trivia:
’In 1996, he wrote the novel ’A Game of Thrones’, which was later adapted into
a TV series by HBO.’. Based on this trivia, the contestant must guess that the
intended entity is the author George R. R. Martin and phrase his answer as ’Who
is George R. R. Martin?’. The game is structured around a board of categories,
with each category containing a series of options ordered in increasing monetary
worth. The categories can be broadly defined, e.g. ’History’ or ’Literature’, or only
cover specific topics, e.g. ’Japan US Relations’ or ’Potent Potables’. Contestants
can choose options from categories that they are familiar with, however the specific
categories are unknown before the show. A player can earn the monetary value of a
question by answering it correctly. However, attempting to answer any question is
always associated with a risk. If a contestant fails to answer a question correctly then
the monetary value of that particular question is subtracted from the contestant’s
earnings. It is thus important that a contestant is certain of his answers.
The difficulty for a query system in this domain is that it needs to produce a
single result with a high degree of certainty. This is significantly more difficult than
the task of information retrieval, where a set of results of which most individuals
are ’relevant’ to some degree can be seen as a good output. The Watson system
parses an input query into a linguistic model which describes the grammatical roles
of words and their relations to other words in the query (McCord, Murdock, and
Boguraev, 2012). Relational-queries which could denote the intended query are created using the information of the parsed user query. When given a specific piece
of trivia, a linguistic analysis system first identifies the grammatical type of each
word and parses the sentence into a linguistic model. Based on this analysis, it
is identified which domains are relevant to the query and candidate semantic relations are gathered which form the basis for potential queries. Examples of such
relations are author-of, appeared-in and produced-by. The author-of relation can be
structured like author-of::[Author][verb][Work], meaning that if the mapped terms
of the input query match with some of the classes of the relational query, then the
missing term could denote the answer to the query. A ranking system ranks all
extracted relational-queries according to their likelihood of denoting the intended
query through relational analysis (Wang et al., 2012).
18
Introduction
1.2.7
Agent Communication
Agents are autonomous software-driven entities designed to independently perform
tasks and solve problems. The domains in which agents are deployed are very divergent, with approaches being developed for negotiation (Chen and Weiss, 2012),
power system restoration (Nagata et al., 2000), resource allocation (Chavez, Moukas,
and Maes, 1997), organ-transplant coordination (Aldea et al., 2001), e-commerce
(Xiao and Benbasat, 2007), cloud computing (Cao, Li, and Xia, 2009) or smart grids
(Pipattanasomporn, Feroze, and Rahman, 2009). Agents communicate with each
other using special communication frameworks, for example KQML8 (Finin et al.,
1994) or FIPA-ACL9 (Labrou, Finin, and Peng, 1999; FIPA, 2008), which allows
messages to be annotated with an interaction-specific context, for instance ’agree’,
’disagree’ or ’request-information’. The actual content of messages is likely expressed
using knowledge-representational languages and using the terminology of a specific
domain ontology. Thus, if two agents interact by exchanging messages using different terminologies, then there is only a very small chance that these agents will be
able to achieve any meaningful interaction or reach an agreement.
Local Ontology 1
Alignment
Local Ontology 2
Negotiation
Figure 1.7: Mapping in an agent communication scenario.
In the case that two interacting agents have a different ontology in which they
express their information, they must first map their ontologies in order to achieve
a meaningful interaction. For this, they must autonomously communicate their ter8 Knowledge
Query Manipulation Language (KQML).
Communication Language (ACL) developed by the Foundation for Intelligent Physical
Agents (FIPA).
9 Agent
1.3 — Core challenges within Ontology Mapping
19
minologies and reach a consensus on how each term should be mapped. Typical
approaches here revolve around argumentation techniques, in which agents argue
in what ways mappings can be established or conflict with other mappings (Trojahn et al., 2011), or to what extent their respective data overlaps (Wiesman, Roos,
and Vogt, 2002; Wang and Gasser, 2002; Wiesman and Roos, 2004). The process
of two agents establishing an alignment between their respective ontologies is illustrated in Figure 1.7.
1.3
Core challenges within Ontology Mapping
Evidenced by the previous section, ontology mapping can be applied as a solution to
interoperability in various scenarios. An obvious challenge for all these scenarios is
the quality of the produced alignments. Incorrect correspondences can either cause
the data exchange between two systems to be erroneous or increase the overhead
caused by verifying the produced alignments. Over past decade, certain specific
aspects of the process of ontology mapping have emerged which have accumulated
considerable research interest (Shvaiko and Euzenat, 2008; Shvaiko and Euzenat,
2013). We can categorize these aspects into seven challenges, namely (1) efficiency,
(2) mapping with background knowledge, (3) automatic configuration, (4) user involvement, (5) correspondence justification, (6) crowdsourcing and (7) alignment infrastructures. Some of the listed challenges focus on distinct techniques with which
alignment quality can be improved, e.g. mapping with background knowledge and
automatic configuration, while other categories of research aim at improving the ontology mapping process in a non-qualitative way, e.g. efficiency or user involvement.
We will provide an overview of each challenge in the following subsections.
1.3.1
Efficiency
Next to the quality of the produced alignments, the computational efficiency of the
mapping systems is also of importance in many applications. Examples of such
applications are mapping problems where the response-time is fixed in the given
domain and the system must produce a mapping within a given time-frame. For
instance, a human issuing a query to a QA system is unlikely to be willing to wait
a long amount of time for a response. It is therefore of importance that ontology
mapping solutions are computationally efficient such that these can be seamlessly
integrated into a knowledge application. Some mapping systems are able to solve
runtime issues through the copious use of memory, however it has been shown that
this design choice can lead to memory bottlenecks (Giunchiglia, Yatskevich, and
Shvaiko, 2007). Hence, memory consumption should be taken into account when
developing approaches aimed at improving the runtime. A related application is
the mapping of large-scale ontologies. A large-scale ontology can be defined as an
ontology consisting of at least 1.000 concepts, though in certain domains ontologies
can reach a size of 100.000 concepts. Computational efficiency is imperative here
since, due to the large problem space. Hence, applying inefficient methods can easily
result in a significantly increased computation time.
20
Introduction
The necessity of computational efficiency has been recognized by the research
community. The Ontology Alignment Evaluation Initiative (OAEI), which hosts
yearly competition for the evaluation of mapping systems, added test sets consisting of large-scale mapping tasks, namely Anatomy and Large BioMed, to specifically evaluate the efficiency of mapping systems (Grau et al., 2013; Euzenat et al.,
2011b; Euzenat et al., 2010). Some research groups have responded to the challenge by developing light-weight versions of their existing systems, as seen with
LogMapLite (Jiménez-Ruiz, Cuenca Grau, and Horrocks, 2012a) and AgreementMakerLight (Faria et al., 2013) systems. Some systems, such as QOM 10 (Ehrig
and Staab, 2004) tackle the efficiency problem by applying very efficient mapping
strategies, while systems such as GOMMA11 (Gross et al., 2012) also exploit the
scalability of available computational resources.
1.3.2
Mapping with background knowledge
An ontology is typically designed with specific background knowledge and context
in mind. However, this type of information is rarely included in the ontology specification, which can cause difficulties in the mapping process. To overcome this issue,
the main challenges are to discover and exploit missing background knowledge. The
most prolific areas of mapping with background knowledge include the following
areas:
Axiom enhancement Declaring missing axioms manually (Do and Rahm, 2002)
or exploiting the axioms of available partial alignments (Lambrix and Liu,
2009).
Alignment re-use Exploiting the alignments of previous mapping efforts of the
given ontologies (Aumueller et al., 2005). Storing and sharing alignments
facilitates the possibility of composing alignments using several pre-existing
alignments. Given alignments linking both given ontologies to a third ontology
one can derive an alignment through logical inference.
Internet-based background knowledge Exploiting internet-based resources and
services to aide the mapping process. Specifically, one can utilize web-based
linked-data structures (Jain et al., 2010) or internet search engines (Gligorov et al., 2007). For example, search engines can be utilized by analysing
the probability of two concept names co-occurring in the search results.
Lexical background knowledge Exploiting lexical resources, such as dictionaries
and thesauri, for the enrichment of the ontologies or as basis for a similarity
measure. Ontology enrichment entails that additional information is added
to each concept’s descriptions by searching for relevant information in the resource (Montiel-Ponsoda et al., 2011). The intent behind this approach is that
existing similarity measures are likely to perform better if more information
is available. A resource can also be used as basis for a similarity measure
10 Quick-Ontology-Mapping
11 Generic
(QOM).
Ontology Mapping and Mapping Management (GOMMA).
1.3 — Core challenges within Ontology Mapping
21
(Budanitsky and Hirst, 2001; Budanitsky and Hirst, 2006). This can be done
by allocating an appropriate entry for each concept within the resource. The
similarity between two concepts can then be determined by comparing their
corresponding lexical entries, e.g. by computing the textual overlap between
their definitions.
Ontology-based background knowledge Exploiting ontologies as background
knowledge. These ontologies can be domain specific (Aleksovski, 2008), upperlevel descriptions (Niles and Pease, 2001; Matuszek et al., 2006) or automatically retrieved from the semantic-web (Sabou, d’Aquin, and Motta, 2008).
Similarly to the previous category, an equivalent concept is identified in the
background ontology for each concept in the given ontologies. From here,
one can enrich the given ontologies using the information of the background
ontology, compute semantic similarities between concepts by analysing their
distances within the background ontology or even infer mappings.
1.3.3
Automatic Configuration
Ontology mapping systems tend to be quite complex, requiring algorithms for the
computation, aggregation and processing of concept similarities and the extraction
of correspondences. Many ontology mapping systems have emerged over the years,
as evidenced by the amount of participating systems in the most recent Ontology
Alignment Evaluation Initiative (OAEI) (Grau et al., 2013). These systems are
quite diverse with regard to their structure and applied techniques, though not one
single system is able to perform exceptionally well on all data sets. Based on this,
it is reasonable to assume that no single set-up of a mapping system will perform
exceptionally in all circumstances. The issue of automatic configuration is also of
important in the field of information retrieval, where the specific configuration of a
retrieval system can have a large impact on the output quality (Oard et al., 2008).
To overcome this limitation, it is necessary for a system to adapt itself to be
better suited to solve the given mapping problem. The challenge here is to develop approaches which tackle three distinct tasks: (1) component selection, (2)
component combination and (3) component tuning. The term component can be
interpreted both as similarity measure and mapping system. Within a single mapping system, the process of configuration entails which similarities are selected for
a given matching task, what approaches are used to combine their results and what
parameters are used for the similarity metrics and combination approaches. For a
meta-system, i.e. a mapping system by utilizing pre-existing mapping system and
combining their results, the process of configuration entails which mapping systems
are selected for a given task, how their results are combined and what parameters
are used for each individual system. In general, in order create an appropriate
configuration with the available components it is necessary to analyse certain properties of the given ontologies, for instance by evaluating their sizes or the richness of
available meta-information. Based on this analysis one can choose the most appropriate selection and configuration of the available components (Mochol and Jentzsch,
2008; Cruz et al., 2012; Lee et al., 2007).
22
1.3.4
Introduction
User involvement
For corporate-based applications of ontology mapping, notably schema-integration,
the results of a mapping system are typically inspected and repaired or approved by
a domain expert. Depending on the domain the given mapping problem can be very
large, rendering it particularly difficult and time consuming for a domain expert to
thoroughly inspect the entire result alignment. However, in addition to the domain
expert(s) another source of human capital is likely available: the user. The core
challenge of user involvement is to create techniques tailed to the expertise of the
user such these can participate in the mapping process. This can be a particularly
challenging task for dynamic applications where the user issues queries in natural
language, and hence cannot be expected to be mapping specialist or domain expert.
One way to involve the user is to create a graphic visualization of the generated alignment, allowing the user to validate alignments by quickly and intuitively
browsing, inspecting and modifying elements of the alignment (Mocan, Cimpian,
and Kerrigan, 2006; Falconer and Storey, 2007; Raffio et al., 2008). Another approach is to involve the user earlier in the mapping process. The PROMPT tool is
an example of such a solution (Noy and Musen, 2003). The tool repeatedly presents
mapping suggestions to the user and records their responses. Using these responses
the system’s beliefs about the computed concept similarities are updated. Subsequently, the system attempts to find inconsistencies and potential problems, which
can in turn be presented to the user as new prompts until no further issues are
detected.
Alternatively, the user can be involved in a process prior to matching, where the
result of that process can be used as additional background knowledge. An example
of this is the HAMSTER tool (Nandi and Bernstein, 2009), which gathers click-logs
of a search-engine in a database. The information within this database is then used
as a basis for an additional similarity measure.
1.3.5
Correspondence justification
Typically, ontology mapping systems annotate each correspondence with a confidence value, signifying the system’s degree of confidence that the given correspondence is true. This value is typically defined to be in the range of [0, 1]. However,
what a particular value signifies is open to interpretation since there is not always
additional information on how the system derived a particular value. This issue is
further elaborated in Section 3.1.2. In order to encourage the widespread acceptance
of ontology mapping systems, it will become necessary that each correspondence of
an alignment is also annotated with an explanation. Justified mapping results would
enable a user to understand the behaviour of the system better, increasing the user’s
satisfaction. A justification would need to satisfy several criteria. First, it must be
intuitive to a human user, for instance by using a visual representation. Second,
it should also be available in machine-readable form, since systems which utilize
alignments might also exploit the justification behind the correspondences.
One of the first attempts at providing users extended justifications for mappings can be seen in the S-MATCH system (Shvaiko et al., 2005). S-MATCH can
provide concise explanations, which can be extended with the use of background
1.3 — Core challenges within Ontology Mapping
23
knowledge or logical reasoning and visually presented to the user. Matchers which
employ agent-based argumentation systems to derive their mappings can also use
the same argumentation for the eventual formulation of justifications aimed at users
(Trojahn et al., 2011).
1.3.6
Crowdsourcing
A new approach to large-scale problem solving, facilitated by the availability of the
internet and social networks, is crowdsourcing (Brabham, 2008). Crowdsourcing
relies on a large group of humans collaborating in the task of solving a particular
problem. A given problem needs to presented in such a way that a common user can
participate in creating a solution. Improving the usability of the crowdsourcing tool
improves its performance, since a lower bar of entry allows for a larger userbase to
participate. Thus, the core challenge is to devise methods allowing a general userbase
to participate in the mapping process, generating high quality alignments from the
user data and reconciling inconsistencies which may occur from contradicting user
input.
Crowdsourcing has initially been applied as a means to establishing ontology
interoperability in the work of Zhdanova and Shvaiko (2006). The produced tool
allows user to generate, modify, store, share and annotate ontology mappings. However, the collaborative aspect is only sequential in nature, meaning that collaborative
matching only occurs if a user decides to modify an already existing alignment. The
tool is not moderated and lacks capabilities to resolve user disagreements. Disagreements must be resolved by the users themselves by voicing their opinions in the
mapping annotations. A more user-friendly approach is to present the user small
sub-problems of the mapping task, formulated as a user friendly questions (McCann,
Shen, and Doan, 2008). This allows for a multitude of users to concurrently work on
the same mapping problem. The user responses are gathered such that a mapping
can be derived. For each unique user prompt, the majority answer of the users is
accepted if the gap to the minority answer is bigger than a statistically significant
margin. Instead of crowdsourcing the generation of correspondences, one can involve
the user in the process of verifying correspondences and resolving inconsistencies.
Here, a contemporary system would provide the correspondences such that the reconciliation task is formulated as a crowdsourcing project, with the users providing
their insight how detected inconsistencies should be resolved (Nguyen et al., 2013).
1.3.7
Alignment infrastructures
As seen from the previously listed challenges and applications, many aspects in the
area of ontology mapping rely on the availability and management of ontologies.
For instance, to facilitate collaborative matching, as described in subsection 1.3.6,
an entire support system needs to be created which facilitates this collaboration.
The core challenge here is the development of a support system which enables users
and other knowledge systems to perform alignment related tasks (Euzenat, 2005).
The most prominent tasks are: (1) alignment storage, (2) alignment retrieval, (3)
alignment computation, either manually or using a supplied mapping system, (4)
24
Introduction
alignment revision and (5) alignment-based information translation.
There has been some work done with regard to this challenge. Some systems focus
on a particular task of an infrastructure (Noy and Musen, 2003; Noy, Griffith, and
Musen, 2008), though these system can act as modules of a complete infrastructure.
An example of an initial attempt at constructing a complete infrastructure is the
alignment server (Euzenat, 2005), where users and system can interact with the
system through the use of the alignment API (Euzenat, 2004a). Another example
of such an infrastructure is the Cupboard system (d’Aquin and Lewen, 2009). It
facilitates functionalities such as alignment storage and retrieval, though lacks the
ability to compute alignments or translate information using an alignment.
1.4
Problem Statement and Research Questions
In the previous sections we introduced the process of ontology mapping as a solution for resolving the interoperability between heterogeneous knowledge systems,
and discussed numerous applications for which ontology mapping is of importance.
Furthermore, we highlighted several key areas of interest which have been established as future focus points of research. This thesis focuses on one of these key
areas, being ontology mapping with background knowledge. The aim of these techniques is to exploit sources of information other than the available meta-data of the
ontologies. The following problem statement will guide the research:
Problem statement How can we improve ontology mapping systems by
exploiting auxiliary information?
We identify two main types of auxiliary information sources which can be used
to enrich ontology mapping systems, being lexical resources and partial alignments.
Using this categorization we identify four research questions which guide the research with regard to the problem statement. The questions address the problems
of (1) accurately linking ontology concepts to lexical senses, (2) exploiting partial
alignments to derive concept similarities, (3) evaluating correspondences of partial
alignments and (4) matching ontologies with little to no terminological overlap using
partial alignments.
Research question 1 How can lexical sense definitions be accurately
linked to ontology concepts?
A lexical resource is a corpus containing word definitions in terms of senses,
synonyms and semantic relations that hold between senses. By linking ontology
concepts to word senses of a lexical resource, one can exploit the meta-data of the
lexical resource in order to determine how semantically similar two ontology concepts
are. For example, one can determine the similarity between two senses by inspecting
their locations within the taxonomy of the lexical resource. Their similarity can be
established by computing the distance between the two senses, i.e. the length of the
shortest path between the two senses, using graph-theory-based measures. If there is
a low semantic distance between two given senses, then the implication is that there
1.4 — Problem Statement and Research Questions
25
is a strong semantic relation between theses senses, meaning that ontology concepts
which are associated with these senses are likely to encode the same information.
However, the correctness of this semantic distance strongly relies on the ability to
identify the correct sense for each ontology concept. Words are often ambiguous,
leading to the problem that several lexical senses can match a ontology concept
label. At this point one needs to discard senses which do not accurately represent
the intended meaning of a given concept. Our research aims at extending current
methods of determining concept senses.
Research question 2 How can we exploit partial alignments in order
to derive concept correspondences?
Businesses will often employ domain experts to handle mapping problems of
big schema integration tasks. The knowledge of the domain expert ensures that
all produced correspondences are indeed correct, ensure maximum interoperability
between two knowledge systems. However, mapping large ontologies is a very laborious task. Ontologies of large systems can contain several thousand concepts,
meaning that a mapping between two large ontologies can consist of several thousand correspondences. It must be ensured that the produced correspondences are
correct and logically consistent. The scale of such a task can easily be too much
for a domain expert. One branch of research which attempts to tackle this issue
involves the creation of tools which reduce the workload of the domain expert (Noy
and Musen, 2003; Cruz, Sunna, and Chaudhry, 2004). Such tools aide the expert
by for instance intuitively visualizing the ontologies and their mappings, suggesting
new correspondences and performing consistency checks and logical inference between the ontologies. However, a domain expert might not be willing to invest time
into the familiarization with a mapping tool and possible abort the task because it
is deemed too daunting. Alternatively, an expert might only available for a certain
amount of time, enabling him to only generate a small amount of correspondences.
In these situations, it is often the case that an incomplete alignment is produced
which needs to be completed. This incomplete alignment, also referred to as partial
alignment, can be a valuable source of information when determining the remaining correspondences. Our research aims at developing a novel approach at utilizing
existing partial alignments in order to determine concept similarities.
Research question 3 How can we evaluate whether partial alignment
correspondences are reliable?
Methods which exploits partial alignments for the sake of finding the remaining
correspondences rely on the correctness of the exploited correspondences from the
partial alignment. The performance of these methods is affected significantly depending on the amount of incorrect correspondences within the partial alignment.
To ensure that these specialized methods perform as designed, one must be evaluate
the provided correspondences to ascertain whether these are indeed correct. Typically, such evaluations involve similarity-based methods, which measure the overlap
of meta-data of ontology concept definitions. Our research attempts to complement
these methods by determining the consistency of the input correspondences with
regard to a set of reliably generated correspondences.
26
Introduction
Research question 4 To what extent can partial alignments be used in
order to bridge a large terminological gap between ontologies?
A particular type of heterogeneity that can exist between two equivalent concepts
x and y is a terminological heterogeneity. This describes a situation in which x and
y are defined using significantly different labels and annotations. This excludes minor character differences, e.g. the differences between the labels ‘Accepted Paper’,
‘Accept-Paper’ and ‘Paper (Accepted)’. The challenging situation here is that many
elemental similarities cannot derive an appropriate similarity values between x and
y. Ontology mapping systems can avoid this issue by analysing the similarities between related concepts of x and y, by for instance comparing the labels of the parents
of x and y, though if the related concepts of x and y are also heterogeneous then this
no longer is a feasible approach. We say that there is a terminological gap between
ontologies O1 and O2 if there is little to no terminological overlap between O1 and
O2 . This represents a challenging matching scenario where specialized approaches
are required in order to succeed. A common approach here is to extract additional
concept term from a lexical resource, thus increasing the likelihood that x and y will
contain similar labels. This approach requires that an appropriate lexical resource is
available for each matching domain, meaning that it is ineffective if no appropriate
resource exists for a given matching task. However, it might be that a partial alignment is available for the given task. Our research aims at developing an approach
for mapping terminological heterogeneous ontologies by utilizing partial alignments.
1.5
Thesis Overview
The remainder of this thesis is structured as follows. Chapter 2 provides the reader
with a formal definition of the task of ontology matching, while also introducing
methods that are applicable for the evaluation of alignments. Chapter 3 provides
an overview of existing mapping approaches. Here we provide an introduction to
mapping system architectures, the three core tasks which a mapping system needs to
perform and an overview of approaches for each of the three core tasks. We conclude
this chapter by providing a survey of state-of-the-art mapping systems.
Chapter 4 answers the first research question. We introduce a method utilizing
virtual documents for measuring the similarity between ontology concepts and sense
definitions and define a framework which links ontology concepts to lexical senses.
Chapter 5 addresses the second research question. We propose an approach which
measures the similarities between a given concept and the given correspondences
of the partial alignment, also referred to as anchors, which are compiled into a
anchor-profile. Two concepts are matched if their respective anchor-profiles are
similar. The third research question is addressed in Chapter 6. Our approach aims
at reformulating the anchor-evaluation problem as a feature-evaluation task, where
every anchor is represented as a feature. Chapter 7 answers the fourth research
question. Our approach aims to enrich the concept profiles each ontology with the
terminology of the other ontology by exploring the semantic relations which are
asserted in the partial alignments. In Chapter 8 we summarize the contributions to
each research question and identify promising directions of future work.
Chapter 2
Background
Ontology mapping is the essential process facilitating the exchange of information
between heterogeneous data sources. Here, each source utilizes a different ontology
to model its data, which can lead to differences with respect to the syntax of the
ontology, concept naming, concept structuring and the granularity with which the
knowledge domain is modelled. Euzenat (2001) identified three main heterogeneity categories as terminological, conceptual and semiotic heterogeneities. Given two
ontologies, these heterogeneities need to be resolved, which in turn allows for the exchange of information between any knowledge system which uses any of the two given
ontologies to model its data. This is achieved by mapping concepts of one ontology
to the concepts of the other ontology which model the same data. The mappings
are then compiled into a list of correspondences, referred to as an alignment.
As an example, suppose that the European Commission (EC) would start an
initiative to centralize all vehicle registrations over all member countries. Then, the
EC would have to create a central knowledge system ontology that is designed to
cover the combination of all requirements of the member countries. To fully realize
this integration effort, every country would need to create a mapping between the
ontology of its own registration system and the new ontology of the central EC
system. Furthermore, it might be the case that a given country manages multiple
registration systems. These might handle vehicle types separately, e.g. cars, trucks
and motorcycles, or model public, government and military vehicles separately. As
a result, a country would have to make several mappings in order to transfer the
data of every knowledge system. Figure 2.1 displays two example ontologies which
can be used to model vehicle registrations and an example mapping between the two
ontologies.
One can see that some concepts are straight-forward to match since they model
the same entity and have similar names, as for instance Car-Car and Bike-Bicycle.
However, other corresponding concepts do have more pronounced differences. Vehicle and Conveyance model the same top-level concept, though exhibit no syntactic
overlap due to the use of synonyms. Concepts which do no exhibit a significant
overlap of meta-information are typically harder to match, requiring the usage of
more advanced techniques, e.g. Natural-Language Processing, or the exploitation of
28
Background
Vehicle
Conveyance
registration_number
id
colour
owner
registered-to
price
manufacturer
brand
MotorizedVehicle
Four-Wheeled
licensePlate
insuredBy
Truck
Car
Motorcycle
Bike
chassisNumber
issued_plate
Company
Insur.
Insurer
has_insurance
Van
Car
Two-Wheeled
Motorbike
issued_plate
has_insurance
SportBike
Bicycle
Figure 2.1: Example mapping between two small ontologies.
a broader rage of information.
The example of Figure 2.1 displays correspondences between concepts which can
be interpreted as equivalent. Identifying correspondences of this type significantly
benefits the facilitation of information exchange. If for a given concept a equivalent
concept has been identified, then the only operation that still needs to be performed
for that concept is the generation of a transformation function. This function can
express any instance of one concept using the terminology of the other concept’s
ontology. If for a given concept an equivalent concept cannot be located, it is still
possible to facilitate information exchange by for instance using the transformation
function of a parent concept for which an equivalent correspondences has been identified. Alternatively, one can identify other semantic relations between concepts, e.g.
generalization or overlapping, in order to help identify possible target concepts for
the transformation functions. For practical purposes, each correspondence is typically annotated with a degree of confidence in the interval of [0, 1]. This measure
expresses the amount of trust one has in the truthfulness of that correspondence.
It is typically based on the results of several algorithms measuring the overlap of
meta-data between the concepts. Note that this measure of trust is not equivalent
29
to the probability of the given correspondence being correct. where a value of 0.7
would mean that the expected outcome of sampling 10 correspondences with trust
value 0.7 would be 7 correct and 3 incorrect correspondences. This topic will be
further discussed in Section 3.1.2. We have updated the example in Figure 2.1 with
different types of relations and some confidence measures, shown in Figure 2.2.
1.0, ≡
Vehicle
Conveyance
0.8,
id
registration_number
colour
owner
price
registered-to
0.7, ≡
manufacturer
brand
MotorizedVehicle
Four-Wheeled
1.0, ≡
licensePlate
insuredBy
Truck
Car
Motorcycle
Bike
chassisNumber
Company
issued_plate
Insurer
Insur.
has_insurance
0.6,
Van
1.0, ≡
Car
Two-Wheeled
Motorbike
issued_plate
has_insurance
SportBike
Bicycle
Figure 2.2: Example mapping between two small ontologies. The mapping models
different semantic relation types and includes confidence values for each correspondence.
In Figure 2.2, we see that all correspondences that were also depicted in Figure
2.1 have been annotated using the equivalence (≡) symbol. New correspondences
expressing different types of semantic relations are also depicted. One such correspondence is the connection between SportsBike and Bike. These two concepts are
not equivalent, however a sports-bike is certainly a type of bike. Thus the correspondence is annotated using the generalization (⊒) relation. Another generalization can
be seen in the correspondence between id and registration number. This correspondence notably has a slightly lower confidence value. The correspondence between
Truck and Van is annotated with a overlapping (⊓) relation type, indicating that
while these two concepts can contain common instances, they cannot be associated
30
Background
using a more precise relation type, such as generalization or equivalence.
2.1
The Mapping Problem
We define ontology mapping as a process which takes as minimal input two ontologies
O1 and O2 and produces an output alignment A′ . Furthermore, this process can
take as input an already existing alignment A, a set of external resources r and
a set of parameters p. The pre-existing alignment can originate from a different
system, thus allowing the combination of two systems in a cascade arrangement,
or from the same system, allowing the possibility of designing an iterative mapping
process. The set of parameters p incorporates any parameter which influences the
mapping process, such as settings, weights or thresholds. While r is broadly defined,
in practice the most commonly used resources are linguistic or domain thesauri and
auxiliary ontologies.
To formally establish the mapping problem, we start by defining an ontology as
follows:
Definition 1 (Ontology). An ontology O is a tuple O =< C, P, I, T, V, R >, such
that:
C is a set of classes;
P is a set of properties;
I is a set of instances;
T is a set of data-types;
V is a set of data-values;
R is a specific set of relations modelled by the ontology language.
This definition encapsulates the types of entities that are typically modelled in
an ontology. Note that the entities contained in C and P can be both referred to as
concepts in order to conveniently refer to the entities that are matched in a mapping
scenario. Ontology languages such as OWL facilitate the creation of such entities
using specially defined constructs:
Classes The primary concepts of an ontology. These form the conceptualization
of a domain and can be interpreted as sets of instances. For example, the
concepts Car and Bike from Figure 2.1 are classes.
Properties Essential to the storage of data, properties model the relation between instances and specific data-values, or the relation between instances
and other instances. The two types of properties are expressed in OWL as
owl:DataProperty and owl:ObjectProperty. As an example, the property
insuredBy from Figure 2.1 is used to connect instances of the class MotorizedVehicle and the class Insurance, whereas the property licensePlate would
connect individual cars with data-values corresponding to their license plate.
Instances Individual instantiations of classes in the ontology consisting of a series
of associated data values, related instances and references to the corresponding
class that is being instantiated. This is equivalent to a row of data values in a
2.1 — The Mapping Problem
31
table of a database. In OWL, instances are expressed under the owl:Thing construct, where the classes being instantiated are referred to using the rdf:type
construct. While in practice instances are typically stored in a different file
than the corresponding ontology definition, they are still considered to be part
of the ontology (Ehrig, 2006; Euzenat and Shvaiko, 2007).
Data-Types A classification of the various types of data. A data-type specifies all
possible values that can be modelled, which operations can be performed on
this data, the ways values of this data-type can be stored and optionally the
meaning of the data. For example string, integer and xsd:dateTime are
data-types. string models all possible combinations of characters, integer
only models whole numbers and xsd:dateTime models specific time stamps
according to a specified syntax, thus also providing a meaning to the data.
Data-Values Simple values that fall into the domain of a specified data-type. As
an example, the name and contact information of a specific vehicle owner are
typically stored as values.
Relations The set of relations already modelled in the given ontology language.
This set is shared over every ontology that is modelled using the given language
and fundamental in the construction of every individual ontology. An ontology
includes at least the following relations:
• specialization (⊑), defined on (C × C) ∪ (P × P ) ∪ (I × I)
• disjointness (⊥), defined on (C × C) ∪ (P × P ) ∪ (I × I)
• assignment (=), defined on I × P × (I ∪ V )
• instantiation (∈), defined on (I × C) ∪ (V × T )
Examples of additional relations which an ontology language might model are
overlapping (⊓) or part-of (⊂). For non-symmetric relations a language might
also model their inverse relations, such as generalization (⊒), being the inverse
of specialization, and consist-of (⊃), being the inverse of part-of.
The ultimate goal of ontology mapping is to find a way that allows us to alter instances of one ontology such that they conform to the defined structure and
terminology of the other. To achieve this for a given instance, one must identify
to which class of the other ontology the instance should belong to, what data can
be transferred by allocating matching properties and how the data-values must be
altered such that these conform to other data-types. Any mapping system must
therefore be able to identify corresponding classes and properties such that these
transformation rules can be generated.
An identification of a relation between two entities, e.g. matching classes or
properties, is expressed as a correspondence. We define a correspondence as follows:
Definition 2 (Correspondence). A correspondence between two ontologies O1 and
O2 is a 5-tuple < id, e1 , e2 , t, c >, where:
id is a unique identifier allowing the referral to specific correspondences;
32
Background
e1 ∈ O1 is a reference to an entity originating from the first ontology;
e2 ∈ O2 is a reference to an entity originating from the second ontology;
t denotes the relation type between e1 and e2 ;
c is a confidence value in the interval [0, 1].
Thus, a given correspondence < id, e1 , e2 , t, c > asserts that a relation of the
type t holds between entities e1 and e2 with a degree of confidence c. e1 and e2 are
typically modelled as a URI in order to refer to a specific entity. Relation types which
are typically asserted in a correspondence are generalization (⊒), specialization (⊑),
disjointness (⊥), overlapping (⊓) and equivalence (≡).
As an example, we can express one of the correspondences displayed in Figure
2.2 as follows:
< id123, SportBike, Bike, ⊑, 1.0 >
This correspondence asserts that the class SportBike is a subclass of Bike with
a confidence of 1.0, meaning that any instance of SportBike is also an instance of
Bike.
Note that the entities of a correspondence can also refer to properties or instances. Thus, the correspondence between the properties owner and registered to
from Figure 2.2 can be expressed as follows:
< id124, owner, registered to, ≡, 0.9 >
The ultimate goal of the ontology mapping process between two ontologies is to
identify all appropriate correspondences. These correspondences are gathered into
a set, which is referred to as an Alignment or Mapping. Formally, we define an
alignment between two ontologies as follows:
Definition 3 (Alignment). An alignment A between two given ontologies O1 and
O2 is a set of correspondences A = {c1 , c2 , . . . cn }, such that for each correspondence
< id, e1 , e2 , t, c >∈ A, e1 ∈ O1 and e2 ∈ O2 holds.
The example in Figure 2.2 illustrates a possible mapping between two ontologies.
The correspondences in this example can be expressed as an alignment A as follows:
< id1, Vehicle, Conveyance, ≡, 1.0 >
number,
⊒,
0.8
>
<
id2,
id,
registration
<
id3,
owner,
registered-to,
≡,
0.9
>
<
id4,
manufacturer,
brand,
≡,
0.7
>
<
id5,
licensePlate,
issued
plate,
≡,
1.0
>
A=
< id6, Insur., Insurer, ≡, 1.0 >
< id7, Truck, Van, ∩, 0.6 >
< id8, Car, Car, ≡, 1.0 >
< id9, Motorcycle, Motorbike, ≡, 0.9 >
< id10, Bike, Bicycle, ≡, 1.0 >
< id11, SportBike, Bicycle, ⊑, 1.0 >
2.1 — The Mapping Problem
33
A mapping system typically exploits all available information in order to produce
an alignment. At the very least, this includes the meta-information that is encoded
into the ontologies themselves. This information can take shape in the form of
concept labels, descriptions or annotations, relations between entities and given
instantiations of concepts. However, there exists a range of information which can
be supplied to the mapping system as additional input. This range can be categorized
as follows:
Input Alignments It can be the case that an alignment between the two given
ontologies is available. This alignment may be supplied by another mapping
framework, available from a repository or the result of a domain expert attempting to map the given ontologies. In this case, the correspondences of
that alignment can be used to guide the mapping process.
Parameters Ontology Mapping systems tend to be quite complex, which is necessary in order to deal with all possible types of input ontologies. These systems
typically require a set of parameters that fine-tune all possible aspects of the
systems to maximize performance. Typical forms of parameters are thresholds,
function parameters or system specifications. Note that the core challenge in
section 1.3.3 envisions the elimination of this type of input, where a mapping
is able to derive this parameter set autonomously.
Knowledge Resources Any type if information that is not associated with the
given mapping problem or the applied mapping system are categorized as
knowledge resource. Typical resources which can be exploited are domain
ontologies, lexical resources or internet-based resources.
Given the definition of an ontology in definition 1, the definition of an alignment
in definition 3 and the different types of additional inputs that a mapping system
could exploit, we can formally define the ontology mapping process as follows:
Definition 4 (Ontology Mapping). Ontology Mapping is a function which takes
as input a pair of ontologies O1 and O2 , an alignment A, a set of resources r and a
set of parameters p, and returns an alignment A′ between the ontologies O1 and O2 :
A′ = f (O1 , O2 , A, r, p)
Note that O1 and O2 are mandatory inputs, while A, r and p are optional,
possibly requiring special techniques in order to adequately exploit the additional
information. The process of ontology mapping is visualized in Figure 2.3:
When matching with input alignments, one can distinguish between three variants of this problem:
Alignment Refinement In this variant a (nearly) completed alignment is available as input. The main task here does not involve the discovery of new
correspondences, but rather the refinement of the existing ones. For example,
a typical approach that is applied for this type of problem is consistency checking through the application of reasoning techniques and resolving the resulting
conflicts.
34
Background
O1
Ontology
Mapping
O2
A’
A
r
p
Figure 2.3: Visualization of the ontology mapping process
Alignment with Tertiary Ontologies When given two ontologies O1 and O2 ,
it might be possible that existing alignments link either ontology to one or
more ontologies which are not part of the original matching problem. If a
chain of alignments exists which link O1 and O2 through one or more tertiary
ontologies, then techniques such as logical inference can be applied in order to
infer correspondences. Otherwise, the alignments must be exploited differently.
For instance, one can use the information of these ontologies as additional
context, similar to the core challenge described in section 1.3.2. Alternatively,
one can shift the mapping task to one or more of the tertiary ontologies if one
can identify a mapping task which is easier to solve.
Partial Alignments In some situations it may the case that an attempt to create
an alignment was started but aborted unfinished. A prime example of this is a
domain expert being unable to finish an alignment due to time constraints. In
this situation the task here is to complete the alignment. The challenge here
is to find ways in which the existing correspondences can be exploited in order
to determine the remaining correspondences.
The correspondences of an input alignment are also formulated as 5-tuples, as
defined in Definition 2. Additionally, the correspondences of a partial alignment (PA)
are typically referred to as anchors in order to clearly separate these from other types
of correspondences, such as the correspondences of the result or reference alignment.
Definition 5 (Anchor). Given two ontologies O1 and O2 , and a given partial
alignment PA between O1 and O2 , an anchor a is defined as a correspondence such
that a ∈ PA.
2.2 — Evaluation of Alignments
35
Having formally introduced the ontology mapping process, along with the alignment as desired output, it becomes necessary to be able to analyse a given alignment
with respect to its quality, such that the effectiveness of a given mapping approach
can be quantitatively established. We will introduce the methodology of alignment
evaluation in the next subsection.
2.2
Evaluation of Alignments
As elaborated in section 2.1, the goal of an ontology mapping system is to produce
an alignment which facilitates the transfer of information between two ontologies.
In order to evaluate the quality of a proposed mapping system or approach, there
must be a quantitative way to evaluate alignments between ontologies.
The most common way to evaluate an alignment is through the application of a
golden standard. Here, a domain expert creates a reference alignment which represents the ideal outcome when mapping the two given ontologies. We will illustrate
this process using a running example using the ontologies depicted in Figure 2.1.
Let us assume we are given the following reference alignment R:
< id1, Vehicle, Conveyance, ≡, 1.0 >
number,
⊒,
1.0
>
<
id2,
id,
registration
<
id3,
owner,
registered-to,
≡,
1.0
>
<
id4,
manufacturer,
brand,
≡,
1.0
>
< id5, licensePlate, issued plate, ≡, 1.0 >
< id6, Insur., Insurer, ≡, 1.0 >
R=
< id7, Truck, Van, ∩, 1.0 >
<
id8,
Car,
Car,
≡,
1.0
>
<
id9,
Motorcycle,
Motorbike,
≡,
1.0
>
<
id10,
Bike,
Bicycle,
≡,
1.0
>
< id11, SportBike, Bicycle, ⊑, 1.0 >
This alignment corresponds to the alignment depicted in Figure 2.2, with the
alternation that all correspondences have a set confidence of 1.0.
Next, let us assume that a hypothetical ontology mapping system produces the
following output alignment A:
36
Background
A=
< id1, Vehicle, Conveyance, ≡, 0.8 >
< id2, id, registration number, ⊒, 0.5 >
< id5, licensePlate, issued plate, ≡, 0.9 >
< id6, Insur., Insurer, ≡, 1.0 >
< id8, Car, Car, ≡, 1.0 >
< id9, Motorcycle, Motorbike, ≡, 1.0 >
<
id10,
Bike,
Bicycle,
≡,
0.75
>
<
id12,
Truck,
Car,
⊑,
0.7
>
< id13, MotorizedVehicle, Four-Wheeled, ≡, 0.7 >
< id14, chassisNumber, issued plate, ≡, 0.6 >
The question now is what measures can be applied in order to compare A with R.
For this purpose one can use the measures of Precision and Recall, which stem from
the field of information retrieval (Rijsbergen, 1979). These measure the ‘correctness’
and ‘completeness’ of a set with respect to another set. Given an alignment A and
a reference alignment R, the precision P (A, R) of alignment A can be calculated as
follows:
P (A, R) =
|R ∩ A|
|A|
(2.1)
The recall R(A, R) of alignment A can be calculated as follows:
R(A, R) =
|R ∩ A|
|R|
(2.2)
The output alignment and reference alignment of our running example are visualized as sets in Figure 2.4, where each correspondence its represented by its
identification value:
We can see in Figure 2.4 an emphasized area, which corresponds to the overlapping area between A and R. The implication here is that correspondence in this area
have been correctly identified by the given ontology mapping system. Additionally,
some correspondences are located only in R, meaning that the system failed to identify these correspondences. On the other hand, the correspondences which are only
in A are incorrect and erroneously included in the output alignment.
Using the measures of precision and recall, we can now evaluate the quality of
our example alignment as follows:
P (A, R) =
R(A, R) =
|{id1,id2,id5,id6,id8,id9,id10}|
7
|R ∩ A|
=
=
= 0.7
|A|
|{id1,id2,id5,id6,id8,id9,id10,id12,id13,id14}|
10
|{id1,id2,id5,id6,id8,id9,id10}|
7
|R ∩ A|
=
=
= 0.6363
|R|
|{id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,id11}|
11
2.2 — Evaluation of Alignments
37
A
R
id3
id1
id2
id12
id4
id5
id8
id6
id13
id9
id14
id7
id10
id11
Figure 2.4: Visualization of the interaction between the example alignment and the
example reference.
We can distinguish all possible correspondences when viewing the mapping problem as a binary classification task, where one must individually classify all possible
correspondences as either true orfalse. The implication here is that all correspondences which are classified as false are simply not included in the output alignment.
The error matrix of such a task is as follows:
Actually True
Actually False
Classified True
Classified False
TP
FP
FN
TN
The set of correspondences is partitioned into: true positive (TP), false positive
(FP), false negative (FN) and true negative (TN) with respect to the desired classification and actual classification. Using these, we can derive the measures of precision
and recall as follows:
P (A, R) =
|T P |
|T P + F P |
(2.3)
R(A, R) =
|T P |
|T P + F N |
(2.4)
When analysing the performance of ontology mapping systems, it is desirable to
be able to formulate the performance of a system using a single value. However,
using either precision or recall alone does not lead to a fruitful comparison. For
instance, one would not regard a system with a recall value of 1 as good if it simply
returned every possible correspondence. Using both precision and recall can be
38
Background
difficult, since both values represent a trade-off that needs to be managed, where an
attempt at an increase of one measure often comes to the detriment of the other.
A solution to this problem is the application of a combination of the two measures,
which considers both precision and recall equally. For this purpose the F-Measure
is typically deployed (Giunchiglia et al., 2009). The F-Measure is defined in the
interval of [0, 1] and represents the harmonic mean between precision and recall.
Given an alignment A, a reference alignment R, the precision P (A, R) and recall
R(A, R), the F-Measure F (A, R) of A with respect to R is defined as:
F (A, R) =
2 × P (A, R) × R(A, R)
P (A, R) + R(A, R)
(2.5)
Returning to our running example, we can now express the quality of the example
alignment as a single value using the F-Measure:
F (A, R) =
2 × P (A, R) × R(A, R)
2 × 0.7 × 0.6363
0.89
=
=
= 0.666
P (A, R) + R(A, R)
0.7 + 0.6363
1.3363
In most scenarios, it is desirable to consider both precision and recall equally
when expressing the quality of an alignment as a single value. This may however
not always be the case. For instance, a domain expert may wish to use an ontology
mapping tool to generate a preliminary alignment, such that he can create a final
alignment by verifying and altering the preliminary alignment. Let us assume that
the expert has a choice between a set of mapping systems and he wishes to choose
the system which will result in the least amount of work. One can argue that the
expert would in this case prefer the system which tends to produce alignments with
a significantly higher precision, since this would imply spending less time removing
incorrect correspondences. However, the expert would also like some emphasis on
the typical recall performance, since a nearly empty alignment with correct correspondences implies that he would have to perform most of the mapping duties
manually anyway. Choosing a system using the measures that are introduced so far
as performance indicator would not lead to a satisfactory result for the hypothetical
domain expert. However, in Rijsbergen (1979) a generalized form of the F-Measure
is introduced which allows for the weighting of either precision or recall to a specified
degree.
Given an alignment A, a reference alignment R, the precision P (A, R), the recall
R(A, R) and a weighting factor β, the weighted F-Measure is defined as follows:
Fβ (A, R) = (1 + β 2 ) ×
P (A, R) × R(A, R)
(β 2 × P (A, R)) + R(A, R)
(2.6)
One can see that the weighted F-Measure is balanced when choosing β = 1.
Thus, the F-Measure expressed in equation 2.5 is actually the weighted F1 measure,
though in the case of β = 1 the weight of β is typically omitted when referring to
the F-Measure. Variants of β which are commonly used are 0.5 when emphasizing
precision and 1.5 when emphasizing recall.
The introduced evaluation methods so far did not consider the confidence values
of the produced correspondences. These confidence values typically impose a design
2.2 — Evaluation of Alignments
39
challenge on the mapping system designers. After generating a set of correspondences, a system may apply a given threshold in order to dismiss correspondences
which exhibit an insufficiently high degree of confidence. The choice of this threshold
can heavily influence the resulting precision, recall and F-measure. A high threshold
typically results in a high precision and low recall, whereas a low threshold typically
results in a low precision and high recall. Judging the performance of a system by
simply calculating the precision, recall and F-measure may result in an unrepresentative conclusion if a different and possibly higher performance could have been
achieved by simply selecting a different threshold.
In order to circumvent the dilemma of selecting a specific threshold, one can
apply a technique known as thresholding. We will illustrate this technique using
our example alignment A. Table 2.1 lists all correspondences of A, sorted by their
confidence values in descending order. The listed thresholds imply that all correspondences with a lower confidence value are discarded if that threshold were to be
applied, whereas the F-measure listed next to a particular threshold means what the
resulting F-measure would be after applying the threshold.
Correspondence
< id6, Insur., Insurer, ≡, 1.0 >
< id9, Motorcycle, Motorbike, ≡, 1.0 >
< id8, Car, Car, ≡, 1.0 >
< id5, licensePlate, issued plate, ≡, 0.9 >
< id1, Vehicle, Conveyance, ≡, 0.8 >
< id10, Bike, Bicycle, ≡, 0.75 >
< id12, Truck, Car, ⊑, 0.7 >
< id13, MotorizedVehicle, Four-Wheeled, ≡, 0.7 >
< id14, chassisNumber, issued plate, ≡, 0.6 >
< id2, id, registration number, ⊒, 0.5 >
threshold
F-Measure
1.0
1.0
1.0
0.9
0.8
0.75
0.7
0.7
0.6
0.5
0.428
0.428
0.428
0.533
0.625
0.705
0.631
0.631
0.6
0.666
Table 2.1: Sorted example correspondences with their respective thresholds and
resulting F-measures.
We can see from Table 2.1 that a higher F-measure than 0.666 can be achieved
if one were to apply a threshold of 0.75 before computing the F-measure. This
threshold discards three incorrect and one correct correspondence, resulting in a
F-measure of 0.705.
Using our example alignment, we have demonstrated the technique of thresholding, which is defined as applying the threshold which results in the maximum
attainable F-measure (Euzenat et al., 2010) 1 Let us define Ax as the resulting
alignment after applying a threshold x to alignment A. We define the thresholded
weighted F-Measure FβT as follows:
1 Presented as part of the conference-dataset evaluation, the original authors compute the optimal thresholds separately, implying that the alignments have been pre-processed by filtering at
the given threshold..
40
Background
FβT (A, R) = max Fβ (Ax , R)
x
(2.7)
The thresholded F-measure FβT eliminates the bias introduced through the application of a threshold. Furthermore, it provides an upper boundary on the possible
performance of a particular system.2 Given a set of alignments To inspect the precision and recall at this upper boundary, we define the thresholded precision and
thresholded recall as follows:
PβT (A, R) = P (Ax , R), where x = arg max Fβ (Ay , R)
(2.8)
RβT (A, R) = R(Ax , R), where x = arg max Fβ (Ay , R)
(2.9)
y
y
It may be desirable to inspect the relation between precision and recall in more
detail, and observe how this changes with respect to the applied threshold. For this
purpose, the thresholded F-measure, precision and recall are insufficient as they only
measure the quality of an alignment at a single cut-off point.
The comparison in Table 2.1 provided an overview on how the F-measure behaves
at different possible thresholds. A similar and established method exists for the
detailed analysis of precision and recall, known as a precision-recall curve. This plot
is created by sorting all retrieved correspondences according to their confidence and
simply plotting the development of precision and recall for all possible cut-off points
on a curve. An example precision-recall curve is displayed in Figure 2.5.
A distinct pattern that is observable in every precision-recall graph is the zig-zag
shape of the curve. This stems from the fact that, when the cut-off point is lowered,
the added correspondence can only result in both precision and recall increasing, in
the case it is correct, or the precision decreasing and the recall staying constant in
the case it is incorrect.
In order to increase the clarity of a precision-recall graph, it is common to
calculate the interpolated-precision at certain recall-levels. To formally define the
interpolated-precision, we must first define the precision of an alignment at a specified recall value. Given an alignment A, a reference R and a specified recall value
r, the precision P (A, R, r) at specified recall r is defined as follows:
P (A, R, r) = P (Ax , R), wherex = max{y|R(Ay , R) = r)}
(2.10)
The interpolated precision at recall value r is defined as the highest precision
found for any recall value r′ ≥ r:
Pint (A, R, r) = max
P (A, R, r′ )
′
r ≥r
(2.11)
Note that while the measure of precision is not defined for a recall value of 0,
since its computation can result a division by zero if A is empty, the interpolated
2 When aggregating the quality of a series of alignments, it is also possible to compute a single
optimal threshold that is applied to all alignments. This method would represent an upper boundary in which a mapping system cannot dynamically configure itself for each matching task, e.g. by
applying a static threshold.
2.2 — Evaluation of Alignments
41
1
0.9
0.8
Precision
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Recall
Figure 2.5: Precision-Recall graph
precision is defined for a recall of zero. Furthermore, P (A, R, r) is undefined if
{y|R(Ay , R) = r)} = ∅, though Pint (A, R, r) is still defined if there exists an r’such
that {y|R(Ay , R) = r′ )} =
6 ∅.
Plotting the interpolated precisions has the effect that the zig-zag pattern of
the curve is flattened, creating a step-like pattern instead. A common variant of
the precision-recall curve is its computation using a set of 11 standard recall levels.
These span the interval of [0, 1] with increments of 0.1. Figure 2.6 illustrates a
precision-recall curve using interpolated precisions by adding the new curves to the
example seen in Figure 2.5.
In a typical information retrieval scenario, precision-recall curves cover the entire
spectrum of possible recall values. This is because information retrieval techniques
evaluate and rank the entire corpus of documents, meaning that if the cut-off point
is set low enough, all relevant document will be retrieved, though usually at a very
low precision rate. In ontology mapping, however, a produced alignment is rarely
complete with respect to the reference alignment. Thus, the standard precision
and interpolated precision are undefined for recall values higher than the highest
achievable recall of the given alignment. Therefore, when comparing the precisionrecall curves of two different alignments, it can occur that the curves have different
lengths.
The measures of precision and recall are well understood and widely adapted.
However, an inherent weakness is that they do no account for the ‘closeness’ of the
found correspondences to the reference alignment. The overlap function R ∩ A only
selects correspondences which are a perfect match with a reference correspondence
with respect to the entity pairings, relation type and confidence value. For example, given the reference correspondence < id8, Car, Car, ≡, 1.0 >, one could argue
that an incorrect correspondence < id15, Car, Four-Wheeled, ≡, 1.0 > is of a better
42
Background
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Recall
Precision
Interpolated Precision
Interp. Precision at Standard Recalls
Figure 2.6: Precision-Recall graph. Includes a curve of interpolated precisions for
all possible recall values (red) and a curve of interpolated precisions at the standard
11 recall values (green).
quality than an incorrect correspondence < id16, Car, Bicycle, ⊑, 1.0 >, since the
concepts of id15 can share some common instances, whereas the concepts of id16
do not. However, both correspondences are equally filtered out when computing the
intersection R ∩ A, resulting in no observable difference in precision or recall. The
closeness of the found correspondences to the reference alignment is of particular importance when considering that in a real-world scenario a domain expert typically
inspects and repairs the computed alignment. Here, it would be beneficial to that
expert if the repair effort would be as small as possible. To account for this, Ehrig
and Euzenat (2005) introduced the measures of relaxed precision and recall. These
replace the intersection component R ∩ A with an overlap proximity function w and
are defined as follows:
Pw (A, R) =
w(A, R)
|A|
(2.12)
Rw (A, R) =
w(A, R)
|R|
(2.13)
The aim of the relaxed precision and recall is to provide a generalization of
precision and recall. Thus, it is possible to select w in such a way that it replicates the results of precision and recall. Ehrig and Euzenat (2005) introduced a
straight-forward interpretation of the overlap proximity, consisting of the sum of
correspondence proximities over a set of selected correspondence pairs. Given an
2.2 — Evaluation of Alignments
43
alignment A and a reference alignment R, a pairwise mapping M (A, R) between
the correspondences of A and R and a correspondence overlap function σ(a, r), the
overlap proximity between A and R is defined as follows:
w(A, R) =
X
σ(a, r)
(2.14)
ha,ri∈M (A,R)
Given this definition of w(A, R), the problem of computing the overlap proximity
is de-composed into two sub-problems: (1) computing the correspondence mapping
set M and (2) defining a correspondence proximity function.
The set M (A, R) ⊆ A × R contains a series of correspondence pairings between
A and R. In order to preserve the possibility of replicating the standard measures of
precision and recall, M (A, R) should be restricted to a subset of A × R in which any
concept may only appear at most once (Ehrig and Euzenat, 2005). The set M (A, R)
can be computed using the Best-Match policy, which is defined as follows:
M (A, R) = arg max
X⊆K
X
σ(a, r); where
ha,ri∈X
K = {C ⊆ A × R|∀{(a, r), (a, r′ )} ⊆ C, r = r′ ∧ ∀{(a, r), (a′ , r)} ⊆ C, a = a′ }
(2.15)
Next, one needs to define the correspondence proximity σ(a, r). This function receives as input two correspondences, being a =< ida , e1,a , e2,a , ta , ca > and
r =< idr , e1,r , e2,r , tr , cr >. A pair of correspondences can differ with respect to
the mapped entities e1 and e2 , identified relation type t or confidence value c. For
each of these differences a domain expert would have to perform a repair action in
order to fix the malformed correspondence. Thus, σ(a, r) would need to take each
of these differences into account. This can be done by defining σ(a, r) as a combination of three proximity functions σpair (< e1,a , e1,r >, < e2,a , e2,r >), σrel (ta , tr ) and
σconf (ca , cr ). The correspondence proximity σ(a, r) can then be defined as follows:
σ(< ida , e1,a , e2,a , ta , ca >, < idr , e1,r , e2,r , tr , cr >) =
σpair (< e1,a , e1,r >, < e2,a , e2,r >) × σrel (ta , tr ) × σconf (ca , cr )
(2.16)
The combination of σpair , σrel and σconf determines how M (A, R) is selected and
ultimately the result of w(A, R). To show that Pw (A, R) and Rw (A, R) are indeed
generalizations, we will provide the three proximity functions which can be used
to replicate the standard measures of precision and recall. These are collectively
referred to as the equality proximity and are computed as follows:
44
Background
σpair (< e1,a , e1,r >, < e2,a , e2,r >) =
σrel (ta , tr ) =
σconf (ca , cr ) =
1
0
if < e1,a , e2,a >=< e1,r , e2,r >
otherwise
1 if ta = tr
0 otherwise
1
0
if ca = cr
otherwise
There exist numerous ways in which σpair , σrel and σconf can be defined to measure the correction effort. Typically, σpair returns a non-negative value if the retrieved entities of a are a specialization of generalization of the entities in r. While
a detailed survey of all variations of σpair , σrel and σconf is beyond the scope of
this work, we suggest the reader consult the work by Ehrig and Euzenat (2005) for
example definitions of σpair , σrel and σconf .
2.3
Alignment Evaluation with Partial Alignments
As stated in section 1.4, the main focus of the presented research is the mapping
of ontologies while exploiting available background knowledge. For some types of
background knowledge, such as lexical resources, ontological resources or parameter
sets, the valuation techniques introduced in section 2.2 can be still be applied. However, the evaluation of alignments that have been generated while exploiting a given
partial alignment poses a unique challenge. In a typical mapping scenario involving a partial alignment, it is assumed that the correspondences within the partial
alignment are correct. Therefore, these correspondences will also be included in the
output alignment, since the goal is to create a single complete alignment. Computing
the measures of precision, recall and F-measure would thus create a positive bias,
since the correspondences of the partial alignment will contribute to an increase in
both precision and recall. This bias could obfuscate the true performance of the
given system.
As an example, let us assume that we are given the alignment A and reference R
from the example in section 2.2. Furthermore, let us assume that we are given the
following partial alignment PA which was exploited during the mapping process:
< id1, Vehicle, Conveyance, ≡, 1.0 >
number,
⊒,
1.0
>
<
id2,
id,
registration
PA = < id5, licensePlate, issued plate, ≡, 1.0 >
< id6, Insur., Insurer, ≡, 1.0 >
< id8, Car, Car, ≡, 1.0 >
The correspondences of any given partial alignment are commonly also referred
to as anchors. The dynamics of the new evaluation problem are visualized in Figure
2.7:
Figure 2.7 depicts all example correspondences according to their association
with A, R or PA. Note that PA is a subset of A, since it is assumed that all
2.3 — Alignment Evaluation with Partial Alignments
A
45
R
id12
id3
PA
id13
id2
id1
id5
id9
id6
id4
id7
id10
id14
id8
id11
Figure 2.7: Visualization of the dynamics between output, reference and partial
alignments of the example.
correspondences of PA will be included in A. However, while all correspondences
of PA are also part of R in this example, we did not depict PA as being a subset
of R. This stems from the fact that, while it is reasonable to assume that all
correspondences of PA are correct, this might not be the case in practise. Chapter
6 will deal with this particular situation in more detail.
Computing the precision and recall of A yields values of 0.7 and 0.6363 respectively. These measurements however are deceiving, as five of the correct correspondences of A were provided in PA and therefore not generated by the tested mapping
system. In order to measure the true quality of the correspondences contributed
by a given system, one must take the correspondences of PA into account. Ideally,
the measurement should reflect the quality of the correspondences in A which are
not part of PA. It is possible to adapt the measures of precision and recall such
that these take a given partial alignment into account. We refer to this variation of
precision and recall as the adapted precision P ∗ (A, R) and adapted recall R∗ (A, R)
(Caraciolo et al., 2008; Schadd and Roos, 2013)3 .
Given an alignment A, a reference alignment R and partial alignment PA, the
adapted precision P ∗ (A, R) with respect to PA can be calculated as follows:
P ∗ (A, R, PA) =
| A ∩ R ∩ PA |
| A ∩ PA |
(2.17)
The adapted recall R∗ (A, R) with respect to PA can be calculated as follows:
R∗ (A, R, PA) =
| A ∩ R ∩ PA |
| R ∩ PA |
(2.18)
3 P ∗ and R∗ are only informally introduced in (Caraciolo et al., 2008) by textually describing
the adaptations to P and R and referring to the new measures also as precision and recall.
46
Background
Using the measures of adapted precision and recall we can now express the quality
of the correspondences that were actually contributed by the mapping system. For
our example, this would result in the following measurements:
P ∗ (A, R, PA) =
R∗ (A, R, PA) =
2
|{id2,id10}|
| A ∩ R ∩ PA |
= = 0.4
=
|{id2,id10,id12,id13,id14}|
5
| A ∩ PA |
|{id2,id10}|
2
| A ∩ R ∩ PA |
=
= = 0.333
|{id2,id3,id4,id7,id10,id11}|
6
| R ∩ PA |
Taking the supplied partial alignment into account for our example by calculating
the adapted precision and recall, being 0.4 and 0.333 respectively, reveals that the
quality of the identified correspondences is not as high as the standard measures of
precision and recall, 0.7 and 0.6363 respectively, implied.
Using the measures of adapted precision and recall, we can now define the measure of adapted F-measure, allowing one to express the quality of an alignment using
a single value, while accounting for the context of a supplied partial alignment:
Fβ∗ (A, R, PA) = (1 + β 2 ) ×
2.4
P ∗ (A, R, PA) × R∗ (A, R, PA)
(β 2 × P ∗ (A, R, PA)) + R∗ (A, R, PA)
(2.19)
Ontology Alignment Evaluation Initiative
The need for techniques which can tackle the ontology mapping problem has been
recognized in the scientific community. To stimulate research in this area and to
compare existing approaches, the Ontology Alignment Evaluation Initiative4 (OAEI)
(Euzenat et al., 2011b) was founded. This organization hosts yearly competitions in
order to evaluate current state-of-the-art system using a series of datasets. Before
being known as OAEI, the contest was initially held twice in 2004, first at Information Interpretation and Integration Conference (I3CON) at the NIST Performance
Metrics for Intelligent Systems workshop (PerMIS) (Euzenat et al., 2011b), and second at the International Semantic Web Conference (ISWC) during the Evaluation
of Ontology-based Tools (EON) workshop (Euzenat, 2004b). In 2005 the contest
was name ’Ontology Alignment Evaluation Initiative’ for the first time and held
at the workshop on Integrating Ontologies during the International Conference on
Knowledge Capture (K-Cap) (Euzenat et al., 2005). All subsequent workshops were
held at the Ontology Matching workshop, which is collocated with the International
Semantic Web Conference (Grau et al., 2013).
During its existence, the OAEI contest has grown steadily. The initial evaluation
fielded only 7 participants (Euzenat et al., 2005), while the most recent edition saw
23 systems participating (Grau et al., 2013). A valuable addition to the evaluation
campaign was the introduction of the SEALS platform (Trojahn et al., 2010). This
4 http://oaei.ontologymatching.org/
2.4 — Ontology Alignment Evaluation Initiative
47
software platform allows mapping system creators to wrap and upload their tools,
such that these can automatically be evaluated and compared to other tools. Furthermore, it promotes the accessibility of the tools for other researchers and supports
the validity of the results, since the evaluations are performed by a neutral party
and can easily be replicated.
The OAEI competition is run using a series of datasets, where each dataset tests
a particular aspect of the ontology mapping problem (e.g. lack of data, ontology
size). These datasets are typically created as a response to the current challenges
facing the field (Shvaiko and Euzenat, 2008; Shvaiko and Euzenat, 2013), with the
intention that each dataset stimulates research in the problem area that this dataset
represents. The initial competition was run on only two datasets, while the most
recent competition was run using 10 different datasets.
We will provide a brief overview of these datasets in the following subsection.
2.4.1
Datasets
Benchmark
The benchmark dataset is one of the oldest datasets used by the OAEI competition.
It is a synthetic dataset which consists of matching tasks where one given ontology
has to be matched to many different systematic variations of itself. These variations
entail the alterations or removal of all possible combinations of meta-information
of one ontology. Examples of this are the distortion, translation or removal of concept labels, translation or removal of comments, removal of properties, removal of
instances and the flattening or expansion of the concept hierarchy. Hence, this
dataset tests the robustness of an approach when faced with a lack of exploitable
meta-information.
During the years, this dataset has constantly evolved. The base ontology has
been changed numerous times and different types and combinations of alterations
were introduced. Another notable change for more recent competitions has been
the expansion of this dataset using multiple base ontologies (Aguirre et al., 2012;
Grau et al., 2013). These ontologies vary in size and facilitate the observation of the
scalability of the tested matching systems.
Conference
The conference dataset consists of 16 ontologies modelling the domain of organizing
scientific conferences. These ontologies were developed within the OntoFarm project.
What is different is that all ontologies in this dataset originate from real-world
systems, facilitating an estimate on how well a mapping system might perform in a
real-world application. The ontologies are quite heterogeneous in terms of structure
and naming conventions, providing a challenging environment for the evaluation of
mapping systems.
48
Background
Anatomy
The anatomy dataset consists of a single mapping task between two biomedical ontologies, one describing the anatomy of an adult mouse and one being a part of the
NCI thesaurus, with this part describing human anatomy. This dataset is noteworthy for several of reasons. One is the size of the given ontologies. Whereas the
ontologies of the conference dataset contained at most 100 concepts, the ontologies
of the anatomy dataset are significantly larger, 2744 classes for the mouse ontology
and 3304 classes for the human ontology. This dataset thus presents a complexity
problem, where a mapping system must provide an alignment within an acceptable
time. For the OAEI competition in particular, systems are given one day of computational time to generate an alignment (Grau et al., 2013). Whilst different OAEI
contests have offered a variation of this dataset where a partial alignment is provided
for the mapping task, unfortunately it has not been executed in recent years due to
lack of participants (Aguirre et al., 2012; Grau et al., 2013).
Another challenging aspect is the use of domain specific terminology for the concept labels and descriptions. Hence, there is little natural language present in the
concept descriptions, making the application of natural-language processing techniques difficult. Approaches which use thesauri or external ontologies also struggle
with this dataset, as external ontologies or thesauri are typically limited to general
language concepts and are thus unlikely to contain the concepts which are modelled
in the given ontologies. Another differentiator to typical ontologies is the use of
specific annotations and roles, e.g. the widespread use of the partOf relation.
Library
After being introduced in 2007, the library dataset presents a mapping task in which
two large thesauri have to be mapped using the SKOS (Miles et al., 2005) mapping
vocabulary. In its original version (Isaac et al., 2009), the library dataset consisted of
two thesauri used by the National Library of the Netherlands (KB) for the indexation
of two of its collections. The KB uses the GTT thesauri for the indexation of its
Scientific Collection, while relying on the Brinkman thesaurus for the indexation of
its Deposit Collection, containing all Dutch printed publications. The two thesauri
contain approximately 35.000 and 5.000 concept descriptions. While both thesauri
have a similar domain coverage, they differ greatly with respect to their granularity.
The 2009 variant of this dataset saw the same methodology, however the original
ontologies were replaced with the Library of Congress Subject Headings list (LCSH),
the French National Library heading list (RAMEAU) and the German National
Library heading list (SWD) (Euzenat et al., 2009a). Here, an additional difficulty is
the multi-lingual aspect of the mapping problems.
Whilst not being run in 2010 and 2011, the library dataset returned in the 2012
OAEI competition (Ritze and Eckert, 2012). This edition no longer features the
multi-lingual aspect of the 2009 edition, with multi-lingual heterogeneity now being
tested separately in the Multi-Farm dataset. Instead, the 2012 version consists of the
STW Thesaurus for Economics and The Thesaurus for Social Sciences (TheSoz).
2.4 — Ontology Alignment Evaluation Initiative
49
MultiFarm
Introduced in 2012, the MultiFarm dataset is specifically designed to test a mapping
system’s capability to match ontologies that are formulated using a different natural
language (Meilicke et al., 2012). This dataset has been created by taking a subset
of the conference dataset and translating the ontologies from English to a series of
other languages, being Chinese, Czech, Dutch, French, German, Portuguese, Russian
and Spanish. The dataset contains problems of all possible language combinations,
and for every combination there exists a problem of mapping two originally different
ontologies and a problem of mapping the same ontology that has been translated
into different languages. By comparing the results of these two variants one can
observe to what extent a system exploits non-linguistic features of the ontology for
its results, as the alignment between the same ontology translated into different
languages is likely to be of a much higher quality.
Large Biomedical Ontologies
This dataset consists of mapping tasks where large-scale semantically rich biomedical
ontologies need to be aligned. Three ontologies are provided, namely the Foundational Model of Anatomy (FMA), the SNOMED CT ontology and the National
Cancer Institute Thesaurus (NCI) consisting of 78.989, 306.591 and 66.724 classes
respectively. To create the reference alignments the UMLS Metathesaurus (Lindberg, Humphreys, and McCray, 1993) is exploited. Given the large scale of the
different matching problems, this dataset is very suitable as a stress test for the
efficiency and scalability of a particular mapping system.
Instance Matching
Introduced in 2009 (Euzenat et al., 2009b), the Instance Matching dataset saw a
rapid evolution during its existence. The main goal of this dataset is to measure
instance matching techniques. Here the primary task is not the identification of correspondences between ontology concepts, but instead the identification of instances
which are present in both ontologies but actually model the same real world entity.
In its initial variation, the dataset consisted of three separate benchmark tests. Two
of these were set up using real-world data collections, using the eprints, Rexa and
SWETO-DBLP datasets to form the first benchmark and TAP, SWEETO-testbed
and DBPedia 3.2 to form the second benchmark. The third benchmark is a series of synthetic tests where one collection, namely the OKKAM dataset, has to be
matched to different variations of itself.
The 2010 edition saw two main modalities (Euzenat et al., 2010), being a datainterlinking track and a OWL data track. In 2011, the dataset was altered such that
it offered one data-interlinking track and one synthetic benchmark (Euzenat et al.,
2011a). The data-interlinking track consisted of rebuilding links among the New
York Times dataset itself and identifying shared instances between the New York
Times dataset and the external resources DBPedia, Geonames and Freebase. The
synthetic track is created by introducing systematic alterations to the Freebase data.
The 2011 edition of the instance matching dataset has also been use for the 2012
50
Background
and 2013 campaign. However, to address the problem of finding similar instances,
instead of identical instances, the 2014 edition5 of this dataset was set-up to include
an identity recognition track and a similarity recognition track.
Interactive Matching
The interactive matching evaluation was introduced in 2013 in order to address the
need for the evaluation of semi-automatic approaches (Grau et al., 2013). This stems
from the intuition that a user can be a valuable asset for improving the quality of
the generated alignments. The set-up for this dataset differs from a typical partial
alignment mapping problem. Instead of being given a series of correspondences, the
given system may consult a user regarding the correctness of a correspondence. To
perform this evaluation automatically, the given user is simulated through the use
of an Oracle class, which can check suggested correspondences by inspecting the
reference alignment. The ontologies used for this dataset stem from the conference
dataset. By comparing a system’s results of the interactive matching dataset with
the results of the conference dataset, one can observe how much the interactive
component of the system actually contributes to the alignment quality.
Query Answering
This newest addition to the OAEI campaign aims to present an alternative to the typical reference-based evaluation of alignments (Solimando, Jiménez-Ruiz, and Pinkel,
2014). Instead, the goal is to evaluate mapping systems in an ontology-based data
access scenario. This set-up is comparable to an information integration scenario,
as introduced in section 1.2.2. In essence, the systems are required to compute
an alignment between a query ontology and a local ontology. A series of queries
are translated using the alignment produced by the system, such that an answer
set is generated for each query. These answer sets can be evaluated by comparing
them to the desired answer sets by computing the measures of precision, recall and
F-measure.
5 results
paper publishing in progress
Chapter 3
Mapping Techniques
In the previous chapter we have formally introduced the problem of ontology mapping and evaluation techniques. In this chapter, we will introduce the basic techniques that are commonly used to create such mapping functions. In section 3.1 we
will introduce basic system architectures, a categorization of similarity techniques,
and a brief overview of similarity aggregation and correspondence extraction techniques. In section 3.2 we will provide a survey of the contemporary state-of-the-art
mapping systems.
3.1
Basic Techniques
The core task of an ontology mapping system is the identification of correspondences
that exist between the concepts of two given ontologies. In order to achieve this,
the system must be able to determine the likelihood of two concepts being used
to encode the same or similar information. This task is usually achieved through
the usage of similarity measures. Using the results of the similarity measures, the
system must determine the overall similarity between the given concepts and decide
which possible correspondences to include in the output alignment. Thus, the core
tasks which are performed during ontology mapping can be summarized as follows:
Similarity Computation Computation of similarities between the ontology concepts. A similarity measurement exploits the encoded meta-information of
concept definitions in order to produce its measurements. Various types of
meta-information can be used for this purpose, such as concept labels, descriptions, comments, related concepts or provided instantiations of that concept.
While a similarity measurement is typically correlated with the statistical likelihood of two given concepts denoting the same entity, it is not a statistical
estimate, i.e. the computed measurements over the entire problem space are
not normalized and do not take previously observed measurements into account.
Similarity Combination It is common for mapping systems to employ multiple
similarity measurements. The reason for this is that similarity metrics typically
52
Mapping Techniques
exploit a limited range of available meta-information and determine the similarity between concepts using a specific intuition. If one of these two aspects
fails then the similarity will be very unlikely to reflect appropriate similarity
measurements. An example of the aspect of information availability failing
would be a mapping system employing a comment-based similarity to process
an ontology which does not contain concept comments. An example of the
similarity intuition aspect failing is the use of a string-similarity when the concept names are expressed using synonyms. The usage of multiple similarity
metrics means that for each combination of concepts there will be multiple
measurements of their similarity. At this step, it is necessary to combine these
measurements using a specific similarity combination technique.
Correspondence Extraction After the similarities between all possible concept
combinations have been converted into a single value, it becomes necessary
to determine which correspondences will be included in the output alignment.
Whether a specific correspondence linking the concepts x and y will be included is determined by not only inspecting its own similarity value, but also
by analysing all possible correspondences which link either x or y to alternative concepts. Alternatively, one can analyse correspondences which map
the respectively related concepts of x and y, e.g. the possible correspondence
between the parent concept of x and the parent concept of y.
Using these core functions, we can now visualize the entire mapping process in
more detail as can be seen in Figure 3.1.
Ontology 1
Aggregation
Similarity 1
Similarity Cube
Resources
Parsing and
Processing
Similarity 2
Mapping
Extraction
Ontology 2
Similarity n
Result
Alignment
Figure 3.1: Basic architecture of an ontology mapping framework.
In Figure 3.1 we can see the entire ontology mapping process. On the left side
we can see the inputs of the mapping problem, being two ontologies and the set of
additional resources. The first task to be performed involves the parsing and processing of the input ontologies. Here, the ontologies are parsed into a format that
3.1 — Basic Techniques
53
is supported by the system. For example, if the system is designed with OWL-DL
ontologies in mind and receives a RDF-schema as one of the input ontologies, then
the RDF-schema will be parsed into an OWL-DL ontology during the parsing step.
Furthermore, several pre-processing steps can be performed at this step. Examples
of these are word-stemming, stop-word filtering, part-of-speech tagging, synonym
identification or the creation of semantic annotations. After this step, all configured similarity measures will proceed to calculate all pairwise similarities between
the concepts of both ontologies. When two ontologies consist of x and y concepts
respectively, the result of the similarity computation step using n similarity metrics
is a x × y × n cube of similarity measures. It is possible to apply special partitioning
techniques at this stage. Partitioning techniques attempt to reduce the mapping
problem into smaller sub-problems such that the similarity cube does not have to
be computed completely without impacting the alignment quality (Stuckenschmidt
and Klein, 2004), increasing the efficiency of the system. Since only a subset of the
x × y × n matrix is actually computed, one can compile the results into a sparse
x × y × n matrix.
Next, the x × y × n cube is reduced to a x × y matrix of similarity measures
through the use of an aggregation technique. Examples of these techniques are
statistical measures or machine learning techniques. These will be introduced in
more detail in Section 3.1.3. In the third core step, the x × y similarity matrix, is
used to determine which pairwise combinations of concepts will be included in the
result alignment. Here, the aggregated similarity value of a concept pair is typically
also used as the confidence value of the respective correspondence.
The mapping steps starting from the parsing and preprocessing sub-process and
ending with the mapping extraction, as seen in Figure 3.1, can be interpreted as
the mapping function a system, as defined in Definition 4. The example of Figure
3.1 however only portrays a straight forward approach of such a function. It is
of course possible to structure a mapping system differently while still conforming
to the definition of an ontology mapping system. We will introduce a selection of
prolific system structure alternatives which utilize two ontologies 01 and O2 , an input
alignment A, a set of parameters p and a set of resources r in order to produce an
output alignment A′ .
Figure 3.2 visualizes the basic principle behind an iterative system. Here, the
output alignment A′ serves as input for an additional execution of the mapping
system. Typically, the system produces a small set of high precision correspondences.
These serve as a basis for the discovery of related correspondences which themselves
might not exhibit a high pairwise similarity. The iteration process continues until a
certain stopping criteria is met. Such a criteria can be the stability of the mapping,
i.e. no significant changes to the alignment after an iteration, or a limit on the
amount of iterations or maximum runtime, etc.
3.1.1
System Composition
When given a selection of mapping systems, it becomes possible to create a metasolution to the mapping problem by composing the individual systems into a single
system. One way of achieving this is through sequential composition. Given two
54
Mapping Techniques
O1
r
A
Mapping System
Continue?
no
A’
yes
p
O2
Figure 3.2: An iterative mapping system.
systems, being Mapping System 1 and Mapping System 2, a sequential composition
first presents the mapping problem between O1 and O2 to Mapping System 1. This
system produces an output alignment A′ . Mapping System 2 then utilizes A′ as
input alignment for the task of mapping O1 with O2 in order to produce the output
alignment A′′ . This process is illustrated in Figure 3.3.
O1
r’
r
A
Mapping System 1
p
A’
Mapping System 2
A’’
p’
O2
Figure 3.3: A sequential composition of mapping systems.
Mapping approaches which exploit partial alignments are commonly deployed in
sequential compositions, specifically as part of Mapping System 2. Here, Mapping
System 1 typically serves as a means to generate a partial alignment for the particular approach in the case that no partial alignment is provided. If such an alignment
3.1 — Basic Techniques
55
does exist, Mapping System 1 can either opt to forward that alignment to Mapping
System 2 or attempt to enhance it, for instance through the discovery of additional
correspondences or the verification of the input alignment.
An alternative way of composing mapping systems is through parallel composition. Here, the inputs O1 , O2 and A are forwarded to the given mapping systems
Mapping System 1 and Mapping System 2. In this type of composition, the given
systems are executed independently of each other, resulting in the output alignments
A1 and A2 . The principle behind a parallel composition is illustrated in Figure 3.4.
r1
p1
O1
Mapping System 1
A1
A
aggregation
A’
Mapping System 2
A2
O2
r2
p2
Figure 3.4: A parallel composition of mapping systems.
After the individual systems have been executed, an aggregation technique merges
the alignments A1 and A2 into a single alignment A′ , representing the output alignment of a parallel composition. There exist a variety of techniques which can be
applied in this situation. For instance, a decision system can simply choose between
A1 and A2 based on some given criteria. Alternatively, the aggregation method can
opt to create A′ by only selecting correspondences which appear in both A1 and A2 .
A selection can also take place based on the provided confidence values of the correspondences. Threshold-based methods are a good example of such an aggregation,
where correspondences are only added to A′ if their confidence values satisfy the
given threshold. Alternatively, one can create a bipartite matching problem using
the supplied alignments. In a bipartite matching problem one is given a graph with
two groups of vertices A and B and a set of weighted edges E which only vertices of
A with vertices of B. The main task is then to find a subset of E which satisfies a
certain criteria. To merge the alignments A1 and A2 one can define E such that it
only contains an edge e linking the concepts c1 and c2 if either A1 or A2 contain a
correspondence between c1 and c2 . Executing maximum-weight matching or stable-
56
Mapping Techniques
marriage matching, as detailed in sub-sections 3.1.4 and 3.1.4 can then be used to
determine which correspondences are included in the final alignment.
When performing a confidence-based aggregation, it is likely a better option
to perform the aggregation step on the produced similarity matrices of Mapping
System 1 and Mapping System 2 instead on their output alignments A1 and A2 .
The methods which can then be applied are similar to the aggregation techniques
discussed in Section 3.1.3.
3.1.2
Similarity Metrics
An essential component of an ontology mapping system is the set of applied similarity
metrics. These measure how similar two given ontology entities e1 and e2 are with
respect to their provided meta-data. We define a similarity metric as follows:
Definition 6 (Similarity). A similarity metric is any function f (e1 , e2 ) ∈ R which,
given a set of entities E, maps two entities e1 and e2 to a real number and satisfies
the following properties:
∀x ∈ E, ∀y ∈ E, f (x, y) ≥ 0
∀x ∈ E, ∀y ∈ E, f (x, y) ≤ 1
∀x ∈ E, ∀y, z ∈ E, f (x, x) ≥ f (y, z)
∀x ∈ E, f (x, x) = 1
∀x, y ∈ E, f (x, y) = f (y, x)
(positiveness)
(normalized)
(maximality)
(self-maximality)
(symmetry)
Similarity metrics are typically executed on concepts originating from different
ontologies. However, the origins of the inputs are not a requirement for a function
to satisfy the criteria of a similarity metric.
Some techniques measure the dissimilarity between entities and convert resulting
value into a similarity value. A dissimilarity is defined as follows:
Definition 7 (Dissimilarity). A dissimilarity metric is any function f (e1 , e2 ) ∈
R which maps two entities e1 and e2 to a real number and satisfies the following
properties:
∀x ∈ E, ∀y ∈ E, f (x, y) ≥ 0
∀x ∈ E, ∀y ∈ E, f (x, y) ≤ 1
∀x ∈ E, f (x, x) = 0
∀x, y ∈ E, f (x, y) = f (y, x)
(positiveness)
(normalized)
(self-minimality)
(symmetry)
A more constrained notion of dissimilarity can be found in a distance measure
or in an ultrametric, which are defined as (Euzenat and Shvaiko, 2007):
Definition 8 (Distance). A distance metric is any dissimilarity function f (e1 , e2 ) ∈
R which additionally satisfies the following properties:
∀x, y ∈ E, f (x, y) = 0 iff x = y
∀x, y, z ∈ E, f (x, y) + f (y, z) ≥ f (x, z)
(definiteness)
(triangular inequality)
Definition 9 (Ultrametric). An ultrametric is any distance function f (e1 , e2 ) ∈ R
which satisfies the following property:
3.1 — Basic Techniques
∀x, y, z ∈ E, f (x, y) ≤ max(f (x, z), f (y, z))
57
(ultrametric inequality)
Some authors define the measures of similarity and dissimilarity without the normalization and self-maximality properties (Euzenat and Shvaiko, 2007), and instead
define a normalized version of either metric separately. While the lack of normalization and self-maximality might not cause issues in other scientific domains, in the
domain of ontology mapping the lack of these properties can be the cause of issues
when combining several metrics into a single mapping system. One would need to
account for the output interval of each metric separately such that these can be
aggregated into a single appropriate confidence value. Additionally, the lack of selfmaximality makes it difficult to express whether the system considers two entities to
be identical. It would thus be possible to define multiple similarity metrics which assign different values to identical concept pairings, forcing the system to know which
values are used to express which concept pairs are considered to be identical for each
metric. Certain matching techniques rely on this knowledge in order generate an
input-set of correspondences with the highest degree of confidence. The property
of self-maximality is important, for instance, for systems which generate anchors
on-the-fly and only wish to exploit correspondences which are considered equal by
the given similarity metric. It is for these reasons that most contemporary mapping
systems do enforce the normalization and self-maximality properties, justifying their
inclusion in Definition 6.
A similarity function as defined in Definition 6 does bear some resemblance to
a probability mass function. One might even be inclined to describe a similarity as
the probability of two concepts being equal. However, a similarity function is not
a probability mass function. While a concept similarity function is likely positively
correlated with the theoretical probability of concept equality, it does not model the
chance of an event occurring. An intrinsic property of a probability mass function
is the normalization over the entire input set. Specifically, a function pX (x ∈ X, y ∈
X) → [0, 1], with X being the P
set of all possible discrete events, is only a probability
mass function if the equality x,y∈X p(x, y) = 1 holds. Recall that f is subject to
the self-maximality property, meaning that f (x, y) = 1 for all x and y which describe
a pairwise combination of identical concepts. If X contains n concepts, with n ≥ 2,
then there exist at least n combinations resulting in a value
P of 1, due to concepts
being compared to themselves. Therefore, we know that x,y∈X f (x, y) ≥ n > 1,
contradicting the requirement that the sum of all possible inputs be 1 if f were to
be a probability mass function.
There exists a variety of techniques that can be applied as a similarity function.
These exploit a variety of algorithms and underlying principles in order to derive
their similarity values. One can classify similarity metrics according to certain distinct properties of each metric (Rahm and Bernstein, 2001; Shvaiko and Euzenat,
2005). These classification properties are referred to as matching dimensions by
Shvaiko and Euzenat (2005). Matching dimensions examine the metrics with respect
to their expected input format, e.g. XML, RDF, OWL, relational-models, attributes
of the formal algorithms and provided output dimensions. While there exist several
ways of selecting matching dimensions in order to classify similarity metrics, we
will illustrate a classification of techniques using dimensions which are relevant to
the performed research. The classification, illustrated in Figure 3.5, is based on the
58
Mapping Techniques
works by Shvaiko and Euzenat (2005), particularly the granularity/input interpretation of the work. However, the order of the mapping dimensions has been altered in
order to cluster mapping techniques which utilize external resources. This grouping
emphasizes the breadth of existing resource-based techniques. Additionally, the classification has been updated such that it includes categories which more accurately
reflect related work and recent developments.
Our classification utilizes two mapping dimensions, which separate the individual
techniques according to the following criteria:
Syntactic/External/Semantic This mapping dimension was introduced in the
work by Shvaiko and Euzenat (2005) and distinguishes between interpretations of the input. Syntactic techniques interpret their input as a construct
and compute their outputs according to a clearly defined algorithm. External
techniques utilize resources in addition to the two given ontologies. Concepts
are evaluated by using the provided resource as context. External techniques
may exploit resources such as domain specific ontologies or lexicons, partial
alignments or human inputs. Semantic techniques utilize the semantic interpretations of the input. Typically, logic-based reasoners are used to infer
correspondences, ensuring that the alignment is coherent and complete.
Element-level/Structural-level Element-level techniques function by comparing
concepts of different ontologies in isolation, thus omitting information or instances of semantically related concepts. Structure-level techniques do also utilize information or instances of related concepts, thus analysing how concepts
appear as part of a structure. This matching dimension was first introduced
by Rahm and Bernstein (2001). For instance-based techniques, this dimension
was first utilized in the work by Kang and Naughton (2003).
We will provide a brief description of the illustrated techniques from Figure 3.5
in the remainder of this subsection. For more detailed discussions of each technique
and demonstrations of particular algorithms, we suggest that the reader consult the
works by Ehrig (2006) or Euzenat and Shvaiko (2007).
String-based
String-based techniques evaluate concept pairings by comparing the names and/or
descriptions of concepts. These interpret the input as a series of characters. The data
these techniques exploit is typically extracted from the rdf:label or rdf:description
properties of the concept descriptions. The main intuition behind string-based techniques is that concepts are more likely to denote the same entity if their names are
more similar. String-based techniques typically evaluate metrics such as common
prefix or suffix length, edit-distances and n-gram overlap (Cohen, Ravikumar, and
Fienberg, 2003).
Language-based
Language-based techniques treat the provided input as an occurrence of some natural
language. Thus, language-based similarities apply techniques originating from the
Languagebased
-tokenization
-lemmatization
-morphology
-elimination
Constraintbased
-type
similarity
-key
properties
Data
Analysis
and
Statistics
-frequency
distribution
Element-level
Graph-based
-homomorphism
-path
-children
-leaves
Taxonomybased
-taxonomy
structure
Information
Retrieval
-profile
similarity
3.1 — Basic Techniques
String-based
-name
similarity
-description
similarity
-global
namespace
Structure-level
Syntactic
Mapping Techniques
External
Element-level
Lexical
Resources
-WordNet
-UMLS
-UWN
Alignment
Re-use
-alignment
path
discovery
-concept
inference
Structure-level
Lexical
Resources with
Disambiguation
-glossary-profile
overlap
-co-occurrence
disambiguation
Partial
Alignments
-anchor paths
-anchor
profiles
-mapping
discovery
Structure-level
Repository
of
structures
-structure
meta-data
Modelbased
-SAT
solvers
-DL
reasoners
Figure 3.5: Classification of concept mapping approaches. The classification is hierarchically structured, with the top level
distinguishing the input interpretation and the bottom level featuring input scope.
59
Upperlevel
Ontologies
-SUMO
-MILO
-Cyc
Semantic
60
Mapping Techniques
field of Natural Language Processing (NLP) in order to extract and process meaningful terms from the concept names or descriptions. Examples of NLP techniques
are tokenization, lemmatization, or stop-word removal. The goal of a tokenization
technique is to split up compound words into meaningful fragments. For example, a
tokenization technique would split up the term SportsCar into the token set {Sports,
Car}, making it easier to establish that the concept of SportsCar is in some way
associated with the concept Car. The process of lemmatization allows for the linguistic reduction of a word into its base form. For example, the two corresponding
properties have and has are unlikely to produce a high result when used as input to a
string similarity. By applying lemmatization, one can reformulate the name has into
its linguistic base form have, thus yielding a perfect match. The aim of stop-word
removal is to exclude terms which themselves do not carry any meaning from future
evaluations, words such as ’a’, ’the’ or ’that’. For example, a comparison between
two properties written and hasWritten would yield a more appropriate result after
the stop-word ’has’ is removed from the name hasWritten.
Constraint-based
Constraint-based techniques compare the internal structure of ontology concepts.
The core intuition is that concepts which model the same information are more
likely to be similarly structured. Constraint-based techniques analyse concepts with
respect to the modelled data-type, e.g. string, int or double, and cardinality of
each property. The cardinality of a property describes a set of restrictions for that
property, modelling the minimum and maximum amount of times the property may
be used for any concept instance. Examples of cardinalities are 1..1, 0..1 and 0..*.
The cardinality of 1..1 implies that a property must be present exactly once for each
instance. Properties used as identifier are typically associated with this cardinality.
A property which may be used at most once is associated with the cardinality 0..1,
while a property which may be omitted or occur an arbitrarily large number of times
will be associated with the cardinality 0..*. Datatypes and cardinalities are typically
compared using a compatibility table (Lee et al., 2002).
Data Analysis and Statistics
Techniques which fall under this category exploit the set of provided instances of
the given ontologies. The intuition behind such techniques is that two identical concepts are likely to model the same or at least similar data. Data analysis techniques
assume that there is a subset of instances which is modelled in both ontologies.
The set of shared instances over the entire ontologies can either be established by
exploiting unique key properties, e.g. registration numbers, or assembled using instances mapping techniques (Elfeky, Verykios, and Elmagarmid, 2002; Wang et al.,
2008). Ontology concepts can then be compared by analysing the overlap with respect to their shared instances. This can be done by for example computing the
Hamming distance (Hamming, 1950) or Jaccard similarity (Jaccard, 1901) between
their respective instance sets.
In cases where a shared instance set cannot be computed, it is possible to apply
statistical techniques in order to approximate the similarity between instance sets.
3.1 — Basic Techniques
61
The aim is to extract statistical features from the ontology properties using the
given instance data, for example the minimum value, maximum value, mean, median, variance, existence of null values, number of decimals or number of segments.
The intuition here is that, given a statistically representative set of samples, these
characteristics should be more similar, if not the same, for two classes if these denote
the same entity.
Graph-based
Graph-based techniques interpret the two input ontologies as a pair of graphs G1 =
{V1 , E1 } and G2 = {V2 , E2 }. A graph G is characterized by its set of vertexes V
and set of edges E. By parsing the ontologies into a graph structure, the problem of
mapping two ontologies can then be reformulated as a graph homomorphism problem
(Bunke, 2000; Fan et al., 2010). The core problem of graph homomorphism is to
find a mapping from V1 to V2 , such that each node of V1 is mapped to a node of V2
of the same label and each edge in E1 is mapped to an edge in E2 .
The common notion of a graph homomorphism problem is often too restrictive as
it aims to produce a full mapping V1 → V2 and E1 → E2 . In real-world applications
a perfect match between the input structures might not always be possible (Bunke,
2000). In the domain of ontology mapping, this is also the case as ontologies can be
modelled with a different scope or granularity. Therefore, graph-based techniques
often interpret the problem at hand as a sub-problem of graph homomorphism,
referred to as the Maximum common directed subgraph problem (Levi, 1973). Here,
the aim is to find the maximum sets F1 ⊆ E1 and F2 ⊆ E2 and the function
pair f : V1 → V2 and f −1 : V2 → V1 between two graphs G1 = {V1 , E1 } and
G2 = {V2 , E2 }. This approach to the given graph matching problem allows for a
mapping between graph structures even if these differ with respect to their scope or
granularity. However, while in classic applications the aim is to maximize the size of
the common subgraph, in ontology mapping the problem is typically maximized over
a pairwise similarity function over all concept pairings (Euzenat and Shvaiko, 2007).
The similarity function typically compares two concepts c1 and c2 by comparing the
neighbourhood of c1 within G1 with the neighbourhood of c2 within G2 . The core
intuition is that the more similar the neighbourhoods of c1 and c2 are, the more
likely it is that c1 and c2 denote the same concept.
Taxonomy-based
Instead of mapping entire ontologies using their graph structures, one can apply the
graph matching techniques merely on the taxonomies of the ontologies (Valtchev
and Euzenat, 1997). A taxonomy only considers edges representing subClassOf
relations, which forms the back-bone of the ontology. When interpreting a class as
a set of instances, the subClassOf relation essentially links two sets where one set
is subsumed by the other. The intuition here is that concepts connected with a
subClassOf relation are semantically very similar. For two concepts c1 ∈ O1 and
c2 ∈ O2 denoting the same entity one can expect that their taxonomic neighbourhoods are also highly similar.
62
Mapping Techniques
Information Retrieval
Techniques which fall under this category utilize information retrieval (IR) approaches in to derive concept similarities. Ontology concepts are interpreted as
documents and each document is populated with relevant information about its respective concept (Mao, Peng, and Spring, 2007; Qu, Hu, and Cheng, 2006). This
information can stem from the concept definition, the definitions of related concepts
or provided concept instances. The core intuition here is that concept documents
are more likely to be similar if their respective concepts model the same entity. Document similarities are computed using information retrieval techniques (Manning,
Raghavan, and Schütze, 2008). Typically this includes the weighting of the document terms, for instance using weighting schemes like Term Frequency-Document
Frequency(TF-IDF ) or a weighted model utilizing the origin of each term from their
respective ontologies. An example of such a weighted model will be introduced in
Section 4.3.3.
Upper-level Ontologies
Upper-level ontologies model an abstract domain of concepts using formal semantics.
The concepts of such an upper-level ontology can be used as the basis for the creation
of a domain specific ontology. A technique utilizing an upper-ontology as background
knowledge would hence derive correspondences which are semantically consistent
and complete. While there exist several specifications of upper-level ontologies, e.g.
SUMO (Niles and Pease, 2001), MILO (Niles and Terry, 2004), Cyc (Matuszek et al.,
2006) or DOLCE (Gangemi et al., 2003), research in techniques which utilize these
ontologies has so far been unsuccessful (Noy, 2004; Euzenat and Shvaiko, 2007).
Lexical Resources
Lexical resources provide domain specific background to the given matching task.
These resources typically model the given domain in great detail and include domain
specific terminology. The information of such a resource can be used to enrich
the meta-information of each ontology concept by linking each concept with its
appropriate entity from the lexical resource. A resource-based technique can then
determine the similarity between two concepts by establishing their distance within
the given lexical resource. The core intuition behind this category of techniques is
that the closer two concepts are within the resource, the more likely they are to
denote the same entity. By converting the computed distances into a similarity,
one can essentially derive a measure of semantic relatedness or semantic similarity
between two concepts (Strube and Ponzetto, 2006).
Examples of such lexical resources are WordNet (Miller, 1995), YAGO (Suchanek,
Kasneci, and Weikum, 2007), UMLS (Bodenreider, 2004) and FMA (Rosse and
Mejino Jr, 2003). WordNet and YAGO are examples of broadly defined resources,
modelling a very wide domain. For WordNet and YAGO the modelled domain is
the general English language. UMLS and FMA are examples of narrow resources
which model a specific domain in great detail, being the biomedical domain in the
3.1 — Basic Techniques
63
case of UMLS and FMA. Techniques which utilize these types of resources will be
further discussed in Chapter 4.
Alignment Re-use
Alignment re-use techniques utilize complete alignments in order to derive concept
correspondences using reasoning techniques (Rahm and Bernstein, 2001). In the
most basic form, alignment re-use techniques tackle the problem of mapping O1
with O2 by exploiting two existing alignments A1 and A2 , with A1 specifying a
mapping O1 ↔ O3 and A2 specifying the mapping O2 ↔ O3 . While ideally A1 , A2
and O3 are provided as input to the mapping problem, this might not always be the
case. Without the provided necessary resources, a mapping system might consult a
mapping infrastructure, as described in subsection 1.3.7, in order to automatically
identify an ontology O3 for which appropriate alignments exist. More advanced
techniques exploit alignments spanning a variable path length (Aumueller et al.,
2005), e.g. O1 ↔ O′ ↔ O′′ ↔ · · · ↔ O2 , or merge the results of several alignment
paths (Sabou et al., 2008).
Lexical Resources with Disambiguation
Similarities of this category share some overlap with the previously described lexical
resource-based techniques. They utilize the same type of resources, e.g. WordNet or
UMLS, and use the same techniques to derive the similarities between entries of these
resources. However, the key difference here is that a disambiguation technique is
applied before the execution of the lexical similarity. Such techniques use contextual
information of a given concept, e.g. labels, descriptions or information of related
concepts, in order to identify the correct sense for that given concept. Because
disambiguation techniques typically utilize information of related concepts as context
to aid the disambiguation step, lexical similarities that utilize a disambiguation step
are classified as structure-level approaches instead of element-level. Disambiguation
techniques and their integration into a lexical similarity will be further discussed in
Chapter 4.
Partial Alignments
Partial alignment-based techniques utilize incomplete alignments between the two
given ontologies O1 and O2 . Such an alignment can stem from a domain expert
being unable to finish the complete alignment, or from a mapping system. Such a
system might even be designed for the specific purpose of generating a reliable set
of correspondences to be used as partial alignment. The main goal here is to exploit
the given partial alignment in order to derive additional correspondences, such that
a complete alignment can be created. Specific techniques may for instance use
the partial alignment as a starting point for the discovery of new correspondences
(Seddiqui and Aono, 2009) or explore concept paths between anchors (Noy and
Musen, 2001). This category of techniques will be further discussed in Chapter 5.
64
Mapping Techniques
Repository of Structures
Previously described alignment re-use techniques either utilized partial alignments
between the given ontologies O1 and O2 or complete alignments between either O1 or
O2 and other ontologies from a repository. Techniques in this category utilize mappings of ontology fragments. Here, both O1 and O2 are decomposed into fragments
such that similar fragment pairs are identified. For each fragment pair a repository
of fragments is consulted for fragments which are similar to both fragments of the
given pair. The core intuition here is that alignments to similar fragments can be
exploited such that the mapping problem can be shifted into a fragment pair that
is easier to match (Rahm, Do, and Massmann, 2004).
Model-based
Model-based techniques process ontology concept based on their semantic interpretation. These techniques use formal semantic frameworks, e.g. propositional
satisfiability (SAT) (Giunchiglia and Shvaiko, 2003) or description logic techniques
(Bouquet et al., 2006), in order to derive correspondences. These techniques are
typically applied in conjunction with a set of anchors since otherwise their performance is unsatisfactory (Euzenat and Shvaiko, 2007). For example, SAT solvers
build matching axioms for each pairwise combination of concepts c1 and c2 according to a specified relation r ∈ {≡, ⊑⊒, ⊥}. Each mapping axiom is then verified by
assuming the negation of that axiom and deriving whether the negation is satisfiable.
A mapping axiom is considered true if its negation is unsatisfiable.
3.1.3
Similarity Aggregation
In the previous subsection we introduced a varied selection of techniques that can
serve as the basis of a similarity measure for an ontology mapping system. These
can be categorized by their different underlying principles and intuitions. However,
a single similarity metric does not always produce a reliable result. For example,
consider the concepts Bus and Coach describing the same entity. A human would
immediately conclude that these two concepts should be matched. However, depending on the used similarity metric the result can be vastly different. A lexical
similarity, as described in Subsection 3.1.2, would quickly identify that these two
words are synonyms and therefore produce a similarity value of 1. On the other
hand, a string-based technique would return a very low value since the words ’Bus’
and ’Coach’ have no character overlap. Based on this example one should not conclude that a string-based technique is a sub-standard technique. Given the concept
pair Bus and Buss, with the second concept containing a spelling error, a stringbased approach would yield a more accurate result since a lexical similarity would
be unable to find a concept labelled Buss in its corpus.
To overcome the different types of heterogeneities between ontology concepts, a
varied selection of mapping techniques is typically applied. For any given concept
pair the given mapping system applies every configured similarity metric. The results
of these metrics are combined using an aggregation technique. In this subsection we
will provide a brief overview on techniques that are commonly utilized for the purpose
3.1 — Basic Techniques
65
of similarity aggregation. Note that some aggregation techniques are defined over
distance measures instead of similarities (Euzenat and Shvaiko, 2007), though in
practise this is barely a hindrance since a similarity metric can easily be converted
into a distance measure.
Weighted Product
One way of aggregating similarities is by using a product. The similarities can be
weighted using a parameter ω, allowing certain metrics to be emphasized over others.
For example, one might find a metric based on instance overlap more trustworthy
than a metric comparing the internal structure of concepts. Given a set of n similarity metrics S = {sim1 , sim2 , . . . , simn }, a set of weights {ω1 , ω2 , . . . , ωn } and two
concepts x and y, an aggregation function a(x, y, S) can be defined as a weighted
product as follows:
a(x, y, S) =
n
Y
simi (x, y)ωi
(3.1)
i=1
Due to its use of multiplication, this function has the unfortunate effect that its
result is 0 if only one of the similarities produces the value 0, completely disregarding
the results of the other similarities. Given that it is more likely to find no overlap
between concepts than finding a perfect overlap, it is more reasonable to apply this
function on a set of distances instead.
Minkowski Distance
A generalization of the Euclidean Distance, the Minkowski Distance is defined on a
multidimensional Euclidean space. Given a set of n distance metrics D =
{dist1 , dist2 , . . . , distn }, a parameter p and two concepts x and y, an aggregation
function a(x, y, S) can be defined as a Minkowski Distance as follows:
v
u n
uX
p
disti (x, y)p
a(x, y, D) = t
(3.2)
i=1
Note that choosing p = 1 would be equivalent to the Manhattan distance while
chooseing p = 2 would be equivalent to the Euclidean Distance.
Weighted Sum
A more linear aggregation can be achieved through the application of a weighted
sum. Given a set of n similarity metrics S = {sim1 , sim2 , . . . , simn }, a set of weights
{ω1 , ω2 , . . . , ωn } and two concepts x and y, an aggregation function a(x, y, S) can
be defined as a weighted sum as follows:
a(x, y, S) =
n
X
i=1
simi (x, y) × ωi
(3.3)
66
Mapping Techniques
Pn
Note that by enforcing a normalization amongst the weights, such that i=1 =
1, one can ensure that the weighted sum is normalized as well. Furthermore, by
selecting ωx = n1 , ∀x ∈ [0 . . . n] one can model a simple arithmetic mean.
Machine Learning
The previously described aggregation techniques have only a limited capability to
adapt themselves to the given set of similarity techniques. The weights are typically
selected by a domain expert and do not change throughout an entire mapping task.
Such a system can have disadvantages for certain scenarios.
To illustrate this, let us return to the example we described at the beginning of
this subsection. The example entailed the evaluation of two concept pairs, being
< Bus, Coach > and < Bus, Buss >, by combining the results of a string similarity
and a lexical similarity. Let us assume that the two techniques produce the similarity
set S1 = {0, 1} for the pair < Bus, Coach > and the set S2 = {0.9, 0} for the pair
< Bus, Buss >. Additionally, let us consider a concept pair which should not be
matched, being < Car, Caravan > with a resulting similarity set S3 = {0.5, 0.5}.
Here, it becomes easy to see how no selection of the weights ω would result in an
adequate value for all concept pairs. Aggregating both similarities equally would
result in S1 , S2 and S3 being aggregated into a similar value near 0.5. One would
rather desire for the resulting values for S1 and S2 to be higher than the value for
S3 . Emphasizing the string similarity with appropriate weights for ω would result
in S2 being aggregated into a higher value, but would also lower the result for S1 .
The opposite would occur if the lexical similarity were to be emphasized instead.
In order to create a more complex aggregation system capable of taking the intricacies of particular similarity metrics into account, on can apply machine learning
techniques (Duda, Hart, and Stork, 1999; Russell and Norvig, 2003). In contrast
to the previously described aggregation techniques, machine learning approaches require a dataset for each specific application. Such a dataset is typically referred to
as the training-set. A machine learning technique attempts to build a model based
on the provided training-set such that this model can be used for the purpose of prediction or classification. For the evaluation of a trained machine learning approach
one typically uses a separate dataset, referred to as the test-set.
While the term machine learning encompasses a broad range of similar problems,
for ontology mapping the most relevant techniques are those which tackle supervised
learning. Here, the training set consists of a large quantity of instances where each
instance contains a value for each modelled dimension, i.e. no missing values, and
the desired outcome for that instance. This outcome may be a classification label
or a numerical value. The applied machine learning technique must construct a
model based on the provided instances such that it can approximate the desired
result for new instances as closely as possible. For ontology mapping one can model
the aggregation task both as a classification and as a regression task (Ichise, 2008).
In a classification task the given machine learning approach would assign a label,
e.g. ’true’ or ’false’, to every given concept pair, while in a regression task the
technique would assign each concept pair a value in [0, 1]. Thus, in order to apply
a machine learning technique as an aggregation function, one needs to construct a
3.1 — Basic Techniques
67
training-set using the similarity metrics of the mapping system and create a label
for each instance. Here one can use ontologies which have already been mapped, for
example the datasets of the OAEI campaign. An example of a system using such
techniques would be YAM++ (Ngo, Bellahsene, and Coletta, 2012), which utilizes
machine learning techniques to aggregate the results of its element-level similarities.
Another example system is NOM/QOM (Ehrig and Sure, 2004), which uses neural
networks for the combination of its matching rules.
We will briefly describe one technique to serve as examples of a machine learning
approach. Specifically, we will introduce artificial neural networks. For a more
thorough introduction to the field of machine learning, we suggest the reader consult
the work of Russell and Norvig (2003).
An artificial neural network (ANN) is a learning model inspired by the biological
structure of the brain (Russell and Norvig, 2003). A brain consists of a large quantity
of processing cells, called neurons. Each neuron has a series of incoming and outgoing
connections to other neurons. A neuron receives electrical pulses from other neurons
through its incoming connections. Based on the presence of absence of pulses a
neuron decides whether itself should fire a pulse. This pulse is then forwarded to
other neurons through its outgoing connections.
In an ANN the functionality of a neuron is modelled mathematically. A neuron is
defined of a series of n input parameters x. Each input is weighted with an individual
parameter ω. All weighted P
inputs are aggregated in a transfer function T , which is
n
typically modelled as T = i−1 ωi × xi + b. The result of the transfer function is
used as input for the activation function A(T, θ), which decides whether the neuron
should ‘fire’ a pulse or not. The activation function may be modelled as a simple
‘step’ function, with the threshold θ deciding if A(T, θ) should produce the output
1 or 0. Alternatively, it can be modelled as a continuous function, e.g. a sigmoid
function.
While a single neuron can be used for simple learning problems, ANNs typically
use a series of interconnected neurons in order to tackle more challenging tasks. The
neurons are typically arranged in layers, where a neuron in one layer only receives
the output of the previous layer. The output of the system is in essentially defined
as a series of nested neuron functions. Figure 3.6 displays an example of a ANN.
Unlike when training a decision tree, the structure of an artificial neural network
is already defined before the learning step. Instead, the learning step involves the
tuning of all the weights ω, the bias b and the threshold θ for each neuron in the network. A well known algorithm commonly applied for this task is the backpropagation
algorithm (Rumelhart, Hinton, and Williams, 1986). This algorithm first evaluates
the training-set using the current parameters of the network and calculates an error
score. By backpropagating the desired outputs through the network a delta score is
computed for each neuron. The parameters for each neuron are then updated based
on its corresponding delta value. This backpropagation routine is repeated until all
instances in the training-set are correctly classified or until a stopping criteria is
met.
Once the chosen machine learning technique has a fully trained model using
the given input-value pairs, one can use it to predict the appropriate value of new
inputs. In the case of an ANN, the similarity values are entered in the input layer and
68
Mapping Techniques
Input
Layer
Hidden
Layer
Output
Layer
input1
output
input2
Figure 3.6: Illustration of a neural network.
forwarded through the activation functions A(T, θ) of each node, using the learned
weights ω and the learned bias b for each respective node. The activation function
of the output node produces the aggregated similarity value.
3.1.4
Correspondence Extraction
The third core task required for each ontology mapping system is the extraction of
correspondences from the aggregated similarities. For each concept c the extraction
method must analyse the similarities in the aggregated similarity matrix M and decide to which other concept c should be mapped in a correspondence. Alternatively,
the system could decide not to map c at all if the aggregated similarities for c are all
sufficiently low below a threshold. Formally, we define the task of correspondence
extraction as follows:
Definition 10 (Correspondence Extraction). Given a set of m entities C1 , a
set of n entities C2 and the m × n similarity matrix M , the task of correspondence
extraction is defined as a function τ (C1 , C2 , M ) → A which generates an alignment
A ⊆ C1 × C2 .
The execution of the correspondence extraction function represents the first moment in which an ontology mapping system generates an alignment between the
two input ontologies. Based on the system’s structure, the alignment can be refined
in a post-processing step, used as input alignment for another matcher or simply
returned as output of the entire mapping system.
In this subsection we will introduce some common techniques that are used as correspondence extraction methods. These range from simple threshold-based methods
3.1 — Basic Techniques
69
(Do and Rahm, 2002; Ehrig and Sure, 2004) to bipartite matching methods (Euzenat
and Shvaiko, 2007).
Hard threshold
In the most straight-forward extraction method, a concept pair is only added to A
as a correspondence if their corresponding similarity value in M is larger or equal to
a specified threshold θ. This technique can be seen in the Anchor-PROMPT (Noy
and Musen, 2001) and QOM system (Ehrig and Staab, 2004).
Delta threshold
Here the threshold is represented as a delta value d, specifying an absolute or relative tolerance value which is subtracted from the highest retrieved similarity value.
The threshold θ is thus specified as θ = max(M ) − d. The delta threshold is also
referred to as proportional threshold if d is specified as a relative value with respect
to max(M ) (Euzenat and Shvaiko, 2007).
Percentage
In this extraction technique all pairwise combinations of concepts are sorted in descending order. Consequently, only the top n% of concept pairs are added as correspondences to A.
Relative threshold
Introduced by Melnik, Garcia-Molina, and Rahm (2002) for the evaluation of the
Similarity Flooding algorithm, this technique splits the similarity matrix M into
two matrices M1 and M2 . The core idea is that the absolute similarity values are
transformed into relative similarities with respect to the alternatives of either the
concepts of C1 or the concepts of C2 . Thus, M1 is generated by normalizing each
row r of M using the maximum value of r, and M2 is generated by normalizing each
column k of M using the maximum value of k. The relative threshold technique
only selects entity pairs such that their corresponding relative similarities in both
M1 and M2 are above the specified relative threshold θ.
Let us illustrate this technique using a small example. Given the two entity sets
C1 = {a, b} and C2 = {c, d}, let us assume that the similarity aggregation technique
produced the following similarity matrix M :
M=
a
b
c
1
0.54
d
0.8
0.27
The similarity matrix M is now converted into two matrices M1 and M2 specifying relative similarities:
70
Mapping Techniques
M1 =
a
b
c d
1 0.8
1 0.5
M2 =
a
b
c
1
0.54
d
1
0.33
Selecting the threshold θ = 0.5 for example would lead to the entity pairs
< a, c >, < a, d > and < b, c > being added as correspondences to the output
alignment A.
Maximum weight graph matching
Correspondence extraction is another area where techniques from the field of graph
theory are of use. Specifically, one can formulate the task of correspondence extraction as a bipartite graph matching problem. Here, we are given a graph G = {V, E}
consisting of the set of vertices V and edges E. G is defined as a bipartite graph if
there exist two sets U and W such that U ∪ W = V , U ∩ W = ∅ and all edges in
E only connect vertices from U to W . Additionally, all edges in E are associated
with a weight x, such that a specific edge e is defined as a triplet < v, v ′ , x >. The
task of bipartite graph matching is defined as the identification of a set E ′ such that
E ′ ⊆ E, E ′ maximizes a certain criteria and no two edges in E ′ share a common
vertex.
One can easily see how one can formulate the task of correspondence extraction
as a bipartite graph. First, we create U and W such that each vertex in U represents
an entity in C1 and each vertex in W represents an entity of C2 . Next, the edges
in E are specified according to the weights in the similarity matrix M . While for
correspondence extraction the vertices in U are fully connected to the vertices in W ,
bipartite graph matching also considers graphs in which U is sparsely connected to
W.
Given a bipartite graph, the core task is now to find a set of edges E ′ ⊆ E
where no two edges in E ′ share a common vertex. If xi is defined as the weight of
the edge ei , in maximum weight matching one needs to identify an E ′ such that for
every possible E ′′ ⊆ E, where no edge in E ′′ shares a common vertex, the following
inequality holds:
X
ei ∈E ′
xi ≥
X
xj
ej ∈E ′′
A well known technique to compute the maximally weighted match is the Hungarian method (Kuhn, 1955). This type of extraction technique is utilized in the
AgreementMaker system (Cruz, Antonelli, and Stroe, 2009) and YAM++ system
(Ngo et al., 2011).
Stable marriage graph matching
When using this technique the extraction task is also formulated as a bipartite
graph matching problem. However, the selection of E ′ is determined according to a
different criterion. Instead of simply maximizing over the weights x, the edges are
3.1 — Basic Techniques
71
selected such that a stability criterion is met. A selection E ′ is considered stable if
the following condition holds:
∀ < v1 , v1′ , x1 >, < v2 , v2′ , x2 >∈ E ′ : ∄ < v1 , v2′ , x >∈ E; x ≥ x1 ∧ x ≥ x2
More informally expressed, there must exist no pair of vertices v and v ′ of which
the corresponding edge weight in E is higher than the weight of the edges containing
v or v ′ in the selection E ′ .
The common analogy used here is the coupling between a group of men and a
group of women. Each man and woman prepare a sorted list of partners to which
they would rather be married to. The goal is to create a list of couples which
are to be married. One assumption is that a woman and a man will drop their
current partners and select each other if their preferences for each other are higher
than their preferences to their current assigned partners. A selection of marriages
is considered stable if no unmatched man and woman are willing drop their current
assigned partners because they prefer each other more.
A well known algorithm to compute a stable marriage in a bipartite graph is the
Gale-Shapley algorithm (Gale and Shapley, 1962). Stable-marriage graph matching
is applied in the MapSSS system (Cheatham, 2013).
While both maximum weight matching and stable-marriage matching select a
subset of E according to a global criterion, they can produce different results. Let
us illustrate their differences using the following example matrix M :
a
M=
b
c
d
1 0.6
0.6 0.0
Since both approaches tackle a bipartite graph matching problem, they can only
select two edges out of E since no vertex may be matched multiple times. In this
case, two subsets of E are potential outputs for each approach: E1 = {< a, c, 1 >, <
b, d, 0 >} and E2 = {< a, d, 0.6 >, < b, c, 0.6 >}. Maximum weight matching would
select E2 , since the sum of the edge weights of E2 , being 1.2, is higher than the sum
of edge weights of E1 , being 1. Stable-marriage matching however would select E1 ,
since both a and c prefer each other more than any alternative. Stable-marriage
matching will only select an E ′ in which a and c are matched, even if the other
vertices are matched to partners with a low weight.
Generally, we can see that Stable-marriage matching is more likely to make a
selection where a proportion of the edges have a very high weight, with the remaining
edges having lower weights due to being matched with lesser preferred vertices.
Maximum weight matching is more likely to make a more balanced selection, possibly
omitting edges with a very high weight.
Naive descending algorithm
The Naive descending algorithm presents a greedy approach for the extraction of
a 1:1 mapping from a matrix of similarities (Meilicke and Stuckenschmidt, 2007).
Instead of maximizing over a global criterion as in the previous approaches, this
72
Mapping Techniques
Algorithm 3.1 Naive descending algorithm pseudo-code
1:
2:
Naive-Descending(M)
3:
4:
5:
6:
7:
A ← convertToAlignment(M )
sortDescending(A)
A′ ← ∅
while A 6= ∅ do
c ← removeFirstElement(A)
A′ ← A′ ∪ c
for all c′ ∈ getAlternatives(A, c) do
removeElement(A,c’)
end for
end while
return A′
8:
9:
10:
11:
12:
13:
algorithm iterates over the elements of M and extracts correspondences based on
a series of local decisions. This extraction technique can be found in the PRIOR
system (Mao and Peng, 2006).
The algorithms receives as input the similarity matrix M . In the preparatory
phase, M is converted into an alignment A such that every entry in M is represented as a correspondence. This operation is denoted using the convertToAlignment statement.A is then sorted in descending order according to the confidences
of the correspondences. Next, the algorithm iterates over A. In each iteration the
correspondence c with the highest confidence value is removed from A and added to
the output alignment A′ . A is then inspected for any correspondences which contain
one of the entities of c, denoted by the getAlternatives statement, and removes them
from A. This ensures that each concept is mapped at most once. The iteration is
continued until no more elements are left in A, at which point A′ is returned as the
output.
Naive ascending algorithm
The naive ascending algorithm functions in a similar way than the naive descending
algorithm. However, instead of accepting correspondences and removing alternatives, it dismisses correspondences if there exist alternatives with higher confidence
values. This means that a correspondence c will be dismissed if an alternative c′
exists with higher confidence value, even if c′ is not actually part of the output alignment A′ . This makes the naive ascending algorithm more restrictive than the naive
descending algorithm, such that Naive-Ascending(M ) ⊆ Naive-Descending(M ) for
all matrices M (Meilicke and Stuckenschmidt, 2007)1 . This extraction technique
can be found in the CroMatcher system (Gulić and Vrdoljak, 2013).
1 The original authors specify this relation as a sub-set relation (⊂). However, their outputs are
equal when given a square M containing only non-zero values along the diagonal.
3.2 — Mapping system survey
73
Algorithm 3.2 Naive ascending algorithm pseudo-code
1:
2:
Naive-Ascending(M)
3:
4:
5:
6:
7:
A ← convertToAlignment(M )
sortAscending(A)
A′ ← ∅
for all c ∈ A do
if getHigherAlternatives(A, c) = ∅ then
A′ ← A′ ∪ c
end if
end for
return A′
8:
9:
10:
11:
3.2
Mapping system survey
In the previous sections we have introduced the components which form an ontology
mapping system and various techniques which can make up the components of such
a system . Given the large pool of techniques and the possibility of combining
techniques or entire systems, it is evident that there is a vast amount of ways in
which one can set up a mapping system. In this section we will provide an overview
of a selection of ontology mapping systems. For the sake of brevity, the intention is
not to provide an exhaustive list of all systems, but rather to provide an overview of
state-of-the-art systems with an emphasis on systems that are related to the topic of
this thesis. To get a more complete overview of existing ontology mapping systems,
we suggest the reader consults the works by Euzenat and Shvaiko (2007), Rahm and
Bernstein (2001), Wache et al. (2001), Noy (2004) or the technical reports of the
participating systems of the OAEI competition (Aguirre et al., 2012; Grau et al.,
2013). We will provide an overview of the different properties of each system in
Table 3.1 and a brief introduction to the general mapping strategy of each system.
YAM++
As evidenced by the results of the 2012 Ontology Alignment Evaluation Initiative
(Aguirre et al., 2012), one of the current state-of-the-art ontology mapping system is
YAM++, developed by (Ngo et al., 2012). This system combines machine learning
and information retrieval techniques on the element level and similarity propagation
techniques on the structure level to derive a mapping, to which consistency checking
is applied to further increase the quality.
AML
The framework AgreementMakerLight (AML) (Cruz et al., 2009; Faria et al., 2013)
matches ontologies using a layered approach. In the initial layer, similarity matrices are computed using syntactic and lexical similarities based on WordNet, among
others, which are then used to create a set of mappings. Their applied lexical
similarity is noteworthy because in the most recent version of AML it included a
74
Mapping Techniques
probabilistic WSD approach based on the assumption that word can be polysemous.
(Cruz et al., 2013) Further iterations in subsequent layers refine the existing mappings using structural properties in order to create new mappings. After a sufficient
amount of iterations, multiple computed mappings are selected and combined in
order to form the final mapping.
ASMOV
ASMOV (Jean-Mary, Shironoshita, and Kabuka, 2009) is capable of using general
lexical ontologies, such as WordNet, as well as domain specific ontologies, for instance UMLS, in its matching procedure. After creating a mapping using a set of
similarity measures, a semantic verification process is performed in order to remove
correspondences that lead to inferences which cannot be verified or are unlikely to
be satisfied given the information present in the ontologies.
Anchor-PROMPT
The PROMPT tool has been a well known tool in the field of ontology mapping (Noy
and Musen, 2003). Based on the Protégé environment, the tool features various
ontology editing features, such as ontology creation, merging and versioning. It also
features a user-based mapping process in which it iterates between querying the
user for mapping suggestions and using the user input to refine the mapping and
formulate new queries. A notable extension to PROMPT is referred to as AnchorPROMPT (Noy and Musen, 2001). This is a mapping algorithm which utilizes
partial alignments. The algorithm traverses all possible paths between two anchors
in both ontologies and records which concept pairs are encountered at the same
time. The intuition here is that concept pairs which are more frequently encountered
during the traversal steps are more likely to denote the same entity.
S-Match
The S-Match suite (Giunchiglia, Shvaiko, and Yatskevich, 2004) is a notable example
for its semantic mapping approach, thus being one of the first mapping systems utilizing a semantic mapping technique. Particularly, it employs the JSAT SAT solver
in order to derive concept correspondences based on a set of initial correspondences.
These initial correspondences are generated using a set of element-level similarities,
e.g. language and corpus-based techniques.
PRIOR
One of the first formal definitions of a profile similarity was developed by Mao et al.
(2007), the researchers behind the PRIOR system. Their definition of a concept
profile is limited to the information encoded in its definition. Thus information
of related concepts is excluded in that definition. However, related information is
added in a method which they refer to as profile propagation. To improve efficiency
for large mapping tasks, the system can omit the profile propagation step and derive
3.2 — Mapping system survey
75
concept similarities using the basic profiles and information retrieval-based indexation methods (Mao and Peng, 2006).
Falcon-AO
Falcon-AO is another system with an early development of profile similarities (Jian et al.,
2005; Hu and Qu, 2008). The authors use the term ‘virtual document’ to describe
the idea of a profile, emphasizing the origin of the approach from the field of information retrieval (Qu et al., 2006). Their profile creation model is noteworthy
because it facilitates parametric weighting of the profile terms according to their
original source.
LogMap
LogMap is an example of a mapping system which generates a set of anchors during
the mapping process (Jiménez-Ruiz and Cuenca Grau, 2011; Jiménez-Ruiz et al.,
2012b). This is done by efficiently comparing labels in a pre-computed index of the
given ontologies. The computed anchors are exploited in a mapping discovery and
repairs phase. The system alternates between discovering new correspondences and
discarding unsatisfiable correspondences that have been identified in the previous
step. This ensures that new correspondences are logically sound such that the resulting alignment is consistent. New correspondences are discovered by using the
I-SUB string metric (Stoilos, Stamou, and Kollias, 2005).
AUTOMS
The AUTOMS system (Kotis, Valarakos, and Vouros, 2006a) is noteworthy due to
its implementation of the HCONE-Merge approach (Kotis, Vouros, and Stergiou,
2006b). This approach assumes the existence of an intermediate hidden ontology
and attempts to create it using the information of both given ontology, effectively
merging the input ontologies. To determine whether two concepts should be merge
into one definition, the system establishes whether a linguistic match and/or a lexical
match exists. For the lexical matcher, concepts are disambiguated using latent semantic indexing (LSI). The system also employs structural information if there is no
sufficient syntactic meta-information to proceed with the HCONE-Merge approach.
Anchor-FLOOD
Anchor-FLOOD is a partial alignment-based mapping tool inspired by the AnchorPROMPT system (Seddiqui and Aono, 2009). However, instead of exploring paths
between anchors, it utilizes the anchors as a starting point for the discovery of new
mappings. In an iterative procedure, the system repeatedly selects a single anchor to
explore. The surrounding concepts of the anchor are analysed using terminological
and structural similarities. Concept pairs which satisfy a threshold criteria are added
to the alignments and the iterative process is repeated until no more correspondences
can be discovered.
76
Mapping Techniques
NOM/QOM
NOM (Naive Ontology Mapping) (Ehrig and Sure, 2004) is a heuristic mapping
system, which follows a series of codified rules specifying what meta-information of
each entity type can be used to derive a similarity value. The system supports 17
decision rules which have been specified by domain experts. An example of such
a rule would be R9, stating that two properties are similar if their sub-relations
are similar. QOM (Quick-Ontology-Mapping) (Ehrig and Staab, 2004) is a variation
of the NOM system with an added emphasis on computational efficiency. This
allows the system to tackle large-scale problems. To achieve the intended efficiency
QOM restricts several features of the NOM system. For instance, concept trees are
compared in a top-down approach instead of computing the full pairwise similarities
of the trees.
RiMOM
RiMOM is an example of an ontology mapping system which can automatically
adapt itself to better the fit the task at hand (Li et al., 2009). Prior to mapping, it
calculates two factors, the label-similarity-factor and the structure-similarity-factor,
over both ontologies. These factors describe the similarity of the entire given ontologies based on their entity-name overlap and structural overlap. The values of these
measures determine the selection of the applied similarity measures, how these are
tuned and influence the similarity aggregation step. The resulting similarities are
improved by propagating them through a pairwise-connectivity-graph (PCG) using
the Similarity Flooding algorithm (Melnik et al., 2002).
AOT
AOT is a recently developed system that participated in the 2014 edition of the OAEI
competition (Khiat and Benaissa, 2014). It uses a combination of multiple string
similarities and a lexical similarity in order to derive its concept correspondences.
WeSeE
WeSeE is an example how the internet can be used as background knowledge for the
mapping procedure. The system constructs a document for each concept by querying the search engine Bing using the terms from the concept labels and comments
as queries (Paulheim, 2012). The search results are then processed, merged and
weighted using TF-IDF weighting, such that the concept similarity is then defined
as the similarity between their corresponding documents.
MapSSS
MapSSS is another example of a system utilizing search engines. However, in contrast to WeSeE it does not assemble a document using the search results. Instead,
queries are formulated using the concept labels and specific keywords, such as ‘translation’ or ‘synonym’. The similarity score is them determined by the amount and
quality of the retrieved results (Cheatham, 2011; Cheatham, 2013).
3.2 — Mapping system survey
77
COMA++
COMA++ (Aumueller et al., 2005) is a refined mapping system of its predecessor
COMA (Do and Rahm, 2002). It stands out from other systems due to its application of alignment re-use and fragment-based matching techniques. The system can
be operated by a user using a GUI and completes the mapping task in user-activated
iterations. Before each iteration, the user can provide feedback in the form of confirming correspondences or refuting correspondences of previous iterations. The user
can also decide which alignment-paths to exploit for the alignment re-use technique
or let the system decide automatically based on the criterion of expected effort.
CroMatcher
CroMatcher is a recently developed system consisting of mainly syntactical methods
(Gulić and Vrdoljak, 2013). The general matching strategy is arranged in a sequential order in two components. The first component evaluates concept pairs using
string, profile, instance and internal-structural similarities. Instead of extracting
an alignment and forwarding this to the second component, the system forwards
the entire similarity matrix. The second component uses the matrix to initialize
several structural similarities, comparing entities with respect to the super-entities,
sub-entities, property domains or ranges. The alignment is extracted using the
Naive-Ascending algorithm.
WikiMatch
WikiMatch (Hertling and Paulheim, 2012a; Hertling and Paulheim, 2012b) a noteworthy variant of the WeSeE system. However, instead of exploiting search-engines
using heuristically generated queries, it exploits the contents of Wikipedia as an external resource. For every term in a concept label, fragment or comment the set
of corresponding Wikipedia article is retrieved. The system defines the similarity
between two concepts as the maximum Jaccard similarity that can be attained by
comparing the Wikipedia article sets of two terms.
Mapping Techniques
System
YAM++
AML
ASMOV
AnchorPROMPT
S-Match
Input
OWL
XSD,RDFS,OWL
OWL
XML
User-interaction
-
-
-
Architecture
iterative
sequential, iterative
Syntactic (Ele.)
string, profile
string
iterative
string,
internal structure
taxonomy structure,
instance similarity
OWL
editing,mergins,
versioning, matching
hybrid
string,
internal structure
Syntactic (Stru.)
Lex. Resources
Lex. Resources (WSD)
Alignment Re-use
Partial Alignments
Rep. of Structures
Semantic (Stru.)
Remarks
external profile,
shared instances
machine-learning
aggregation
profile
external ontology
probabilistic
-
WordNet,UMLS
semantic verification
as stopping criterion
basic
string,
language-based
WordNet
anchor traversal
-
label-profile
co-occurrence
WordNet
SAT solver
-
-
taxonomy
Table 3.1: Overview of ontology mapping systems.
78
-
PRIOR
Falcon-AO
LogMap
AUTOMS
AnchorFLOOD
Input
User-interaction
Architecture
Syntactic (Ele.)
OWL
basic
string
RDFS, OWL
GUI operation
basic
string
OWL
matching
sequential,iterative
string, languagebased
OWL
basic
string
OWL
iterative
string,internal
structure
Syntactic (Stru.)
profile
profile,
graph-based
-
taxonomy
taxonomy
Lex. Resources
-
-
-
synonym overlap
Lex. Resources (WSD)
-
-
-
Alignment Re-use
-
-
HCONE-Merge,
LSI
-
Partial Alignments
-
-
Rep. of Structures
Semantic (Stru.)
complexity-based
similarity selection
PBM partitioning
for large tasks
Remarks
synonym
tion
-
extrac-
generated,
anchor-based mapping discovery
structural and
lexical indexation
3.2 — Mapping system survey
System
-
-
anchor-based
mapping discovery
-
-
-
-
Table 3.1: (Continued) Overview of ontology mapping systems.
79
Mapping Techniques
System
NOM/QOM
RiMOM
AOT
WeSeE
MapSSS
Input
User-interaction
Architecture
RDFS, OWL
iterative
string,
internal structure
taxonomy
OWL
basic
OWL
basic
OWL
basic
OWL
iterative
string
string
-
string
profile
-
-
graph-based
-
WordNet
Bing
GoogleResearch
similarity propagation, strategy
selection & tuning
-
concept profiles
using search engines
-
Syntactic (Ele.)
Syntactic (Stru.)
Lex. Resources (WSD)
Alignment Re-use
Partial Alignments
Rep. of Structures
Semantic (Stru.)
application-specific
vocabulary
-
Remarks
-
Lex. Resources
-
80
Table 3.1: (Continued) Overview of ontology mapping systems.
-
COMA++
CroMatcher
WikiMatch
Input
User-interaction
SQL,XSD, OWL
GUI, user feedback, re-use selection
iterative
string,
internal
structure
OWL
-
OWL
-
sequential
string,
internal
structure
basic
-
Architecture
Syntactic (Ele.)
Syntactic (Stru.)
Lex. Resources
Lex. Resources (WSD)
Alignment Re-use
Partial Alignments
Rep. of Structures
Semantic (Stru.)
Remarks
3.2 — Mapping system survey
System
NamePath,
profile, instances
children and leaves parent & child over- overlap
lap
synonyms
alignment paths
fragment matching
-
-
Wikipedia
-
-
-
Table 3.1: (Continued) Overview of ontology mapping systems.
81
82
Mapping Techniques
Chapter 4
Concept-Sense
Disambiguation for Lexical
Similarities
This chapter is an updated version of the following publications:
1. Schadd, Frederik C. and Roos, Nico (2011a). Improving ontology matchers utilizing linguistic ontologies: an information retrieval approach. Proceedings of the 23rd Belgian-Dutch Conference on Artificial Intelligence
(BNAIC 2011), pp. 191−198.
2. Schadd, Frederik C. and Roos, Nico (2011b). Maasmatch results for
OAEI 2011. Proceedings of The Sixth International Workshop on Ontology Matching (OM-2011) collocated with the 10th International Semantic
Web Conference (ISWC-2011), pp. 171−178.
3. Schadd, Frederik C. and Roos, Nico (2012a). Coupling of WordNet
Entries for Ontology Mapping using Virtual Documents. Proceedings of
The Seventh International Workshop on Ontology Matching (OM-2012)
collocated with the 11th International Semantic Web Conference (ISWC2012), pp. 25−36.
4. Schadd, Frederik C. and Roos, Nico (2014b). Word-Sense Disambiguation for Ontology Mapping: Concept Disambiguation using Virtual Documents and Information Retrieval Techniques. Journal on Data Semantics, Zimàyi, Esteban, Ram, Sudha and Stuckenschmidt Heiner ed., pp.
1−20, Springer.
As seen in subsection 3.2, there exist various mapping systems which exploit external resources to derive concept correspondences. Of these, lexical and ontological
resource-based techniques are the most popular. For each entity, these techniques
allocate a set of entries within the lexical resource or ontological resource. If an entity of a resource is encoded as a set of synonyms instead of a named concept, then
84
Concept-Sense Disambiguation for Lexical Similarities
it may also be referred to as a synonym-set (synset). Lexical similarities then compare two concepts by evaluating their correspondence between sets of senses within
the resource. The most basic techniques simply compute the overlap of the two
sets (Ehrig and Sure, 2004; Seddiqui and Aono, 2009). While these techniques can
effectively establish the equivalence between two entities based on their synonymrelation within the resource, they provide no differentiation between concept pairs
which are closely related and pairs which are not related at all. For instance, given
two ontologies O1 and O2 , where O1 contains the concept Car as the only vehiclerelated concept and O2 only contains Vehicle as a transport related concept. In this
situation, one would be inclined to map Car with Vehicle as there are no better alternatives for either concept. However, since Car and Vehicle are not synonymous,
their resulting lexical similarity would be 0 when applying a sense-overlap method.
To be able to establish the lexical similarity between non-synonymic concepts,
it is necessary to apply a more sophisticated evaluation methods. Examples of such
methods are the comparison of the sets of leaf-nodes of each sense in the taxonomy
or their relative distance within the resource. However, an inherent property of a
natural language is that words can have multiple senses. This means that for an
ontology concept it may be the case that there are multiple possible entries in the
exploited external resource. The accuracy of a lexical similarity thus depends on
whether the senses which denote the correct meaning are actually exploited.
This chapter answers the first research question by proposing an informationretrieval-based disambiguation method. We extend existing glossary-overlap based
methods by using a profile-based method of creating concept descriptions, as introduced in subsection 3.1.2. The proposed profile method gathers terms into a virtual
document, facilitating the inclusion of terms of related concepts and the weighting of terms according to their origin. We evaluate the effects of various senseselection policies and lexical similarity metrics by analysing the alignment quality
when matching the OAEI benchmark and conference datasets. Further, we evaluate
the effect of different weighting policies and quantify the introduced overhead of the
disambiguation policy.
This chapter is structured as follows. Section 4.1 introduces important background information regarding lexical similarities, word-sense disambiguation techniques and virtual documents. Section 4.2 discusses previous work relating to the
content of this chapter. Section 4.3 introduces the framework in which concept terms
are disambiguated and lexical similarities are established. Section 4.4 presents the
performed experiments and discusses its results and 4.5 gives the conclusion of this
chapter and discusses future research.
4.1
Background
In this section we will provide background knowledge which will be necessary for
the remainder of this chapter. In subsection 4.1.1 we will introduce lexical similarity
measures in more detail. Since the problem posed by research question 1 addresses
the task of disambiguating ontology concepts, we will introduce the reader to the
field of word-sense disambiguation in subsection 4.1.2. Finally, we will introduce
4.1 — Background
85
virtual documents in subsection 4.1.3 such that the reader is adequately prepared
for a virtual document-based disambiguation method, which will be introduced in
Section 4.3.
4.1.1
Lexical Similarity Measures
Lexical similarity measures (LSM) are commonly applied metrics in ontology mapping systems. These exploit externally available knowledge bases which can be
modelled in ontological or non-ontological form, for instance by utilizing databases.
Such a knowledge base contains a list of concepts describing the particular domain
that is being modelled. Each concept description contains various kinds of information, such as synonyms and written explanations of that concept. If such a
description does contain at least a list of synonyms, it is also often referred to as
synset (synonym-set). Another important feature of a knowledge base is that each
concept is also linked to other concepts using various semantic relations, thus creating a large relational structure. A LSM exploits these large structures by linking
ontology concepts to the nodes in the external knowledge base, such that the proximity of concepts associated with source and target concepts provides an indication
to their similarity. One can here distinguish between semantic relatedness and semantic similarity (Strube and Ponzetto, 2006), where semantic relatedness denotes
the measured proximity by exploring all given relations, and semantic similarity
expresses the proximity using only is-a relations. Whether a LSM determines relatedness or similarity depends on the utilized metric which expresses the proximity
(Budanitsky and Hirst, 2001; Giunchiglia and Yatskevich, 2004) since the definitions
of these metrics typically also define which relations are exploited. For this research,
as further detailed in sub-section 4.3.2, the applied metric utilizes only is-a relations,
rendering the base LSM which our approach intends to improve, as a measure of
semantic similarity.
There exist several lexical knowledge bases which can be used as a resource for
a LSM. These originate from different research efforts and were all developed with
different capabilities, which can roughly be grouped as follows:
Global/Cross-Domain Knowledge Resources of this category intend to model
a multitude of domains, such that the similarity between concepts can be
identified even if these are generally categorized in different domains. A prime
example of such a resource is WordNet, which models the domain of the entire
English language into approximately 120.000 interrelated synonym sets (Miller,
1995). This resource is regularly used in contemporary ontology mapping
systems. WordNet is also available as an extended version in the form of
YAGO (Suchanek, Kasneci, and Weikum, 2008), which merges WordNet with
the concept descriptions available from Wikipedia.
Domain Knowledge These resources intend to model common knowledge of a
single specified domain. Typically, these domains are not very broadly defined,
however they are usually modelled in great detail. A collaborative effort for
the creation of a knowledge resource is Freebase, which contains both general
knowledge and named entities and is available in both database and ontological
86
Concept-Sense Disambiguation for Lexical Similarities
form (Bollacker et al., 2008). UMLS (Bodenreider, 2004) is a good example
how detailed a domain ontology can become. It is a biomedical resource which
models 900.000 concepts using 2 million labels by integrating and interlinking
several existing vocabularies.
Abstract Upper Ontology Resources belonging to this group have the singular
focus of creating an abstract ontology using an upper-level list of concept
descriptions. Such an ontology can then serve as a base for domain specific
resources. An example of such a resource is the SUMO ontology, containing
approximately 2.000 abstract concept descriptions (Niles and Pease, 2001).
These concepts can then be used to model more specific domains. MILO for
instance is an extension of SUMO which includes many mid-level concepts
(Niles and Terry, 2004). Cyc is another example of a multi-layered ontology
based on an abstract upper level-ontology (Matuszek et al., 2006), of which a
subset is freely available under the name OpenCyc (Sicilia et al., 2004).
Multi-lingual When mapping ontologies, it can occur that some concept descriptions are formulated in a different language. In these situations mono-lingual
resources are insufficiently applicable, necessitating the usage of multi-lingual
resources, e.g. UWN (De Melo and Weikum, 2009) or BabelNet (Navigli and
Ponzetto, 2010).
LSMs are a powerful metric and are commonly used in contemporary stateof-the-art ontology mapping systems (Shvaiko and Euzenat, 2005; Kalfoglou and
Schorlemmer, 2003; Saruladha, Aghila, and Sathiya, 2011), with WordNet being the
most widely used resource as basis. However, a common occurrence in concepts
formulated using natural language is word-sense ambiguity. This entails that a
word can have multiple and possibly vastly different meanings, such that one must
eliminate all meanings which do not adequately represent the intended meaning of
the word. This task, while at a glance quite intuitive for a human, can be deceptively
difficult for a computer program. Given the word house for instance, the intended
meaning might be obvious to a human reader, however this word has 14 different
meanings listed in WordNet, such that an accurate identification of the correct sense
is necessary in order to obtain accurate results. The histogram in Figure 4.1 indicates
the extent of such situations occurring within WordNet (Miller, 1995). Here, all
unique words that occur in WordNet have been gathered and binned according to
how many different meanings each word describes.
One can see from Figure 4.1 that while there is a large number of words with only
one meaning, there is a significant proportion of words which do have more than one
meaning and hence can ambiguous. The general working hypothesis is that a word
in a given context has only a single correct sense. The rejection of this hypothesis,
the acknowledgement of polysemous words, is an emerging field of research for which
new approaches are emerging (Cruz et al., 2013). Ultimately a LSM has to calculate
the similarity between two sets of senses, where the assumption whether these sets
can contain multiple correct senses may influence the choice of specific employed
techniques, including disambiguation methods. LSMs can incorporate polysemous
concepts by for instance calculating an aggregate similarity between these sets of
4.1 — Background
87
senses (Euzenat and Shvaiko, 2007; Cruz et al., 2013; Po and Sorrentino, 2011).
However, if a domain expert determines that the concepts in the ontology are not
polysemous, one can adapt the aggregation step by for instance only utilizing the
maximum pairwise similarity (Euzenat and Shvaiko, 2007) between sets of senses or
by selecting the predominant sense as determined by a given corpus (McCarthy et al.,
2004). The inclusion of a word-sense disambiguation technique in a LSM, which this
chapter proposes, is likely to improve their accuracy.
4.1.2
Word-Sense Disambiguation
Word-Sense Disambiguation (WSD) can be described as the automatic identification of the correct sense(s) of a given word using the information in the proximity of
that word as context. While in many works only one sense is associated with each
word, we define WSD as a process which selects a set of possible candidate senses.
The resulting set may contain multiple senses if desired by the expert designing the
system, for instance to accommodate polysemous words. In the classical problem of
disambiguating words occurring in natural language, the available context information is a body of text co-occurring with the target word (Navigli, 2009). Depending
on the input document or the applied approach, this body of context information
can be limited to the sentence in which the target word appears or extended over
the entire input document. The available context information originating from an
ontology is different compared to a natural language document. In an ontology, natural language is a rare occurrence and usually limited to brief concept descriptions
in the form of annotations. Hence, context information must be extracted from the
entire concept description, its associated properties and other related concepts.
Originally, WSD has been perceived as a fundamental task in order to perform
1000000
Number of Words
100000
10000
1000
100
10
1
1
6
11
16
21
26
31
36
41
46
Number of different senses
Figure 4.1: Histogram showing the number of words in WordNet (y-axis) that have
a specific number of senses (x-axis).
88
Concept-Sense Disambiguation for Lexical Similarities
machine translation (Weaver, 1955; Locke and Booth, 1955). Here, the establishment of accurate word-senses is a requirement for the selection of correct word
translations from a multi-lingual dictionary. While research into WSD halted for a
decade after its acknowledged hardness (Bar-Hillel, 1960), it has been re-instigated
after Wilkes (1975) tackled this problem using formal semantics in order to achieve
a computer understanding of natural language. For a more comprehensive overview
of the history of WSD we suggest the reader consults the work of Ide and Véronis
(1998).
Many different approaches to WSD have been developed over the past decades.
Due to the prevalence of applied machine-learning techniques, three general categories of approaches have emerged:
Supervised Disambiguation One can formulate WSD as a classification problem.
Here, a training set is creating by tagging sentences with the correct senses
of its contained words. Once the training set has reached a sufficient size,
one can use this as basis for a supervised classification method. Examples of
such methods are decision lists, decision trees, Naive Bayes classifier, NeuralNetworks, instance-based methods such as the kNN approach and ensemble
methods which combine different classifiers (Montoyo et al., 2005; Navigli,
2009).
Unsupervised Disambiguation These methods have the advantage that they do
not rely on the presence of a manually annotated training set, a situation which
is also referred to as the knowledge acquisition bottleneck (Gale, Church, and
Yarowsky, 1992). However, unsupervised methods share the same intuition
behind supervised methods, which is that words of the same sense co-occur
alongside the same set of words (Pedersen, 2006). These rely on clustering
methods where each cluster denotes a different word sense.
Knowledge-based Disambiguation Instead of applying classification techniques,
knowledge-based methods exploit available knowledge resources, such as dictionaries, databases or ontologies, in order to determine the sense of a word
(Mihalcea, 2006). These techniques are related to LSMs in that they often
exploit the same knowledge resources. This group of techniques will be further
discussed in subsection 4.2.1.
For a more comprehensive survey of disambiguation techniques we suggest the
reader consults the excellent survey by Navigli (2009).
While originally conceived for the purpose of machine translation, WSD techniques have been applied in a variety of tasks (Ide and Véronis, 1998). In the field
of information retrieval, one can apply WSD in order to eliminate search results in
which at least some of the query keywords occur, but in a different sense than the
given query (Schütze and Pedersen, 1995). This would lead to a reduction of false
positives and hence increase the performance of the retrieval system.
WSD can also aid in the field of content and thematic analysis (Litkowski, 1997).
Here, the aim is to classify a given text into thematic categories, such as traditional
(e.g. judicial, religious), practical (e.g. business), emotional (e.g. leisure, fiction)
4.1 — Background
89
and analytical (e.g. science) texts. Given a corpus of training data, one can create
a profile for each defined category consisting of the distributions of types of words
over a text.
In the field of grammatical analysis WSD is required in order to correctly identify
the grammatical type of ambiguous words (Marshall, 1983). WSD can also aid a
speech synthesis system such that ambiguous words are phoneticised more accurately
(Sproat, Hirschberg, and Yarowsky, 1992). Yarowsky (1994) applied WSD techniques for text processing purposes with the aim to automatically identify spelling
errors.
4.1.3
Virtual Documents
The general definition of a virtual document (VD) (Watters, 1999) is any document
for which no persistent state exists, such that some or all instances of the given document are generated at run-time. These stem from an emerging need for documents
to be more interactive and individualized, which is most prominently seen on the
internet. A simple form of a virtual document would be creating a template for a
document. While some of its content is static, the remainder needs to be filled in
from a static information source, for instance a database. This is commonly applied
in government agencies which sent automated letters, in which relevant data such
as the recipients information is added to the letter template. Composite documents
are a different type of virtual document. Here, the content of several documents
are combined and presented to the user as a single document. A virtual document
can also contain meta-data that has been collected from various documents. Commonly, this can be seen in review-aggregation websites, such as Metacritic, IMDb
and RottenTomatoes, which automatically query many independent review sources
and aggregate their results into an overall consensus.
In the domain of lexical similarity metrics the basic data structure used for the
creation of a virtual document is a linked-data model. It consists of different types of
binary relations that relate concepts, i.e. a graph. RDF (Lassila, Swick, and W3C,
1998) is an example of a linked-data model, which can be used to denote an ontology
according to the OWL specification (McGuinness and Van Harmelen, 2004). The
inherent data model of a thesaurus such as WordNet has similar capacities, however
it stores its data using a database. A key feature of a linked-data model is that it
not only allows the extraction of literal data for a given concept, but also enables
the exploration of concepts that are related to that particular concept, such that
the information of these neighbouring concepts can then be included in the virtual
document. From the linked-data resource, information is gathered and stored in a
document with the intention that the content of that document can be interpreted
as a semantic representation of the meaning of a specific ontology concept, which in
turn can be exploited for the purpose of ontology mapping. A specific model for the
creation of such a virtual document will be presented in subsection 4.3.3.
90
4.2
4.2.1
Concept-Sense Disambiguation for Lexical Similarities
Related Work
Methods of Word-Sense Disambiguation
There exists a notable spectrum of word-sense disambiguation techniques, which
have been used for varying purposes, however certain techniques stand out due to
their applicability to this domain. The method of context clustering (Schütze, 1992)
can be used to exploit large amounts of labelled training data. Here, co-occurrences
with a target word are modelled as a vector in a word space and grouped into clusters
according to their labelled word-sense. Given a new occurrence of the given word,
one can identify its sense by modelling a new context vector form its neighbourhood
and classifying it using the created word-sense clusters. This can be done for instance
by determining the centroid vector of each cluster and computing the vector distance
for each centroid vector.
A more linguistic approach can be achieved through the application of selectional
preferences (Hindle and Rooth, 1993). By determining the grammatical types of
words within a sentence, one can limit the amount of possible sense by imposing
limitation according to the grammatical or semantic context of a particular word.
Such a technique can be especially relevant for the mapping of ontology properties,
since property names or labels can contain combination of grammatical types, e.g.
nouns, verbs or adjectives, where its proper classification can improve their semantic
annotations.
A very effective group of disambiguation methods are those based on glossary
overlap, which are a knowledge-based methods. These rely on the presence of a
detailed corpus of word senses that include their descriptions in natural language.
Determining the overlap between the set of words occurring in context of a target
word and the different sense-descriptions of that word within the given corpus can
be used to determine its proper sense. This type of method has been pioneered by
Lesk (1986), which can be improved by incorporating the descriptions of words that
are related to the different possible senses (Banerjee and Pedersen, 2003).
Cross-lingual word-sense disambiguation is another knowledge-based approach
which exploits multilingual corpora (Resnik and Yarowsky, 1999). A target word is
translated into several distinct languages such that the intended sense is likely the
one whose meaning has been preserved for the majority of the used languages.
Structural methods (Pedersen, Banerjee, and Patwardhan, 2005) exploit the concept structure of a given corpus. This is achieved by applying a similarity metric
between word senses, such that the disambiguated sense of a word from a text is
the particular sense which maximizes the aggregate similarities between all possible
senses of the words occurring in the text and itself.
Budanitsky and Hirst (2001) evaluated five different sense-similarity measures
which serve as the basis for structural disambiguation methods, however these are
also applicable to lexical similarities between ontology concepts. For a more indepth survey of word-sense disambiguation methods, especially the types which do
not strongly relate to the techniques applied in this research, we suggest the reader
consult the comprehensive survey by Navigli (2009).
4.2 — Related Work
4.2.2
91
Word-Sense Disambiguation in Ontology Mapping
Given the large set of possible techniques originating from many different research
areas that can be applied to the process of ontology mapping, only limited research
has been performed into applying word-sense disambiguation techniques. Some of
this research involves the creation of annotation frameworks, which can facilitate a
standardized format of lexical concept annotations and can even provide a more finegrained annotation. An example of such a framework is the work of Buitelaar et al.
(2009), who proposed a linguistic labelling system for the annotation of ontology
concepts. While the primary intent of this system was the facilitation of ontology
learning and natural language generation from ontologies, the linguistic meta information of this system can also be used to disambiguate word-senses, for instance by
extracting selectional preferences generated from these annotations.
McCrae, Spohr, and Cimiano (2011) proposed a common model for linking different lexical resources to ontology concepts. This model not only includes constructs
modelling terms and their senses, but also the morphosyntactic properties of terms
which would allow for a more fine grained annotation of ontology concepts.
Some ontology mapping systems apply WSD to aid their lexical similarity measures. The AUTOMS system (Kotis et al., 2006a), which is designed for the task of
ontology merging, employs a technique called HCONE-Merge (Kotis et al., 2006b).
Part of this technique involves the process of latent semantic indexing (LSI), which
is used to associate senses with the given ontology concepts. The approach assumes
that concepts are monosemous. Ontology concepts are associated with the sense
which resulted in the highest score when querying a latent semantic space using a
binary query. This space is created by performing singular value decomposition on
the sense descriptions.
Po and Sorrentino (2011) introduced a probabilistic WSD method which has been
included in the AgreementMaker system (Cruz et al., 2013). Here, each ontology
concept is annotated with a set of possible senses, where each sense is annotated with
a probability value. This probability value is determined by combining the results
of several WSD techniques, i.e. structural disambiguation, domain disambiguation
and first-sense heuristic, using the Dempster-Shafer Theory. This method is related
to our work due to its application of the basic Lesk method as one of the different
WSD techniques. The approach of our paper also relies on the principle behind the
Lesk method, such that substituting our approach for the basic Lesk method could
improve the WSD accuracy of the AgreementMaker system.
A search-engine-based disambiguation method has been proposed by Maree and
Belkhatir (2014). They calculate a distance between a concept and a synset by calculating the normalized retrieval distance (NRD). This measure utilizes the separate
search-engine retrieval rates of two entities and their co-occurrence rate to compute
a normalized co-occurrence distance. A synset s is only associated with a concept c
if their NRD value N RD(s, v) satisfies a manually specified threshold.
92
Concept-Sense Disambiguation for Lexical Similarities
4.3
Concept Sense Disambiguation Framework
Our proposed approach aims at improving matchers applying lexical similarity metrics. For this resarch, the applied lexical similarity measuref will use WordNet as
knowledge-resource. The synsets of WordNet will be used to annotate the meanings
of ontology concepts and express their semantic relatedness.
The goal of our approach is to automatically identify the correct senses for each
concept of an ontology. This will be realized by applying information retrieval techniques on virtual documents that have been created using either ontology concepts or
word sense entries from the knowledge resource. To achieve this, we need to define a
lexical similarity in which we can integrate a disambiguation procedure. First, let us
define the sets E1 and E2 , originating from the ontologies O1 and O2 respectively, as
the sets of entities that need to be matched. These can be classes, properties and/or
instances, though these three categories of entities are typically matched separately.
Also, these may represent the complete sets of classes/properties within the ontologies or just subsets in case a partitioning method has been applied. Next, let us
denote e as an entity and S(e) as the set of senses representing e. Furthermore let
us denote lsm as a lexical similarity that can be invoked after the disambiguation
procedure and φ as a disambiguation policy. We define our lexical similarity as
specified in Algorithm 4.1 1 .
The initial step of the approach, denoted as the method findSynsetCandidates,
entails the allocation of synsets that might denote the meaning of a concept. The
name of the concept, meaning the fragment of its URI, and alternate labels, when
provided, are used for this purpose. While ideally one would prefer synsets which
contain an exact match of the concept name or label, precautions must be made
for the eventually that no exact match can be found. For this research, several
pre-processing methods have been applied such as the removal of special characters,
stop-word removal and tokenization. It is possible to enhance these precautions
further by for instance the application of advanced natural language techniques,
however the investigation of such techniques in this context is beyond the scope of
this research. When faced with ontologies that do not contain concept names using
natural language, for instance by using numeric identifiers instead, and containing no
labels, it is unlikely that any pre-processing technique will be able to reliably identify
possible synsets, in which case a lexical similarity is ill-suited for that particular
matching problem.
In the second step, the virtual document model as described in section 4.3.3 is
applied to each ontology concept and to each synset that has been gathered in the
previous step. This procedure is denoted as createVD in the algorithm. The resulting
virtual document are represented using the well known vector-space model (Salton,
Wong, and Yang, 1975). In order to compute the similarities between the synset
documents and the concept documents, the established cosine-similarity is applied
(Pang-Ning, Steinbach, and Kumar, 2005). Using the specified filtering policy φ
synsets are discarded whose cosine-similarities of their documents to the concept
document did not satisfy the criteria specified by φ. This process is denoted as
discardDocuments in the pseudo-code. The different criteria which φ can represent
1E
1 [i]
denotes the i-th element of E1
4.3 — Concept Sense Disambiguation Framework
Algorithm 4.1 Lexical similarity with disambiguation pseudo-code
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
Lexical-similarity-WSD(E1 ,E2 ,φ,lsm)
for all e ∈ E1 ∪ E2 do
S(e) ← findSynsetCandidates(e)
doc(e) ← createVD(e)
D←∅
for all s ∈ S(e) do
doc(s) ← createVD(s)
D ← D ∪ {doc(s)}
end for
D ← discardDocuments(D, doc(e), φ)
for all s ∈ S(e) do
if doc(s) 6∈ D then
remove(s, S(e))
end if
end for
assignSynsets(e, S(e))
end for
M ← initMatrix(|E1 |, |E2 |)
for i = 1 to |E1 | do
for j = 1 to |E2 | do
e1 ← E1 [i]
e2 ← E2 [j]
M [i, j] ← lsm(e1 , e2 )
end for
end for
return M
93
94
Concept-Sense Disambiguation for Lexical Similarities
will be introduced in the following subsection.
In the following sub-sections we will elaborate important components of Algorithm 4.1 in more detail. In subsection 4.3.1 we will detail the disambiguation policies
which utilizes the resulting document similarities, denoted as discardDocuments in
Algorithm 4.1. Subsection 4.3.2 details the task of the lexical similarity function
lsm and the tested variations. In subsection 4.3.3 we will introduce the utilized
document model and finally in subsection 4.3.4 we will briefly introduce the TF-IDF
weighting method that can be applied to the documents.
4.3.1
Concept Disambiguation
Once the similarities between the entity document and the different synset documents are known, a selection method φ is applied in order to disambiguate the
meaning of the given concept. Here, senses are only coupled to the concept if they
resulted in a sufficiently high document similarity, while the remaining senses are
discarded. To determine which similarity score can be considered sufficiently high,
a selection policy needs to be applied. It is possible to tackle this problem from
various angles, ranging from very lenient methods, discarding only the very worst
synsets, to strict methods, associating only the highest scoring synset with the given
concept (Banek, Vrdoljak, and Tjoa, 2008). Several selection methods have been
investigated, such that both strict and lenient methods are tested:
G-MEAN The most lenient method aggregates the document similarities using
the geometric mean and uses this as a threshold to discard senses with a lower
similarity value.
A-MEAN Similar to the previous method, however the arithmetic mean is used as
a threshold instead.
M-STD This more strict method dynamically determines a threshold by subtracting the standard deviation of the document similarities from the highest obtained similarity. It has the interesting property that it is more strict when
there is a subset of documents that is significantly more similar than the remaining documents, indicating a strong sense correspondence, and more lenient
when it not as easy to identify the correct correspondences.
MAX The most strict method consists of dismissing all senses from C except for
the one single sense that resulted in the highest document similarity.
Once all concepts of both input ontologies are disambiguated, one can compute
the lexical similarity between concepts using the processed synset collections.
4.3.2
Lexical Similarity Metric
After selecting the most appropriate synsets using the document similarities, the
similarity between two entities can now be computed using their assigned synsets.
This presents the problem of determining the similarity between two sets of synsets.
4.3 — Concept Sense Disambiguation Framework
95
To approach this task, we will evaluate three different methods of determining the
lexical similarity between two collections of synsets.
A reasonable assumption is that each collection of synsets only contains one
synset that represents the true meaning of its corresponding entity. Thus, if one
were to compare two sets of synsets, assuming that the originating entities are semantically related, then one can assume that the resulting similarity between the
two synsets that both represent the true meaning of their corresponding entities,
should be a high value. Inspecting all pairwise similarities between all combinations of synsets between both sets should yield at least one high similarity value.
When comparing two sets originating from semantically unrelated entities, one can
assume that there should be no pairwise similarity of high value present. Thus, in
this scenario a reasonable way of computing the similarity of two sets of synsets is
to compute the maximum similarity over all pairwise combination between the two
sets. This intuition is similar to the principle of Maximum Relatedness Disambiguation (Pedersen et al., 2005) in the sense that two concepts can be considered similar
if a certain amount of their concept information can be considered similar by some
measure.
Formally, given two concepts x and y, their corresponding collections of synsets
S(x) and S(y), a measure of semantic similarity sim(m, n) ∈ [0, 1] where m and n
are two arbitrary synsets, we define the first lexical similarity lsm1 between x and
y as:
lsm1 (x, y) =
max
m∈S(x);n∈S(y)
sim(m, n)
(4.1)
The work by Gao, Zhang, and Chen (2015) serves as an example how lsm1 can be
used to compute the similarity between sets of senses. A potential weakness of lsm1 is
the eventuality where the a concept has several appropriate senses. When comparing
these senses to other collections, one might prefer a method which values the quantity
of high similarities as well. For example, assume that the sense collections S(x), S(y)
and S(z) each contain two senses and that we wish to determine whether S(y) or S(z)
are a more appropriate match for S(x). Further, assume that each pairwise similarity
between S(x) and S(y) results in the value ψ, where as only one pair of senses from
S(x) × S(z) results in the similarity ψ with the remaining pairs being unrelated
and resulting in a similarity of 0. Computing lsm1 (x, y) and lsm1 (x, z) would both
result in the value ψ. In this example however, one would be more inclined to match
x with y since the comparison with y resulted in more high similarity values. A way
to adapt for this situation is to determine the best target sense for each sense in
both collections and to aggregate these values, which we will denote as lsm2 . Given
two concepts x and y, their corresponding collections of synsets S(x) and S(y), a
measure of semantic similarity sim(m, n) ∈ [0, 1] where m and n are two arbitrary
synsets, we define lsm2 as follows:
lsm2 (x, y) =
P
m∈S(x) (maxn∈S(y)
sim(m, n)) +
P
n∈S(y) (maxm∈S(x)
sim(n, m))
|S(x)| + |S(y)|
(4.2)
96
Concept-Sense Disambiguation for Lexical Similarities
A more general approach to determine the similarity between two collections
of senses is to aggregate all pairwise similarities between the two collections. This
has the potential benefit that similarity values which have no effect on the result
of lsm1 or lsm2 are affecting the outcome of the lexical similarity measure. We
will denote this measure as lsm3 . Formally, given two concepts x and y, their
corresponding collections of synsets S(x) and S(y), a measure of semantic similarity
sim(m, n) ∈ [0, 1], we define lsm3 as follows:
P
P
m∈S(x)
n∈S(y) sim(m, n)
lsm3 (x, y) =
(4.3)
|S(x)| × |S(y)|
There exist various ways to compute the semantic similarity sim within WordNet (Budanitsky and Hirst, 2001) that can be applied, however finding the optimal
measure is beyond the scope of this paper since this is not a component of the
disambiguation process. Here, a similarity measure with similar properties as the
Leacock-Chodorow similarity (Budanitsky and Hirst, 2001) has been applied. The
similarity sim(s1 , s2 ) of two synsets s1 and s2 is computed using the distance function dist(s1 , s2 ), which determines the distance of two synsets inside the taxonomy,
and the over depth D of the taxonomy:
D−dist(s ,s )
1 2
if dist(s1 , s2 ) ≤ D
D
(4.4)
sim(s1 , s2 ) =
0
otherwise
This measure is similar to the Leacock-Chodorow similarity in that it relates the
taxonomic distance of two synsets to the depth of the taxonomy. In order to ensure
that the resulting similarity values fall within the interval of [0, 1] and thus can be
integrated into larger mapping systems, the log-scaling has been omitted in favor of
a linear scale.
4.3.3
Applied Document Model
We will provide a generalized description of the creation of a virtual document
based on established research (Qu et al., 2006). The generalization has the purpose
of providing a description that is not only applicable to an OWL/RDF ontology like
the description given in the work by Qu et al. (2006), but also to non-ontological
knowledge sources. While a variety of external resources can be utilized, for this
research we will use the most widely utilized resource, which is WordNet (Miller,
1995). To provide the functions that are used to create a virtual document, the
following terminology is used:
Synset: Basic element within a knowledge source, used to denote a specific sense
using a list of synonyms. Synsets are related to other synsets by different
semantic relations, such as hyponymy and holonymy.
Concept: A named entity in the linked-data model. A concept denotes a named
class or property given an ontology, and a synset when referring to WordNet.
Link: A basic component of a linked-data model for relating elements. A link is
directed, originating from a source and pointing towards a target, such that
4.3 — Concept Sense Disambiguation Framework
97
the type of the link indicates what relation holds between the two elements.
An example of a link is a triplet in an RDF graph.
sou(s), type(s), tar(s): The source element, type and target element of a link s,
respectively. Within the RDF model, these three elements of a link are also
known as the subject, predicate and object of a triplet.
Collection of words: A list of unique words where each word has a corresponding
weight in the form of a rational number.
+: Operator denoting the merging of two collections of words.
× Operator denoting multiplication.
A concept definition within a linked-data model contains different types of literal
data, such as a name, different labels, annotations and comments. The RDF model
expresses some of these values using the rdfs:label, rdfs:comment relations. Concept
descriptions in WordNet have similar capacities, but the labels of a concepts are
referred to as its synonyms and the comments of a concept are linked via the glossary
relation.
Definition Let ω be a concept of a linked-data model, the description
of ω is a collection of words defined by (4.5):
Des(ω)
=
α1 × collection of words in the name of ω
+α2 × collection of words in the labels of ω
+α3 × collection of words in the comments of ω
+α4 × collection of words in the annotations of ω
(4.5)
Where each α1 , α2 , α3 and α4 is a rational number in [0, 1], such that words can
be weighed according to their origin.
Next to accumulating information that is directly related to a specific concept,
one can also include the descriptions of neighbouring concepts that are associated
with that concept via a link. Such a link can be a standard relation that is defined in
the linked-data model, for instance the specialization relation and also a ontologydefined property if the used syntax allows the property to occur as a predicate.
While theoretically the presented model would also allow instances to be included if
these are present in the ontology, it is very unlikely that a given knowledge resource
contains similar specific instance information for which an overlap can be determined.
Hence, given instances are filtered from the ontologies before the creation of the
documents. The OWL language supports the inclusion of blank-node concepts which
allow complex logical expressions to be included in concept definitions. However,
since not all knowledge resources support the blank-node functionality, meaning
anonymous concepts defined using a property restriction, among which WordNet,
these are omitted in our generalization. For more information on how to include
blank nodes in the description, consult the work by Qu et al. (2006).
To explore neighbouring concepts, three neighbour operations are defined. SON (ω)
denotes the set of concepts that occur in any link for which ω is the source of that
98
Concept-Sense Disambiguation for Lexical Similarities
link. Likewise TYN (ω) denotes the set of concepts that occur in any link for which
ω is the type, or predicate, of that link and TAN (ω) denotes the set of concepts
that occur in any link for which ω is the target. WordNet contains inverse relations,
such as hypernym being the inverse of the hyponym relation. When faced with two
relations with one being the inverse of the other, only one of the two should be used
such that descriptions of neighbours are not included twice in the virtual document.
The formal definition of the neighbour operators is given below.
Definition Let ω be a named concept and s be a variable representing
an arbitrary link. The set of source neighbours SON (ω) is defined by
(4.6), the set of type neighbours TYN (ω) of ω is defined by (4.7) and
the set of target neighbours TAN (ω) of ω is defined by (4.8).
[
SON (ω) =
{type(s), tar(s)}
(4.6)
{sou(s), tar(s)}
(4.7)
{sou(s), type(s)}
(4.8)
sou(s)=ω
[
T Y N (ω) =
type(s)=ω
T AN (ω) =
[
tar(s)=ω
Given the previous definitions, the definition of a virtual document of a specific
concept can be formulated as follows.
Definition Let ω be a concept of a linked-data model. The virtual
document of ω, denoted as V D(ω), is defined by (4.9):
V D(ω) =Des(ω) + β1 ×
X
Des(ω ′ )
ω ′ ∈SON (ω)
+ β2 ×
X
ω ′ ∈T Y N (ω)
Des(ω ′ ) + β3 ×
X
Des(ω ′ )
(4.9)
ω ′ ∈T AN (ω)
Here, β1 , β2 and β3 are rational numbers in [0, 1]. This makes it possible to
allocate a different weight to the descriptions of neighbouring concepts of ω compared
to the description of the concept ω itself.
We will provide a brief example for the resulting term weights in a virtual document that is created using this model. For this we will use an example ontology
provided in Figure 4.2.
Suppose one would want to construct a virtual document representing the concept
Car. The term weights of this document are determined through the merger of the
description of the concept Car and the weighted descriptions of the concepts Vehicle
and Ambulance. The term weight of the word car would be α1 , since the term only
occurs in the name of the concept Car. The term vehicle would receive the weight
α3 + β1 × α1 + β3 × α3 . This is because the term occurs in three locations in the
neighbourhood of the concept Car, once in a comment of the given concept, once in
4.3 — Concept Sense Disambiguation Framework
99
Vehicle
rfds:label: rdfs:comment: A
conveyance that
transports people or
objects
rdfs:subClassOf:
Car
Ambulance
rfds:label: Auto,
Automobile
rdfs:subClassOf
rdfs:comment: A motor
vehicle with four wheels
rfds:label: rdfs:comment: A vehicle
that takes people to and
from hospitals
Figure 4.2: Example ontology for the construction of a virtual document.
the name of a source neighbour and once in a comment of a target neighbour. The
sum of these particular occurrences hence forms the final term weight for this word.
The full list of term weights of the document representing the example concept Car
can be viewed in Table 4.1. For the sake of demonstration the list also includes the
weights of stop-words.
Term
a
ambulance
auto
automobile
car
conveyance
four
from
hospitals
Weight
α3 + β1 × α3 + β3 × α3
β3 × α1
α2
α2
α1
β1 × α3
α3
β3 × α3
β3 × α3
Term
motor
objects
people
takes
that
transports
to
vehicle
wheels
Weight
α3
β1 × α3
β1 × α3 + β3 × α3
β3 × α3
β1 × α3 + β3 × α3
β1 × α3
β3 × α3
α3 + β1 × α1 + β3 × α3
α3
Table 4.1: Term weights for the document representing the concept Car, according
to the example ontology displayed in Figure 4.2.
100
4.3.4
Concept-Sense Disambiguation for Lexical Similarities
Term-Frequency Weighting
Instead of weighting terms in a virtual document according to their origin from
within their respective ontology, it is possible to treat a virtual document as a standard natural language document once all of its dynamic content has been determined.
This allows for the application of well-known weighting techniques originating from
the field of information retrieval.
Information retrieval techniques have been applied in a variety of fields. The most
prominent application is the retrieval of relevant documents from a repository, as
seen in commercial search engines (Croft, Metzler, and Strohman, 2009). Document
vectors can be weighed using different methods (Salton and Buckley, 1988), of which
the most prominent method is the application of TF-IDF weights (Sparck Jones,
1972). This method relates the term frequency (TF) of a word within a document
with the inverse document frequency (IDF), which expresses in how many of the
registered documents a term occurs. Given a collection of documents D and an
arbitrary term t, the inverse document frequency of term t is computed as follows:
idf (t, D) = log
|D|
|{d ∈ D : t ∈ d}|
(4.10)
Given the term frequency of the term t within document dx as tf (t, dx ), the
TF-IDF weight of the term t within document dx is then specified as follows.
tf -idf (t, dx , D) = tf (t, dx ) × idf (t, D)
(4.11)
This weighting scheme assigns higher weights to terms that occur more frequently
in a document dx , however this effect is diminished if this term occurs regularly in
other documents as well. The resulting weighed vectors can then be used in a
similarity calculation with a query, such that the document that is the most similar
to the query can then be seen as the most relevant document.
Given the availability of ontological background knowledge which can aid the
document creation process, it is questionable whether the application of a weighting
scheme which is designed to be applied on texts formulated in natural language
outperforms the weighting functionality supplied by the virtual document model.
For this work, we will empirically evaluate the benefits of the TF-IDF weighting
scheme when applied to virtual documents using the methods described in section
4.3.3.
4.4
Experiments
In this section, the experiments that have been performed to test the effectiveness
of adding a concept disambiguation step, specifically our approach, to a lexical similarity will be presented. These experiments serve to evaluate different aspects of
the proposed approach and to demonstrate the feasibility of word-sense disambiguation techniques for an ontology mapping system. The different experiments can be
divided into the following categories:
4.4 — Experiments
101
• Subsection 4.4.1 describes the performed experiments to evaluate the different
concept disambiguation policies in order to determine whether lenient or strict
policies should be preferred.
• The experiments described in subsection 4.4.2 demonstrate the potential performance a system can achieve when utilizing the proposed techniques.
• Subsection 4.4.3 presents the performed experiments which evaluate the considered virtual document weighting techniques.
• The runtime performance overhead and gains introduced by our approach will
be analysed in subsection 4.4.4.
The tested mapping system used for the performed experiments contains two
similarity metrics: a lexical similarity using a configuration which is specified in the
experimental set-up and a syntactic similarity using the Jaro string similarity (Jaro,
1989) applied on concept names and labels. The combined concept similarities are
aggregated using the Naive descending extraction algorithm (Meilicke and Stuckenschmidt, 2007). The tested system in sections 4.4.1, 4.4.2 used the parameter
schemes obtained from the experiment presented in section 4.4.3, while the system
in section 4.4.2 had a manually tuned parameter set. To quantify the quality of
the produced alignments we will evaluate them using the measures of thresholded
precision, recall and F-measure, as introduced in Section 2.2, with the exception
of sub-section 4.4.3 which evaluates the tested approaches using a precision-recall
graph and therefore computes the standard measures of precision an recall.
4.4.1
Concept Disambiguation
To investigate to what extent disambiguation techniques can improve a framework
using a lexical similarity, we evaluated our approach using different variations of
our approach on the conference data set of the 2011 competition (Euzenat et al.,
2011c) from the Ontology Alignment Evaluation Initiative (OAEI) (Euzenat et al.,
2011b). This data set consists of real-world ontologies describing the conference domain and contains a reference alignment for each possible combination of ontologies
from this data set. We performed this evaluation using the three lexical similarity
measures lsm1 , lsm2 and lsm3 , evaluating each measure using the disambiguation
policies G-Mean, A-Mean, M-STD and MAX. We denote None as the omission of
the disambiguation step, such that its results denotes the baseline performance of
the respective lexical similarity measure. Figure 4.3 displays the different results
when using lsm1 .
From Figure 4.3 we can make several key observations. First, we can see that
a stricter disambiguation policy clearly benefits the lsm1 metric, evidenced by the
steadily increasing F-measure. The low precision for lenient policies implies that
there are numerous false positives which exhibit a higher semantic similarity than
the true correspondences. When increasing the strictness of the filtering policy, the
precision rises steadily, meaning an increasing amount of false positives are eliminated. We can also observe a slight drop in recall for stricter policies, particularly
102
Concept-Sense Disambiguation for Lexical Similarities
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
None
G-Mean
Precision
A-Mean
Recall
M-STD
MAX
F-Measure
Figure 4.3: Evaluation of disambiguation policies using the lexical similarity lsm1
on the OAEI 2011 Conference data set.
when comparing M-STD with MAX, which implies that in a few situations the wrong
senses are filtered out.
The same evaluation has been performed using the lsm2 lexical similarity. The
results of this evaluation can be seen in Figure 4.4.
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
None
G-Mean
Precision
A-Mean
Recall
M-STD
MAX
F-Measure
Figure 4.4: Evaluation of disambiguation policies using the lexical similarity lsm2
on the OAEI 2011 Conference data set.
From Figure 4.4 we can see that the disambiguation policies have a different
effect on lsm2 , as opposed to lsm1 . We can observe an improvement in performance
when applying G-Mean or A-Mean as policies, with F-measures of .517 and .526
respectively compared to the baseline F-measure of .501. This improvement stems
from an increase in precision, which more than compensates for the loss in recall.
However, the F-measure decreases again when applying M-STD and MAX as policies. This implies that preferring to match concepts whose sense have multiple high
pairwise similarities can be beneficial, since for M-STD and MAX it is unlikely at
4.4 — Experiments
103
least that after the disambiguation step there are multiple senses left. Thus, main
observation of this evaluation that a disambiguation step is also beneficial for lsm2 ,
though not for all disambiguation policies.
Lastly, the results of the evaluation when applying lsm3 can be observed in
Figure 4.5.
From Figure 4.5 we can see that the precision and recall values obtained by applying lsm3 differ significantly when compared to the values obtained by applying
lsm1 or lsm2 . For the baseline and the policies G-Mean and A-Mean we can observe a very high precision and low recall. The high precision implies that a high
average semantic similarity between collections of synsets is likely to represent a
true correspondences. The low recall implies though that this does not occur very
frequently. Upon applying the most lenient disambiguation policy G-Mean, we can
see a drastic increase in both recall and F-measure. Applying the stricter policy
A-Mean the recall and F-measure increases slightly, though at the cost of a reduced
precision. The performance of M-STD is similar to its performance when applying
lsm1 or lsm2 , implying that it is not a regular occurrence that this policy retains
more than one word sense.
Overall, we can conclude that the application of the proposed disambiguation
method benefited the tested lexical similarity metrics. For lsm1 and lsm3 a strict
disambiguation policy has produced the best results, while for lsm2 the lenient
policies have been shown to be most effective.
4.4.2
Framework Comparison
In this subsection we will compare the performance of a mapping system utilizing our
approach with the performance of established techniques. To do this, we have entered
a configuration of our approach in the OAEI 2011 competition (Euzenat et al.,
2011c), of which the results are reported in section 4.4.2. A comparison with the
performances of additional and revised state-of-the-art systems will be presented in
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
None
G-Mean
Precision
A-Mean
Recall
M-STD
MAX
F-Measure
Figure 4.5: Evaluation of disambiguation policies using the lexical similarity lsm3
on the OAEI 2011 Conference data set.
104
Concept-Sense Disambiguation for Lexical Similarities
section 4.4.2.
Preliminary OAEI 2011 evaluation
During the research phase of this approach, we entered the described system in the
2011 OAEI competition under the name MaasMatch in order evaluate its performance. The configuration used the lexical similarity metric lsm1 with disambiguation policy M AX, since at the time the performance of lsm2 and lsm3 were not
evaluated, yet. The results of the competition on the conference data set can be
seen in Figure 4.6.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Precision
Recall
F-Measure
Figure 4.6: Results of MaasMatch in the OAEI 2011 competition on the conference
data set, compared against the results of the other participants
From Figure 4.6 one can see that MaasMatch achieved a high precision and
moderate recall over the conference data set, resulting in the fifth-highest F-measure
among the participants, which is above average. A noteworthy aspect of this result
is that this result has been achieved by only applying lexical similarities, which are
better suited at resolving naming conflicts as opposed to other conflicts. This in
turn also explains the moderate recall value, since it would require a larger, and
more importantly a more varied set of similarity values, to deal with the remaining
types of heterogeneities as well. Hence, it is encouraging to see these good results
when taking into account the moderate complexity of the framework.
A different dataset of the OAEI competition is the benchmark data set. This is
a synthetic data set, where a reference ontology is matched with many systematic
variations of itself. These variations include many aspects, such as introducing errors
or randomizing names, omitting certain types of information or altering the structure
of the ontology. Since a base ontology is compared to variations of itself, this data
set does not contain a large quantity of naming conflicts, which our approach is
targeted at. However, it is interesting to see how our framework performs when
4.4 — Experiments
105
faced with every kind of heterogeneity. Figure 4.7 displays the results of the OAEI
2011 evaluation (Euzenat et al., 2011c) on the benchmark data set.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Precision
Recall
F-Measure
Figure 4.7: Results of MaasMatch in the OAEI 2011 competition on the benchmark
data set, compared against the results of the other participants
From Figure 4.7 we can see that the overall performance MaasMatch resulted
in a high precision score and relatively low recall score when compared to the competitors. The low recall score can be explained by the fact that the disambiguation
method relies on collecting candidate synsets using information stored in the names
of the ontology concepts. The data set regularly contains ontologies with altered
or scrambled names, such that it becomes extremely difficult to allocate candidate
senses which can be used for the disambiguation step. These alterations also have
a negative impact on the quality of the constructed virtual documents, especially if
names or annotations are scrambled or completely left out, resulting in MaasMatch
performing poorly in benchmarks that contain such alterations. Despite these drawbacks, it was possible to achieve results similar to established matchers that address
all types of heterogeneities. Given these results, the performance can be improved
if measures are added which tackle other types of heterogeneities, especially if such
measures increase the recall without impacting the precision.
Comparison with OAEI 2013 frameworks
To give a more complete picture of the performance of our approach compared to
other frameworks, we re-evaluated our approach using the 2013 conference data set
(Grau et al., 2013) using the same evaluation methodology than the OAEI competition. This allows for the comparison with newer frameworks. Here, the frameworks
edna and StringEquiv are purely string-based systems which serve as a baseline
comparison. We limit the comparison to systems which performed above the lowest
baseline, StringEquiv, for the sake of brevity. We test three variations of our approach, allowing each lexical similarity metric to be compared. As disambiguation
106
Concept-Sense Disambiguation for Lexical Similarities
policies we applied MAX for lsm1 and A-Mean for lsm2 and lsm3 . While A-Mean
is sub-optimal for lsm3 with respect to the F-measure, applying its best-performing
measure MAX would result in a performance similar to the configuration of lsm1 .
The comparison of the OAEI 2013 performances with the three lexical similarity
measures can be seen in Table 4.2.
Framework
YAM++
AML-bk
LogMap
AML
ODGOMS1 2
StringsAuto
ServOMap v104
MapSSS
ODGOMS1 1
lsm1
lsm2
HerTUDA
WikiMatch
WeSeE-Match
IAMA
HotMatch
CIDER CL
edna
lsm3
OntoK
LogMapLite
XMapSiG1 3
XMapGen1 4
SYNTHESIS
StringEquiv
Precision
Recall
F-Measure
.78
.82
.76
.82
.7
.74
.69
.77
.72
.8631
.7382
.7
.7
.79
.74
.67
.72
.73
.6327
.72
.68
.68
.64
.73
.76
.65
.53
.54
.51
.55
.5
.5
.46
.47
.4436
.4797
.46
.45
.42
.44
.47
.44
.44
.5041
.43
.45
.44
.45
.41
.39
.71
.64
.63
.63
.62
.6
.58
.58
.57
.5685
.5643
.56
.55
.55
.55
.55
.55
.55
.5466
.54
.54
.53
.53
.53
.52
Table 4.2: Evaluation on the conference 2013 data set and comparison with OAEI
2013 frameworks.
One can observe from Table 4.2 that of the three tested lexical similarity measures, lsm1 and lsm2 scored above the two baseline matchers. The quality of the
alignments produced by the two variants of the tested systems is very similar, especially with respect to the F-measure. Similar to its 2011 performance, the lsm1
variant displayed a strong emphasis on precision, while the precision and recall of
lsm2 resembles the measures obtained by similarly performing systems, most notably ODGOMS1 1 and HerTUDA. The performance of lsm3 is more comparable
to the baseline and the OntoK system.
Overall, we can conclude that a system using our approach can perform compet-
4.4 — Experiments
107
itively with state-of-the-art systems, especially when taking into account the modest
complexity of the tested system.
4.4.3
Weighting Schemes Experiments
In this section, we will demonstrate the effect of the parameter system of the used
document model. We will demonstrate this effect when the model is used to calculate
word sense scores, as described in our approach, and the effect when the model is
used in its original context as a profile similarity.
Preliminaries: Parameter Optimization
The applied VD model provides the possibility of parametrized weighting, which
allows the emphasis of words depending on their origin. Recall from subsection
4.3.3 that the model contains a set of parameters, being α1 , α2 , α3 , α4 , β1 , β2 and
β3 , which weight terms according to their place in the ontology. Next to evaluating
the weighting approaches in the proposed WSD method, we will also test a profile
similarity that uses the presented virtual document model for gathering the context
information of a concept, similar to the work by Qu et al. (2006). Here, given two
concepts c and d, originating form different ontologies, and their respective virtual
documents V D(c) and V D(d), a profile similarity can be created by computing the
document similarity between V D(c) and V D(d). For each of the tested approaches
the conference and benchmark datasets were used as separate training sets, resulting
in four different parameter sets. We will use the terms Lex-B and Lex-C to refer
to the parameter sets which have been generated by optimizing the LSM on the
benchmark and conference dataset respectively. For the parameter sets which have
been generated using the profile similarity we will use the terms Prof-B and Prof-C.
Tree-Learning-Search (TLS) (Van den Broeck and Driessens, 2011) was applied
in order to optimize the different combinations of similarity metrics and training
sets. TLS combines aspects of Monte-Carlo Tree Search and incremental regression
tree induction in order to selectively discretize the parameter space. This discretized
parameter space in then sampled using the Monte-Carlo method in order to approximate the optimal solution. The results of the performed optimization can be seen
in Table 4.3.
Parameter Set
α1
α2
α3
α4
β1
β2
β3
Lex-C
Lex-B
.51
.52
.68
.99
.58
.08
.42
.65
.32
.01
.07
.09
.06
.16
Prof-C
Prof-B
.71
.85
.02
.13
.01
.54
.58
.32
.09
.90
.04
.32
.01
.99
Table 4.3: Optimized parameter sets for the VD model when applied to a LSM (Lex)
and profile similarity (Prof) using the conference (C) and benchmark (B) data sets
as training sets.
108
Concept-Sense Disambiguation for Lexical Similarities
From Table 4.3 some notable differences emerge. The parameter α1 tends to
have a higher value for profile similarities compared to the LSM parameters sets.
This can be explained by the fact that the synset candidate collection step of the
proposed disambiguation method selects candidate synsets using the processed ontology concept names as basis. Hence, all sysnet candidates will contain terms that
are similar to the ontology concept name, diminishing their information value for
the purpose of WSD. Conversely, values for α2 tend to be higher for LSM parameter
sets, indicating that matching alternative concept names are a strong indication of
a concept’s intended meaning.
Preliminaries: Test Setup
We will evaluate six different weighting schemes for virtual documents in order to
investigate what impact these have on the mapping quality. The six weighting
schemes were evaluated on the conference dataset and can be described as follows:
TF As a reference point, we will evaluate the performance of standard term-frequency
weights as a baseline, which is done by setting all VD parameters to 1.
Lex-C/Prof-C This scheme represents the VD model using optimized parameters
that were obtained from the same dataset. This scheme will be referred to by
the name of its corresponding parameters set, which is Lex-C for the WSD
evaluation and Prof-C for the profile similarity evaluation.
Lex-B/Prof-B Similar to the previous scheme, however the parameter set were
obtained through the optimization on a different training set.
TF-IDF This scheme entails the combination of term-frequency and inverse documentfrequency weights, as commonly seen in the field of information retrieval. Similar to TF weighting, all weights of the VD model will be set to 1.
Lex-C/Prof-C * TF-IDF It is possible to combine the VD model with a TFIDF weighting scheme. This scheme represents such a combination using the
parameter sets that have been obtained from the same data set. In the WSD
experiment this scheme will be referred to as Lex-C * TF-IDF, while in the
profile similarity experiment it will be referred to as Prof-C * TF-IDF.
Lex-B/Prof-B * TF-IDF Similar to the previous scheme, however the parameter
sets that were obtained from the benchmark dataset are used instead.
The evaluation of the TF-IDF method and its combination with the VD model
weighting is especially critical since previous work using this model has included TFIDF weighting in its approach without evaluating the possible implications of this
technique (Qu et al., 2006). For each weighting method the computed alignments
are ranked according to their similarity. For each ranking the interpolated precision
values will be computed such that these can be compared.
4.4 — Experiments
109
1
0.9
0.8
Precision
0.7
TF
0.6
Lex-C
0.5
TF-IDF
0.4
Lex-C * TF-IDF
0.3
Lex-B
Lex-B * TF-IDF
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Recall
0.7
0.8
0.9
1
Figure 4.8: Precision versus Recall graph of the created alignments from the conference data set using the lexical similarities with the virtual document
Lexical Similarity with Applied WSD
The different weighting schemes have been separately applied to this approach and
subsequently used to calculate mappings on the conference data set. The precision
vs recall graph of the produced alignments can be seen in Figure 4.8.
From Figure 4.8 we can observe some key points. For lower recall values, Lex-C,
Lex-B and Lex-B * TF-IDF weighting resulted in the highest precision values. When
inspecting higher recall values, one can observe that the Lex-C and Lex-B weighting
outperformed the remaining weighting schemes with differences in precision reaching
values of 10%. However, only the alignments generated with TF, TF-IDF and LexB * TF-IDF weighting achieved a possible recall value of 0.7 or higher, albeit at
very low precision values. Another notable observation is the performance of TFIDF based schemes. The standard TD-IDF scheme displayed performance similar
to TF, thus being substantially lower than Lex-C or Lex-B. Also, the combination
schemes Lex-C * TF-IDF and Lex-B * TF-IDF performed lower than their respective
counterparts Lex-C and Lex-B. From this we can conclude that when applying VDbased disambiguation for a LSM, it is preferable to weight terms according to their
origin and avoid the use of inverse document frequencies.
Profile Similarity
After having established the impact that different weighting techniques can have
on the VD model when applied as context gathering method for a disambiguation
approach, it would be interesting to see the impact of these techniques when the VD
model is used for its original purpose (Qu et al., 2006). Hence, in this subsection we
will detail the performed experiments with the six investigated weighting schemes
110
Concept-Sense Disambiguation for Lexical Similarities
1
0.9
0.8
Precision
0.7
TF
0.6
Prof-C
0.5
TF-IDF
0.4
Prof-C * TF-IDF
0.3
Prof-B
Prof-B * TF-IDF
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5 0.6
Recall
0.7
0.8
0.9
1
Figure 4.9: Precision versus Recall graph of the created alignments from the conference data set using the document similarities of the virtual documents.
when utilizing the virtual document model as the context gathering method for a
profile similarity. All weighting schemes were used to calculate mappings on the
conference data set. The measures of precision and recall were computed using the
resulting alignments. The precision vs recall graph of these alignments can be seen
in Figure 4.9.
From Figure 4.9 several key observations can be made. Initially, one can see
that the overall two best performing schemes are Prof-C and Prof-C * TF-IDF
weighting. The Prof-C * TF-IDF scheme displays a slightly worse performance than
the Prof-C scheme. This indicates that the combination with TF-IDF weights not
only failed to improve the term weights of the virtual documents, but rather it caused
the representative strength of the VD to decrease, leading to alignments of lesser
quality. The same contrast is visible when comparing Prof-B weighting with Prof-B
* TF-IDF weighting.
Next, another observation can be made when contrasting the results of the TFIDF weights with TF weights. Both schemes lead to alignments of similar quality,
indicating that the combination of the inverse document frequencies to the term
frequencies does not lead to the same improvements that one can observe when
performing information retrieval on regular documents. Lastly, when comparing TFIDF weighting to Prof-C and Prof-B weighting, one can see that TF-IDF weighting
can at most match the performance of the other two schemes.
4.4.4
Runtime Analysis
When designing an ontology mapping framework the issue of runtime can be an
important factor. This becomes an increasingly important issue when attempting to
4.4 — Experiments
111
create a mapping between large ontologies, with both ontologies containing several
hundred up to thousand of concepts. Adding a disambiguation procedure to a lexical similarity might cause a decrease in runtime performance, which if sufficiently
significant would make in infeasible to include this approach for a large mapping
task. To establish how much runtime overhead our approach generates, we executed our system on the OAEI 2013 conference data set while recording the total
runtimes for the three general steps of lexical similarity measure: the retrieval of
candidate senses, the disambiguation procedure and the computation of the lexical
similarity. The disambiguation procedure involves the process of creating the virtual
documents, document similarity computations and application of the disambiguation policy. In order to accurately establish the overhead added to the runtime of a
standard lexical similarity, no word senses are discarded in the disambiguation step.
As lexical similarity metric, lsm1 was applied, though in terms of runtime there is
likely to be little difference since lsm1 , lsm2 and lsm3 all require the computation
between all pairwise combinations of senses in order to obtain their results. The
recorded runtimes are presented in Table 4.4.
Computation
Sense Retrieval
Disambiguation
Lexical Similarity
Overhead
Runtime(ms)
35,272
5,632
118,900
3.65%
Table 4.4: Runtimes of the different elements of the lexical similarity on the conference dataset.
From Table 4.4 we can see that the most time consuming step of the entire
similarity measure, consuming 74% of the expended computation time, is the calculation of the actual similarity values after having disambiguated all the word senses.
Determining all candidate word senses for the ontology concepts, which involves
several string-processing techniques such as tokenization, word-stemming and stop
word removal, required 22% of the spent computational time. The creation of virtual
documents and disambiguation of senses only required 3% of the computation time,
meaning that the addition of this step increased the runtime by 3.65%. Given the
potential performance increases of our approach, once can conclude the additional
overhead introduced is negligible.
The previous comparison assumed a worse-case scenario where no senses are discarded. However, the filtering of senses can reduce the computational time for the
lexical similarity by requiring fewer evaluations of the semantic similarity between
senses. To see to what extent the different disambiguation policies reduce the runtime of this step, we recorded the runtimes of each policy on the conference dataset
to establish possible performance gains.
From Table 4.5 we can observe that the application of the disambiguation policies
can lead to significant improvements in terms of runtime. Applying the most lenient
G-Mean policy leads to a reduction in runtime of 35 % where as the most strict
policy reduces the overall runtime by 74.1 %.
Overall, we can conclude the the application of a disambiguation procedure can
lead to significant improvements in runtime despite the addition of computational
112
Concept-Sense Disambiguation for Lexical Similarities
Policy
None
G-Mean
A-Mean
M-STD
MAX
Sense Retrieval
Disambiguation
Lexical
Similarity
Runtime
Reduction
35,272ms
35,350ms
35,780ms
34,229ms
33,975ms
5,632ms
5,590ms
5,828ms
5,472ms
5,374ms
118,900ms
61,761ms
24,847ms
7,244ms
2,005ms
0.0%
35.7%
58.4%
70.6%
74.1%
Table 4.5: Runtimes of the different elements of the lexical similarity for each disambiguation policy.
overhead of the disambiguation method.
4.5
Chapter Conclusions and Future Work
We end this chapter by summarizing the results of the experiments (Subsection 4.6)
and giving an outlook on future research (Subsection 4.7) based on the findings
presented.
4.6
Chapter Conclusions
In this chapter we tackled research question 1 by suggesting a method for the concept
sense disambiguation of ontology concepts. The methods extends current disambiguation methods adapting information-retrieval-based techniques from contemporary profile similarities. We propose a virtual document model based on established
work (Qu et al., 2006) and propose a disambiguated lexical similarity capable of
utilizing different disambiguation policies and methods for the computation of similarities between sets of senses.
First, we establish that the addition of our disambiguation procedure enhances
the performance of lsm1 , lsm2 and lsm2 , with the most significant improvements
being observed for lsm1 and lsm3 (4.4.1). We further observe that the strict MAX
disambiguation policy results in the highest measured performance of lsm1 and
lsm3 , while for lsm2 A-Mean was the most effective policy. The comparison with
other mapping systems using the OAEI 2011 competition and the 2013 conference
dataset revealed that a otherwise modest system using our approach can result in
a competitive performance when compared to established system, with our system
producing higher F-measures than 50 % of the established systems and a higher
precision than most systems (4.4.2). Furthermore, we investigated the possible effects on the alignment quality when applying different weighting techniques for the
virtual documents. The outcome is that a weighting technique utilizing the origin
of the terms within the ontology, as specified by the document model, outperforms
the IR-based TF-IDF technique. Lastly, we establish that the addition of our proposed disambiguation approach results in an insignificant amount of computational
4.7 — Future Research
113
overhead while significantly reducing the overall runtime due to the reduction of
computed similarities between individual senses (4.4.4).
4.7
Future Research
We propose three directions of future research based on our findings presented in
this chapter.
(1) The proposed disambiguation approach is based on existing profile similarities. While this type of similarity is fairly robust, it is still susceptible to terminological limitations and disturbances. An example of this would be the virtual
documents of two concepts only containing synonymic words such that there are
many equivalent but non-identical terms. While this is less of an issue for concept
names, since synsets contain all synonyms for the entity they denote, this can still
be an issue when comparing terms of the concept comments or synset annotations.
Another example would be anomalies in the terms themselves, e.g. spelling errors or
non-standard syntax for compound words. In these cases it is difficult for a profile
similarity to determine an appropriate degree of overlap between terms. Future research could tackle these weaknesses through the addition of new techniques to the
proposed lexical similarity. Examples of techniques which could be applied are spellchecking tools, synonym extraction techniques and soft metrics for the computation
of term overlap.
(2) An alternative to tackling the weaknesses of profile similarities can be found
in the combination of several disambiguation techniques. Future research could attempt to adapt other disambiguation techniques, as introduced in Subsection 4.2.1,
for the purpose of concept sense disambiguation. The results of multiple disambiguation procedures would then require to be combined in a to-be-proposed aggregation
strategy.
(3) The work of this chapter utilizes the lexical information available in the
widely adopted WordNet dictionary. While this resource is rich with respect to
the modelled entities, their relations, grammatical forms and possible labels, its
additional synset annotations are typically limited to a few sentences per synset.
It might be possible to achieve more accurate disambiguation results by acquiring
additional information for the descriptions of each synset. New information might
be gathered by querying corresponding Wikipedia entries by exploiting the links
provided by YAGO or querying online search-engines such as Google or Bing and
extracting terms from the search results.
114
Concept-Sense Disambiguation for Lexical Similarities
Chapter 5
Anchor Profiles for Partial
Alignments
This chapter is an updated and expanded version of the following publications:
1. Schadd, Frederik C. and Roos, Nico (2013). Anchor-Profiles for Ontology Mapping with Partial Alignments. Proceedings of Twelfth Scandinavian Conference on Artificial Intelligence (SCAI 2013), Jaeger, Manfred,
Nielsen, Thomas D. and Viappiani, Paolo ed., pp. 235−244, IOS.
2. Schadd, Frederik C. and Roos, Nico (2014a) Anchor-Profiles: Exploiting
Profiles of Anchor Similarities for Ontology Mapping. Proceedings of the
26th Belgian-Dutch Conference on Artificial Intelligence (BNAIC 2014),
pp. 177−178.
Another type of background knowledge that can be exploited, as discussed in
subsection 2.3, are partial alignments. Given the two input ontologies O1 and O2 ,
a partial alignment PA specifies an alignment that is incomplete with respect to
the entities in O1 and O2 . Essentially, if a domain expert were to be presented
with PA, he would not be satisfied with PA until a series of correspondences are
added to the alignment. The main goal in this scenario is to identify the additional
correspondences which the domain expert would add to PA.
This chapter addresses the second research question by proposing a profile-based
method of utilizing the anchors of a partial alignment. We generalize the notion of
a profile such that it expresses the affinity to certain objects. In a classic profile
approach, the objects denote natural language terms and the affinity to a term is
expressed by how often the term occurs appears in the vicinity of the given concept.
We propose an alteration of that interpretation, such that a profile expresses the
concept’s affinity to a series of given anchors. The core intuition behind this approach
is that concepts which denote the same entity are more likely to exhibit the same
levels of affinity to the given anchors. We evaluate our approach on the OAEI
benchmark dataset. Particularly, we investigate the effects of partial alignment sizes
116
Anchor Profiles for Partial Alignments
and correctness and compare the performance of the approach to contemporary
mapping systems.
The remainder of this chapter is structured as follows. Section 5.1 discusses
work that is related to mapping with partial alignments. Section 5.2 details our
proposed approach. Section 5.3 presents and discusses the results of the performed
experiments while Section 5.4 presents the conclusion of this chapter.
5.1
Related Work
Several works exist that have described approaches which reuse previously generated
alignments. The principle behind these approaches has initially been suggested by
Rahm and Bernstein (2001). Here, the focus lies on finding auxiliary ontologies
which are already mapped to the target ontology. This has the intention that,
by selecting the auxiliary ontology according to a specific criteria, the remaining
mapping problem between the source and auxiliary ontology might be easier to solve
than the original problem. Subsequent works have expanded this idea to deriving
mappings when both input ontologies have an existing alignment to an auxiliary
ontology.
COMA++ employs several strategies with respect to exploiting pre-existing
alignments (Aumueller et al., 2005). Most prominently, the system can explore
alignment paths of variable lengths between multiple ontologies, which are obtained
from a corpus, in order to derive its mappings. It is also possible to explore ontologies from the semantic web for this purpose (Sabou et al., 2008). The resulting
mapping derivations of multiple alignment paths can be combined to form a more
reliable mapping.
While the previously mentioned approaches utilized complete mappings involving auxiliary ontologies, there has been some research into approaches that exploit
partial alignments that exist between the source and target ontologies. These alignments can either be user generated, by for instance using the PROMPT tool (Noy
and Musen, 2000), or automatically generated from a different system.
The most prominent approach is the Anchor-PROMPT (Noy and Musen, 2001)
algorithm. Here, possible paths between anchors are iteratively explored in parallel
in both ontologies, while the encountered concept combinations are registered. The
intuition is that concept pairs which have been encountered regularly during the
exploration phase are more likely to correspond with each other.
The Anchor-Flood algorithm also features a type of iterative exploration by exploiting anchors (Seddiqui and Aono, 2009). This approach selects a main anchor
and iteratively expands the explored neighbourhood of this anchor. At each iteration, a matching step is invoked which compares the concepts in this neighbourhood
and updates the alignment if new correspondences are found. A similar procedure
can be seen in the LogMap system (Jiménez-Ruiz and Cuenca Grau, 2011; JiménezRuiz et al., 2012b). This system alternates between an anchor-based discovery step
and a mapping repair step in order to compute a full mapping.
5.2 — Anchor Profiles
5.2
117
Anchor Profiles
A profile similarity gathers context information of ontology concepts and compares
these context collections by parsing them into a vector space and comparing the
resulting vectors. This context information can consist of data from the concept
description and the descriptions of related concepts (Qu et al., 2006; Mao et al.,
2007). The intuition behind this approach is that concepts can be considered similar
if they have similar context information. More generally, a profile can be considered
as a vector generated from data which describes a concept, hence two concepts are
similar if their profiles can be considered similar.
When mapping two ontologies for which a partial alignment is provided by a
domain expert, new opportunities arise when selecting similarity measures for a
mapping system. Instead of using description information as the basis for a profile,
we suggest utilizing the correspondences of the given partial alignment, also referred
to as anchors, as basis for a new kind of profile similarity. Here, since the anchors
are assumed to be correct, the main intuition is that two concepts can be considered
similar if they exhibit a comparable degree of similarity towards the given anchors.
We will illustrate this intuition with an example, depicted in Figure 5.1.
Anchor
Space
similarity
0.9
similarity
Sports Car
Performance
Car
Truck
Lorry
Bicycle
Fiets
0.7
Car
0.2
0.9
0.8
Automobile
0.1
Figure 5.1: Two equivalent concepts being compared to a series of anchors.
Figure 5.1 depicts two classes, Car and Automobile, being compared to three
given anchors, <Sports Car, Performance Car>, <Truck, Lorry> and <Bike, Fiets>.
When comparing Car and Automobile to a concept of the anchor <Sports Car, Performance Car> it is reasonable to expect that both comparisons would result in
a high value since they are highly related. On the contrary, comparing Car and
Automobile to the anchor <Bike, Fiets> is likely to result in a lower value, since a
bike is not closely semantically related to the concept of a car. However, since Car
and Automobile are semantically equivalent, one can expect the resulting values of
the comparison to be equally low.
In order to compare concepts with anchors we need to define a metric capable
of doing so. Given two ontologies O1 and O2 , and given an anchor Ax [C 1 , C 2 ] containing a correspondence between the concepts C 1 and C 2 originating from O1 and
118
Anchor Profiles for Partial Alignments
O2 respectively, and given a concept similarity sim′ (E, F ) ∈ [0, 1] which expresses
the similarity between two concepts, we define an anchor similarity simA (C, Ax )
between an arbitrary concept C and Ax as:
(
sim′ (C, C 2 ) if C ∈ O1
simA (C, Ax ) =
(5.1)
sim′ (C, C 1 ) if C ∈ O2
Note that C is compared to the concept in the anchor which originates from the
other ontology. If one were to compare C to the anchor concept from the same ontology, sim′ would be reduced to a structural similarity, similar to a taxonomy distance,
making the distinction between classes that are related equivalently close to a given
anchor prohibitively difficult. We will empirically demonstrate the effectiveness of
this anchor similarity, opposed to comparing C to the anchor concept originating
from the same ontology, in subsection 5.3.3. From equation 5.1 follows that two
concepts C and D can be considered similar if simA (C, Ax ) and simA (D, Ax ) are
similar. Given that a partial alignment most likely contains multiple correspondences, this intuition needs to be expanded for a series of anchors. This brings us
back to the generalized idea of a profile, such that we can use the anchor similarities
simA between a concept C and all anchors as the basis of a profile, referred to as
anchor-profile.
Algorithm 5.1 Anchor-Profile Similarity
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
Anchor-Profile(E1 ,E2 ,PA,simA , simP )
for all e ∈ E1 ∪ E2 do
Profile(e) ← initVector(|PA|)
for all i = 1 to |PA| do
Ai ← PA[i]
Profile(e)(i) ← simA (e, Ai )
end for
end for
M ← initMatrix(|E1 |, |E2 |)
for i = 1 to |E1 | do
for j = 1 to |E2 | do
e1 ← E1 [i]
e2 ← E2 [j]
M [i, j] ← simP (Profile(e1 ), Profile(e2 ))
end for
end for
return M
Formally, let us define Profile(e) as the profile of the entity e. Also, let us define
simP as a similarity metric capable of comparing two profiles, as introduced in
subsection 3.1.2. Given two sets of entities E1 and E2 , belonging to the ontologies
O1 and O2 respectively, a partial alignment PA = {A1 , A2 , . . . , An } consisting of n
anchors, an anchor similarity simA and a similarity measure simP , we compute the
5.2 — Anchor Profiles
119
matrix M of anchor-profiles similarities between E1 and E2 as defined in Algorithm
5.1.
Figure 5.2 visualizes our anchor-profile similarity, as defined in Algorithm 5.1.
O1
O2
Profile(C1)
0.5
C1
0.8
0.3
,A 1)
sim A(C 1
,A2)
simA(C1
simA(C
1 ,A3 )
O1
O2
Profile(C2)
A1
simA(C ,A
2 1)
A2
A3
simA(C2,A2)
simA(C ,A
2 3)
0.6
0.9
C2
0.2
simP(Profile(C1),Profile(C2))
Figure 5.2: Visualization of an anchor profile similarity.
The example in Figure 5.2 shows two ontologies, O1 and O2 , and three anchors
A1 , A2 and A3 . Two concepts C1 and C2 , originating from O1 and O2 respectively,
are compared using their respective anchor-profiles Profile(C1 ) and Profile(C2 ). The
profile vectors are compared using the similarity simP . While there exist various
similarity measures for vectors, for this research the well-known cosine-similarity
(Pang-Ning et al., 2005) has been applied as simP .
Since the main intuition of this approach is that corresponding concepts should
exhibit a comparable degree of similarity towards the given anchors, it is necessary
to choose sim′ such that this metric is robust under a wide variety of circumstances.
Since every single metric has potential weaknesses (Shvaiko and Euzenat, 2005), it is
preferable to aggregate different metrics in order to overcome these. To realise this,
sim′ utilizes the aggregate of all similarities from the MaasMatch system (Schadd
and Roos, 2012b). Figure 5.3 displays the configuration of the evaluated mapping
system. Here, two distinct similarity matrices are computed, being the similarities
of the anchor-profiles and an aggregate of other metrics. This second matrix is
necessary for the eventuality where the system has to differentiate correspondences
that all contain anchor-profiles which closely resemble null vectors, which occurs
when a concept displays no similarity to any of the given anchors. This can occur
when a given ontology has a considerable concept diversity and the given anchors do
not adequately cover the concept taxonomy. The aggregation of these two matrices
is then used to extract the output alignment A.
120
Anchor Profiles for Partial Alignments
Other Concept Similarities
O1
Syntax
O2
Profile
A
Anchor-Profile
sim‘
A’
Syntax
Name Path
Profile
Lexical
Figure 5.3: Overview of the tested mapping system.
5.3
Experiments
To evaluate the performance of our approach, we will use the measures of adapted
Precision P ∗ , adapted Recall R∗ and adapted F-Measure F ∗ , as introduced in Section 2.3. Furthermore, we will also compute the standard measures of Precision,
Recall, and F-Measure when it is necessary to establish the overall quality of the
resulting alignments. This section is structured into the following subsections, which
individually either establish the performance of our approach or investigate interesting properties:
• Subsection 5.3.1 establishes the overall performance of the approach with an
evaluation on the OAEI benchmark dataset.
• Subsection 5.3.2 analyses the benchmark results in more detail by analysing
the performance over the different tasks of the dataset.
• We evaluate the intuition behind simA by comparing its performance to an
alternative anchor-similarity in subsection 5.3.3.
• In subsection 5.3.4 we investigate to what extend incorrect anchors influence
the performance of our approach.
• Subsection 5.3.5 compares the quality of the produced correspondences to other
contemporary matching systems.
In order to evaluate the performance of a mapping approach which exploits partial alignments, it is necessary to have access to a dataset which not only contains
appropriate mapping tasks and their reference alignments, but also partial alignments that can be used as input. However, within the boundaries of the OAEI
competition, which allows a comparison with other frameworks, there does not exist
a dataset which also supplies partial alignments as additional input with which a
recent evaluation has taken place. When a dataset does not contain partial alignments, it is possible to generate these by drawing correspondences from the reference
alignment at random. In order to account for the random variation introduced by
5.3 — Experiments
121
the generated partial alignments, it becomes necessary to repeatedly evaluate the
dataset using many generated partial alignments for each mapping task. The values of precision, recall and F-measure can then be aggregated using the arithmetic
mean.
Next to establishing the mean performance of a system, it is also interesting to
see how stable its performance is. Traditionally, this is expressed via the standard
deviation. However, given that in this domain the measurements origin from different
tasks of differing complexity, this introduces a problem. Given the presence of tasks
of varying complexity that can occur in a dataset, it is to be expected that the
mean performances of the repeated evaluations differ for each task. Thus, in order
to combine the standard deviations of the different tasks, a statistical measure is
needed that takes this into account. To do this we propose using the pooled standard
deviation of the different measures (Dodge, 2008; Killeen, 2005).
Given k samples, the different sample sizes n1 , n2 , . . . , nk and sample variances
s21 , s22 , . . . , s2k , the pooled standard deviation of the collection of samples can be
calculated as follows:
s
(n1 − 1) × s21 + (n2 − 1) × s22 + · · · + (nk − 1) × s2k
′
(5.2)
s =
n1 + n2 + · · · + nk − k
In this domain, the repeated evaluation of a single track using randomly generated partial alignments can be viewed as a sample, such that the pooled standard
deviation expresses how much the results deviate across all tracks. For the remainder of this chapter, we will refer to the pooled standard deviation of P ∗ , R∗ and F ∗
as s′P ∗ , s′R∗ and s′F ∗ respectively.
5.3.1
Evaluation
To evaluate an anchor-profile approach, an ontology mapping system incorporating
the proposed similarity has been evaluated on the benchmark-biblio dataset originating from the 2012 Ontology Alignment Evaluation Initiative (Aguirre et al., 2012).
This synthetic dataset consists of tasks where each task tests a certain limiting aspect of the mapping process, for instance by distorting or removing certain features
of an ontology like concept names, comments or properties. Since this dataset does
not contain partial alignments that can be used as input, they were randomly generated from the reference alignments. In order to evaluate what impact the size of the
partial alignment can have on the mapping process, we evaluated our approach over
a spectrum of partial alignment recall values [0.1, 0.2, . . . , 0.9]. Thus, as an example,
a partial alignment recall of 0.2 indicates that PA was randomly generated from
the reference R such that PA has a recall of 0.2. In order to mitigate the variance
introduced through the random generation of PA, each recall level has been evaluated 100 times where each evaluation contained a new set of randomly generated
partial alignments. For each evaluation, the adapted measures of precision, recall
and F-measure, P ∗ , R∗ and F ∗ respectively, were computed and aggregated. Table
5.1 displays the aggregated results of the evaluation.
From Table 5.1, several interesting results and trends can be seen. First, we can
see that overall for all PA recall levels the system resulted in an adapted precision
122
Anchor Profiles for Partial Alignments
PA Recall
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
P
R∗
F∗
0.760
0.632
0.668
0.769
0.641
0.678
0.779
0.649
0.686
0.786
0.656
0.693
0.801
0.663
0.701
0.817
0.674
0.713
0.835
0.685
0.726
0.855
0.702
0.743
0.866
0.745
0.780
s′P ∗
s′R∗
s′F ∗
0.094
0.049
0.038
0.099
0.068
0.053
0.107
0.083
0.066
0.112
0.092
0.074
0.125
0.102
0.083
0.139
0.117
0.098
0.155
0.133
0.115
0.180
0.158
0.142
0.219
0.215
0.199
∗
Table 5.1: Results of the evaluations on the benchmark-biblio dataset using different
recall requirements for the randomly generated partial alignments. For each recall
requirement, 100 evaluations were performed and aggregated.
in the interval [0.76, 0.87], adapted recall in the interval [0.63, 0.75] and adapted
F-measure in the interval [0.66, 0.78]. Thus, for every PA recall level the approach
resulted in high precision and moderately high recall measure. Furthermore, we can
observe that as the recall of PA increases, the adapted precision, recall and F-measure
of A increase as well. This increase is fairly consistent over all PA recall levels,
indicating that a larger amount of anchors improves the representative strength of
the computed anchor profiles.
Inspecting s′P ∗ , s′R∗ and s′F ∗ reveals each measure shows a similar trend. For
each measure, an increase of the recall level of PA also yields an increase of the
pooled standard deviation, with the resulting alignments at PA recall level of 0.1
being fairly stable, while a moderate variance can be observed at a PA recall level of
0.9. This trend is to be expected since any variation in A will have a larger impact
on P ∗ , R∗ and F ∗ if PA has a significant size.
5.3.2
Performance Track Breakdown
Having established the overall performance of the proposed approach for different
size levels of PA, it would be interesting to inspect the performance over the different
tasks of the dataset. The task groups reflect different kinds of alterations in the
target ontology and have been grouped as follows:
101 A baseline task where the complete ontology is matched against itself. Allows
for the identification of any fundamental flaws in a mapping system.
201-202 Concept names have been removed. Task 202 has concept description
removed in addition.
221-228 Testing the separate removal or alteration of instances, the class hierarchy,
restrictions or properties.
232-247 Testing all possible combinations of removing or altering the instances,
hierarchy, restrictions or properties.
5.3 — Experiments
123
248-253 Similar to 221-228, this group tests the removal or alteration of instances,
the class hierarchy, restrictions or properties. However, concept names and
descriptions have been removed in addition.
254-266 Similar to 232-247, this group tests all possible combinations of removing or altering the instances, hierarchy, restrictions or properties. However,
concept names and descriptions have been removed in addition.
Figures 5.4, 5.5 and 5.6 show adapted precision, recall and F-measure over several
task groups when using different PA size levels, ranging from 0.1 to 0.9 in intervals
of 0.1.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.10.20.30.40.50.60.70.80.90.10.20.30.40.50.60.70.80.90.10.20.30.40.50.60.70.80.90.10.20.30.40.50.60.70.80.90.10.20.30.40.50.60.70.80.90.10.20.30.40.50.60.70.80.9
101
201-202
221-228
232-247
248-253
254-266
Figure 5.4: Corrected precision of the proposed approach for the different task groups
of the benchmark dataset. Each group contrasts the performance of different partial
alignment recall levels.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
101
201-202
221-228
232-247
248-253
254-266
Figure 5.5: Corrected recall of the proposed approach for the different task groups
of the benchmark dataset. Each group contrasts the performance of different partial
alignment recall levels.
124
Anchor Profiles for Partial Alignments
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.60.7 0.8 0.9 0.1 0.20.3 0.4 0.5 0.6 0.70.8 0.9 0.1 0.2 0.30.4 0.5 0.6 0.7 0.8 0.9 0.10.2 0.3 0.4 0.5 0.60.7 0.8 0.9 0.1 0.20.3 0.4 0.5 0.6 0.70.8 0.90.1 0.2 0.3 0.4 0.50.6 0.7 0.8 0.9
101
201-202
221-228
232-247
248-253
254-266
Figure 5.6: Corrected F-measure of the proposed approach for the different task
groups of the benchmark dataset. Each group contrasts the performance of different
partial alignment recall levels.
From this evaluation we can observe two clear trends. Firstly, the performance
in terms of precision, recall and F-measure is positively correlated with the size
of PA. We can see most prominently in track groups 201-202 and 248-253 that the
performance improvements surpass linear growth, as we can observe more substantial
improvements for larger PA sizes. Secondly, we can see a divided in performance
when contrasting the groups 101, 221-228 and 232-247 against the groups 201-202,
248-253 and 254-266, with the former being matched with perfect results. This can
be explained by the setup of the mapping system, specifically the similarities being
employed in sim′ . These similarities are predominantly syntactic and structural,
such that the track groups where these traits are not altered display a justifiably
high performance. A more varied set of similarities should improve the performance
on the remaining track groups.
5.3.3
Alternate Profile Creation
The basis of the Anchor Profile approach is the comparison of ontology concepts
to anchors, which is achieved via simA as described in equation 5.1. Here, an
ontology concept C is compared to an anchor A by retrieving the concept from
A which does not originate from the same ontology as C. While in section 5.2 we
elaborated the intuition behind this, we will empirically demonstrate the correctness
of this approach by comparing simA to an anchor similarity which compares C
to the anchor-concept which originates from the same ontology as C. To achieve
this, given two ontologies O1 and O2 , and given an anchor Ax [C 1 , C 2 ] containing
a correspondence between the concepts C 1 and C 2 originating from O1 and O2
respectively, we define a new anchor similarity sim∗A as follows:
sim∗A (C, Ax )
=
(
sim′ (C, C 2 )
sim′ (C, C 1 )
if C ∈ O2
if C ∈ O1
(5.3)
5.3 — Experiments
125
Having defined sim∗A , an evaluation was performed similar to the evaluation in
subsection 5.3.1, however with sim∗A substituted for simA . Figure 5.7 compares the
resulting adapted precision values of these evaluations.
0.9
0.85
P*
0.8
0.75
0.7
0.65
0.6
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PA Recall
simA
sim
A
simA*
sim*A
Figure 5.7: Adapted precision of the anchor profile approach using simA and sim∗A
as anchor similarities.
From Figure 5.7 we can see that sim∗A produces slightly higher precision values
for low PA recall levels, with an adapted precision difference of approximately 0.01
for every PA recall value up to 0.5. However, for higher PA recall values, where
the anchor profile has a higher dimensionality, we can observe a strong increase in
precision when using simA . On the contrary, the adapted precision when using
sim∗A stagnates and even drops for these higher PA recall values, leading maximum
difference of adapted precision of approximately 0.06. We can conclude that, when
using sim∗A , that the resulting adapted precision is not positively correlated with
the recall of PA, unlike simA .
Next, the resulting adapted recall values for the evaluated PA recall levels are
compared, when applying simA and sim∗A , are compared in Figure 5.8.
Unlike the comparison of the adapted precision, in Figure 5.8 we can observe a
more straight-forward result. While for both simA and sim∗A the adapted recall is
positively correlated with the recall of PA, applying simA resulted in significantly
higher adapted recall values for all PA recall values, with the difference ranging from
0.06 up to values of 0.08.
Finally, the resulting adapted F-measures, indicative of the overall performance
of both anchor similarities, are compared in Figure 5.9.
In Figure 5.9 we can see a similar trend is in Figure 5.8, with the adapted Fmeasure being significantly higher at all PA recall values, though with the difference
slightly less pronounced due to simA producing slightly lower adapted precision
values, as seen in Figure 5.7. However, we can observe a minimum adapted Fmeasure difference of 0.032, at a PA recall of 0.1 and a maximum adapted F-measure
126
Anchor Profiles for Partial Alignments
0.8
0.7
R*
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PA Recall
simA*
sim*A
simA
sim
A
Figure 5.8: Adapted recall of the anchor profile approach using simA and sim∗A as
anchor similarities.
0.9
0.8
F*
0.7
0.6
0.5
0.4
0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PA Recall
simA
sim
A
simA*
sim*A
Figure 5.9: Adapted F-measure of the anchor profile approach using simA and sim∗A
as anchor similarities.
difference of 0.072 at PA recall of 0.9.
Overall, we can conclude that by comparing ontology concepts to the anchor concept originating from the opposing ontology results in a superior quality of computed
alignments for all PA recall values.
5.3.4
Influence of Deteriorating PA Precision
As previously stated, the general assumption of an approach which utilizes partial
alignments is that the correspondences within the partial alignment can be assumed
to be correct. This assumption is based on the fact that partial alignments are gen-
5.3 — Experiments
127
erated by a domain expert or by a specialized pre-processing technique. However, it
can be possible that a domain expert can make an error, or that the specialized preprocessing technique does not produce correct correspondences with 100% certainty,
in which case the performance of the Anchor Profile approach, or any other approach
which utilizes partial alignments, might suffer. In this subsection we will investigate
to what extent the performance of the Anchor Profile approach is influence in the
eventuality that this assumption is wrong and that the partial alignment contains
incorrect correspondences. Formally, given a partial alignment PA and a reference
6 ∅. From this folalignment R, we will investigate the situation in which |PA ∩ R| =
lows that P (PA, R) < 1. We will systematically evaluate the OAEI 2012 benchmark
dataset with PA recall levels ranging from 0.1 to 0.9, similar to subsection 5.3.1.
However, for each PA recall level we will evaluate a series of PA precision levels,
ranging from 0.1 to 0.7. This will provide an indication of performance degradation
for the relative spectrum of PA sizes. Table 5.2, 5.3 and 5.4 display the adapted
precision, recal and F-measure, respectively, of this evaluation.
R(PA, R)
P (PA, R)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.760
0.766
0.755
0.759
0.746
0.732
0.726
0.711
0.769
0.763
0.755
0.741
0.728
0.714
0.683
0.666
0.779
0.771
0.749
0.742
0.713
0.695
0.675
0.335
0.786
0.783
0.758
0.727
0.703
0.684
0.345
0.666
0.801
0.775
0.752
0.722
0.696
0.324
0.688
0.569
0.817
0.787
0.748
0.704
0.456
0.724
0.648
0.433
0.835
0.786
0.735
0.510
0.752
0.701
0.554
0.279
0.855
0.784
0.498
0.136
0.737
0.614
0.408
0.168
0.866
0.592
0.769
0.690
0.553
0.379
0.187
0.054
Table 5.2: Adapted precision P ∗ (A, R) of the Anchor Profile approach for varying
recall and precision levels of the input partial alignment.
First, we can see that the performance of the anchor profile approach is negatively
affected upon decreasing P (PA, R). While this is an expected result, it is interesting
to see to what extent this decrease occurs for the different values of R(PA, R). For
the smallest value of R(PA, R), we can observe a gradual decrease in F-measure,
mostly caused by a decrease in the recall of the result alignments. However, when
increasing R(PA, R) we can observe that the decrease in adapted F-measure occurs
more quickly and steeply. For very large values of R(PA, R) we can even observe a
non-gradual decline precision and recall values. This stems from the nature of the
input partial alignment. When given a partial alignment which already contains a
large portion of the reference alignment, only a few correct correspondences remain
to be discovered. The actual amount of remaining correspondences also varies due
to the random addition of incorrect correspondences, since any concept in PA is not
matched again due to the implicit assumption of the correctness of PA.
Next, while one would initially presume that despite the presence of incorrect
128
Anchor Profiles for Partial Alignments
R(PA, R)
P (PA, R)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.632
0.634
0.598
0.598
0.562
0.529
0.465
0.404
0.641
0.603
0.562
0.523
0.457
0.391
0.282
0.164
0.649
0.604
0.517
0.476
0.365
0.270
0.146
0.023
0.656
0.608
0.508
0.382
0.271
0.157
0.028
0.130
0.663
0.545
0.436
0.300
0.157
0.028
0.168
0.081
0.674
0.537
0.353
0.180
0.056
0.233
0.135
0.056
0.685
0.456
0.250
0.084
0.306
0.201
0.108
0.038
0.702
0.388
0.112
0.068
0.286
0.175
0.087
0.028
0.745
0.274
0.493
0.379
0.260
0.156
0.069
0.018
Table 5.3: Adapted recall R∗ (A, R) of the Anchor Profile approach for varying recall
and precision levels of the input partial alignment.
R(PA, R)
P (PA, R)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.668
0.669
0.645
0.645
0.619
0.595
0.547
0.497
0.678
0.651
0.622
0.593
0.542
0.487
0.384
0.251
0.686
0.654
0.591
0.560
0.465
0.373
0.229
0.042
0.693
0.661
0.587
0.482
0.375
0.243
0.051
0.207
0.701
0.619
0.532
0.407
0.245
0.052
0.257
0.136
0.713
0.617
0.461
0.274
0.097
0.335
0.213
0.096
0.726
0.557
0.357
0.140
0.415
0.297
0.173
0.066
0.743
0.499
0.178
0.086
0.393
0.261
0.139
0.048
0.780
0.363
0.579
0.471
0.342
0.215
0.100
0.027
Table 5.4: Adapted F-measure F ∗ (A, R) of the Anchor Profile approach for varying
recall and precision levels of the input partial alignment.
correspondences in PA, the quality of the output alignments would remain constant
if the ratio between correct and incorrect correspondences in PA remains constant
as well. However, the experiment reveals that this is rarely the case. For P (PA, R)
values of 0.9, we can observe that the average F-measure is mostly constant, with
the exceptions occurring at high R(PA, R) values. Intriguingly, this constant performance is achieved due the rising precision compensating for the decreasing recall. For
P (PA, R) values of 0.8 and lower we can see that an increase of R(PA, R) actually
leads to a detriment in average F-measure. Since fraction of incorrect correspondences remains constant, we can conclude that the absolute amount of incorrect
correspondences also is a factor for the observed performance.
From this experiment, we can conclude that the precision of the input partial
alignments plays an important factor for the performance of the Anchor profile
approach, and likely also for any other approach which utilizes partial alignments.
5.3 — Experiments
129
It follows that any future deployment of this techniques requires a preprocessing
step which would ensure that the precision of the partial alignments lies as closely
to 1 as possible. The evaluation shows that, if such a pre-processing step would have
the disadvantage of reducing the recall of PA, the overall performance would still be
impacted positively.
5.3.5
Comparison with other Frameworks
Next to establishing the overall performance on the benchmark dataset, it is also
important to provide some context to that performance. To do this, we will compare
the performance of the Anchor-Profile approach with the top 8 frameworks out of
18 frameworks that participated in the OAEI 2012 competition (Aguirre et al.,
2012) in Table 5.5. Unfortunately, none of the OAEI evaluations contained a task
which also provided partial alignments, however a comparison with state-of-the-art
systems which tackled the same task without a partial alignment can still be a
useful performance indication. For this comparison, both the smallest and largest
evaluated PA size levels were used.
System
MapSSS
YAM++
Anchor-Profile (0.9)
AROMA
WeSeE
AUTOMSv2
Hertuda
Anchor-Profile (0.1)
HotMatch
Optima
Precision
Recall
F-Measure
0.99
0.98
0.866*(0.998)
0.98
0.99
0.97
0.9
0.760*(0.88)
0.96
0.89
0.77
0.72
0.745*(0.967)
0.64
0.53
0.54
0.54
0.632*(0.623)
0.5
0.49
0.87
0.83
0.78*(0.982)
0.77
0.69
0.69
0.68
0.668*(0.691)
0.66
0.63
Table 5.5: Comparison of the Anchor-Profile approach, using two different PA
thresholds, with the 8 best performing frameworks from the OAEI 2012 competition. An asterisk indicates the value has been adapted with respect to PA, while the
values inside the brackets indicate the respective measure over the entire alignment.
The results of Table 5.5 indicate that the quality of correspondences produced
by our approach is in line with the top ontology mapping frameworks in the field.
In fact, when including PA in the evaluation metrics, the anchor-profile approach
outperforms these frameworks given a large enough recall level1 of PA. Using partial
alignments with a recall of 0.1 resulted in an F-measure similar to the HotMatch
framework, ranking at 8th place in this comparison. A PA recall level of 0.9 resulted
in a sufficiently high F-measure to rank 3rd among the top ranking systems. With
regards to precision and recall, our system differentiates itself from other frameworks
1 The
results of the experiments indicate that a recall level of 0.5 would suffice.
130
Anchor Profiles for Partial Alignments
by having a comparatively lower precision and higher recall. This indicates that our
approach is capable of identifying correspondences which other system cannot, while
further measures must be implemented to differentiate between correspondences that
have similar anchor profiles.
5.4
Chapter Conclusions and Future Work
We end this chapter by summarizing the results of the experiments (Subsection 5.4.1)
and giving an outlook on future research (Subsection 5.4.2) based on the findings
presented.
5.4.1
Chapter Conclusions
In this chapter we tackled research question 2 by proposing the Anchor-Profile technique. We generalized the notion of a profile as an expression of affinity towards a
series of objects, such that two equivalent concepts are more likely to exhibit the
same degrees of affinity. Our Anchor-Profile approach uses the anchors of a partial
alignment for the measurement of affinities. We proposed a method capable of using
established similarities in order to create the profiles of each concept.
First, we demonstrated the performance of our approach with an evaluation on
the OAEI benchmark dataset (5.3.1). For this dataset, we generated the partial
alignment by randomly sampling from the reference alignment and aggregated the
results of multiple executions using numerous sampled partial alignments. In this
experiment we established that the matching performance is positively influenced
by the recall of the provided partial alignment PA by performing a full evaluation of
the benchmark dataset for different recall values of the sampled partial alignments,
spanning the interval of [0.1, 0.9] using increments of 0.1. We also noted that the
pooled standard-deviations s′P ∗ , s′R∗ and s′F ∗ increase for higher partial alignment
recalls. This is explained by the fact that the variations of P ∗ , R∗ and F ∗ will be
more pronounced if PA has a significant size. Second, we analysed the performance of
our approach in more detail by inspecting the results of the benchmark evaluation
for the different provided task groups (5.3.2). We concluded that the choice of
sim′ reflects on the performance of the Anchor-Profile approach, since in order to
adequately compare a concept with an anchor it is necessary that both contain the
meta-information that is relevant to sim’.
In a further experiment we validated the intuition behind our choice for simA by
comparing its performance against an alternative sim∗A which compares concepts to
the anchor-concept of the same ontology (5.3.3). Next, we investigated the possible
effect incorrect anchors can have on the alignment quality. For this, we performed a
systematic evaluation on the benchmark dataset by sampling the partial alignments
according to a series of combinations between specified P (PA, R) and R(PA, R) values (5.3.4). From this evaluation we concluded that the correctness of the anchor
can indeed have a significant impact on the alignment quality, particularly for larger
partial alignments. Lastly, we provided a context for the performance of our approach by comparing the achieved results to the performance of the top 8 systems
5.4 — Chapter Conclusions and Future Work
131
that were evaluated on the same dataset in the OAEI competition (5.3.5). We observed that the alignment quality of our approach is comparable with the quality of
state-of-the-art system, surpassing the performance of most systems for larger sizes
of PA.
5.4.2
Future Research
We propose two directions of future research based on our findings presented in this
chapter.
(1) In Subsection 5.3.2 we observed that the performance over the different categories of matching tasks, categorized by the types and combinations of different
kind of heterogeneities, is influenced by the choice of sim′ . To improve the robustness of our approach we selected sim′ as a combination of different types of
similarities. This selection has been shown to be susceptible to strong terminological disturbances, as seen in the performance over the matching tasks 248-253 and
254-266. Further research should evaluate the effects of different selections of sim′
and whether a higher performance can be achieved by utilizing techniques which do
no exploit terminological meta-information.
(2) Other mapping systems can utilize anchor-based techniques by generating
a set of anchors during the matching process. Future research could investigate
the performance of our technique when utilized in a similar manner. Particularly,
the research should be focused on the deployed anchor-generation techniques, as we
have shown in Subsection 5.3.4 that both the size and quality of the anchors can
significant impacts on the performance.
132
Anchor Profiles for Partial Alignments
Chapter 6
Anchor Evaluation using
Feature Selection
This chapter is an updated version of the following publication:
1. Schadd, Frederik C. and Roos, Nico (2014c). A Feature Selection Approach for Anchor Evaluation in Ontology Mapping. Knowledge Engineering and the Semantic Web, Klinov, Pavel and Mouromtsev, Dmitry
ed., pp. 160−174, Springer, Top 5 Research Paper Award Winner.
In the previous chapter we have introduced a method for mapping ontologies
using partial alignments. Further, we have established that the performance of this
method depends not only on the size of the supplied partial alignment but also on its
correctness. This implies that the performance of partial-alignment-based matchers
will also be affected by these qualities.
The third research question has been formulated as a response to the obtained
results of answering the second research question, particularly with respect to the
influence of the quality of the provided partial alignments. This chapter aims to
answer the third research question by proposing a novel method facilitating the
evaluation of the provided anchors. The results of this evaluation can be used to
apply a filtering method in order to ensure the quality of the provided anchors. To
achieve this, our proposed method is aimed at exploiting the set of correct correspondences which can be reliably generated with a pairwise similarity metric. We
compare how well a provided anchor aligns with a single generated correspondence
by proposing a dissonance measure. Further, we quantify how well an anchor aligns
with multiple correspondences by formulating this evaluation as a feature selection
task, originating from the field of machine learning. We evaluate our method using
the OAEI conference dataset when applying numerous configurations.
The remainder of this chapter is structured as follows. Section 6.1 formalizes
the task of evaluating and filtering anchors when matching with partial alignments.
Section 6.2 details our approach for the subtask of evaluating anchors. Section 6.3
presents and discusses the results of the performed experiments. The conclusions of
this chapter and a discussion of future research are presented in Section 6.4.
134
Anchor Evaluation using Feature Selection
6.1
Anchor Filtering
While the correspondences originating from a partial alignment, referred to as anchors, can be assumed to be correct, this is not always the case. In case of a generated
partial alignment, there is no guarantee that the used approach has a precision of
100% for every mapping task. If the partial alignment is made by a domain expert, it
can always occur that the expert makes a mistake. The presence of incorrect anchors
can degrade the quality of the computed correspondences, with the degradation of
quality being correlated to the quantity of incorrect anchors. In order to ensure
that a mapping approach that utilizes partial alignments performs as designed, it
becomes necessary to perform a pre-processing step that ensures that the provided
anchors are of sufficient quality.
The procedure of pre-processing partial alignments can be described by two key
steps: anchor evaluation and the application of a filtering policy. Given two ontologies O1 and O2 , and a partial alignment PA consisting of n anchors {a1 , a2 , . . . an },
the anchor evaluation step produces a set of n scores S = {s1 , s2 , . . . , sn }, with each
score sx indicating the quality of its anchor cx . The filtering step uses these scores
to discard any anchor which does not satisfy a given policy, creating a new partial
alignment PA′ , such that PA′ ⊆ PA. The entire process is illustrated in Figure 6.1.
O1
PA
Anchor
Evaluation
S
Filter
Policy
PA’
Mapping
Process
A
O2
Figure 6.1: Illustration of the anchor filtering process when mapping with partial
alignments.
Typically, the evaluation and filtering steps are achieved through the application
of already existing approaches from the field of ontology mapping. The filtering
step can be performed by simply applying a threshold to the score set S, with the
threshold value set by a domain expert or learned using a training set. To evaluate the anchors, one can utilize any available concept similarity metric (Shvaiko
and Euzenat, 2005). However, such metrics are unfortunately susceptible to concept
heterogeneities, where a concept pair for which a human would immediately conclude that it denotes the same information would result in a low similarity values.
Such heterogeneities can be mitigated through the combination of multiple similarity
metrics, though the aggregation of several similarity values has its disadvantages.
For example, given two concept pairs which respectively receive the similarity values
6.2 — Proposed Approach
135
{0, 1} and {0.5, 0.5} as determined by two metrics, one would be more inclined to
accept the first pair than the second, since it can occur that the feature on which a
similarity metric relies might be absent while at the same time the maximum score
of a given metric is only rarely a false positive. Computing the aggregate of two
similarities would thus obscure this information. The approach presented in this
paper attempts to tackle this problem by proposing a new way in which a similarity
metric can be used to evaluate anchors.
6.2
Proposed Approach
A similarity metric can produce a small set of reliable correspondences, given a
sufficiently high similarity threshold. Established matching systems, such as LogMap
(Jiménez-Ruiz and Cuenca Grau, 2011) or Anchor-FLOOD (Seddiqui and Aono,
2009), utilize this property to generate a series of anchors on-the-fly to serve as
the basis of their general mapping strategy. However, it can be the case that the
provided partial alignment originates from a source that is unknown to the mapping
system, e.g. a domain expert or a different mapping system in the case where
multiple systems are composed in a sequential order. In this case, one can generate
a set of reliable correspondences in a way similar to LogMap or Anchor-FLOOD, and
utilize this set to evaluate the provided anchors. For example, LogMap (JiménezRuiz et al., 2012b) generates this set by applying a terminological similarity with
a strict cut-off. Given a partial alignment from an unknown source, generating a
separate set of reliable correspondences presents us with the opportunity to evaluate
the partial alignment using this generated set.
Given an anchor ax ∈ {a1 , a2 , . . . an } and set of generated correspondences C =
{c1 , c2 , . . . , cm }, in order to evaluate a with C we need a metric for comparing ax
with every element of C. An aggregation of the results for each element of C could
then determine whether ax is filtered from the partial alignment. To compare ax
with any correspondence cy , it is preferable to have a metric that produces consistent
results independent of the taxonomical distances between ax and cy within the given
ontologies O1 and O2 . This is to ensure the robustness of the approach in the case
that none of the correspondences of C are closely related to ax .
As an example, let us assume we are comparing an anchor a1 , denoting the
concept car with two correct correspondences c1 and c2 , with c1 denoting the concept
vehicle and c2 denoting the concept physical object. Both c1 and c2 are correct
correspondences, hence it would be preferable if the two comparisons with ax would
result in the same value despite car being more closely related to vehicle than to
physical object.
One can interpret such a measure as expressing how well an anchor aligns with
a correspondences, as opposed to measuring the semantic similarity between the
anchor concepts. A correct anchor would thus be expected to be better aligned with
respect to a reliably classified correspondence as opposed to an incorrect anchor.
To minimize the effect of outliers and utilize all available reliably classified correspondences, one should measure the degree of alignment of an anchor and all given
correspondences, and measure how well this measure correlates with the expected
136
Anchor Evaluation using Feature Selection
result. A way to measure how well an anchor aligns with a given correspondence
would be to compute the concept similarities between the concepts in the anchor
and the concepts of the given correspondence and express how these similarities
differ. To measure this difference in similarity between the concepts of an anchor
and the concepts of a given correspondence, we propose a measure of dissonance.
Given a correspondence {c1 , c2 }, an anchor {a1 , a2 } and a base similarity measure
sim(a, b) ∈ [0, 1], we define the dissonance d as follows:
d({c1 , c2 }, {a1 , a2 }) = |sim(c1 , a2 ) − sim(c2 , a1 )|
(6.1)
Using the measure of dissonance, the core of the approach consists of comparing the
given anchor to a set of reliably generated correspondences, correct and incorrect,
and quantifying to what extend the anchor aligns with the given correspondences.
Based on this quantification, the set of anchors can then be filtered. For this research,
we will investigate three different metrics when used as base similarity sim.
e1
e2
e1
e2
m3
b1
m1
a1
A
c1
b2
b1
b2
a1
c2
c1
d2
d1
A
a2
c2
m4
a2
d1
m2
(a) Correct anchor A contrasted against
two correct matches m1 and m2 .
d2
(b) Correct anchor A contrasted against
two incorrect matches m3 and m4 .
Figure 6.2: Example scenarios of an anchor A being compared to either correct
matches, illustrating the expected semantic difference between anchors and given
correspondences.
To illustrate the principle behind the approach, consider the examples illustrated
in Figures 6.2 and 6.3. Each example illustrates two ontologies, an anchor A and
two correspondences linking two other concept pairs. Figure 6.2a depicts a correct anchor and two correct correspondences m1 = [b1 , b2 ] and m2 = [d1 , d2 ]. m1
is semantically more related to A than to m2 , thus it can be expected that when
calculating sim(a1 , b2 ) and sim(a2 , b1 ) results in higher values than when computing sim(a1 , d2 ) and sim(a2 , d1 ). It is reasonable to presume that sim(a1 , b2 ) and
sim(a2 , b1 ) will result in equally high, and sim(a1 , d2 ) and sim(a2 , d1 ) will result in
equally low values, meaning that computing the dissonance d(m1 , A) and d(m2 , A)
will result in equally low values, indicating a high degree of alignment.
6.2 — Proposed Approach
e1
b1
m1
137
e2
e1
b2
b1
e2
b2
m
A
3
a1
a1
a2
c2
c1
c2
d2
d1
A
a2
c1
m4
d1
m2
(a) An incorrect anchor A contrasted
against two correct matches m1 and m2 .
d2
(b) An incorrect anchor A contrasted
against two incorrect matches m3 and
m4 .
Figure 6.3: Four example scenarios of an anchor A being compared to incorrect
matches, illustrating the irregularity in the expected semantic difference between
anchors and given correspondences.
Comparing a correct anchor to dissimilar correspondences is expected to not
exhibit this behaviour. Figure 6.2b illustrates a correct anchor A, consisting of the
concepts a1 and a2 , and two incorrect matches m3 and m4 , which link the concepts
b1 with e2 and c1 with d2 respectively. In this situation, a similarity calculation
between a2 and b1 is likely to result in a higher value than the similarity between
a1 and e2 . Similarly, the concept similarity between the concepts of A and m3 are
also likely to differ, despite m4 being semantically more apart from A than m3 .
When given an incorrect anchor, the similarity difference between the concepts
of A and the concepts of either correct or incorrect matches are less likely to be
predictable, as illustrated in Figure 6.3a and 6.3b. Figure 6.3a depicts an incorrect
anchor A being compared to two correct correspondences. Here, both correspondences contain one concept, b1 and d2 respectively, which are semantically closer to
A than their other concept. Thus, computing a similarity measure between the concepts of a correct correspondence and the concepts of an incorrect anchor will likely
produce unequal results, regardless of the semantic distance of the correspondence to
the anchor. However, to which degree these similarity will differ is not predictable,
since this depends on how semantically related the concepts of the incorrect anchor
are. If one were to compare an incorrect anchor to an incorrect correspondences,
then the expected difference in concept similarities is not predictable at all, as illustrated in Figure 6.3b. The comparison of A with m3 is likely to produce a low
difference in similarity, being the result of the comparison of a1 with a2 and b1 with
b2 . On the other hand, it is also possible the similarity difference can be very large,
as illustrated with m4 .
138
6.2.1
Anchor Evaluation using Feature Selection
Filtering using Feature Selection
Having identified a measurement which leads to predictable behaviour for correct
anchors and less predictable behaviour for incorrect anchors, one now needs to find
a method for quantifying this predictability. As previously stated, in order for the
dissonance to behave in a predictable way one must use correspondences of which
their truth value is known with a high degree of certainty. The correct and incorrect
comparison correspondences need to be generated reliably, such that labelling them
as true and false respectively results in only few incorrect labels. Assuming that
these generated correspondences have indeed their corresponding labels, one can interpret the different dissonance measures as separate samples over a feature space.
Given a set of n input anchors A = {a1 , a2 , . . . , an } and the set of generated correspondences C = {c1 , c2 , . . . , cm } with their respective labels Y = {y1 , y2 , . . . , ym },
containing both reliably correct and incorrect correspondences, each correspondence
cx would thus consist of n dissonance measurements dx,i (i = 1, . . . n) and its label
yx . If an anchor ax is correct, then evaluating the dissonances over C would lead to
discernible differences for correct and incorrect correspondences, making the variable
representing ax in the feature space a good predictor of the labels Y .
To determine how well each dimension can serve as a predictor, one can utilize
established feature selection techniques (Guyon and Elisseeff, 2003), which have
become part of a set of important pre-processing techniques facilitating the use of
machine learning and data-mining techniques on high-dimensional datasets. These
techniques quantify how much a feature can contribute to the classification of a given
labelled dataset. Their scores are then used in order to dismiss features which do not
hold information that is relevant for classifying the data, allowing for the reduction
of the feature space and the quicker training and execution of classifiers.
For this research, we will use the computed feature scores as evaluation metric
for their corresponding anchors. Based on these values, a filtering policy can then
dismiss anchors which are unlikely to be correct. Feature selection methods can
utilize different underlying principles, for instance using correlation measures or
information theory approaches. In order to not bias our approach to a single method,
we will evaluate six different feature evaluation measures.
Pearson Correlation Coefficient A fundamental method in the field of mathematical analysis, the Pearson Correlation Coefficient (Myers, Well, and Lorch Jr.,
2010) measures the linear correlation between two variables. Having the sample set X and Y of two variables, the Pearson Correlation Coefficient is defined
as:
Pn
i=1 (Xi − X̄)(Yi − Ȳ )
qP
(6.2)
r = qP
n
n
2
2
(X
−
X̄)
(Y
−
Ȳ
)
i
i
i=1
i=1
Spearman Rank Correlation The Spearman Rank Correlation (Myers et al., 2010)
is a method which utilizes the method of computing the Pearson Correlation
Coefficient. Here, the sample sets X and Y are transformed into the ranking
sets x and y. The correlation between x and y is then computed as:
6.2 — Proposed Approach
139
p = pPn
Pn
i=1 (xi
− x̄)(yi − ȳ)
pP n
2
i=1 (yi − ȳ)
2
i=1 (xi − x̄)
(6.3)
Gain Ratio Information theoretical approaches have also been employed as measures of feature quality. Information gain techniques compute how much impurity is left in each split after a given attribute has been employed as the
root node of a classification tree (Quinlan, 1986). To measure this impurity,
the measure of entropy is commonly employed. The entropy of a variable X
is defined as:
H(X) = −
X
p(xi )log2 p(xi )
(6.4)
xi
The entropy after observing another variable is defined as:
H(X|Y ) = −
X
p(yj )
yj
X
p(xi |yj )log2 p(xi |yj )
(6.5)
xi
The information gain of X is defined as the additional amount of information
left after partitioning for all values of Y :
IG(X|Y ) = H(X) − H(X|Y )
(6.6)
The Gain Ratio is defined as the normalized information gain:
GainRatio(X|Y ) = IG(X|Y )/H(X)
(6.7)
Symmetrical Uncertainty The Symmetrical Uncertainty (Flannery et al., 1992)
is a measure that is similar to the Gain Ratio. It however employs a different normalization principle to counteract the bias towards larger attribute
sets. Using equations 6.4 and 6.6, the Symmetrical Uncertainty SU (X) can
be computed as follows:
SU (X) = 2
IG(X|Y )
H(X) + H(Y )
(6.8)
Thornton’s Separability Index Instead of using a correlation measure, Thornton’s Separability Index (Thornton, 1998) expresses separability between the
classes in a dataset. Specifically, it is defined as the fraction of data-points
whose nearest neighbour shares the same classification label. It is computed
as follows:
T SI =
Pn
i=1 (f (xi )
+ f (x′i ) + 1)
n
mod 2
(6.9)
where f is a binary value function returning 0 or 1, depending on which class
label is associated with value xi . x′i is defined as the nearest neighbour of xi .
140
Anchor Evaluation using Feature Selection
Fisher’s Linear Discriminant Fisher’s Linear Discriminant (Fisher, 1936) evaluates the discriminatory quality of a set of features by calculating the difference of means of the features and normalizing this distance by a measure of
the within-class scatter. The dataset is transformed into a linear space using the projection w which optimizes the output of the value function. The
discriminant of two features can be computed as follows:
J(w) =
|µy1 − µy2 |
s2y1 + s2y2
(6.10)
where µy and s2y denote the means and variance of class y.
Using these feature evaluation methods one can evaluate the given anchors of a
partial alignments with respect to their discriminatory qualities over the dissonance
feature space. Based on the evaluation values, a filtering policy can then decide
which anchors to discard before continuing the mapping process. The computation
of these measures has been facilitated using the Java-ML framework (Abeel, Van de
Peer, and Saeys, 2009).
6.3
Evaluation
To evaluate the proposed technique of filtering anchors, we utilized the conference dataset originating from the 2013 Ontology Alignment Evaluation Initiative
(Grau et al., 2013). This dataset contains matching tasks, including reference alignments, of real-world ontologies describing the domain of scientific conferences. While
this dataset does not contain predefined partial alignments as additional input, it
is possible to simply generate partial alignments from the supplied reference alignments. For this domain, it is preferable that the partial alignment also contains
incorrect anchors such that the capability of filtering these incorrect anchors can be
adequately tested. For each mapping task, PA is generated randomly such that it
exhibits a precision and recall of 0.5 with respect to the reference alignment. Since
we assume that a similarity metric can produce limited set reliable correspondences
given a high threshold, as mentioned in Section 6.2, we limit the set of correct correspondences in the partial alignment to correspondences which do not exhibit a
high pairwise similarity. The experiments thus provide an insight to what extent we
can reliably evaluate anchors for situations where a basic similarity-based evaluation
produces unreliable results.
Each task is repeated 100 times and the results aggregated in order to minimize
random fluctuations. For each task, the given approach evaluates the given anchors,
such that from the resulting scores a ordered ranking is created. While in a realworld application a given filtering approach would discard a series anchors based on
a given policy, for instance by applying a threshold, for an experimental set-up it is
more appropriate to perform a precision vs. recall analysis. Such an analysis allows
for a comparison of performances without having to limit oneself to a set filtering
policies.
6.3 — Evaluation
141
To evaluate the dissonance between an anchor and a comparison correspondence,
as stated in Section 6.2, a base similarity metric sim is required. We investigate three
different categories of base similarity metrics:
Syntactic A comparison between concept names and labels using a specific algorithm. The Jaro (Jaro, 1989) similarity was applied for this purpose. Subsection 6.3.1 will present the results of this experiment.
Structural A comparison between concepts which also includes information of related concepts in its computation. As an example of a structural similarity,
a profile similarity (Qu et al., 2006) has been evaluated. A profile similarity
gathers syntactical information, e.g. concept names, labels and comments,
from a given concept and its related concepts into a collection, which is referred to as profile. The similarity of two profiles determines the similarity of
the corresponding concepts. The results of this experiment will be presented
in subsection 6.3.2.
Lexical A similarity of this type aims to identify the meanings of concept senses
within a lexical resource. The senses of the lexical resource are related with
each other using semantic relations, e.g. ‘is-a-kind-of ’ relations, forming a taxonomy of senses. Concept similarities are determined by identifying the correct
concept senses and determining the distance of these senses within the lexical
taxonomy. This distance is then transformed into a similarity metric. For this
evaluation a lexical similarity using WordNet as a lexical resource has been
evaluated (Schadd and Roos, 2014b), as described in Chapter 4. Specifically,
lsm2 has been applied as similarity metric and A-MEAN as disambiguation
policy. The results of this evaluation will be presented in subsection 6.3.3 .
The final score of each anchor is determined by computing the pairwise similarity
of the anchor concepts, also computed using sim, and multiplying this similarity
with the anchor consistency score as determined using the proposed approach, using
one of the tested feature evaluation methods. We will compare the rankings of our
approach with a baseline, which is obtained by computing the pairwise similarities
of the anchor concepts using the base similarity sim, while omitting the evaluation
of the anchors using our approach. The comparison with the baseline allows us to
establish how much our approach contributes to the evaluation of the given anchors.
The presented approach requires a method of generating the set of correspondences C which serve as individuals of the feature space. In order to apply feature
selection techniques on a dataset, the class labels y of each individual must be known,
and ideally also correct. Since a single similarity metric can produce a reliable set
or correct correspondences, albeit limited in size, one can use this set as the part
of C which represent true correspondences. In order to generate reliably incorrect
correspondences, one can simply select two concepts at random while ensure that
their pairwise similarity is below a threshold. For the experiments, the quantity of
incorrect correspondences is set to be equal to the quantity of reliably correct correspondences. To generate C the Jaro similarity with thresholds 0.75 and 0.3 was
utilized to ensure that the correspondences had a sufficiently high or low similarity.
142
Anchor Evaluation using Feature Selection
6.3.1
Syntactic Similarity
In the first performed experiment the Jaro similarity was evaluated when applied as
sim in order to evaluate a syntactical similarity. The generated anchors are evaluated
and ranked according to their evaluation scores. We evaluate these rankings by
computing their aggregated interpolated precision vs. recall values, displayed in
Figure 6.4.
0.9
0.85
Precision
0.8
0.75
0.7
0.65
0.6
0.55
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
baseline
GainRatio
SymmUnc
Pearson
Spearman
Thornton
Fisher's
Figure 6.4: Precision vs. recall of the rankings created using a syntactic similarity
weighted by the evaluated feature selection methods. The un-weighted variant of
the syntactic similarity is used as baseline.
From the results depicted in Figure 6.4 several observations can be made. The
most striking observation to be made is that all six tested feature evaluation methods
produced a better ranking than the un-weighted baseline. At low recall levels this
resulted in an increased precision of up to .057. At the higher recall levels we observe
an increase in precision of up to .035.
With regard to the individual feature evaluation metrics a few trends are observable. First of all, we can see that the information theoretical approaches, meaning
the GainRatio and the Symmetrical Uncertainty improve the precision fairly consistently across all recall levels. On average, these measure improve the precision
by approximately 0.3. The Spearman rank correlation and Fisher’s discriminant
only display a marginal improvement for lower recall levels, however show a more
significant improvement for higher recall levels. The most significant improvements
for the lower recall levels are observed when applying Thornton’s separability index
and Pearson’s correlation coefficient.
6.3 — Evaluation
6.3.2
143
Structural Similarity
For the second evaluation of our approach, we replaced the Jaro similarity with a
profile similarity for sim. The profile similarity (Qu et al., 2006) compiles metainformation, primarily the name, comments and annotations, of a given concept
and concepts that are linked to the given concept using relations such as ‘is-a’
and ‘domain-of ’. A profile similarity can be classified as a structural similarity
due the utilization of information originating from related concepts. The gathered
meta-information is represented as a weighted document-vector, also referred to as a
profile. The similarity between two concepts is determined by computing the cosine
similarity of their corresponding document vectors. The results of evaluating our
approach using a profile similarity as sim can be seen in Figure 6.5.
1
0.95
0.9
Precision
0.85
0.8
0.75
0.7
0.65
0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
baseline
GainRatio
SymmUnc
Pearson
Spearman
Thornton
Fisher's
Figure 6.5: Precision vs. recall of the rankings created using a structural similarity
weighted by the evaluated feature selection methods. The un-weighted variant of
the structural similarity is used as baseline.
From Figure 6.5 we can observe a more mixed result compared to the previous
evaluation. The information-theoretical methods, namely Gain Ratio and Symmetrical Uncertainty outperform the baseline at lower recall levels, maintaining a
near-perfect precision of 0.99 for one additional recall level and outperforming the
baseline by a margin of roughly .022 at a recall of 0.3. However, for higher recall levels this margin drops until both measures perform roughly on par with the baseline
at the highest recall levels. Thornton’s Separability Index outperforms the baseline
only at lower recall levels, while Pearson’s correlation coefficient performs lower than
the baseline. The most promising measures in this experiment were Fisher’s linear
discriminant and the Spearman rank correlation, which performed higher than the
baseline for all recall levels. Contrary to the baseline, both measures produce a near
perfect ranking of 0.99 at a recall of 0.2. The Spearman rank correlation produces
rankings which have an increased precision of roughly .025 for most recall levels,
144
Anchor Evaluation using Feature Selection
while for the highest recall levels this difference is widened to roughly .045.
6.3.3
Lexical Similarity
In the third performed evaluation, we evaluated our approach when utilizing a lexical
similarity as sim. A lexical similarity derives a similarity between two concepts
by identifying their intended senses within a corpus and computing the semantic
or taxonomic distance between the senses. The resulting distance value is then
transformed into a similarity measure. For a lexical similarity to functions it is
necessary that the given corpus also models the domains of the two input ontologies.
To ensure this, WordNet (Miller, 1995) has been utilized as corpus, which aims at
modelling the entire English language. The result of utilizing a lexical similarity as
sim can be seen in Figure 6.6.
1
0.95
Precision
0.9
0.85
0.8
0.75
0.7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
baseline
GainRatio
SymmUnc
Pearson
Spearman
Thornton
Fisher's
Figure 6.6: Precision vs. recall of the rankings created using a lexical similarity
weighted by the evaluated feature selection methods. The un-weighted variant of
the lexical similarity is used as baseline.
From Figure 6.6 several key observations can be made. First of all, the baseline
displays a distinctively constant precision of .82 up to a recall level of .5. For the lower
recall levels, our approach outperforms the baseline by a significant margin using any
of the tested feature evaluation methods. Most measures produced an interpolated
precision and recall of approximately .9, indicating an improvement of .08. When
increasing the recall levels, the performance of these measures slowly approaches the
performance of the baseline, while still staying above it. The exception is Pearson’s
correlation coefficient, which performs lower than the baseline at higher recall levels.
The clearly best performing measure is Thornton’s separability index, which produced a precision higher than both the baseline and the other measures for all recall
levels. At recall levels of .3 and higher Thornton’s separability index improved upon
the baseline by up to .047. At recall levels of .0 and .1 Thornton’s separability index
produced rankings with a precision of approximately .94, an improvement of .12
6.4 — Chapter Conclusion and Future Research
145
compared to the baseline. At a recall level of .2 it still produced rankings with a
commendable precision of .91, which is .09 higher than the baseline.
Improvements of this magnitude are particularly important for the utilization of
partial alignments, since they allow a significantly larger amount of anchors to be
utilized while maintaining a degree of certainty that the anchors are correct. An
approach which utilizes partial alignments relies on the quantity and quality of the
anchors, but is likely biased towards the quality of the anchors. Thus in order to
perform well, such an approach is likely to enforce stringent criteria on the given
anchors instead of risking wrong anchors to be included in its computation. In
the case of using a lexical similarity to achieve this, our approach would lead to a
significantly higher amount of correct anchors being retained.
6.4
Chapter Conclusion and Future Research
We end this chapter by summarizing the results of the experiments (Subsection 6.4.1)
and giving an outlook on future research (Subsection 5.4.2) based on the findings
presented.
6.4.1
Chapter Conclusions
In this chapter we tackled research question 3 by proposing a feature-selection-based
approach for the evaluation of the given anchors. Our approach is designed to create
a feature space where every feature is representative to a specific anchor. We populate this feature space by generating a set of reliably classified correspondences and
computing a measure of dissonance for every feature, which measures how well an
instance aligns with a given anchor. The intuition behind this approach is that correct anchors should align better with the reliably classified correspondences, which
is measured by how well the features of the feature space can serve as predictor of a
potential classification task. We apply established feature selection techniques from
the field of machine learning to measure the predictor capability of every feature.
We evaluated our approach on the OAEI conference dataset using three different similarities for the computation of the dissonance measures and six different
feature evaluation techniques. In the first experiment, we evaluated our technique
when applying a syntactic similarity (6.3.1). We concluded from this experiment
that all tested feature selection techniques produces consistently better rankings
than the evaluated baseline ranking, with some techniques producing a significant
improvement. In the next experiment, we evaluated our technique when applying
a structural similarity instead (6.3.2). We observed more varied results, with the
application of Fisher’s Linear Discriminant and Spearman’s Rank Correlation resulting in an improvement in ranking quality compared to the baseline, while the
application of the Pearson Correlation Coefficient and Thornton’s Separability Index
resulted in a decrease in quality. We concluded that for the rankings of a similarity
metric, which itself to an extent can produce reliable rankings, can still be improved
through the application of our approach. In the last experiment, we evaluated the
performance when applying a lexical similarity (6.3.3). We observed a significant
improvement in ranking quality at lower to medium recall levels for all evaluated
146
Anchor Evaluation using Feature Selection
feature selection techniques. Particularly, the application of the Thornton’s Separability Index resulted in a ranking quality that is distinctively higher than the other
tested techniques and significantly higher than the baseline.
Taking the experimental observations into account, we conclude that the proposed evaluation method is capable of improving the performance of every tested
base similarity. Further, we observe that the measured improvements were significant when utilizing a syntactic or lexical base similarity. Combining this observation
with the performances of the different feature evaluation techniques leads us to conclude that, prior to integration into a matching system, an evaluation of the chosen
configuration is necessary in order to ensure the desired performance improvement.
6.4.2
Future Research
We propose two directions of future research based on our findings presented in this
chapter.
(1) The current evaluation was performed on the conference dataset. This dataset
does not suffer from terminological disturbances or omissions, such that similarities
which utilize terminological information can be applied. Future research could evaluate the approach on the benchmark dataset to investigate its robustness and possibly
propose techniques to improve the overall robustness.
(2) The presented technique is focused on the evaluation of anchors. However,
another part of the overall filtering procedure is the application of a filtering step
which utilizes the evaluation scores. Future research could propose and evaluate
different policies. Furthermore, given one or more filter policies one could establish the difference in alignment quality when applying a partial alignment matcher
with or without a complete anchor filtering procedure (i.e. executing the proposed
evaluation approach and applying a filter policy).
Chapter 7
Anchor-Based Profile
Enrichment
This chapter is an updated version of the following publication:
1. Schadd, Frederik C. and Roos, Nico (2015). Matching Terminological
Heterogeneous Ontologies by Exploiting Partial Alignments. 9th International Conference on Advances in Semantic Processing, Accepted
Paper.
The previous chapters have dealt with profile similarities on multiple occasions.
In Chapter 4 we introduced a profile-based method for concept-sense disambiguation
and in Chapter 5 we introduced a profile-based method for matching ontologies with
partial alignments. Both chapters attempt to improve the exploitation of external
resources, being lexical resources and partial alignments respectively, using profilebased methods. In this chapter we approach the combination of profiles and external
resources from a different perspective. Here, we attempt to improve an existing
resource-independent metric, specifically a profile similarity, through the addition of
a provided resource.
Profile similarities are widely used in the field of ontology mapping. Despite being
a type of syntactical similarity, exploiting the terminological information of ontology
concepts, they are fairly robust against terminological heterogeneities due to the
large scope of information that is exploited for each concept. For instance, if two
corresponding concepts A and B have very dissimilar labels, then a profile similarity
can still derive an appropriate similarity score if other information close to A and B
still matches, e.g. their descriptions or parent’s labels. However, this robustness has
a limit. For ontology pairs between which there is a significant terminological gap, it
is less likely that there is matching information in the vicinities of two corresponding
concepts A and B. Hence, a typical profile similarity is unsuited to deal with this
kind of matching problem. A sophisticated mapping system would then configure
itself such that the profile similarity would not be used at all.
This chapter addresses the fourth research question by exploiting a provided
partial alignment such that a profile similarity can be used to match ontologies
148
Anchor-Based Profile Enrichment
between which there exists a significant terminological gap. Given two ontologies
O1 and O2 , we redefine the neighbourhood of a concept, from which information for
a profile is gathered, by including the semantic relations of the partial alignment to
the set of exploitable relations. For example, given a concept a1 ∈ O1 , a concept
b1 ∈ O1 which is in the neighbourhood of a1 and a partial alignment PA containing
a correspondence c which matches b1 to a concept in O2 , it is possible to exploit c
in order to add information originating from O2 to the profile of a1 , thus using the
terminology of the other ontology and bridging the terminological gap between O1
and O2 . We evaluate our approach on a subset of the OAEI benchmark dataset and
the OAEI multifarm dataset, representing two datasets consisting solely of matching
problems with significant terminological gaps.
The remainder of this chapter is structured as follows. Section 7.1 discusses work
related to profile similarities and partial alignments. Section 7.2 introduces profile
similarities in more detail and illustrates the terminological gap with an example.
Section 7.3 details our approach and Section 7.4 presents the results and discussions
of the performed experiments. Finally, Section 7.5 presents the conclusions of this
chapter and discusses future work.
7.1
Related Work
Profile similarities have seen a rise in use since their inception. Initially developed
for the Falcon-AO system (Qu et al., 2006), this type of similarity has been used
in ontology mapping systems such as AML (Cruz et al., 2009), RiMoM (Li et al.,
2009) and CroMatcher (Gulić and Vrdoljak, 2013). These systems typically apply
the same scope when gathering information for a concept profile, being the parent
concepts and children concepts. Some systems, such as YAM++ (Ngo et al., 2012),
limit the scope to the information contained in the concept annotations and labels.
There exist some works which aim at extending the scope of exploited profile
information in order to improve the effectiveness of the similarity. The deployed
profile similarity in the mapping system PRIOR (Mao et al., 2007) extends the scope
of exploited information to the grand-parent concepts and grand-children concepts,
thus providing a larger amount of exploitable context information.
7.1.1
Semantic Enrichment
One way in which additional information can be exploited is through semantic enrichment. Semantic enrichment describes any process which takes any ontology O
as input and produces as output the enhanced ontology E(O), such that E(O) expresses more semantic information than O. For this purpose, a semantic enrichment
process typically exploits resources such as stop-word lists, allowing to identify words
as stop-words, or lexical resources, allowing to annotate words or concepts with their
lexical definitions. The disambiguation procedure of subsection 4.3 can serve as an
example of an enrichment process which enriches concepts with lexical senses. However, unlike in subsection 4.3, a formal enrichment process typically separates the
enrichment process with the subsequent similarity computations.
7.2 — Profile Similarities and the Terminological Gap
149
Semantic enrichment has been applied to ontology mapping in a non-profile context. Notable examples are the addition of synonyms to the concept descriptions by
exploiting lexical resources. An example of this is the LogMap system (Jiménez-Ruiz
and Cuenca Grau, 2011), which is capable of adding information from WordNet or
UMLS to the ontologies prior to mapping. Another example is YAM++ (Ngo et al.,
2012) which uses a machine translator to generate English translations of labels prior
to mapping.
A noteworthy application of semantic enrichment for a profile similarity is the
work by Su and Gulla (2004). Here, the semantic enrichment process exploits a
corpus of natural language documents. Using a linguistic classifier and optional user
input, the corpus documents are assigned to the ontology concepts, such that each
assignment asserts that the ontology concept is discussed in its associated document.
The exploitation of the corpus documents results in the concept profiles containing
much more terms, which is particularly beneficial for matching tasks where the
ontologies contain only little terminological meta-information. The concept profiles
are constructed as feature-vectors using the Rocchio algorithm (Aas and Eikvil,
1999), where a feature-vector describing the concept c is specified as the average
feature-vector over all documents di that contain the concept c. The similarities
between concepts are determined by computing the cosine-similarity between their
feature-vectors.
7.2
Profile Similarities and the Terminological Gap
Profile similarities are a robust and effective type of similarity metric and deployed
in a range of state-of-the-art ontology matching systems (Qu et al., 2006; Cruz et al.,
2009; Li et al., 2009). They rely on techniques pioneered in the field of information
retrieval (Manning et al., 2008), where the core problem is the retrieval of relevant
documents when given an example document or query. Thus, the stored documents
need to be compared to the example document or the query in order to determine
which stored documents are the most similar and therefore returned to the user. A
profile similarity adapts these document comparison techniques by constructing a
virtual document for each ontology concept, also referred to as the profile of that
concept, and determines the similarity between two concepts x and y by comparing
their respective profiles. The core intuition of this approach is that x and y can be
considered similar if their corresponding profiles can also be considered similar.
As their origin already implies, profile similarities are language-based techniques
(Euzenat and Shvaiko, 2007). Language-based techniques interpret their input as an
occurrence of some natural language and use appropriate techniques to determine
their overlap based on this interpretation. A language-based technique might for
instance perform an analysis on the labels of the concept in order to determine their
overlap. For instance given the two concepts Plane and Airplane a language-based
analysis of their labels would result in a high score since the label Plane is completely
contained within the label Airplane. Thus, despite the labels being different, a high
similarity score would still be achieved. However, the degree of surmountable labeldifference has a limit for language-based techniques. The labels of the concepts Car
150
Anchor-Based Profile Enrichment
and Automobile have very little in common with respect to shared characters, tokens
or length. Thus, many language-based techniques are unlikely to result in a high
value.
Profile similarities have the advantage that they draw from a wide range of
information per concept. Thus terminological differences between the labels of two
concepts can still be overcome by comparing additional information. This additional
information typically includes the comments and annotations of the given concept
and the information of semantically related concepts (Qu et al., 2006; Mao et al.,
2007). The range of exploited information of a typical profile similarity is visualized
in Figure 7.1.
“Structure”
is-a
Structure
is-a
Building
“An object
constructed
Profile with parts”
“Building”
“A structure
with walls
and a roof”
Description
“House”
is-a
House
domain
“A building
that houses
a family”
ownedBy
“Hostel”
Hostel
“Cheap
supervised
lodging”
Figure 7.1: Illustration of the typical range of exploited information of a profile
similarity.
Figure 7.1 illustrates the range of exploited information when constructing a
profile of the concept House. The concept itself contains some encoded information in
the form of a label and a comment. The unification of this information is also referred
to as the description of the given concept. The concept House is also associated to
other concepts through either semantic relations or because it is associated with a
7.3 — Anchor-Based Profile Enrichment
151
property of the other concepts. The information encoded in these related concepts
plus the information in the description of the concept House make up the information
typically found in the profile of House.
In order for two profiles to be similar, they must contain some shared terminology. For example, the concepts House and Home can still be matched if their
parents contain the word Building or if a concept related Home contains the word
“House”. In essence, in order for profile similarities to be effective it is still required
that the two given ontologies O1 and O2 exhibit some overlap with respect to their
terminologies. However, this is not always the case as two ontologies can model the
same domain using a completely different terminology. This can be the result of one
ontology using synonyms, different naming conventions or the usage of acronyms.
Furthermore, two ontologies might even be modelled in a different language. For example, one might need to match two biomedical ontologies where one is modelled in
English and one in Latin. In the multilingual domain, the terminological difference
depends on the relatedness between the given languages. For example, ontologies
defined using English and French might have some overlap since English has adapted
words from the French language and vice-versa. However, such a limited overlap is
unlikely to occur when for instance comparing French and Romanian ontologies.
Matching tasks with little terminological overlap can still be regarded as difficult,
since the overlapping terms might be concentrated in a few sub-structures of the
ontologies, meaning that the other sub-structures would exhibit no terminological
overlap at all. The terminological gap between two ontologies is illustrated in Figure
7.2.
Figure 7.2 displays the example of Figure 7.1 next to a series of concepts from
a different ontology, Ontology 2, modelling the same entities. The terminological
gap is illustrated through the fact that all information in Ontology 2 is modelled
in German instead of English. As we can see, comparing the concept House with
its equivalent concept Haus using a typical profile similarity is unlikely to produce
a satisfying result, since neither the concepts Haus and House nor their related
concepts contain any overlapping terminology. Therefore, additional measures are
necessary in order to ensure the effectiveness of profile similarities when the given
ontologies have little to no shared terminology.
7.3
Anchor-Based Profile Enrichment
A typical profile similarity is inadequate for ontology matching problems with significant terminological gaps. One way of tackling this issue is through semantic
enrichment by exploiting lexical resources such as WordNet (Miller, 1995), UMLS
(Bodenreider, 2004) or BabelNet (Navigli and Ponzetto, 2010). Techniques which
fall under this category work by looking up each concept in the given resource and
adding synonyms, additional descriptions or translations to the concept definition.
However, these techniques rely on several assumptions: (1) the availability of an
appropriate resource for the given matching problem, (2) the ability to locate appropriate lexical entries given the naming formats of the ontologies, and (3) the
ability to disambiguate concept meanings such that no incorrect labels or comments
152
Anchor-Based Profile Enrichment
Ontology 1
Ontology 2
“Aufbau”
“Structure”
is-a
Building
“A structure
with walls
and a roof”
Description
is-a
Aufbau
“An object
constructed
Profile with parts”
“Building”
Gebäude
Haus
is-a
is-a
“A building
that houses
a family”
“Cheap
supervised
lodging”
“Ein Aufbau
dass eine
Familie
unterbringt”
“Herberge”
“Hostel”
Hostel
“Ein Aufbau
mit Wände
und Dach”
“Haus”
“House”
House
“Ein aus Teilen
bestehender
Gegenstand”
“Gebäude”
is-a
is-a
Structure
Herberge
“Eine billige
beaufsichtigte
Unterkunft”
Figure 7.2: Illustration of a terminological gap between two ontologies modelling
identical concepts.
are added to the concept definition. We can see that the performance of such techniques is severely impacted if any of these assumptions fail. If (1) and (2) fail then it
is not possible to add additional information to the concept definition, thus causing
the ontology concepts to be compared using only their standard profiles. Examples
of assumption (1) failing would be the lack of access to an appropriate resource, for
instance due to lack of connectivity, or the lack of existence of any appropriate resource for the given matching task due to the specific nature of the ontologies. As an
example for assumption (2) failing, let us consider an example concept EE and it’s
corresponding lexical entry Energy Efficient, referring to a type of engine. It is easy
to see that the labels are very different, making it a difficult task to match concepts
to lexical entries. To ensure the ability of identifying correct lexical entries when
dealing with ambiguous concepts, one needs to apply a disambiguation technique,
as introduced in sub-section 4.1.2. The current state-of-the-art disambiguation systems can achieve an accuracy of roughly 86% (Navigli, 2009), meaning that even if
a state-of-the-art system is applied there is still a significant proportion of concepts
7.3 — Anchor-Based Profile Enrichment
153
which would be associated with unrepresentative information based on incorrectly
designated lexical entries.
In the case that an appropriate lexical resource is not available, other measures
are necessary to overcome the terminological gap. These typically are the exploitation of other ontological features, for example the ontology structure. However, it
may be the case that instead of a lexical resource a different kind of resource is available to be exploited. For a given mapping problem it is possible that an incomplete
alignment, also refereed to as partial alignment, is available as additional input. A
partial alignment can stem from efforts such as a domain expert attempting to create
an alignment, but being unable to complete it due to given circumstances, or from
a high-precision system generating such an alignment. The correspondences of the
given partial alignment can then be exploited in order to determine the unidentified
correspondences.
Our approach aims at adapting profile similarities to be appropriate for matching problems with significant terminological gaps through the exploitation of partial
alignments. It is based on the insight that an ontology will consistently use its own
terminology. For instance, if an ontology uses the term Paper to refer to scientific
articles, it is unlikely to use the equivalent term Article in the descriptions of other
concepts instead, especially if the ontology is designed using a design principle that
enforces this property (Sure, Staab, and Studer, 2002). However, if a partial alignment contains the correspondence Paper-Article, then one can use this insight to
one’s advantage. For instance, given the concept Accept Paper a profile similarity is
more likely to match it to its appropriate counterpart Approve Article if the profile
of Accept Paper contains the term ‘Article’.
A partial alignment PA is a set of correspondences, with each correspondence
asserting a semantic relation between two concepts of different ontologies. The types
of relations modelled in a partial alignment, e.g. ⊒, ⊥, ⊓ and ≡, are typically also
modelled in an ontology and thus exploited in the construction of a profile. Thus, by
semantically annotating the given ontologies O1 and O2 with the correspondences of
PA it becomes possible to exploit these newly asserted relations for the creation of
the concept profiles. This enables us to construct the profiles of O1 using a subset of
the terminology of O2 , increasing the probability of a terminological overlap between
the profiles of two corresponding concepts. This idea is illustrated in Figure 7.3.
Before we introduce our approach, we need to define a series terms and symbols
that will be used in the following sections:
Collection of words: A list of unique words where each word has a corresponding
weight in the form of a rational number.
+: Operator denoting the merging of two collections of words.
×: Operator denoting element-wise multiplication of term frequencies with a weight.
depth(x): The taxonomy depth of concept x within its ontology.
D: The maximum taxonomical depth of a given ontology.
Next, it is necessary to provide a definition of a basic profile similarity upon
which we can base our approach. For this, we provide a definition similar to the work
154
Anchor-Based Profile Enrichment
Ontology 1
Ontology 2
Enriched Profile
“Aufbau”
“Structure”
≡
Aufbau
is-a
“An object
constructed
with parts”
is-a
Structure
“Gebäude”
“Building”
“A structure
with walls
and a roof”
Description
Gebäude
is-a
is-a
Building
Haus
“Ein Aufbau
dass eine
Familie
unterbringt”
is-a
is-a
“A building
that houses
a family”
“Herberge”
“Hostel”
Hostel
“Cheap
supervised
lodging”
“Ein Aufbau
mit Wände
und Dach”
“Haus”
“House”
House
“Ein aus Teilen
bestehender
Gegenstand”
Herberge
“Eine billige
beaufsichtigte
Unterkunft”
Figure 7.3: Two equivalent concepts being compared to a series of anchors.
by Mao et al. (2007). Neighbouring concepts are explored using a set of semantic
relations, such as isChildOf or isParentOf.
A base function of a profile similarity is the description of a concept, which
gathers the literal information encoded for that concept. Let x be a concept of an
ontology, the description Des(x) of x is a collection of words defined as follows:
Des(x)
=
collection of words in the name of x
+collection of words in the labels of x
+collection of words in the comments of x
+collection of words in the annotations of x
(7.1)
We define the profile of x as the merger of the description of x and the descriptions
of concepts that are semantically related to x :
Profile(x) = Des(x) +
X
p∈P (x)
Des(p) +
X
c∈C(x)
Des(c) +
X
r∈R(x)
Des(r)
(7.2)
7.3 — Anchor-Based Profile Enrichment
where
155
P (x) = {p|x isChildOf p}
C(x) = {c|c isChildOf x}
R(x) = {r|r isRelatedTo x ∧ r ∈
/ P (x) ∪ C(x)}
In order to compute the similarity between two profiles, they are parsed into
a vector-space model and compared using the cosine similarity (Pang-Ning et al.,
2005). Note that while it is possible to weigh the descriptions of the related concepts
and the different collections within the description of each concept, similar to the
model presented in sub-section 4.3.3. However, we opt for a uniform weighting
for two reasons: (1) A detailed analysis of the influence of these weights would
provide little research value since this analysis has already been performed in subsection 4.4.3 for the model of sub-section 4.3.3, and (2) the profile similarity and its
variations can be easier understood and replicated when using a uniform weighting
scheme.
To bridge the terminological gap we aim to exploit the semantic relations provided by a given partial alignment PA, such that we can enhance the profile of a
concept x ∈ O1 using the terminology of O2 . We refer to this enlarged profile as the
anchor-enriched-profile. For this, we explore the parents, children and properties of
a concept x (or ranges and domains in case x itself is a property). If during this
exploration a concept y is encountered which is mapped in a correspondence in PA
to a concept e ∈ O2 , then Profile(x) is extended with Des(e).
We will define the set which describes the merged collection of parentally-anchoreddescriptions (PAD) with respect to concept x in three variations. These gather the
descriptions of anchored concepts from the ancestors of x. To measure the improvement caused by the addition of these sets, we also define the omission of any such
description. PAD’s are defined as follows:
PAD0 (x, PA) =
PAD1 (x, PA) =
PAD2 (x, PA) =
∅
P
e∈E Des(e); where
E
P= {e|∃ < id, y, e, t, c >∈ PA; y isAncestorOf x}
e∈E ω × Des(e); where
E = {e|∃ < id, y, e, t, c >∈ PA; y isAncestorOf x}
∧ ω = D−|depth(x)−depth(y)|
D
(7.3)
An interesting point to note is that PAD2 utilizes the same set of concepts as
PAD1 , but weights each description using its relative distance to x, such that the
descriptions of closer concepts receive a higher weight.
Exploring the children of x, we define the merged collection of child-anchoreddescriptions (CAD) in a similar way:
CAD0 (x, PA) =
CAD1 (x, PA) =
CAD2 (x, PA) =
∅
P
e∈E Des(e); where
E
P= {e|∃ < id, y, e, t, c >∈ PA; y isDescendantOf x}
e∈E ω × Des(e); where
E = {e|∃ < id, y, e, t, c >∈ PA; y isDescendantOf x}
∧ ω = D−|depth(x)−depth(y)|
D
(7.4)
156
Anchor-Based Profile Enrichment
Lastly, we can explore the relations defined by the properties of the ontology,
being isDomainOf and isRangeOf. Defining Oc as the set of concepts defined in
ontology O and Op as the set of properties of O, we define the merged collection of
relation-anchored-descriptions (RAD) in two variations as follows:
RAD0 (x, PA) =
RAD1 (x, PA) =
RAD2 (x, PA) =
∅ P
e∈E Des(e); where
P E = {e|∃ < id, y, e, t, c >∈ PA; x isDomainOf
e∈E Des(e); where
E = {e|∃ < id, y, e, t, c >∈ PA; y isDomainOf
y
P isRangeOf x}
e∈E∪F Des(e); where
E = {e|∃ < id, y, e, t, c >∈ PA; x isDomainOf
and F = {f |∃ < id, y, f, t, c >∈ PA ∃z ∈ Op ;
P x isDomainOf z ∧ y isRangeOf z}
e∈E Des(e); where
E = {e|∃ < id, e, y, t, c >∈ PA; y isDomainOf
y isRangeOf x}
y}
if x ∈ Oc
x∨
if x ∈ Op
y}
if x ∈ Oc
x∨
if x ∈ Op
(7.5)
The noteworthy difference between RAD1 and RAD2 , given a property z which
has x as its domain, is that RAD2 will include the description of the range of y
in addition to the description of y itself. As an example, assume we are given
the concepts Car and Driver being linked by the property ownedBy. Constructing
the anchor-enriched-profile of Car using the set RAD1 would mean that we only
investigate if ownedBy is mapped in PA. Using RAD2 means we also investigate
Driver, which could provide addition useful context.
Given a partial alignment PA between ontologies O1 and O2 , and given a concept
x, we define the anchor-enriched-profile of x as follows:
ProfileAE
κ,λ,µ (x, PA) = Profile(x) + PADκ (x, PA) + CADλ (x, PA) + RADµ (x, PA)
(7.6)
7.4
Experiments
In this section we will detail the performed experiments to test the effectiveness of
our approach and discuss the obtained results. To adequately test our approach we
need (1) a dataset with matching problems demonstrating terminological gaps and
(2) a partial alignment for each matching task. For this we will utilize the benchmarkbiblio (Aguirre et al., 2012) and multi-farm (Meilicke et al., 2012) datasets. For the
benchmark dataset we selected only mapping tasks where the concept names have
been significantly altered in order to represent terminological gaps. The multi-farm
dataset contains only matching problems with terminological gaps by design, thus
requiring no sub-selection of matching tasks. For each task a partial alignment PA
is supplied by randomly sampling the reference alignment R. Each matching task is
7.4 — Experiments
157
repeated 100 times with a newly sampled partial alignment in order to mitigate the
variance introduced by randomly sampling PA from R. To evaluate the produced
alignments we will use the measures of P ∗ , R∗ and F ∗ , as introduced in Section 2.3.
The experiments in the remainder of this section are structured as follows:
• Subsection 7.4.1 details the experiments performed on the benchmark dataset.
• In subsection 7.4.2 the same evaluation is performed on the multi-farm dataset.
• In subsection 7.4.3 we will evaluate some variations of our approach using
different measures of R(PA, R) to investigate how the different variations cope
with varying sizes of PA
• We compare our approach to complete systems utilizing lexical resources instead of partial alignments when evaluating the multi-farm dataset in subsection 7.4.4. To establish the potential of our approach, this comparison
includes one variation where the ontologies are semantically enriched using a
lexical resource.
7.4.1
Benchmark
For our first experiment, we evaluate our approach on the benchmark-biblio dataset
originating from the 2014 OAEI competition (Dragisic et al., 2014). This dataset
presents a synthetic evaluation where an ontology is matched against systematic alterations of itself. For this experiment, we focus our attention only on matching tasks
with significant terminological gaps, for which our approach is aimed at. Specifically,
we evaluate the following matching tasks: 201-202, 248-254, 257-262 and 265-266.
These represent matching tasks in which the concept names and annotations have
been significantly altered. For each task we randomly sample a partial alignment PA
from the reference alignment R with the condition that R(PA, R) = 0.5. From the
output alignment we compute the adapted Precision, Recall and F-Measure P ∗ , R∗
and F ∗ and aggregate the results over the entire dataset. We evaluate our approach
with this method using every possible combination of κ, λ and µ, which determine to
what extent profiles are enriched using anchored concepts (as specified in Equation
7.6). The results of this evaluation can be see in Table 7.1.
First, an important aspect to note is that the configuration κ = 0, λ = 0, µ = 0,
which can also be denoted as P rof ileAE
0,0,0 , is the baseline performance. This configuration results in the execution of a standard profile similarity, thus not utilizing any relations provided by the partial alignment. The baseline performance of
∗
P rof ileAE
0,0,0 is very low, with an F value of only 0.119, demonstrating how a profile
similarity struggles with this type of matching task. By comparing the performance
of P rof ileAE
0,0,0 to the performance of any other configuration we can see that our
approach improves upon the performance of the baseline profile for every tested
configuration. Enriching the profiles using only PAD1 or PAD2 resulted in an FMeasure of 0.251 and 0.230 respectively. Increases of a similar amount in F-Measure
can also be seen when only using CAD1 , CAD2 , RAD1 or RAD2 . However, we can
observe that applying CAD1 or CAD2 has a more significant impact on the Precision, with an increase of up to 0.2 approximately. Applying RAD1 or RAD2 has
158
Anchor-Based Profile Enrichment
κ
λ
µ
P∗
R∗
F∗
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.520
0.420
0.402
0.719
0.631
0.622
0.676
0.567
0.526
0.093
0.208
0.215
0.199
0.307
0.314
0.187
0.298
0.294
0.119
0.268
0.271
0.277
0.398
0.401
0.251
0.380
0.368
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.480
0.538
0.539
0.602
0.615
0.620
0.544
0.568
0.564
0.197
0.310
0.318
0.296
0.404
0.409
0.287
0.391
0.393
0.251
0.375
0.383
0.372
0.476
0.482
0.346
0.453
0.453
2
2
2
2
2
2
2
2
2
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.453
0.481
0.464
0.589
0.590
0.589
0.561
0.558
0.531
0.186
0.298
0.292
0.278
0.386
0.384
0.272
0.377
0.363
0.230
0.357
0.348
0.345
0.455
0.454
0.329
0.440
0.422
Table 7.1: Aggregated adapted Precision, Recall and F-Measure when evaluating all
variations of our approach on a selection of tasks from the Benchmark dataset.
a more significant effect on the resulting Recall. Applying only RAD2 for example
resulted in an increase in Recall of approximately 0.12.
While applying PAD, CAD or RAD separately did result into an improvement
in performance, the resulting alignment quality however does not yet resemble the
quality one typically sees when tackling easier matching tasks. This changes when
enriching the profiles by exploiting multiple relation types. Enriching the profiles
with two different sets of anchored descriptions typically results in an F-Measure of
approximately 0.35-0.37. Two combinations however exceed this typical performance
AE
range. Applying P rof ileAE
0,1,1 and P rof ile0,1,2 resulted in an F-Measure of 0.398 and
0.401 respectively. This increase in F-Measure is the result of a comparatively higher
Precision compared to other dual-combinations of description sets.
7.4 — Experiments
159
Finally, we can see that utilizing all description sets, being PAD, CAD and
RAD, has resulted in the highest measured performance. The best performance
of this evaluation has been the result of applying P rof ileAE
1,1,2 , with a Precision of
0.62, a Recall of 0.409 and an F-Measure of 0.482. Comparing the performance
AE
of P rof ileAE
1,1,2 with P rof ile2,2,2 indicates that equally weighing the descriptions of
concepts that are anchored via ancestors or descendants results in a better performance than giving these descriptions a higher weight if they are more closely related
AE
to the given concept. This can also be seen when P rof ileAE
0,1,0 with P rof ile0,2,0 ,
albeit with a less pronounced difference. For variations that differ only with respect
to RAD we can see that there is no clear trend indicating whether RAD1 or RAD2
performs better. RAD1 resulted in a better performance if PAD2 is also applied.
However, for combinations which did not utilize PAD2 the application of RAD2
resulted in better performances instead. We will investigate in sub-section 7.4.2
whether these findings are consistent.
Overall, we can conclude that our approach indeed enables profile similarities to
better cope with matching problems that are characterized by a significant terminological gap. However, while our approach certainly improves the performance for
these matching problems, an F-Measure of 0.482 indicates that the approach alone
is not yet sufficient to autonomously tackle these tasks.
7.4.2
MultiFarm
In this section we will present the results of our evaluation on the MultiFarm dataset.
This data-set stems from the OAEI 2014 (Dragisic et al., 2014) competition. The
terminologies of the ontologies in this dataset vary greatly since it is designed to
be a cross-lingual dataset1 . The set consists of 8 ontologies that are modelled using 9 languages (including English). For each pair of ontologies a set of mapping
tasks exists consisting of every possible combination of selecting different languages.
As during the previous evaluation, we generate the partial alignments by randomly
sampling the reference alignment with the condition that R(PA, R) = 0.5 and aggregate the results of 100 evaluations for each task. This evaluation is repeated for
every possible combination of κ, λ and µ. The result of this evaluation is presented
in Table 7.2.
First, by comparing the performance of the baseline configuration P rof ileAE
0,0,0
to any configuration of our approach we can easily see that our approach improves
upon the performance of the baseline. Adding the sets PAD or CAD using either
variation typically resulted in an F-Measure of 0.39-0.43, an improvement of 0.07 to
0.11 when compared to the baseline. Curiously, enriching the profiles using RAD
alone typically resulted in a F ∗ score of approximately 0.5. This could indicate that
for this dataset the concept annotations more often contain terms of related concepts
than ancestors or descendants.
Looking at dual-combinations between PAD, CAD and RAD we can see a consistent increase in performance. Of these combinations, P rof ileAE
1,1,0 resulted in
AE
the lowest F-Measure of 0.47, while P rof ile1,0,1 resulted in the highest F-Measure
1 A cross-lingual mapping problem is defined by the given ontologies being mono-lingual, but
modelled using different natural languages.
160
Anchor-Based Profile Enrichment
κ
λ
µ
P∗
R∗
F∗
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.418
0.657
0.630
0.500
0.675
0.666
0.512
0.688
0.678
0.278
0.433
0.405
0.324
0.469
0,453
0.333
0.475
0.457
0.326
0.510
0.481
0.381
0.543
0.529
0.393
0.552
0.535
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.521
0.667
0.659
0.594
0.691
0.688
0.601
0.699
0.695
0.376
0.529
0.518
0.409
0.559
0.555
0.417
0.565
0.562
0.423
0.583
0.574
0.470
0.611
0.609
0.478
0.619
0.615
2
2
2
2
2
2
2
2
2
0
0
0
1
1
1
2
2
2
0
1
2
0
1
2
0
1
2
0.523
0.674
0.661
0.591
0.690
0.685
0.597
0.698
0.692
0.385
0.538
0.522
0.411
0.562
0.554
0.421
0.570
0.562
0.433
0.592
0.577
0.471
0.614
0.607
0.481
0.622
0.614
Table 7.2: Aggregated adapted Precision, Recall and F-Measure when evaluating all
variations of our approach on the MultiFarm dataset.
of 0.583. We can also observe that combinations which include a variation of the
RAD-set in the enriched profiles typically performed better than combinations that
didn’t.
Lastly, we can observe using all three types of description sets resulted in the
highest measured F ∗ score. We can see that every combination of PAD, CAD and
RAD resulted in an F ∗ score higher than 0.6. The best performing combination
∗
was P rof ileAE
2,2,1 with an F score of 0.622. While this contrasts with the results of
subsection 7.4.1, with P rof ileAE
1,1,2 resulting in the highest score, we can observe that
AE
the difference in performed between P rof ileAE
2,2,1 and P rof ile1,1,2 is not as distinct
for this dataset.
Comparing RAD1 with RAD2 reveals a different trend than subsection 7.4.1.
7.4 — Experiments
161
Here, combinations which utilized RAD1 performed slightly better than combinations which used RAD2 instead. Taking the results of the benchmark dataset in
mind, we thus cannot conclude whether RAD1 or RAD2 will result in a better performance in all cases. However, we can conclude that the inclusion of RAD1 or
RAD2 in the concept profile does increase the performance of a profile similarity.
7.4.3
Influence of Partial Alignment Size
As detailed in section 7.3, the approach exploits the semantic relations of the provided partial alignment to enrich concept profiles with additional information. Hence,
in order for a profile of a given concept x to be enriched, the partial alignment must
contain a relation which specifies a concept which lies in the semantic neighbourhood
of x. If this is not the case, then the profile of x will remain unchanged. It follows
that, in order for this approach to be effective, the partial alignment must contain
a sufficient amount of correspondences.
In this subsection we will investigate to what extent our approach is effective
when supplied with partial alignments of varying size. To do this, we perform an
evaluation similar to subsection 7.4.2. We evaluate our approach using the MultiFarm dataset by sampling random partial alignments from the reference 100 times for
each task and aggregating the results. However, for a given combination P rof ileAE
κ,λ,µ
we perform this evaluation using different sizes of the partial alignment, with recall
values R(PA, R) ranging from 0.1 to 0.9 using increments of 0.1. The configurations
AE
P rof ileAE
1,1,2 and P rof ile2,2,2 were utilized for this evaluation. The results of this
experiment can be seen in Table 7.3.
R(PA, R)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
P rof ileAE
1,1,2
∗
P
R∗
F∗
.436
.272
.324
.529
.339
.403
.587
.406
.471
.639
.479
.540
.688
.556
.609
.737
.628
.673
.788
.711
.744
.837
.794
.811
.883
.891
.883
P rof ileAE
2,2,2
P∗
R∗
F∗
.432
.277
.328
.524
.345
.406
.587
.411
.475
.641
.484
.545
.692
.561
.614
.742
.638
.681
.790
.721
.751
.833
.806
.816
.859
.904
.878
Table 7.3: Aggregated adapted Precision, Recall and F-Measure on the MultiFarm
dataset when varying the Recall of the supplied partial alignment.
The main trend which we can observe from Figure 7.3 is that P rof ileAE
1,1,2 and
AE
P rof ile2,2,2 appear to be similarly affected by the size of PA. For every increment
AE
of the partial alignment size the F ∗ scores of P rof ileAE
1,1,2 and P rof ile2,2,2 rise
by roughly 0.07. For partial alignment recalls of 0.1 up to 0.8 we can observe
AE
that P rof ileAE
2,2,2 performs slightly better than P rof ile1,1,2 . While a different in
∗
F score by 0.005 can be dismissed as the result of the introduced variation of
162
Anchor-Based Profile Enrichment
sampling the partial alignment, it is noteworthy here since it is consistent for all
partial alignment recalls up to 0.8. Interestingly, for the highest recall value of PA
AE
AE
P rof ileAE
1,1,2 performed higher than P rof ile2,2,2 . This is the result of P rof ile1,1,2
producing alignments with significantly higher precision at this configuration.
AE
Overall, given the performances of P rof ileAE
1,1,2 and P rof ile2,2,2 , we can conclude
that weighting descriptions of the anchored concepts such that the descriptions of
closely related concepts are weighted higher results in a slight increase in performance
for most recall levels of PA.
7.4.4
Comparison with Lexical Enrichment Systems
The main goal behind this work is to provide an approach that allows the enrichment of concept profile by exploiting the relations of a provided partial alignment.
The reason behind this is that current enrichment methods exploit primarily lexical
resources, which rely on the presence of an appropriate resource. In the previous
sections we have established the performance of our approach using varying configurations, datasets, and partial alignment sizes. In this section, we will provide
some interesting context for these results. Specifically, we aim to compare the results of our approach with the performances of matching systems tackling the same
dataset while exploiting lexical resources. This allows us to establish whether an
approach exploiting a partial alignment can produce alignments of similar quality
than approaches exploiting lexical resources.
To do this, we will compare our approach to the performances of the OAEI
participants on the MultiFarm dataset (Dragisic et al., 2014). Here we will make
the distinction between approaches utilizing no external resources, lexical resources
and partial alignments. This allows us to see the benefit of exploiting a given type
of external resource.
Furthermore, to provide an upper boundary for the potential performance on this
dataset, we will also evaluate a method utilizing both lexical resources and partial
alignments. To achieve this, we will re-evaluate the best performing configuration
from sub-section 7.4.2. However, the profiles of this re-evaluation will be additionally
enriched by translating the concept labels using the Microsoft Bing translator. This
will provide an indication of how well a system may perform when utilizing both
appropriate lexical resources and partial alignments. The comparison can be seen
in Table 7.4.
From Table 7.4 we can make several observations. First, we can observe that
every system utilizing either lexical resources or partial alignments performs significantly better than systems which do not. This is an expected result given the nature
of this dataset. Of the system which do not exploit resources AOT has the highest
performance with an F-Measure of 0.12.
Comparing the performance of P rof ileAE
2,2,1 to the performance of system exploiting only lexical resources reveals an interesting observation. Specifically, we can see
that the performance of these system is comparable. While the performances of
LogMap and XMap were lower than P rof ileAE
2,2,1 , with an F-Measure of 0.62 the
performance of AML is very close to the performance of P rof ileAE
2,2,1 . However,
AML distinguishes itself from our approach by having a notably higher precision
7.5 — Chapter Conclusion and Future Work
Lex.
P. Align.
Matcher
yes
yes
P rof ileAE
2,2,1
yes
yes
yes
no
no
no
AML
LogMap
XMap
no
yes
no
no
no
no
no
no
no
no
no
no
no
no
163
Precision
Recall
F-Measure
0.849
0.838
0.843
0.95
0.94
0.76
0.48
0.27
0.40
0.62
0.41
0.50
P rof ileAE
2,2,1
0.698
0.570
0.622
AOT
AOTL
LogMap-C
LogMapLt
MaasMatch
RSDLWB
0.11
0.27
0.31
0.25
0.52
0.34
0.12
0.01
0.01
0.01
0.06
0.01
0.12
0.02
0.02
0.02
0.10
0.02
+ Bing
Table 7.4: Comparison between the performance of our approach and the competitors of the 2014 OAEI competition on the MultiFarm dataset. Performances of
approaches utilizing partial alignments are denoted in adapted precision, recall and
F-Measure.
and a somewhat lower recall. In fact, all systems utilizing only lexical resources are
characterized with a high precision, which implies that enriching ontologies using
these resources only rarely leads to false-positive matches in terminology.
Lastly, we can observe the performance of our approach when paired with a
lexical resource, specifically Bing Translator. Here the produced alignments reached
an F ∗ score of 0.843, which is significantly higher than the OAEI participants. This
implies that the correct correspondences which lexical-based systems identify differ
significantly from the correct correspondences of a partial-alignment-based system.
From this we can conclude that the two types of resources are complementary for
matching problems with significant terminological gaps.
7.5
Chapter Conclusion and Future Work
We end this chapter by summarizing the results of the experiments (Subsection 7.5.1)
and giving an outlook on future research (Subsection 7.5.2) based on the findings
presented.
7.5.1
Chapter Conclusions
This chapter is aimed at answering research question 4, which concerns the matching
of ontologies between which there exists a significant terminological gap by exploiting
a given partial alignment. To answer this question, we developed an extension to
an existing profile similarity, which interprets the semantic relations asserted by
the partial alignment as additional exploitable relations. For the extended concept
164
Anchor-Based Profile Enrichment
profile, we define three different sets of concepts, being the ancestors, descendants
and otherwise related concepts. The extended profile is then created for each concept
by identifying the anchors in each set and exploiting the descriptions of the anchored
concepts.
First, we established the performance of our approach on the Benchmark dataset
in sub-section 7.4.1 and on the MultiFarm dataset in sub-section 7.4.2. The evaluation on both datasets were performed with identical experimental set-ups. Every
configuration of our approach is evaluated by repeatedly sampling a partial alignment from the reference alignments an aggregating the results of every evaluation.
For both datasets we observed that our proposed extension significantly improves
the performance of the profile similarity. We observed that exploring the relations
to anchored parent-, child- and otherwise related concepts for a given concept all
contribute to the quality of the output alignment. For both datasets, the top performance was measured in a configuration which exploited all three sets of exploitable
concepts. In a further experiment we investigated the influence of weighting the
added concept descriptions based on their distances to the given concept for partial
alignment of varying sizes. We observed that weighting the descriptions based on
their distance does resulted in a slightly better performance for most partial alignment sizes. However, for very large partial alignment sizes, the uniform weighting
scheme performed better. Given that in a normal mapping scenario it is very unlikely that a given partial alignment exhibits such a significant size, we can conclude
that for a real-world mapping scenario it is likely that a distance-based weighting
scheme results in a better performance.
In the final experiment we compared the performance of our approach to the
performances of other systems on the MultiFarm dataset. We established that the
performance of our approach is comparable to the performances of systems exploiting
lexical resources. Additionally, in order to give an indication of performance for
a system exploiting both partial alignments and appropriate lexical resources, we
executed our approach with an added enrichment system utilizing Microsoft Bing.
This addition significantly improved the performance of our approach and resulted
in a significantly higher performance than the compared OAEI systems. From this
we can conclude that matching problems with significant terminological gaps can
be matched with a high quality if both partial alignments and a lexical resource are
available.
7.5.2
Future Research
We propose two directions of future research based on our finding presented in this
chapter.
(1) In sub-section 7.4.4 we observed a performance indication of a mapping system utilizing both lexical resources and partial alignments. A future line of research
would be the investigation to what extent this performance is replicable if a partial
alignment is generated on-the-fly, instead of being given. This research should investigate which mapping technique produces reliable anchors for matching problems
with significant terminological gaps and to what extent each technique impacts the
subsequent matching performance.
7.5 — Chapter Conclusion and Future Work
165
(2) The core process of the proposed approach is the addition of anchored concept
descriptions to an already existing profile. This includes terminology which might
not re-occur in any concept description of either ontology. Future work could focus
on the occurrence rates of terms within either ontologies. This could be in the form
of filtering out terms which only occur once, or apply term weighting schemes such
as TF-IDF.
166
Anchor-Based Profile Enrichment
Chapter 8
Conclusions and Future
Research
This thesis investigated how auxiliary information can be used in order to enhance
the performance of ontology mapping systems. This led to the formulation of our
problem statement in Section 1.4.
Problem statement How can we improve ontology mapping systems by
exploiting auxiliary information?
To tackle the given problem statement, we focused the performed research on
two types of auxiliary resources: lexical resources and partial alignments. Lexical
resources present a research area with good potential due to their prevalent usage
in existing matching systems, implying that there is a large group of benefactors
of research in this area. Partial alignments pose a good target for research due to
the limited amount of existing work regarding their exploitation. This implies that
partial alignments are under-utilized in current matching systems, such that further
research could enable potential performance gains for systems which do not utilize
this resource yet. We have posed four research questions that need to be answered
before addressing the problem statement.
In this chapter we will present the conclusions of this thesis. In Section 8.1 we
will individually answer the four posed research questions. We formulate an answer
to the problem statement in Section 8.2. Finally, we will present promising directions
of future research in Section 8.3.
8.1
Conclusions on the Research Questions
The research questions stated in Chapter 1 concern the exploitation of auxiliary
resources for ontology matching, i.e. (1) the disambiguation of concept senses for
lexical similarities, (2) the exploitation of partial alignments in a general matching
scenario, (3) the verification of partial alignment correspondences and (4) the ex-
168
Conclusions and Future Research
ploitation of partial alignments for matching problems with significant terminological
heterogeneities. They are dealt with in the following subsections, respectively.
8.1.1
Concept Disambiguation
Lexical resources are commonly utilized in ontology matching systems to derive
similarity scores between individual ontology concepts. This is achieved in two core
steps. First, for every concept the corresponding entries, also known as senses, within
the given lexical resource need to be identified and associated with that concept.
Second, the concepts are evaluated by comparing their associated senses. A metric
which compares senses may utilize principles such as the semantic distance between
senses in the resource or computing information theoretic commonalities. However,
in order for such a metric to produce accurate results, the ontology concepts need
to be associated with senses which accurately reflect their intended meaning. This
issue has led us to the following research question.
Research question 1 How can lexical sense definitions be accurately
linked to ontology concepts?
To answer this question, we proposed a virtual-document-based disambiguation
method. This method utilizes an existing document model from an established profile similarity in order to generate virtual documents capable of representing both ontology concepts and lexical senses. This document model allows for of parametrized
weighting of terms according to their respective origins within the ontology or lexical resource. For example, the terms originating from concept labels may receive
higher weights than the concept annotation terms. Utilizing document similarity
scores between the document of a given concept and the documents of potential
lexical senses, a disambiguation policy is executed which only associates a sense to
a concept, if its respective document fulfilled a given criterion.
Using several disambiguation policies, we evaluate the effect of our disambiguation framework on the Conference dataset by applying three different lexical similarities. Here we observe that the application of our disambiguation approach is
beneficial for all tested lexical similarities lsm1 , lsm2 and lsm3 . Further, we observe
that for lsm1 and lsm3 , stricter disambiguation policies produced better results, with
the MAX policy resulting in the highest performance. For lsm2 the disambiguation
policy A-Mean resulted in the highest performance. In a further analysis, we compared the performance of our approach to the performances of existing matching
systems. This has been achieved in two ways: (1) by entering a system based on
our approach in the 2011 OAEI competition and (2) by evaluating our approach on
a 2013 OAEI dataset and comparing the results to the performances of that year’s
participants. These comparisons revealed that a system utilizing our approach can
perform competitively with state-of-the-art systems. This is especially noteworthy
when taking into account the otherwise modest complexity of the tested system.
In a further experiment we quantified the potential benefit of utilizing the term
weighting scheme of the document model and compared its performance to an
information-retrieval-based weighting scheme, namely TF-IDF. For this, we generated two parameter sets by optimizing the weights on the Benchmark and Con-
8.1 — Conclusions on the Research Questions
169
ference datasets using Tree-Learning-Search. We observe that when trained on the
appropriate dataset the weighting scheme of the document model produces better
results at all recall levels. For recall levels of in the interval of [0.4, 0.6] we observed
the biggest differences in precision. In our last evaluation, we investigated the effect
of the introduction of a disambiguation procedure has on the runtime. We observe
an added runtime overhead of 3.65%. However, when also taking into account the
gained efficiency due to the reduction of required calculations of sense similarities,
we observe that addition of a disambiguation procedure improved the runtime of
the lexical similarity. While the amount of runtime improvement depends on the
efficiency of the used lexical similarity, we can conclude that the performance impact
of a disambiguation procedure is small at most, whilst potentially being beneficial
in the right circumstances.
8.1.2
Exploiting Partial Alignments
An additional type of available auxiliary resources are partial alignments. These are
incomplete alignments stemming from previous matching efforts. An example of such
an effort is a domain expert attempting to matching the given ontologies, but being
unable to complete this task due to time constraints. In such a scenario, the core
task is to identify the missing correspondences, such that the merger between the
newly found correspondences and the given partial alignment can form a complete
alignment. To do this, one can use the correspondences of the partial alignments,
also referred to as anchors, in order to aide the matching process. This has lead us
to the following research question.
Research question 2 How can we exploit partial alignments in order
to derive concept correspondences?
With the intent of answering the second research question, we developed a technique which matches concepts by comparing their respective similarities towards the
set of anchors. Concepts and anchors are compared by using a composite of different
similarity metrics and their similarities are compiled into a profile for each concept,
referred to as an anchor-profile. The core intuition behind this approach is that two
concepts can be considered similar if they exhibit comparable degrees of similarity
towards the given anchors.
To evaluate this approach, we introduced the measures of adapted Precision, Recall and F-Measure, referred to as P ∗ , R∗ and F ∗ respectively. Unlike the measures
from which these are derived, they take the presence of an input partial alignment
PA into account. Thus, P ∗ , R∗ and F ∗ accurately express the quality of the additionally computed correspondences with respect to their correctness, completeness
and overall quality. We evaluated the approach on the Benchmark dataset by repeatedly randomly sampling the partial alignments from the reference alignment.
This evaluation gave us a general performance indication, with an F ∗ score ranging
in the interval of [0.66, 0.78], whilst also showing that the matching performance is
positively influenced by the size of the input partial alignment. An analysis on the
performances on the different tracks reveals that the approach requires a composite
similarity which utilizes various types of information from the ontologies in order to
170
Conclusions and Future Research
function in all conditions. A subsequent evaluation revealed that comparing a given
ontology concept c with the anchor concept originating from the other ontology is
preferable to comparing c with the anchor concept originating from the same ontology. Next, we systematically evaluated our approach using a spectrum of settings
of the size and quality of the input partial alignment. This analysis revealed that
while both these factors influence the performance of our approach, the quality of
the partial alignment had a more severe impact on the performance than the size.
From this we conclude that matching systems that generate partial alignments onthe-fly should focus their approach towards ensuring the correctness of the anchors.
Lastly, we compared the quality of the computed correspondences to state-of-the-art
system. This revealed that for larger partial alignments the quality is on par with
the top systems for this dataset.
8.1.3
Filtering Partial Alignments
The results of answering the previous research question revealed that matching approaches utilizing partial alignments are influenced both by the size and correctness
of the given partial alignments, with their correctness being a more influential factor.
Hence, it is important that the partial alignment is evaluated such that incorrect
correspondences can be discarded prior to matching. This led us to formulating the
following research question.
Research question 3 How can we evaluate whether partial alignment
correspondences are reliable?
To tackle the third research questions, we created a technique for the evaluation of anchors using feature selection techniques from the field of machine learning.
We create a feature-space where every feature corresponds to a given anchor. We
populate this feature space with labelled instances by generating reliable correspondences on-the-fly. An instance represents a series of consistency evaluations towards
each anchor. The intuition behind this is that correct anchors are expected to
produce predictable consistency evaluations for the instance set. We measure this
predictability by applying a feature evaluation metric on each feature. We evaluate
our approach on the Conference dataset by sampling the partial alignments, utilizing
six different feature evaluation metrics and three different base similarities, which
are used to compute the consistency scores for each instance. For each configuration,
we compute the precision vs recall scores by ranking the anchors according to their
evaluation scores. We compare these to three baseline rankings, which are created
by ranking the anchors using purely the similarity scores of the base similarities. We
observe that when utilizing a given base similarity, our approach can improve upon
the performance of each corresponding base similarity. The most significant improvements were observed when utilizing a syntactic and lexical similarity. For the lexical
similarity, the observed improvements were more significant at lower recall levels,
with an increase of interpolated precision of up to 0.12. For the syntactic similarity
the observed improvements were fairly consistent for all recall levels, with increases
in interpolated precision typically falling in the interval of [0.035, 0.057] above the
baseline, resulting in a precision of 0.857 for the highest performing metric. For
8.1 — Conclusions on the Research Questions
171
the lexical similarity, the baseline interpolated precision of approximately 0.821 for
most recall levels was improved on by all tested metrics. The best performing metric
resulted on a interpolated precision of 0.942, an improvement of 0.121. We observe
that for base similarities for which significant improvements were observed, featureevaluation-metrics utilizing class separability scores typically resulted in a better
performance, particularly Thornton’s Separability Index. We conclude that our approach presents a promising way of improving the utilization of existing similarity
metrics for the evaluation of partial alignment correspondences.
8.1.4
Matching Terminologically Heterogeneous Ontologies
A challenging category of matching tasks are ontology pairs between which there
exists no or an at most small terminological overlap. This causes many similarity
metrics to produce unsatisfactory results. Typically, this issue is mitigated through
the exploitation of lexical resources by enriching the ontologies with additional terminology. However, this approach imposes several issues, for example the availability
of an appropriate resource for each problem. A way to circumvent these issues could
be the exploitation of partial alignments instead of exploiting lexical resources. This
led us to the following research question.
Research question 4 To what extent can partial alignments be used in
order to bridge a large terminological gap between ontologies?
With the goal of answering research question 4, we developed an extension to a
typical profile similarity. This extension interprets the semantic relations asserted
in the partial alignment as additional relation types which the profile similarity can
exploit. The core intuition is that natural languages often re-use concept terms when
referring or defining more specific concepts. Thus, by extending a profile similarity
through the exploitation of the added relations we can identify these re-occurrences
of terminology and produce a better matching result. We refer to this extended
profile as the anchor-enriched profile.
The evaluation of our approach requires a dataset consisting of matching tasks
that are characterized by large terminological gaps. Hence, we evaluate our approach
on a sub-set of the Benchmark dataset and the MultiFarm dataset. We repeatedly
randomly sample the partial alignments from the reference alignment and compute
an aggregate result for each evaluation. Further, the extension of the profile similarity is partitioned into three sets of descriptions, being the descriptions that have been
gathered by exploring anchors that are ancestors, descendants or otherwise related
concepts. We refer to these three sets as PAD, CAD and RAD respectively, with a
subscript of 1 and 2 denoted whether the terms in the descriptions are weighted uniformly or proportionate to their semantic distance to the given concept. This allows
us to see their individual effects on the mapping result and determine what configuration best suits the datasets. Our evaluation revealed that the addition of each
description set benefited the performance of the profile similarity for both datasets.
Additionally, for both datasets the highest performance was measured when utilizing
all three types of description sets. However, while for the Benchmark-subset dataset
the uniform weighting of terms produced better results, for the MultiFarm dataset
172
Conclusions and Future Research
the semantic-distance-based weighting was the preferable option. Overall, for the
Benchmark-subset dataset we observed an improvement in F-Measure from 0.119
to 0.482 and for the MultiFarm dataset we observed an improvement in F-Measure
from 0.326 to 0.622.
We further investigated the difference in performance between uniform and proportional weighting on the MultiFarm dataset by analysing the performances for
different sizes of partial alignments. This comparison revealed that the difference in
performance is consistent up to partial alignment recall values R(PA, R) of 0.8. For
R(PA, R) values of 0.9 the uniform weighting method performed better. However,
in real-world matching scenarios it is very unlikely that the given partial alignment
exhibit a recall measure of 0.9. From this we conclude that for real-world matching
cases a semantic-distance-based weighting scheme is preferable.
Finally, we compared the performance of our approach with with performances
of other matching systems on the MultiFarm dataset. Here, we make the distinction between system utilizing no auxiliary resources, partial alignment or lexical
resources. This comparison revealed that our approach performs significantly better than established matching systems utilizing no auxiliary resources and on par
with AML, the top performing system utilizing an appropriate lexical resource. Furthermore, we re-evaluated our approach while also enriching the ontologies using an
appropriate lexical resource, specifically Microsoft Bing translator. This provides
a performance indication for a system utilizing both partial alignments and lexical
resources. The resulting performance was characterized with an F-Measure of 0.843,
a significant improvement compared to the performance of our approach without
using Bing (F-Measure of 0.622) and the performance of AML (F-Measure of 0.62).
From this we conclude that there is a significant performance potential for systems
utilizing both types of auxiliary resources when faced with significant terminological
gaps.
8.2
Conclusion to Problem Statement
After answering the four stated research questions, we are now able to provide an
answer to the problem statement.
Problem statement How can we improve ontology mapping systems by
exploiting auxiliary information?
Taking the answers to the research questions into account we can see that there
are numerous ways in which auxiliary information can be exploited to the benefit
of ontology mapping systems. First, the utilization of lexical resources for the computation of semantic distances between concepts, where the accurate identification
of concept senses can be achieved using a virtual document-based disambiguation
process. Second, partial alignments can be exploited by creating a profile of similarities to the anchor concepts for each unmatched concept and comparing the resulting
profiles. Third, our feature evaluation-based approach improves upon the performance of a normal similarity metric with regard to ensuring the correctness of the
correspondences of input partial alignments. Fourth, the performance on mapping
8.3 — Recommendations for Future Research
173
problems with significant terminological gaps can be improved upon by extending
profile similarities such that they also exploit the asserted relations of a provided
partial alignment.
8.3
Recommendations for Future Research
The research presented in this thesis indicates the following areas of interest for
future research:
1. Improving Concept Sense Disambiguation. We identify three directions
into which research can be performed to improve the performance of the disambiguation approach of Chapter 4:
(a) The presented disambiguation method relies on the co-occurrences of exact terms for the determination of candidate senses and the sense similarity computation. Naming anomalies, such as spelling errors or nonstandard syntax of compound words, can lead to significant issues. Next,
after resolving naming anomalies the method needs to determine which
senses best describe a given concept. For a sense to receive a higher similarity score, its annotation needs to contain exact terminology which also
occurs in the annotation of the given concept. If the two annotations refer
to the same entity using synonymous terms or a different syntax, then the
result is that the similarity score of the ideal sense is not increased. Future research could aim to resolve these issues through the application of
synonym extraction, spell-checking or soft document-similarity metrics.
(b) An alternative to complement the shortcomings of our approach is by
combining the results of multiple disambiguation techniques. This can
be achieved by selecting the most appropriate disambiguation approach
based on a heuristic evaluation or learning approach, or by combining the
results of all given disambiguation approaches.
(c) Methods of exploiting multiple source of lexical sources should be investigated. Instead of only utilizing the definitions provided by WordNet,
one can also query other lexical sources such as Wikipedia or exploit the
results of internet search engines such as Google or Bing.
2. Improving Anchor Profiles. The evaluation on the Benchmark dataset
revealed that the robustness of the approach is influence by the choice of
(compound-)similarity for the role as anchor similarity. Future research could
be aimed at determining the best choice of similarities with regard to both
matching quality in real-life matching cases and overall robustness. Additionally, the applicability of the approach can be widened by in researching
methods of generating reliable anchors during runtime for matching problems
that do not contain partial alignments.
3. Improving Anchor Evaluation Techniques. In Chapter 6 we described
two core steps that are required for the filtering of possibly incorrect, the
174
Conclusions and Future Research
anchor-evaluation step and the filter-policy step. The work of Chapter 6
presents an approach for the anchor-evaluation step, the result of which being
a set of scores S. We propose that future work should be aimed at realizing
the filter-policy step. Here, possible approaches need to be investigated which
take as input the two ontologies O1 and O2 , the partial alignment PA and the
set of scores S, such that they produce a filtered partial alignment PA′ . This
would allow us to measure the actual benefit of a filtering approach by comparing the alignment quality of an anchor-based mapping approach before and
after executing the filtering procedure. Additionally, we suggest addressing the
robustness of the approach by testing future improvements on the Benchmark
dataset.
4. Improving Anchor-Enriched Profiles. The performance indication of a
system utilizing both lexical resources and partial alignments was created using
the provided partial alignments of the experimental set-up. Future research
should investigate whether this performance is replicable for systems which
generate partial alignments during runtime. This would indicate whether
the approach is applicable for terminological heterogeneous ontologies between
which there does not exist a partial alignment.
References
Aas, Kjersti and Eikvil, Line (1999). Text categorisation: A survey. Raport NR,
Vol. 941. [149]
Abeel, Thomas, Peer, Yves Van de, and Saeys, Yvan (2009). Java-ML: A machine
learning library. Journal of Machine Learning Research, Vol. 10, pp. 931–934.
[140]
Aguirre, José Luis, Cuenca Grau, Bernardo, Eckert, Kai, Euzenat, Jérôme, Ferrara,
Alfio, Van Hague, Robert Willem, Hollink, Laura, Jimenez-Ruiz, Ernesto, Meilicke, Christian, Nikolov, Andriy, Ritze, Dominique, Scharffe, François, Shvaiko,
Pavel, Sváb-Zamazal, Ondrej, Trojahn, Cássia, and Zapilko, Benjamin (2012).
Results of the Ontology Alignment Evaluation Initiative 2012. Proc. of the 7th
ISWC workshop on ontology matching, pp. 73–115. [47, 48, 73, 121, 129, 156]
Aldea, Arantza, López, Beatriz, Moreno, Antonio, Riaño, David, and Valls, Aı̈da
(2001). A multi-agent system for organ transplant co-ordination. Artificial
intelligence in medicine, pp. 413–416. Springer. [18]
Aleksovski, Zarko (2008). Using background knowledge in ontology matching. Ph.D.
thesis, Vrije Universiteit Amsterdam. [21]
Androutsellis-Theotokis, Stephanos and Spinellis, Diomidis (2004). A survey of peerto-peer content distribution technologies. ACM Computing Surveys (CSUR),
Vol. 36, No. 4, pp. 335–371. [10, 11]
Arens, Yigal, Knoblock, Craig A, and Shen, Wei-Min (1996). Query reformulation
for dynamic information integration. Springer. [5, 8]
Aumueller, David, Do, Hong-Hai, Massmann, Sabine, and Rahm, Erhard (2005).
Schema and ontology matching with COMA++. Proceedings of the 2005 ACM
SIGMOD international conference on Management of data, pp. 906–908, ACM.
[20, 63, 77, 116]
Banek, Marko, Vrdoljak, Boris, and Tjoa, A Min (2008). Word sense disambiguation
as the primary step of ontology integration. Database and Expert Systems
Applications, pp. 65–72, Springer. [94]
176
References
Banerjee, Satanjeev and Pedersen, Ted (2003). Extended gloss overlaps as a measure
of semantic relatedness. Proceedings of the 18th international joint conference
on Artificial intelligence, IJCAI’03, pp. 805–810, San Francisco, CA, USA. [90]
Bar-Hillel, Yehoshua (1960). The present status of automatic translation of languages. Advances in computers, Vol. 1, pp. 91–163. [88]
Batini, Carlo and Lenzerini, Maurizio (1984). A methodology for data schema integration in the entity relationship model. Software Engineering, IEEE Transactions on, Vol. SE-10, No. 6, pp. 650–664. [3]
Batini, Carlo, Lenzerini, Maurizio, and Navathe, Shamkant B. (1986). A comparative
analysis of methodologies for database schema integration. ACM computing
surveys (CSUR), Vol. 18, No. 4, pp. 323–364. [3]
Bernstein, Philip A. and Rahm, Erhard (2000). Data warehouse scenarios for model
management. Conceptual Modeling—ER 2000, pp. 1–15. Springer. [5]
Bodenreider, Olivier (2004). The unified medical language system (UMLS): integrating biomedical terminology. Nucleic acids research, Vol. 32, No. suppl 1,
pp. D267–D270. [62, 86, 151]
Bollacker, Kurt, Evans, Colin, Paritosh, Praveen, Sturge, Tim, and Taylor, Jamie
(2008). Freebase: a collaboratively created graph database for structuring human knowledge. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250, ACM. [86]
Börner, Katy, Sanyal, Soma, and Vespignani, Alessandro (2007). Network science.
Annual review of information science and technology, Vol. 41, No. 1, pp. 537–
607. [12]
Bouquet, Paolo, Serafini, Luciano, and Zanobini, Stefano (2003). Semantic coordination: a new approach and an application. The Semantic Web-ISWC 2003,
pp. 130–145. Springer. [5, 7]
Bouquet, Paolo, Serafini, Luciano, Zanobini, Stefano, and Sceffer, Simone (2006).
Bootstrapping semantics on the web: meaning elicitation from schemas. Proceedings of the 15th international conference on World Wide Web, pp. 505–512,
ACM. [64]
Brabham, Daren C. (2008). Crowdsourcing as a model for problem solving an introduction and cases. Convergence: the international journal of research into new
media technologies, Vol. 14, No. 1, pp. 75–90. [23]
Broeck, Guy Van den and Driessens, Kurt (2011). Automatic discretization of
actions and states in Monte-Carlo tree search. Proceedings of the International Workshop on Machine Learning and Data Mining in and around Games
(DMLG), pp. 1–12. [107]
References
177
Brunnermeier, Smita B. and Martin, Sheila A. (2002). Interoperability costs in the
US automotive supply chain. Supply Chain Management: An International
Journal, Vol. 7, No. 2, pp. 71–82. [203]
Budanitsky, Alexander and Hirst, Graeme (2001). Semantic distance in WordNet:
An experimental, application-oriented evaluation of five measures. Workshop
on WordNet and other lexical resources, second meeting of the North American
Chapter of the Association for Computational Linguistics, pp. 29–34. [21, 85,
90, 96]
Budanitsky, Alexander and Hirst, Graeme (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, Vol. 32, No. 1,
pp. 13–47. [21]
Buitelaar, Paul, Cimiano, Philipp, Haase, Peter, and Sintek, Michael (2009). Towards Linguistically Grounded Ontologies. The Semantic Web: Research and
Applications, Vol. 5554 of Lecture Notes in Computer Science, pp. 111–125.
Springer Berlin / Heidelberg. ISBN 978–3–642–02120–6. [91]
Bunke, Horst (2000). Graph matching: Theoretical foundations, algorithms, and
applications. Proc. Vision Interface, Vol. 2000, pp. 82–88. [61]
Bussler, Christoph, Fensel, Dieter, and Maedche, Alexander (2002). A conceptual
architecture for semantic web enabled web services. ACM Sigmod Record,
Vol. 31, No. 4, pp. 24–29. [13]
Calvanese, Diego, De Giacomo, Giuseppe, Lenzerini, Maurizio, Nardi, Daniele, and
Rosati, Riccardo (1998). Information integration: Conceptual modeling and
reasoning support. Cooperative Information Systems, 1998. Proceedings. 3rd
IFCIS International Conference on, pp. 280–289, IEEE. [5, 8]
Cao, Bu-Qing, Li, Bing, and Xia, Qi-Ming (2009). A service-oriented qos-assured
and multi-agent cloud computing architecture. Cloud Computing, pp. 644–649.
Springer. [18]
Caraciolo, Caterina, Euzenat, Jérôme, Hollink, Laura, Ichise, Ryutaro, Isaac, Antoine, Malaisé, Véronique, Meilicke, Christian, Pane, Juan, Shvaiko, Pavel,
Stuckenschmidt, Heiner, et al. (2008). Results of the ontology alignment evaluation initiative 2008. Proc. 3rd ISWC workshop on ontology matching (OM),
pp. 73–119, No commercial editor. [45]
Chavez, Anthony, Moukas, Alexandros, and Maes, Pattie (1997). Challenger: A
multi-agent system for distributed resource allocation. Proceedings of the first
international conference on Autonomous agents, pp. 323–331, ACM. [18]
Cheatham, Michelle (2011). MapSSS Results for OAEI 2011. Proceedings of The
Sixth ISWC International Workshop on Ontology Matching(OM). [76]
Cheatham, Michelle (2013). StringsAuto and MapSSS results for OAEI 2013. Proceedings of The Eighth ISWC International Workshop on Ontology Matching(OM). [71, 76]
178
References
Chen, Siqi and Weiss, Gerhard (2012). An Efficient and Adaptive Approach to
Negotiation in Complex Environments. ECAI’2012, pp. 228–233, IOS Press.
[18]
Coalition, DAML-S, Ankolekar, Anupriya, Burstein, Mark, Hobbs, Jerry R., Lassila, Ora, Martin, David, McDermott, Drew, McIlraith, Sheila A., Narayanan,
Srini, and Paolucci, Massimo (2002). DAML-S: Web service description for the
semantic Web. The Semantic Web-ISWC, pp. 348–363, Springer. [14]
Cohen, William, Ravikumar, Pradeep, and Fienberg, Stephen (2003). A comparison
of string metrics for matching names and records. KDD Workshop on Data
Cleaning and Object Consolidation, Vol. 3, pp. 73–78. [58]
Croft, W Bruce, Metzler, Donald, and Strohman, Trevor (2009). Search Engines:
Information Retrieval in Practice. Addison-Wesley Publishing Company, USA,
1st edition. ISBN 0136072240, 9780136072249. [100]
Cruz, Isabel F., Sunna, William, and Chaudhry, Anjli (2004). Semi-automatic ontology alignment for geospatial data integration. Geographic Information Science,
pp. 51–66. Springer. [25]
Cruz, Isabel F, Antonelli, Flavio Palandri, and Stroe, Cosmin (2009). AgreementMaker: efficient matching for large real-world schemas and ontologies. Proc.
VLDB Endow., Vol. 2, No. 2, pp. 1586–1589. ISSN 2150–8097. [70, 73, 148,
149]
Cruz, Isabel F., Fabiani, Alessio, Caimi, Federico, Stroe, Cosmin, and Palmonari,
Matteo (2012). Automatic configuration selection using ontology matching task
profiling. The Semantic Web: Research and Applications, pp. 179–194. Springer.
[21]
Cruz, Isabel F., Palmonari, Matteo, Caimi, Federico, and Stroe, Cosmin (2013).
Building linked ontologies with high precision using subclass mapping discovery.
Artificial Intelligence Review, Vol. 40, No. 2, pp. 127–145. [74, 86, 87, 91]
d’Aquin, Mathieu and Lewen, Holger (2009). Cupboard–a place to expose your
ontologies to applications and the community. The Semantic Web: Research
and Applications, pp. 913–918. Springer. [24]
De Melo, Gerard and Weikum, Gerhard (2009). Towards a universal wordnet by
learning from combined evidence. Proceedings of the 18th ACM conference on
Information and knowledge management, pp. 513–522, ACM. [86]
Ding, Ying, Korotkiy, M, Omelayenko, Borys, Kartseva, V, Zykov, V, Klein, Michel,
Schulten, Ellen, and Fensel, Dieter (2002). Goldenbullet: Automated classification of product data in e-commerce. Proceedings of the 5th International
Conference on Business Information Systems. [5, 7]
Dodge, Yadolah (2008). Pooled Variance. The Concise Encyclopedia of Statistics,
pp. 427–428. Springer New York. ISBN 978–0–387–31742–7. [121]
References
179
Do, Hong-Hai and Rahm, Erhard (2002). COMA: a system for flexible combination
of schema matching approaches. Proceedings of the 28th international conference on Very Large Data Bases, pp. 610–621, VLDB Endowment. [20, 69,
77]
Dragisic, Zlatan, Eckert, Kai, Euzenat, Jérôme, Faria, Daniel, Ferrara, Alfio,
Granada, Roger, Ivanova, Valentina, Jimenez-Ruiz, Ernesto, Kempf, Andreas,
Lambrix, Patrick, et al. (2014). Results of theOntology Alignment Evaluation
Initiative 2014. International Workshop on Ontology Matching, pp. 61–104.
[157, 159, 162]
Duda, Richard O., Hart, Peter E., and Stork, David G. (1999). Pattern classification.
John Wiley & Sons,. [66]
Ehrig, Marc (2006). Ontology alignment: bridging the semantic gap, Vol. 4. Springer.
[31, 58]
Ehrig, Marc and Euzenat, Jérôme (2005). Relaxed precision and recall for ontology
matching. Proc. K-Cap 2005 workshop on Integrating ontology, pp. 25–32. [42,
43, 44]
Ehrig, Marc and Staab, Steffen (2004). QOM–quick ontology mapping. The Semantic Web–ISWC 2004, pp. 683–697. Springer. [20, 69, 76]
Ehrig, Marc and Sure, York (2004). Ontology mapping–an integrated approach. The
Semantic Web: Research and Applications, pp. 76–91. Springer.[67, 69, 76, 84]
Ehrig, Marc, Schmitz, Christoph, Staab, Steffen, Tane, Julien, and Tempich,
Christoph (2004). Towards evaluation of peer-to-peer-based distributed knowledge management systems. Agent-Mediated Knowledge Management, pp. 73–
88. Springer. [10]
Elfeky, Mohamed G., Verykios, Vassilios S., and Elmagarmid, Ahmed K. (2002).
TAILOR: A record linkage toolbox. Data Engineering, 2002. Proceedings. 18th
International Conference on, pp. 17–28, IEEE. [60]
Euzenat, Jérôme (2001). Towards a principled approach to semantic interoperability.
Proceedings of the IJCAI-01 Workshop on Ontologies and Information Sharing,
pp. 19–25. [27]
Euzenat, Jérôme (2004a). An API for ontology alignment. The Semantic Web–ISWC
2004, pp. 698–712. Springer. [24]
Euzenat, Jérôme (2004b). Introduction to the EON Ontology alignment contest.
Proc. 3rd ISWC2004 workshop on Evaluation of Ontology-based tools (EON),
pp. 47–50. [46]
Euzenat, Jérôme (2005). Alignment infrastructure for ontology mediation and other
applications. Proc. 1st ICSOC international workshop on Mediation in semantic
web services, pp. 81–95. [23, 24]
180
References
Euzenat, Jérôme and Shvaiko, Pavel (2007). Ontology Matching, Vol. 18. Springer
Berlin. [31, 56, 57, 58, 61, 62, 64, 65, 69, 73, 87, 149]
Euzenat, Jérôme, Stuckenschmidt, Heiner, Yatskevich, Mikalai, et al. (2005). Introduction to the ontology alignment evaluation 2005. Proc. K-Cap 2005 workshop
on Integrating ontology, pp. 61–71. [46]
Euzenat, Jérôme, Ferrara, Alfio, Hollink, Laura, Isaac, Antoine, Joslyn, Cliff,
Malaisé, Véronique, Meilicke, Christian, Nikolov, Andriy, Pane, Juan, Sabou,
Marta, et al. (2009a). Results of the ontology alignment evaluation initiative
2009. Proc. 4th ISWC workshop on ontology matching (OM), pp. 73–126. [48]
Euzenat, Jérôme, Ferrara, Alfio, Hollink, Laura, Isaac, Antoine, Joslyn, Cliff,
Malaisé, Véronique, Meilicke, Christian, Nikolov, Andriy, Pane, Juan, Sabou,
Marta, et al. (2009b). Results of the ontology alignment evaluation initiative
2009. Proc. 4th ISWC workshop on ontology matching (OM), pp. 73–126. [49]
Euzenat, J., Ferrara, A., Meilicke, C., Pane, J., Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Svab-Zamazal, O., Svatek, V., and Trojahn, C. (2010). First
Results of the Ontology Alignment Evaluation Initiative 2010. Proceedings of
ISWC Workshop on OM, pp. 85–117. [20, 39, 49]
Euzenat, Jérôme, Ferrara, Alfio, Hague, Robert Willem van, Hollink, Laura, Meilicke, Christian, Nikolov, Andriy, Scharffe, Francois, Shvaiko, Pavel, Stuckenschmidt, Heiner, Svab-Zamazal, Ondrej, and Santo, Cassia Trojahn dos (2011a).
Results of the ontology alignment evaluation initiative 2011. Proc. 6th ISWC
workshop on ontology matching (OM), pp. 85–110. [49]
Euzenat, Jérôme, Meilicke, Christian, Stuckenschmidt, Heiner, Shvaiko, Pavel, and
Trojahn, Cássia (2011b). Ontology alignment evaluation initiative: Six years of
experience. Journal on data semantics XV, pp. 158–192. Springer.[20, 46, 101]
Euzenat, J., Ferrara, A., Hague, R.W. van, Hollink, L., Meilicke, C., Nikolov, A.,
Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Svab-Zamazal, O., and Santos, C.
Trojahn dos (2011c). Results of the Ontology Alignment Evaluation Initiative
2011. Proc. 6th ISWC workshop on ontology matching (OM), Bonn (DE), pp.
85–110. [101, 103, 105]
Falconer, Sean M and Storey, Margaret-Anne (2007). A cognitive support framework
for ontology mapping. The Semantic Web, pp. 114–127. Springer. [22]
Fan, Wenfei, Li, Jianzhong, Ma, Shuai, Wang, Hongzhi, and Wu, Yinghui (2010).
Graph homomorphism revisited for graph matching. Proceedings of the VLDB
Endowment, Vol. 3, Nos. 1–2, pp. 1161–1172. [61]
Faria, Daniel, Pesquita, Catia, Santos, Emanuel, Palmonari, Matteo, Cruz, Isabel F,
and Couto, Francisco M (2013). The agreementmakerlight ontology matching
system. On the Move to Meaningful Internet Systems: OTM 2013 Conferences,
pp. 527–541, Springer. [20, 73]
References
181
Ferrara, Alfio, Lorusso, Davide, Montanelli, Stefano, and Varese, Gaia (2008). Towards a benchmark for instance matching. The 7th International Semantic Web
Conference, pp. 37–48. [14]
Ferrucci, David, Brown, Eric, Chu-Carroll, Jennifer, Fan, James, Gondek, David,
Kalyanpur, Aditya A, Lally, Adam, Murdock, J William, Nyberg, Eric, Prager,
John, et al. (2010). Building Watson: An overview of the DeepQA project. AI
magazine, Vol. 31, No. 3, pp. 59–79. [17]
Finin, Tim, Fritzson, Richard, McKay, Don, and McEntire, Robin (1994). KQML
as an agent communication language. Proceedings of the third international
conference on Information and knowledge management, pp. 456–463, ACM.
[18]
FIPA, TCC (2008). Fipa communicative act library specification. Foundation for
Intelligent Physical Agents, http://www. fipa. org/specs/fipa00037/SC00037J.
html (30.6. 2004). [18]
Fisher, Ronald A. (1936). The use of multiple measurements in taxinomic problems.
Annals of Eugenics, Vol. 7, No. 2, pp. 179–188. ISSN 2050–1439. [140]
Flannery, Brian P., Press, William H., Teukolsky, Saul A., and Vetterling, William
(1992). Numerical recipes in C. Press Syndicate of the University of Cambridge,
New York. [139]
Gale, David and Shapley, Lloyd S. (1962). College Admissions and the Stability of
Marriage. American Mathematical Monthly, Vol. 69, No. 1, pp. 9–15. [71]
Gale, William A., Church, Kenneth W., and Yarowsky, David (1992). A Method for
Disambiguating Word Senses in a Large Corpus. Computers and the Humanities, Vol. 26, No. 5/6, pp. pp. 415–439. [88]
Gallaher, Michael P., O’Connor, Alan C., Dettbarn, John L., and Gilday, Linda T.
(2004). Cost analysis of inadequate interoperability in the US capital facilities
industry. National Institute of Standards and Technology (NIST). [202]
Gangemi, Aldo, Guarino, Nicola, Masolo, Claudio, and Oltramari, Alessandro
(2003). Sweetening wordnet with dolce. AI magazine, Vol. 24, No. 3, p. 13.
[62]
Gao, Jian-Bo, Zhang, Bao-Wen, and Chen, Xiao-Hua (2015). A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Engineering Applications of Artificial Intelligence, Vol. 39, pp.
80–88. [95]
Giunchiglia, Fausto and Shvaiko, Pavel (2003). Semantic matching. The Knowledge
Engineering Review, Vol. 18, No. 03, pp. 265–280. [64]
Giunchiglia, Fausto and Yatskevich, Mikalai (2004). Element Level Semantic Matching. Meaning Coordination and Negotiation workshop (MCN-04), collocated at
ISWC-2004. [85]
182
References
Giunchiglia, Fausto, Shvaiko, Pavel, and Yatskevich, Mikalai (2004). S-Match: an
algorithm and an implementation of semantic matching. ESWS, Vol. 3053, pp.
61–75, Springer. [74]
Giunchiglia, Fausto, Yatskevich, Mikalai, and Shvaiko, Pavel (2007). Semantic
matching: Algorithms and implementation. Journal on Data Semantics IX,
pp. 1–38. Springer. [19]
Giunchiglia, Fausto, Yatskevich, Mikalai, Avesani, Paolo, and Shivaiko, Pavel (2009).
A large dataset for the evaluation of ontology matching. The Knowledge Engineering Review, Vol. 24, No. 02, pp. 137–157. [38]
Gligorov, Risto, Kate, Warner ten, Aleksovski, Zharko, and Harmelen, Frank van
(2007). Using Google distance to weight approximate ontology matches. Proceedings of the 16th international conference on World Wide Web, pp. 767–776,
ACM. [20]
Grau, Bernardo Cuenca, Dragisic, Zlatan, Eckert, Kai, Euzenat, Jérôme, Ferrara,
Alfio, Granada, Roger, Ivanova, Valentina, Jiménez-Ruiz, Ernesto, Kempf, Andreas Oskar, Lambrix, Patrick, et al. (2013). Results of the Ontology Alignment
Evaluation Initiative 2013. Proc. 8th ISWC workshop on ontology matching
(OM), pp. 61–100. [20, 21, 46, 47, 48, 50, 73, 105, 140]
Gross, Anika, Hartung, Michael, Kirsten, Toralf, and Rahm, Erhard (2012).
GOMMA results for OAEI 2012. Ontology Matching Workshop. International
Semantic Web Conference. [20]
Gruber, Thomas R. (1993). A translation approach to portable ontology specifications. Knowledge acquisition, Vol. 5, No. 2, pp. 199–220. [1]
Gulić, Marko and Vrdoljak, Boris (2013). CroMatcher-Results for OAEI 2013. Proceedings of The Eigth ISWC International Workshop on Ontology Matching,
pp. 117–122. [72, 77, 148]
Guyon, Isabelle and Elisseeff, André (2003). An introduction to variable and feature
selection. Journal of Machine Learning Research, Vol. 3, pp. 1157–1182. [138]
Halevy, Alon Y., Ashish, Naveen, Bitton, Dina, Carey, Michael, Draper, Denise, Pollock, Jeff, Rosenthal, Arnon, and Sikka, Vishal (2005). Enterprise information
integration: successes, challenges and controversies. Proceedings of the 2005
ACM SIGMOD international conference on Management of data, pp. 778–787,
ACM. [5, 8]
Halevy, Alon, Rajaraman, Anand, and Ordille, Joann (2006). Data integration: the
teenage years. Proceedings of the 32nd international conference on Very large
data bases, pp. 9–16, VLDB Endowment. [5, 8]
Hamming, Richard W. (1950). Error detecting and error correcting codes. Bell
System technical journal, Vol. 29, No. 2, pp. 147–160. [60]
References
183
Hepp, Martin and Roman, Dumitru (2007). An Ontology Framework for Semantic
Business Process Management. Wirtschaftsinformatik (1), pp. 423–440. [10]
Hepp, Martin, Bachlechner, Daniel, and Siorpaes, Katharina (2006). OntoWiki:
community-driven ontology engineering and ontology usage based on Wikis.
Proceedings of the 2006 international symposium on Wikis, pp. 143–144, ACM.
[8]
Hepp, Martin, Leukel, Joerg, and Schmitz, Volker (2007). A quantitative analysis of
product categorization standards: content, coverage, and maintenance of eCl@
ss, UNSPSC, eOTD, and the RosettaNet Technical Dictionary. Knowledge and
Information Systems, Vol. 13, No. 1, pp. 77–114. [8]
Hertling, Sven and Paulheim, Heiko (2012a). WikiMatch–Using Wikipedia for Ontology Matching. Proceedings of The Seventh ISWC International Workshop
on Ontology Matching(OM), p. 37, Citeseer. [77]
Hertling, Sven and Paulheim, Heiko (2012b). WikiMatch Results for OEAI 2012.
Proceedings of The Seventh ISWC International Workshop on Ontology Matching(OM), pp. 220–225. [77]
Hindle, Donald and Rooth, Mats (1993). Structural ambiguity and lexical relations.
Computational linguistics, Vol. 19, No. 1, pp. 103–120. [90]
Hsu, Feng-Hsiung (2002). Behind Deep Blue: Building the computer that defeated
the world chess champion. Princeton University Press. [17]
Hu, Wei and Qu, Yuzhong (2008). Falcon-AO: A practical ontology matching system.
Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 6,
No. 3, pp. 237–239. [75]
Ichise, Ryutaro (2008). Machine learning approach for ontology mapping using
multiple concept similarity measures. Proceedings of the Seventh IEEE/ACIS
International Conference on Computer and Information Science(ICIS 08), pp.
340–346, IEEE. [66]
Ide, Nancy and Véronis, Jean (1998). Introduction to the special issue on word sense
disambiguation: the state of the art. Computational linguistics, Vol. 24, No. 1,
pp. 2–40. [88]
Isaac, Antoine, Wang, Shenghui, Zinn, Claus, Matthezing, Henk, Meij, Lourens
van der, and Schlobach, Stefan (2009). Evaluating thesaurus alignments for semantic interoperability in the library domain. IEEE Intelligent Systems, Vol. 24,
No. 2, pp. 76–86. [48]
Ives, Zachary G., Halevy, Alon Y., Mork, Peter, and Tatarinov, Igor (2004). Piazza:
mediation and integration infrastructure for semantic web data. Web Semantics:
Science, Services and Agents on the World Wide Web, Vol. 1, No. 2, pp. 155–
175. [11]
184
References
Jaccard, Paul (1901). Étude comparative de la distribution florale dans une portion
des Alpes et des Jura. Bulletin del la Société Vaudoise des Sciences Naturelles,
Vol. 37, pp. 547–579. [60]
Jain, Prateek, Hitzler, Pascal, Sheth, Amit P, Verma, Kunal, and Yeh, Peter Z
(2010). Ontology alignment for linked open data. The Semantic Web–ISWC
2010, pp. 402–417. Springer. [20]
Jaro, Matthew A. (1989). Advances in Record-Linkage Methodology as Applied
to Matching the 1985 Census of Tampa, Florida. Journal of the American
Statistical Association, Vol. 84, No. 406, pp. 414–420. ISSN 01621459. [101,
141]
Jean-Mary, Yves R., Shironoshita, E. Patrick, and Kabuka, Mansur R. (2009). Ontology matching with semantic verification. Web Semant., Vol. 7, pp. 235–251.
ISSN 1570–8268. [74]
Jian, Ningsheng, Hu, Wei, Cheng, Gong, and Qu, Yuzhong (2005). Falcon-ao: Aligning ontologies with falcon. Proceedings of K-CAP Workshop on Integrating
Ontologies, pp. 85–91. [75]
Jiménez-Ruiz, Ernesto and Cuenca Grau, Bernardo (2011). LogMap: logic-based
and scalable ontology matching. The Semantic Web–International Semantic
Web Conference (ISWC), pp. 273–288, Springer Berlin/Heidelberg. [75, 116,
135, 149]
Jiménez-Ruiz, Ernesto, Cuenca Grau, Bernardo, and Horrocks, Ian (2012a). LogMap
and LogMapLt Results for OAEI 2012. 7th International Workshop on Ontology
Matching (OM). [20]
Jiménez-Ruiz, Ernesto, Grau, Bernardo Cuenca, Zhou, Yujiao, and Horrocks, Ian
(2012b). Large-scale Interactive Ontology Matching: Algorithms and Implementation. ECAI, Vol. 242, pp. 444–449. [75, 116, 135]
Kalfoglou, Yannis and Schorlemmer, Marco (2003). Ontology mapping: the state of
the art. The knowledge engineering review, Vol. 18, No. 1, pp. 1–31. [86]
Kang, Jaewoo and Naughton, Jeffrey F (2003). On schema matching with opaque
column names and data values. Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 205–216, ACM. [58]
Khiat, Abderrahmane and Benaissa, Moussa (2014). AOT/AOTL Results for OAEI
2014. Proceedings of The Ninth ISWC International Workshop on Ontology
Matching(OM). [76]
Killeen, Peter R (2005). An alternative to null-hypothesis significance tests. Psychological science, Vol. 16, No. 5, pp. 345–353. [121]
Kim, Won and Seo, Jungyun (1991). Classifying schematic and data heterogeneity
in multidatabase systems. Computer, Vol. 24, No. 12, pp. 12–18. [3]
References
185
Klein, Michel and Noy, Natalya F. (2003). A component-based framework for ontology evolution. Proceedings of the IJCAI, Vol. 3, Citeseer. [10]
Klusch, Matthias, Fries, Benedikt, and Sycara, Katia (2006). Automated semantic
web service discovery with OWLS-MX. Proceedings of the fifth international
joint conference on Autonomous agents and multiagent systems, pp. 915–922,
ACM. [14]
Klusch, Matthias, Fries, Benedikt, and Sycara, Katia (2009). OWLS-MX: A hybrid
Semantic Web service matchmaker for OWL-S services. Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 7, No. 2, pp. 121–133.
[13]
Kotis, Konstantinos, Valarakos, Alexandros G., and Vouros, George A. (2006a).
AUTOMS: Automated Ontology Mapping through Synthesis of methods. Proceedings of Ontology Matching (OM), pp. 96–106. [75, 91]
Kotis, Konstantinos, Vouros, George A, and Stergiou, Konstantinos (2006b). Towards automatic merging of domain ontologies: The HCONE-merge approach.
Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 4,
No. 1, pp. 60–79. ISSN 1570–8268. [75, 91]
Kuhn, Harold W. (1955). The Hungarian method for the assignment problem. Naval
research logistics quarterly, Vol. 2, Nos. 1–2, pp. 83–97. [70]
Labrou, Yannis, Finin, Tim, and Peng, Yun (1999). Agent communication languages:
The current landscape. IEEE Intelligent systems, Vol. 14, No. 2, pp. 45–52.[18]
Lambrix, Patrick and Liu, Qiang (2009). Using partial reference alignments to
align ontologies. The Semantic Web: Research and Applications, pp. 188–202.
Springer. [20]
Lassila, Ora, Swick, Ralph R., and W3C (1998). Resource Description Framework
(RDF) Model and Syntax Specification. [89]
Lee, Mong Li, Yang, Liang Huai, Hsu, Wynne, and Yang, Xia (2002). XClust:
clustering XML schemas for effective integration. Proceedings of the eleventh
international conference on Information and knowledge management, pp. 292–
299, ACM. [60]
Lee, Yoonkyong, Sayyadian, Mayssam, Doan, AnHai, and Rosenthal, Arnon S
(2007). eTuner: tuning schema matching software using synthetic scenarios.
The VLDB Journal—The International Journal on Very Large Data Bases,
Vol. 16, No. 1, pp. 97–122. [21]
Lei, Yuangui, Uren, Victoria, and Motta, Enrico (2006). Semsearch: A search engine
for the semantic web. Managing Knowledge in a World of Networks, pp. 238–
245. Springer. [16]
186
References
Lenzerini, Maurizio (2002). Data integration: A theoretical perspective. Proceedings
of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles
of database systems, pp. 233–246, ACM. [5, 9]
Lesk, Michael (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. Proceedings of the
5th annual international conference on Systems documentation, SIGDOC ’86,
pp. 24–26. ISBN 0–89791–224–1. [90]
Levi, Giorgio (1973). A note on the derivation of maximal common subgraphs of
two directed or undirected graphs. Calcolo, Vol. 9, No. 4, pp. 341–352. [61]
Lindberg, Donald A., Humphreys, Betsy L., and McCray, Alexa T. (1993). The
Unified Medical Language System. Methods of information in medicine, Vol. 32,
No. 4, pp. 281–291. [49]
Litkowski, Kenneth C. (1997). Desiderata for tagging with WordNet synsets or
MCCA categories. fourth meeting of the ACL Special Interest Group on the
Lexicon. Washington, DC: Association for Computational Linguistics. [88]
Li, Juanzi, Tang, Jie, Li, Yi, and Luo, Qiong (2009). Rimom: A dynamic multistrategy ontology alignment framework. Knowledge and Data Engineering,
IEEE Transactions on, Vol. 21, No. 8, pp. 1218–1232. [76, 148, 149]
Locke, William Nash and Booth, Andrew Donald (1955). Machine translation of
languages: fourteen essays. Published jointly by Technology Press of the Massachusetts Institute of Technology and Wiley, New York. [88]
Lopez, Vanessa, Pasin, Michele, and Motta, Enrico (2005). Aqualog: An ontologyportable question answering system for the semantic web. The Semantic Web:
Research and Applications, pp. 546–562. Springer. [16]
Lopez, Vanessa, Uren, Victoria, Motta, Enrico, and Pasin, Michele (2007). AquaLog:
An ontology-driven question answering system for organizational semantic intranets. Web Semantics: Science, Services and Agents on the World Wide Web,
Vol. 5, No. 2, pp. 72–105. [16]
Manning, Christopher D., Raghavan, Prabhakar, and Schütze, Hinrich (2008). Introduction to information retrieval, Vol. 1. Cambridge university press Cambridge.
[62, 149]
Mao, Ming and Peng, Yefei (2006). PRIOR System: Results for OAEI 2006. Ontology Matching, p. 173. [72, 75]
Mao, Ming, Peng, Yefei, and Spring, Michael (2007). A profile propagation and
information retrieval based ontology mapping approach. Proceedings of the
Third International Conference on Semantics, Knowledge and Grid, pp. 164–
169, IEEE. [62, 74, 117, 148, 150, 154]
References
187
Maree, Mohammed and Belkhatir, Mohammed (2014). Addressing semantic heterogeneity through multiple knowledge base assisted merging of domain-specific
ontologies. Knowledge-Based Systems, Vol. 73, No. 0, pp. 199 – 211. [91]
Marshall, Ian (1983). Choice of Grammatical Word-Class without Global Syntactic
Analysis: Tagging Words in the LOB Corpus. Computers and the Humanities,
Vol. 17, No. 3, pp. pp. 139–150. ISSN 00104817. [89]
Martin, David, Burstein, Mark, Hobbs, Jerry, Lassila, Ora, McDermott, Drew, McIlraith, Sheila, Narayanan, Srini, Paolucci, Massimo, Parsia, Bijan, Payne, Terry,
et al. (2004). OWL-S: Semantic markup for web services. W3C member submission, Vol. 22, pp. 2007–04. [14]
Matuszek, Cynthia, Cabral, John, Witbrock, Michael, and DeOliveira, John (2006).
An Introduction to the Syntax and Content of Cyc. Proceedings of the 2006
AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, pp. 44–49. [21, 62, 86]
Maximilien, E Michael and Singh, Munindar P (2004). Toward autonomic web
services trust and selection. Proceedings of the 2nd international conference on
Service oriented computing, pp. 212–221, ACM. [13]
McCann, Robert, Shen, Warren, and Doan, AnHai (2008). Matching schemas in
online communities: A web 2.0 approach. Data Engineering, 2008. ICDE 2008.
IEEE 24th International Conference on, pp. 110–119, IEEE. [23]
McCarthy, Diana, Koeling, Rob, Weeds, Julie, and Carroll, John (2004). Finding
predominant word senses in untagged text. Proceedings of the 42nd Annual
Meeting on Association for Computational Linguistics, p. 279, Association for
Computational Linguistics. [87]
McCord, Michael C., Murdock, J. William, and Boguraev, Branimir K. (2012). Deep
parsing in Watson. IBM Journal of Research and Development, Vol. 56, No.
3.4, pp. 3–1. [17]
McCrae, John, Spohr, Dennis, and Cimiano, Philipp (2011). Linking Lexical Resources and Ontologies on the Semantic Web with Lemon. The Semantic Web:
Research and Applications, Vol. 6643 of Lecture Notes in Computer Science,
pp. 245–259. Springer. ISBN 978–3–642–21033–4. [91]
McGuinness, Deborah L. and Van Harmelen, Frank (2004). OWL Web Ontology
Language Overview. W3C recommendation, W3C. [89]
Medjahed, Brahim, Bouguettaya, Athman, and Elmagarmid, Ahmed K (2003).
Composing web services on the semantic web. The VLDB Journal, Vol. 12,
No. 4, pp. 333–351. [13, 14]
Meilicke, Christian and Stuckenschmidt, Heiner (2007). Analyzing mapping extraction approaches. Proceedings of the ISWC 2007 Workshop on Ontology
Matching, pp. 25–36. [71, 72, 101]
188
References
Meilicke, Christian, Garcı́a-Castro, Raúl, Freitas, Fred, Van Hage, Willem Robert,
Montiel-Ponsoda, Elena, Azevedo, Ryan Ribeiro de, Stuckenschmidt, Heiner,
Šváb-Zamazal, Ondřej, Svátek, Vojtěch, Tamilin, Andrei, et al. (2012). MultiFarm: A benchmark for multilingual ontology matching. Web Semantics:
Science, Services and Agents on the World Wide Web, Vol. 15, pp. 62–68. [49,
156]
Melnik, Sergey, Garcia-Molina, Hector, and Rahm, Erhard (2002). Similarity flooding: A versatile graph matching algorithm and its application to schema matching. Data Engineering, 2002. Proceedings. 18th International Conference on, pp.
117–128, IEEE. [69, 76]
Mihalcea, Rada (2006). Knowledge-based methods for WSD. Word Sense Disambiguation, pp. 107–131. [88]
Miles, Alistair, Matthews, Brian, Wilson, Michael, and Brickley, Dan (2005). SKOS
core: simple knowledge organisation for the web. International Conference on
Dublin Core and Metadata Applications, pp. pp–3. [48]
Miller, George A. (1995). WordNet: a lexical database for English. Communications
of the ACM, Vol. 38, pp. 39–41. ISSN 0001–0782. [62, 85, 86, 96, 144, 151]
Mocan, Adrian, Cimpian, Emilia, and Kerrigan, Mick (2006). Formal model for ontology mapping creation. The Semantic Web-ISWC 2006, pp. 459–472. Springer.
[22]
Mochol, Malgorzata and Jentzsch, Anja (2008). Towards a rule-based matcher selection. Knowledge Engineering: Practice and Patterns, pp. 109–119. Springer.
[21]
Montiel-Ponsoda, Elena, Cea, G Aguado de, Gómez-Pérez, Asunción, and Peters,
Wim (2011). Enriching ontologies with multilingual information. Natural language engineering, Vol. 17, No. 03, pp. 283–309. [20]
Montoyo, Andres, Suárez, Armando, Rigau, German, and Palomar, Manuel (2005).
Combining knowledge-and corpus-based word-sense-disambiguation methods.
Journal of Artificial Intelligence Research, Vol. 23, No. 1, pp. 299–330. [88]
Myers, Jerome L., Well, Arnold D., and Lorch Jr., Robert F. (2010). Research design
and statistical analysis. Routledge. [138]
Nagata, Takeshi, Watanabe, H, Ohno, M, and Sasaki, H (2000). A multi-agent
approach to power system restoration. Power System Technology, 2000. Proceedings. PowerCon 2000. International Conference on, Vol. 3, pp. 1551–1556,
IEEE. [18]
Nandi, Arnab and Bernstein, Philip A (2009). HAMSTER: using search clicklogs
for schema and taxonomy matching. Proceedings of the VLDB Endowment,
Vol. 2, No. 1, pp. 181–192. [22]
References
189
Navigli, Roberto (2009). Word sense disambiguation: A survey. ACM Comput.
Surv., Vol. 41, No. 2, pp. 10:1–10:69. ISSN 0360–0300. [87, 88, 90, 152]
Navigli, Roberto and Ponzetto, Simone Paolo (2010). BabelNet: Building a very
large multilingual semantic network. Proceedings of the 48th Annual Meeting
of the Association for Computational Linguistics, pp. 216–225, Association for
Computational Linguistics. [86, 151]
Nejdl, Wolfgang, Wolf, Boris, Qu, Changtao, Decker, Stefan, Sintek, Michael,
Naeve, Ambjörn, Nilsson, Mikael, Palmér, Matthias, and Risch, Tore (2002).
EDUTELLA: a P2P networking infrastructure based on RDF. Proceedings of
the 11th international conference on World Wide Web, pp. 604–615, ACM.[11]
Ngo, Duy Hoa, Bellahsene, Zohra, Coletta, Remi, et al. (2011). YAM++–Results for
OAEI 2011. ISWC’11: The 6th International Workshop on Ontology Matching,
Vol. 814, pp. 228–235. [70]
Ngo, Duy Hoa, Bellahsene, Zohra, and Coletta, R. (2012). YAM++-A combination
of graph matching and machine learning approach to ontology alignment task.
Journal of Web Semantic. [67, 73, 148, 149]
Nguyen, Hung Quoc Viet, Luong, Xuan Hoai, Miklós, Zoltán, Quan, Tho Thanh,
and Aberer, Karl (2013). Collaborative schema matching reconciliation. On the
Move to Meaningful Internet Systems: OTM 2013 Conferences, pp. 222–240.
Springer Berlin Heidelberg. [23]
Niles, Ian and Pease, Adam (2001). Towards a standard upper ontology. Proceedings
of the international conference on Formal Ontology in Information SystemsVolume 2001, pp. 2–9, ACM. [21, 62, 86]
Niles, Ian and Terry, Allan (2004). The MILO: A general-purpose, mid-level ontology. Proceedings of the International conference on information and knowledge
engineering, pp. 15–19. [62, 86]
Noy, Natalya F. (2004). Semantic integration: a survey of ontology-based approaches. ACM Sigmod Record, Vol. 33, No. 4, pp. 65–70. [62, 73]
Noy, Natalya F. and Musen, Mark A (2000). Algorithm and tool for automated
ontology merging and alignment. Proceedings of AAAI-00. [116]
Noy, Natalya F. and Musen, Mark A. (2001). Anchor-PROMPT: Using non-local
context for semantic matching. Proceedings of the ICJAI workshop on ontologies
and information sharing, pp. 63–70. [63, 69, 74, 116]
Noy, Natalya F. and Musen, Mark A. (2002). Promptdiff: A fixed-point algorithm
for comparing ontology versions. AAAI/IAAI, Vol. 2002, pp. 744–750. [10]
Noy, Natalya F. and Musen, Mark A. (2003). The PROMPT suite: interactive tools
for ontology merging and mapping. International Journal of Human-Computer
Studies, Vol. 59, No. 6, pp. 983–1024. [22, 24, 25, 74]
190
References
Noy, Natalya F. and Musen, Mark A. (2004). Ontology versioning in an ontology
management framework. Intelligent Systems, IEEE, Vol. 19, No. 4, pp. 6–13.
[10]
Noy, Natalya F., Griffith, Nicholas, and Musen, Mark A. (2008). Collecting
community-based mappings in an ontology repository. The Semantic WebISWC 2008, pp. 371–386. Springer. [24]
Oard, Douglas W., Hedin, Bruce, Tomlinson, Stephen, and Baron, Jason R. (2008).
Overview of the TREC 2008 legal track. Technical report, DTIC Document.
[21]
Pang-Ning, Tan, Steinbach, Michael, and Kumar, Vipin (2005). Introduction to
Data Mining. Addison Wesley, 1 edition. ISBN 0321321367. [92, 119, 155]
Parent, Christine and Spaccapietra, Stefano (1998). Issues and approaches of
database integration. Communications of the ACM, Vol. 41, No. 5es, pp. 166–
178. [3]
Paulheim, Heiko (2012). WeSeE-Match results for OEAI 2012. Proceedings of The
Seventh ISWC International Workshop on Ontology Matching(OM). [76]
Pedersen, Ted (2006). Unsupervised corpus-based methods for WSD. Word Sense
Disambiguation, pp. 133–166. [88]
Pedersen, Ted, Banerjee, Satanjeev, and Patwardhan, Siddharth (2005). Maximizing semantic relatedness to perform word sense disambiguation. University of
Minnesota Supercomputing Institute Research Report UMSI, Vol. 25, p. 2005.
[90, 95]
Pipattanasomporn, Manisa, Feroze, Hassan, and Rahman, Saifur (2009). Multiagent systems in a distributed smart grid: Design and implementation. Power
Systems Conference and Exposition, 2009. PSCE’09. IEEE/PES, pp. 1–8, IEEE.
[18]
Plessers, Peter and De Troyer, Olga (2005). Ontology change detection using a
version log. The Semantic Web–ISWC 2005, pp. 578–592. Springer. [9]
Pouwelse, Johan, Garbacki, Pawel, Epema, Dick, and Sips, Henk (2005). The bittorrent p2p file-sharing system: Measurements and analysis. Peer-to-Peer Systems
IV, pp. 205–216. Springer. [10]
Po, Laura and Sorrentino, Serena (2011). Automatic generation of probabilistic
relationships for improving schema matching. Information Systems, Vol. 36,
No. 2, pp. 192–208. [87, 91]
Quinlan, John Ross (1986). Induction of decision trees. Machine learning, Vol. 1,
No. 1, pp. 81–106. [139]
References
191
Qu, Yuzhong, Hu, Wei, and Cheng, Gong (2006). Constructing virtual documents
for ontology matching. Proceedings of the 15th international conference on
World Wide Web, WWW ’06, pp. 23–31, ACM, New York, NY, USA. ISBN
1–59593–323–9.[62, 75, 96, 97, 107, 108, 109, 112, 117, 141, 143, 148, 149, 150]
Raffio, Alessandro, Braga, Daniele, Ceri, Stefano, Papotti, Paolo, and Hernandez,
Mauricio A (2008). Clip: a visual language for explicit schema mappings. Data
Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pp.
30–39, IEEE. [22]
Rahm, Erhard and Bernstein, Philip A (2001). A survey of approaches to automatic
schema matching. the VLDB Journal, Vol. 10, No. 4, pp. 334–350. [57, 58, 63,
73, 116]
Rahm, Erhard, Do, HongHai, and Massmann, Sabine (2004). Matching large XML
schemas. ACM SIGMOD Record, Vol. 33, No. 4, pp. 26–31. [64]
Redmond, Timothy, Smith, Michael, Drummond, Nick, and Tudorache, Tania
(2008). Managing Change: An Ontology Version Control System. OWLED.
[9]
Resnik, Philip and Yarowsky, David (1999). Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural
language engineering, Vol. 5, No. 02, pp. 113–133. [90]
Rijsbergen, Cornelis J. Van (1979). Information Retrieval. ISBN 0408709294. [36,
38]
Ritze, Dominique and Eckert, Kai (2012). Thesaurus mapping: a challenge for
ontology alignment? Proceedings of The Seventh International Workshop on
Ontology Matching (OM-2012) collocated with the 11th International Semantic
Web Conference (ISWC-2012), pp. 248–249. [48]
Rogozan, Delia and Paquette, Gilbert (2005). Managing ontology changes on the
semantic web. Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM
International Conference on, pp. 430–433, IEEE. [9]
Rosse, Cornelius and Mejino Jr, José LV (2003). A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Journal of biomedical
informatics, Vol. 36, No. 6, pp. 478–500. [62]
Rumelhart, David E., Hinton, Geoffrey E., and Williams, Ronald J. (1986). Learning
representations by back-propagating errors. Nature, Vol. 323, pp. 533–536.[67]
Russell, Stuart J. and Norvig, Peter (2003). Artificial Intelligence: A Modern Approach. Pearson Education, 2 edition. ISBN 0137903952. [66, 67]
Sabou, Marta, d’Aquin, Mathieu, and Motta, Enrico (2008). Exploring the Semantic Web as Background Knowledge for Ontology Matching. Journal on Data
Semantics XI, Vol. 5383 of Lecture Notes in Computer Science, pp. 156–190.
Springer Berlin Heidelberg. ISBN 978–3–540–92147–9. [21, 63, 116]
192
References
Salton, Gerard and Buckley, Christopher (1988). Term-weighting approaches in
automatic text retrieval. Information Processing and Management, Vol. 24,
No. 5, pp. 513 – 523. ISSN 0306–4573. [100]
Salton, Gerard, Wong, Anita, and Yang, Chung-Shu (1975). A vector space model
for automatic indexing. Communications of the ACM, Vol. 18, pp. 613–620.
ISSN 0001–0782. [92]
Saruladha, K., Aghila, G., and Sathiya, B. (2011). A Comparative Analysis of
Ontology and Schema Matching Systems. International Journal of Computer
Applications, Vol. 34, No. 8, pp. 14–21. Published by Foundation of Computer
Science, New York, USA. [86]
Schadd, Frederik C. and Roos, N. (2011a). Improving ontology matchers utilizing
linguistic ontologies: an information retrieval approach. Proceedings of the 23rd
Belgian-Dutch Conference on Artificial Intelligence (BNAIC 2011), pp. 191–198.
[83]
Schadd, Frederik C. and Roos, N. (2011b). Maasmatch results for OAEI 2011.
Proceedings of The Sixth International Workshop on Ontology Matching (OM2011) collocated with the 10th International Semantic Web Conference (ISWC2011), pp. 171–178. [83]
Schadd, Frederik C. and Roos, N. (2012a). Coupling of WordNet Entries for Ontology Mapping using Virtual Documents. Proceedings of The Seventh International Workshop on Ontology Matching (OM-2012) collocated with the 11th
International Semantic Web Conference (ISWC-2012), pp. 25–36. [83]
Schadd, Frederik C. and Roos, N. (2012b). Maasmatch results for OAEI 2012. Proceedings of The Seventh ISWC International Workshop on Ontology Matching,
pp. 160–167. [119]
Schadd, Frederik C. and Roos, N. (2013). Anchor-Profiles for Ontology Mapping
with Partial Alignments. Proceedings of the 12th Scandinavian AI Conference
(SCAI 2013), pp. 235–244. [45, 115]
Schadd, Frederik C. and Roos, N. (2014a). Anchor-Profiles: Exploiting Profiles of
Anchor Similarities for Ontology Mapping. Proceedings of the 26th BelgianDutch Conference on Artificial Intelligence (BNAIC 2014), pp. 177–178. [115]
Schadd, Frederik C. and Roos, Nico (2014b). Word-Sense Disambiguation for Ontology Mapping: Concept Disambiguation using Virtual Documents and Information Retrieval Techniques. Journal on Data Semantics, pp. 1–20. ISSN
1861–2032. http://dx.doi.org/10.1007/s13740-014-0045-5. [83, 141]
Schadd, Frederik C. and Roos, Nico (2014c). A Feature Selection Approach for
Anchor Evaluation in Ontology Mapping. Knowledge Engineering and the Semantic Web, pp. 160–174. Springer International Publishing. [133]
References
193
Schadd, Frederik C. and Roos, N. (2015). Matching Terminological Heterogeneous
Ontologies by Exploiting Partial Alignments. Proceedings of the 9th International Conference on Advances in Semantic Processing (SEMAPRO 2015).
Accepted Paper. [147]
Schütze, Hinrich (1992). Dimensions of meaning. Proceedings of the 1992
ACM/IEEE Conference on Supercomputing, pp. 787–796, IEEE. [90]
Schütze, Hinrich and Pedersen, Jan O (1995). Information Retrieval Based on Word
Senses. In Proceedings of the 4th Annual Symposium on Document Analysis
and Information Retrieval. [88]
Seddiqui, Md Hanif and Aono, Masaki (2009). An efficient and scalable algorithm for
segmented alignment of ontologies of arbitrary size. Web Semantics: Science,
Services and Agents on the World Wide Web, Vol. 7, No. 4, pp. 344–356. [63,
75, 84, 116, 135]
Sheth, Amit P. and Larson, James A. (1990). Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing
Surveys (CSUR), Vol. 22, No. 3, pp. 183–236. [3]
Sheth, Amit P., Gala, Sunit K., and Navathe, Shamkant B. (1993). On automatic
reasoning for schema integration. International Journal of Intelligent and Cooperative Information Systems, Vol. 2, No. 01, pp. 23–50. [3]
Shokouhi, Milad and Si, Luo (2011). Federated search. Foundations and Trends in
Information Retrieval, Vol. 5, No. 1, pp. 1–102. [7]
Shvaiko, Pavel and Euzenat, Jérôme (2005). A Survey of Schema-Based Matching
Approaches. Journal on Data Semantics IV, Vol. 3730, pp. 146–171. Springer.
ISBN 978–3–540–31001–3. [57, 58, 86, 119, 134]
Shvaiko, Pavel and Euzenat, Jérôme (2008). Ten Challenges for Ontology Matching.
Proceedings of ODBASE 2008, pp. 1164—-1182. [19, 47]
Shvaiko, Pavel and Euzenat, Jérôme (2013). Ontology Matching: State of the Art
and Future Challenges. Knowledge and Data Engineering, IEEE Transactions
on, Vol. 25, No. 1, pp. 158 –176. ISSN 1041–4347. [19, 47]
Shvaiko, Pavel, Giunchiglia, Fausto, Da Silva, Paulo Pinheiro, and McGuinness,
Deborah L. (2005). Web explanations for semantic heterogeneity discovery.
The Semantic Web: Research and Applications, pp. 303–317. Springer. [22]
Sicilia, Miguel A., Garcia, Elena, Sanchez, Salvador, and Rodriguez, Elena (2004).
On integrating learning object metadata inside the OpenCyc knowledge base.
Advanced Learning Technologies, 2004. Proceedings. IEEE International Conference on, pp. 900–901, IEEE. [86]
Solimando, Alessandro, Jiménez-Ruiz, Ernesto, and Pinkel, Christoph (2014). Evaluating Ontology Alignment Systems in Query Answering Tasks. International
Semantic Web Conference (ISWC). Poster track. [50]
194
References
Spaccapietra, Stefano and Parent, Christine (1994). View integration: A step forward in solving structural conflicts. Knowledge and Data Engineering, IEEE
Transactions on, Vol. 6, No. 2, pp. 258–274. [3]
Sparck Jones, Karen (1972). A statistical interpretation of term specificity and its
application in retrieval. Journal of documentation, Vol. 28, No. 1, pp. 11–21.
[100]
Sproat, Richard, Hirschberg, Julia, and Yarowsky, David (1992). A corpus-based
synthesizer. Proceedings of the International Conference on Spoken Language
Processing, Vol. 92, pp. 563–566. [89]
Stoilos, Giorgos, Stamou, Giorgos, and Kollias, Stefanos (2005). A string metric
for ontology alignment. The Semantic Web–ISWC 2005, pp. 624–637. Springer.
[75]
Strube, Michael and Ponzetto, Simone Paolo (2006). WikiRelate! Computing semantic relatedness using Wikipedia. AAAI, Vol. 6, pp. 1419–1424. [62, 85]
Stuckenschmidt, Heiner and Klein, Michel (2004). Structure-based partitioning of
large concept hierarchies. The Semantic Web–ISWC 2004, pp. 289–303. [53]
Subrahmanian, V.S., Adali, Sibel, Brink, Anne, Emery, Ross, Lu, James J, Rajput,
Adil, Rogers, Timothy J, Ross, Robert, and Ward, Charles (1995). HERMES:
A heterogeneous reasoning and mediator system. [5, 8]
Suchanek, Fabian M, Kasneci, Gjergji, and Weikum, Gerhard (2007). Yago: a core
of semantic knowledge. Proceedings of the 16th international conference on
World Wide Web, pp. 697–706, ACM. [62]
Suchanek, Fabian M., Kasneci, Gjergji, and Weikum, Gerhard (2008). Yago: A large
ontology from wikipedia and wordnet. Web Semantics: Science, Services and
Agents on the World Wide Web, Vol. 6, No. 3, pp. 203–217. [85]
Sure, York, Staab, Steffen, and Studer, Rudi (2002). Methodology for development
and employment of ontology based knowledge management applications. ACM
SIGMOD Record, Vol. 31, No. 4, pp. 18–23. [153]
Su, Xiaomeng and Gulla, Jon Atle (2004). Semantic Enrichment for Ontology Mapping. Natural Language Processing and Information Systems: 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, Salford, UK, June 23-25, 2004, Proceedings, Vol. 3136, p.
217, Springer. [149]
Sycara, Katia, Paolucci, Massimo, Ankolekar, Anupriya, and Srinivasan, Naveen
(2003). Automated discovery, interaction and composition of semantic web
services. Web Semantics: Science, Services and Agents on the World Wide
Web, Vol. 1, No. 1, pp. 27–46. [13]
References
195
Thornton, Chris (1998). Separability is a learner’s best friend. 4th Neural Computation and Psychology Workshop, London, 9–11 April 1997, pp. 40–46, Springer.
[139]
Tran, Thanh, Cimiano, Philipp, Rudolph, Sebastian, and Studer, Rudi (2007).
Ontology-based interpretation of keywords for semantic search. The Semantic Web, pp. 523–536. Springer. [16]
Trojahn, Cássia, Meilicke, Christian, Euzenat, Jérôme, and Stuckenschmidt, Heiner
(2010). Automating oaei campaigns (first report). Proceedings of the International Workshop on Evaluation of Semantic Technologies (IWEST). [46]
Trojahn, Cássia, Euzenat, Jérôme, Tamma, Valentina, and Payne, Terry R. (2011).
Argumentation for reconciling agent ontologies. Semantic Agent Systems, pp.
89–111. Springer. [19, 23]
Valtchev, Petko and Euzenat, Jérôme (1997). Dissimilarity measure for collections
of objects and values. Advances in Intelligent Data Analysis Reasoning about
Data, pp. 259–272. Springer. [61]
Wache, Holger, Voegele, Thomas, Visser, Ubbo, Stuckenschmidt, Heiner, Schuster,
Gerhard, Neumann, Holger, and Hübner, Sebastian (2001). Ontology-based
integration of information-a survey of existing approaches. IJCAI-01 workshop:
ontologies and information sharing, Vol. 2001, pp. 108–117. [73]
Walker, Jan, Pan, Eric, Johnston, Douglas, Adler-Milstein, Julia, Bates, David W,
and Middleton, Blackford (2005). The value of health care information exchange
and interoperability. HEALTH AFFAIRS-MILLWOOD VA THEN BETHESDA
MA-, Vol. 24, p. W5. [204, 205]
Wang, Jun and Gasser, Les (2002). Mutual online ontology alignment. Proceedings
of the Workshop on Ontologies in Agent Systems, held with AAMAS 2002.[19]
Wang, Shenghui, Englebienne, Gwenn, and Schlobach, Stefan (2008). Learning
concept mappings from instance similarity. The Semantic Web-ISWC 2008, pp.
339–355. Springer. [14, 60]
Wang, Chang, Kalyanpur, Aditya, Fan, James, Boguraev, Branimir K., and Gondek,
DC (2012). Relation extraction and scoring in DeepQA. IBM Journal of Research and Development, Vol. 56, No. 3.4, pp. 9–1. [17]
Watters, Carolyn (1999). Information retrieval and the virtual document. Journal
of the American Society for Information Science, Vol. 50, pp. 1028–1029. ISSN
0002–8231. [89]
Weaver, Warren (1955). Translation. Machine translation of languages, Vol. 14, pp.
15–23. [88]
Wiesman, Floris and Roos, Nico (2004). Domain independent learning of ontology mappings. Proceedings of the Third International Joint Conference on
Autonomous Agents and Multiagent Systems-Volume 2, pp. 846–853, IEEE
Computer Society. [19]
196
References
Wiesman, Floris, Roos, Nico, and Vogt, Paul (2002). Automatic ontology mapping
for agent communication. Proceedings of the first international joint conference
on Autonomous agents and multiagent systems: part 2, pp. 563–564, ACM.[19]
Wilkes, Yorick (1975). Preference semantics. Formal Semantics of Natural Language (ed. E.L. Keenan), pp. 329–348., Cambridge University Press. [88]
Xiao, Bo and Benbasat, Izak (2007). E-commerce product recommendation agents:
use, characteristics, and impact. Mis Quarterly, Vol. 31, No. 1, pp. 137–209.
[18]
Yarowsky, David (1994). Decision lists for lexical ambiguity resolution: Application
to accent restoration in Spanish and French. Proceedings of the 32nd annual
meeting on Association for Computational Linguistics, pp. 88–95, Association
for Computational Linguistics. [89]
Zhdanova, Anna V. and Shvaiko, Pavel (2006). Community-driven ontology matching. The Semantic Web: Research and Applications, pp. 34–49. Springer. [23]
References
197
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Example illustration of a schema integration task. . . . .
Example illustration of an information integration task. .
Example of an ontology engineering task. . . . . . . . . .
Information sharing in a hybrid decentralized P2P system.
Mapping tasks in a web-service composition scenario. . . .
Mapping in an information system receiving NL queries. .
Mapping in an agent communication scenario. . . . . . . .
.
.
.
.
.
.
.
4
6
10
12
15
16
18
2.1
2.2
Example mapping between two small ontologies. . . . . . . . . . . .
Example mapping between two small ontologies. The mapping models
different semantic relation types and includes confidence values for
each correspondence. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Visualization of the ontology mapping process . . . . . . . . . . . . .
Visualization of the interaction between the example alignment and
the example reference. . . . . . . . . . . . . . . . . . . . . . . . . . .
Precision-Recall graph . . . . . . . . . . . . . . . . . . . . . . . . . .
Precision-Recall graph. Includes a curve of interpolated precisions for
all possible recall values (red) and a curve of interpolated precisions
at the standard 11 recall values (green). . . . . . . . . . . . . . . . .
Visualization of the dynamics between output, reference and partial
alignments of the example. . . . . . . . . . . . . . . . . . . . . . . .
28
2.3
2.4
2.5
2.6
2.7
3.1
3.2
3.3
3.4
3.5
3.6
4.1
4.2
4.3
4.4
4.5
4.6
4.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Basic architecture of an ontology mapping framework. . . . . . . . .
An iterative mapping system. . . . . . . . . . . . . . . . . . . . . . .
A sequential composition of mapping systems. . . . . . . . . . . . . .
A parallel composition of mapping systems. . . . . . . . . . . . . . .
Classification of concept mapping approaches. The classification is
hierarchically structured, with the top level distinguishing the input
interpretation and the bottom level featuring input scope. . . . . . .
Illustration of a neural network. . . . . . . . . . . . . . . . . . . . . .
Histogram showing the number of words in WordNet (y-axis) that
have a specific number of senses (x-axis). . . . . . . . . . . . . . . . .
Example ontology for the construction of a virtual document. . . . .
Evaluation of disambiguation policies using the lexical similarity lsm1
on the OAEI 2011 Conference data set. . . . . . . . . . . . . . . . .
Evaluation of disambiguation policies using the lexical similarity lsm2
on the OAEI 2011 Conference data set. . . . . . . . . . . . . . . . .
Evaluation of disambiguation policies using the lexical similarity lsm3
on the OAEI 2011 Conference data set. . . . . . . . . . . . . . . . .
Results of MaasMatch in the OAEI 2011 competition on the conference data set, compared against the results of the other participants
Results of MaasMatch in the OAEI 2011 competition on the benchmark data set, compared against the results of the other participants
29
34
37
41
42
45
52
54
54
55
59
68
87
99
102
102
103
104
105
198
References
4.8
4.9
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6.1
6.2
6.3
6.4
6.5
6.6
7.1
Precision versus Recall graph of the created alignments from the conference data set using the lexical similarities with the virtual document109
Precision versus Recall graph of the created alignments from the conference data set using the document similarities of the virtual documents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Two equivalent concepts being compared to a series of anchors. . . .
Visualization of an anchor profile similarity. . . . . . . . . . . . . . .
Overview of the tested mapping system. . . . . . . . . . . . . . . . .
Corrected precision of the proposed approach for the different task
groups of the benchmark dataset. Each group contrasts the performance of different partial alignment recall levels. . . . . . . . . . . .
Corrected recall of the proposed approach for the different task groups
of the benchmark dataset. Each group contrasts the performance of
different partial alignment recall levels. . . . . . . . . . . . . . . . . .
Corrected F-measure of the proposed approach for the different task
groups of the benchmark dataset. Each group contrasts the performance of different partial alignment recall levels. . . . . . . . . . . .
Adapted precision of the anchor profile approach using simA and
sim∗A as anchor similarities. . . . . . . . . . . . . . . . . . . . . . . .
Adapted recall of the anchor profile approach using simA and sim∗A
as anchor similarities. . . . . . . . . . . . . . . . . . . . . . . . . . .
Adapted F-measure of the anchor profile approach using simA and
sim∗A as anchor similarities. . . . . . . . . . . . . . . . . . . . . . . .
Illustration of the anchor filtering process when mapping with partial
alignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example scenarios of an anchor A being compared to either correct
matches, illustrating the expected semantic difference between anchors and given correspondences. . . . . . . . . . . . . . . . . . . .
Four example scenarios of an anchor A being compared to incorrect
matches, illustrating the irregularity in the expected semantic difference between anchors and given correspondences. . . . . . . . . . .
Precision vs. recall of the rankings created using a syntactic similarity weighted by the evaluated feature selection methods. The unweighted variant of the syntactic similarity is used as baseline. . . .
Precision vs. recall of the rankings created using a structural similarity weighted by the evaluated feature selection methods. The unweighted variant of the structural similarity is used as baseline. . . .
Precision vs. recall of the rankings created using a lexical similarity weighted by the evaluated feature selection methods. The unweighted variant of the lexical similarity is used as baseline. . . . . .
117
119
120
123
123
124
125
126
126
134
136
137
142
143
144
Illustration of the typical range of exploited information of a profile
similarity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
References
7.2
7.3
199
Illustration of a terminological gap between two ontologies modelling
identical concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Two equivalent concepts being compared to a series of anchors. . . . 154
List of Tables
2.1
Sorted example correspondences with their respective thresholds and
resulting F-measures. . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.1
3.1
3.1
3.1
Overview of ontology mapping systems. . . . . . . .
(Continued) Overview of ontology mapping systems.
(Continued) Overview of ontology mapping systems.
(Continued) Overview of ontology mapping systems.
78
79
80
81
4.1
Term weights for the document representing the concept Car, according to the example ontology displayed in Figure 4.2. . . . . . . . . .
Evaluation on the conference 2013 data set and comparison with
OAEI 2013 frameworks. . . . . . . . . . . . . . . . . . . . . . . . . .
Optimized parameter sets for the VD model when applied to a LSM
(Lex) and profile similarity (Prof) using the conference (C) and benchmark (B) data sets as training sets. . . . . . . . . . . . . . . . . . . .
Runtimes of the different elements of the lexical similarity on the
conference dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Runtimes of the different elements of the lexical similarity for each
disambiguation policy. . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
4.3
4.4
4.5
5.1
5.2
5.3
5.4
5.5
7.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Results of the evaluations on the benchmark-biblio dataset using different recall requirements for the randomly generated partial alignments. For each recall requirement, 100 evaluations were performed
and aggregated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adapted precision P ∗ (A, R) of the Anchor Profile approach for varying recall and precision levels of the input partial alignment. . . . . .
Adapted recall R∗ (A, R) of the Anchor Profile approach for varying
recall and precision levels of the input partial alignment. . . . . . . .
Adapted F-measure F ∗ (A, R) of the Anchor Profile approach for varying recall and precision levels of the input partial alignment. . . . . .
Comparison of the Anchor-Profile approach, using two different PA
thresholds, with the 8 best performing frameworks from the OAEI
2012 competition. An asterisk indicates the value has been adapted
with respect to PA, while the values inside the brackets indicate the
respective measure over the entire alignment. . . . . . . . . . . . . .
99
106
107
111
112
122
127
128
128
129
Aggregated adapted Precision, Recall and F-Measure when evaluating all variations of our approach on a selection of tasks from the
Benchmark dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
200
References
7.2
7.3
7.4
A
B
C
Aggregated adapted Precision, Recall and F-Measure when evaluating
all variations of our approach on the MultiFarm dataset. . . . . . . . 160
Aggregated adapted Precision, Recall and F-Measure on the MultiFarm dataset when varying the Recall of the supplied partial alignment.161
Comparison between the performance of our approach and the competitors of the 2014 OAEI competition on the MultiFarm dataset.
Performances of approaches utilizing partial alignments are denoted
in adapted precision, recall and F-Measure. . . . . . . . . . . . . . . 163
Annual cost of inadequate interoperability in the US capital facility industry by cost category, by stakeholder group (in $Millions)(Gallaher
et at., 2004). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Aggregate of estimated annual interoperability costs of the US automotive industry (Brunnermeier and Martin, 2002). . . . . . . . . . . 204
Estimated net value of deployment of HIEI systems, according to
different levels of sophistication, in the US health care industry(in
$Billions)(Walker et al., 2005 ). . . . . . . . . . . . . . . . . . . . . . 205
List of Algorithms
3.1
3.2
4.1
5.1
Naive descending algorithm pseudo-code . . . . . .
Naive ascending algorithm pseudo-code . . . . . .
Lexical similarity with disambiguation pseudo-code
Anchor-Profile Similarity . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 72
. 73
. 93
. 118
Addendum: Valorization
Remark: This addendum is required by the regulation governing the attainment of
doctoral degrees (2013) of Maastricht University. As stated there, the addendum
”does not form part of the dissertation and should not be assessed as part of the
dissertation”.
The ability to transfer data between information systems, also referred to as
interoperability between systems, presents an ever-growing issue in a society which
adopts electronic solutions in a growing range of domains. If this data is not standardized, it becomes necessary to apply a transformation to the data, based on a
given mapping, in order to make the data transfer possible. In section 1.2 we introduced a wide range of applications for the presented research of this thesis, where
it is required for data to be exchanged between information systems. These application include schema integration, information integration, ontology engineering,
information sharing, web-service composition, querying of semantic information and
agent communication.
In the following sections, we will introduce three real-world domains, namely (1)
the US Capital Facility Industry, (2) the US Automotive Industry and (3) the US
Health Care Industry, which regularly face interoperability issues. These issues are
typically resolved via conventional means, resulting in operational inefficiencies and
added costs. Examples of such conventional means are transforming and entering
data into different systems by hand, redesigning information systems due to incompatibility or outsourcing information exchange responsibilities to third parties. We
present the results of three scientific studies which attempted to quantify the annual
costs, that these domains are forced to compensate for, due to unresolved interoperability issues. Interoperability costs are compiled by estimating factors such
as added labour costs of data transformation and verification, added labour costs
of reworking and redesigning ongoing projects due to unexpected incompatibilities,
purchase costs of new systems and resources, delay costs and lost revenue costs.
202
Addendum: Valorization
Cost Estimate: US Capital Facility Industry
The so called Capital Facility industry is a component of the entire US construction
industry. The core activities of this industry encompass the design, construction and
maintenance of large buildings, facilities and plants. These buildings are ordered by
commercial, industrial and institutional sectors. Due to the large scale of typically
requested buildings, the capital facility industry has large data requirements. Examples of data exchange in this sector is the sharing of data among all stakeholders,
possibly across several information systems, and the integration of multi-vendor
equipment and systems. Due to these requirements, the capital facility industry is
particularly vulnerable to interoperability issues.
In 2004, the Building and Fire Research Laboratory and the Advanced Technology Program at the National Institute of Standards and Technology (NIST) issued a study to estimate the inefficiencies caused by interoperability issues between
computer-aided design, engineering and software systems. This study has been performed by RTI International and the Logistic Management Institute (Gallaher et al.,
2004). The following stakeholders were identified which typically face direct interoperability issues during the execution of a project:
Architects and Engineers covering architects, general and speciality engineers,
and facilities consultancies.
General Contractors covering general contractors tasked with physical construction and project management.
Speciality Fabricators and Suppliers covering speciality constructors and systems suppliers, including elevators, steel, and HVAC systems.
Owners and Operators covering the entities that own and/or operate the facilities.
Participants from each stakeholder group contributed to the study through interviews with the experimenters or by completing surveys. The participants were
tasked to quantify their incurred interoperability costs by listing which activities
they perform in order to resolve these issues. By extrapolating the costs associated
with these activities, a cost estimate could then be established. The activities and
their associated costs were grouped into three categories: (1) avoidance costs, (2)
mitigation costs and (3) delay costs. Examples of avoidance costs are the costs of
outsourcing of translation services to third parties, investing in in-house programs
and the costs of purchasing, maintaining and training for redundant computer-aided
design and engineering systems. Mitigation costs typically involve the costs associated with rework of designs or construction, re-entering data when automated
transfer systems fail and information verification. Examples of delay costs are costs
of idle resources due to delays, lost profits due to delay in revenues and losses to
customers and consumers due to project delays. An overview of the estimated costs
for each stakeholder group can be viewed in Table A.
Addendum: Valorization
203
Stakeholder Group
Avoidance
Costs
Mitigation
Costs
Delay
Costs
Total
Architects and Engineers
General Contractors
Speciality Fabricators
and Suppliers
Owners and Operators
485,3
1,095.4
684.5
693.3
13.0
1,169.8
1,801.7
1,908.4
296.1
-
2,204.5
3,120.0
6,028.2
1,499.8
10,648.0
All Stakeholders
6,609.1
7,702.0
1,512.8
15,824.0
Table A: Annual cost of inadequate interoperability in the US capital facility industry
by cost category, by stakeholder group (in $Millions)(Gallaher et at., 2004).
As we can see in Table A, the capital facility industry has to compensate for
substantial interoperability costs. The total costs are estimated at $15.8 billion
annually, which corresponds to approximately 3-4% of the entire industry’s annual
revenue.
Cost Estimate: US Automotive Industry
In 2002, the Research Triangle Institute conducted a study for the National Institute of Standards and Technology (NIST) in order to quantify to what degree the
US Automotive supply chain suffers from interoperability issues (Brunnermeier and
Martin, 2002). Similar to the previous study, the experimenters surveyed different stakeholders across the industry about typically faced interoperability problems.
The costs of the provided answers are extrapolated in order to estimate the severity of these costs across the entire automotive industry. This estimate is referred
to as the Cost Component Approach. The experimenters also interviewed several
key industry executives about their viewpoints. The executives provided their own
estimates of the incurred interoperability costs, allowing for the inclusion of costs
which might not have been considered by the experimenters. This method of cost
estimation of referred to as the Aggregate Cost Approach. An additional benefit of
consulting industry executives is that it validates the results of the Cost Component
Approach if their results are similar to a certain degree. The cost estimates of both
approaches can be viewed in Table B.
The results of both estimates depict that the automotive industry suffers significant monetary losses due to interoperability issues. According to the Cost Component Approach $1.05 billion are lost yearly, while the Aggregate Cost Approach
resulted in a cost estimate of $1.015 billion.
204
Addendum: Valorization
Source of Cost
Annual Cost
($Millions)
Percentage
Cost Component Approach
Avoidance Costs
Mitigation Costs
Delay Costs
52.8
907.6
90.0
5
86
9
Total
1,050.4
100
Aggregate Cost Approach
Interoperability Cost
Delayed Profits
925.6
90.0
91
9
Total
1,015.6
100
Table B: Aggregate of estimated annual interoperability costs of the US automotive
industry (Brunnermeier and Martin, 2002).
Net Value Estimate: US Health Care Industry
The health care domain sees an every increasing use of information technology. Examples are the storage of patient data in electronic medical records, computerized
physician order entry systems and decision support tools. Facilitating easy information exchange between these systems would result in lower transaction costs of
information exchange, increased operating efficiency and a higher quality of service
due to fewer transaction mistakes and easier access to critical medical data. Additionally, most healthcare facilities still store patient information on paper-based
formats. Therefore, every time paper-based data needs to be transferred to a different stakeholder, it is necessary that it is entered into the information system by
hand, resulting in a huge operating inefficiency.
Walker et al. (2005) investigated what the net value of a fully implemented health
care information exchange and interoperability (HIEI) system would be. This study
weighed the estimated interoperability cost savings of the US health care domain
against the estimated project costs of a full roll-out of a HIEI system. The study
defined four levels of interoperability and estimated the net value of achieving each
level. The levels are defined as follows:
Level 1: Non-electronic data. No use of IT to share information. This level represents the operational efficiency of the health care system prior to the introduction of IT and serves as baseline comparison for determining the benefits
of the other levels.
Level 2: Machine transportable data. Transfer on non-standardized data via basic
IT channels. Data cannot be manipulated by machines (e.g. exchange of
scanned documents via fax or PDF files).
Level 3: Machine organizable data. Transfer of non-standardized data via struc-
Addendum: Valorization
205
tured messages. Requires mappings such that data conforming to different
standards can be interpreted by each local system. Still requires transferred
data to be verified due to the risk of imperfect mappings.
Level 4: Machine interpretable data. Transfer of standardized data via structured
messages, allowing data to be transferred to and understood by all local systems.
Level 2 systems are already universally implemented amongst all health-care institutions, therefore requiring no costs of implementation. The costs of adopting level
3 and 4 were estimated by compiling various cost estimates for the sub-components
of each level from different sources, such as established scientific studies, the US
census bureau and expert-panel judgements (Walker et al., 2005). The aggregate
cost estimates of the HIEI implementations and their resulting net values are listed
in Table C.
Implementation,
cumulative years 1–10
Steady state,
annual starting year 11
Level 2
Benefit
Cost
Net Value
141.0
0.0
141.0
21.6
0.0
21.6
Level 3
Benefit
Cost
Net Value
286
320
–34.2
44.0
20.2
23.9
Level 4
Benefit
Cost
Net Value
613
276
337
94.3
16.5
77.8
Table C: Estimated net value of deployment of HIEI systems, according to different
levels of sophistication, in the US health care industry(in $Billions)(Walker et al.,
2005 ).
The result of the study indicated that a nationwide adoption of level 4 HIEI
system would result in an annual net gain of $77.8 billion after the initial implementation phase. This corresponds to approximately 5% of the annual US expenditures
on health care.
206
Addendum: Valorization
Summary
The availability of data plays an ever increasing role in our society. Businesses commonly store information about their customers, transactions and products in large
information systems. This allows them to analyse their data to gain more knowledge, such as trends and predictions, in order to improve their business strategy.
Furthermore, the core strategy of a business can be built on enabling the user to
easily access a certain type of data. Such services play an increasing role in common
every-day life. For example, services such as Google and Wikipedia are widely used
to find general information, whereas services such as Amazon, bol.com and Yelp are
used to find information and reviews about products. Some of these site also allow
the user to purchase the queried products on the same site. To be able to interpret
stored data, it is necessary that the data is structured and annotated with metainformation, such that for each data entry it is possible to determine its meaning
and relation to other data entries. For example, a data entry ‘555-12345’ has very
little use if it is not known that it represents a telephone number and who the owner
of the number is. An information system specifies this type of meanings and their
structure using an ontology. An ontology specifies the types of objects, referred to as
concepts, about which one intends to store information, what kind of data is stored
for each concept and how the concepts are related to each other.
A common problem faced by businesses is the desire to be able to exchange
information between different systems. An example scenario would be Company A
deciding to acquire Company B. To continue the operations of Company B, Company
A would need to transfer all the data of the information system of Company B into its
own information system. Here, it can occur that the data in the information systems
of both companies is modelled using different ontologies. This can stem from the
companies having different requirements for their systems or having followed separate
design principles in the creation of their ontologies. In this case, it is not possible
to simply transfer data between the systems since these are incompatible.
A possible solution for enabling the exchange of information between systems
utilizing different ontologies is the process of ontology mapping. Ontology mapping
aims to identify all pairs of concepts between two ontologies which are used to model
the same type of information. A full list of correspondences between two ontologies
is known as an alignment or mapping. Based on such a mapping, it is possible to
create a transfer function such that every data entry part of one ontology can be
re-formulated such that it conforms to the specification of the other ontology. This
allows for the transfer of data between two information systems despite the systems
208
Summary
using different ontology structures.
Mapping ontologies is a labour intensive task. To create a mapping, a domain
expert has to manually define and verify every correspondence. This approach is
infeasible when having to map large ontologies encompassing thousands of concepts.
Hence, automatic approaches to ontology mapping are required in order to solver
interoperability problems in the corporate domain. A different domain of application is the Semantic Web. This domain envisions the next step in the evolution of
the world-wide-web, where all available information is machine readable and semantically structured. This semantic structure is also specified using an ontology and
allows machines to gather semantic information from the web. However, in order to
retrieve semantic information autonomously, a machine needs to be capable to also
autonomously match ontologies. This is necessary such that the machine can query
sources which represent their information using a different semantic structure.
Ontology mapping has been an active field of research in the past decade. Here,
matching systems typically utilize a combination of techniques to determine the
similarity between concepts. From these computations, highly similar concepts are
extracted which then form the alignment between the given ontologies. In some
situations, it is possible that an extra resource of information is available that can
be exploited to aid the matching process. An example of such extra information
are lexical resources, for instance Wikipedia. A lexical resource allows a system
to look up word definitions, identify synonyms and look up information of related
concepts. A different example resource are partial alignments. A partial alignment
is an incomplete mapping stemming from an earlier matching effort. It can be the
result of a domain expert attempting to create a mapping, but being unable to finish
it due to time constraints. A core challenge within the field of ontology mapping
thus is to devise techniques which can use these resources for the purpose of creating
a complete mapping. This has led us to the following problem statement:
How can we improve ontology mapping systems by exploiting auxiliary
information?
To tackle this problem statement, we formulated four research questions upon
which we based our research:
1. How can lexical sense definitions be accurately linked to ontology concepts?
2. How can we exploit partial alignments in order to derive concept correspondences?
3. How can we evaluate whether partial alignment correspondences are reliable?
4. To what extent can partial alignments be used in order to bridge a large terminological gap between ontologies?
In Chapter 1 we introduce the reader to the field of ontology mapping. Here,
we introduce the problems that arise when attempting to transfer data between
knowledge systems with different ontologies. Further, we present a series of realworld domains which can benefit from the research of this thesis, such as information
Summary
209
integration, web-service composition and agent communication. We also present a
brief overview of the core research challenges of the field of ontology mapping. In
the final section of the chapter, we introduce and discuss the problem statement and
research questions which guide this thesis.
Chapter 2 provides important background information to the reader. We formally introduce the problem of ontology matching. Further, we detail and illustrate
common techniques that are applied for the purpose of ontology alignment evaluation. Lastly, we introduce a series of datasets which can be used in order to evaluate
ontology matching systems.
Applicable techniques to ontology matching are introduced in Chapter 3. Here,
we introduce the reader to contemporary ontology matching systems and their underlying architectures. We introduce the three core tasks that a matching systems
has to perform, being similarity computation, similarity combination and correspondence extraction, and provide an overview of techniques which are applicable for
these respective tasks. Additionally, we provide a brief survey of existing ontology
matching systems with the focus on systems utilizing auxiliary resources.
Chapter 4 answers the first research question. Here, the core problem concerns
the linking of correct lexical definitions to the modelled concepts of an ontology,
referred to as concept disambiguation. An example of such an action is determining
that the concept name ‘Plane’ refers to the type of mathematical surfaces instead
of the type of airborne vehicles. Techniques utilizing lexical resources rely on these
links to determine concept similarities using various techniques. We tackle this research question by proposing an information retrieval based method for associating
ontology concepts with lexical senses. Using this method, we define a framework for
the filtering of concept senses based on sense-similarity scores and a given filtering
policy. We evaluate four filtering policies which filter senses if their similarity scores
resulted in an unsatisfactory value. The filtering policies are evaluated using several
lexical similarities in order to investigate the general effects of the filtering policies
on the lexical similarities. Our evaluation revealed that the application of our disambiguation approach improved the performances of all lexical metrics. Additionally,
we investigated the effect of weighting the terms of the sense annotations and concept
annotations. This evaluation revealed that weighting terms according to respective
origins within the ontologies or lexical resource resulted in a superior performance
compared to weighting the document terms using the widely used TF-IDF approach
from the field of information retrieval.
The research question tackled in Chapter 5 concerns the exploitation of partial
alignments. The core problem here is that a matching system is given an incomplete
alignment, referred to as partial alignment, and has to compute the remaining correspondences to create a full mapping. For this purpose, one has to create mapping
techniques which utilize the information of the individual partial alignment correspondences, referred to as anchors, in order to improve the mapping quality. To
answer this question, we propose a method which compares concepts by measuring
their similarities with the provided anchor concepts. For each concept, its measurements are compiled into an anchor-profile. Two concepts are then considered
similar if their anchor-profiles are similar, i.e. they exhibit comparable degrees of
similarity towards the anchor concepts. The evaluation revealed that the applica-
210
Summary
tion of our approach can result into performances similar to top matching systems
in the field. However, we observe that the performance depends on the existence
of appropriate meta-information which is used to compare concepts with anchors.
From this, we conclude that a combination of similarity metrics, such that all types
of meta-information are exploited, should be used to ensure a high performance for
all types of matching problems. Lastly, we systematically investigate the effect of
the partial alignment size and correctness on the quality of the produced alignments.
We observe that both size and correctness have a positive influence on the alignment
quality. We observe that decreasing the degree of correctness has a more significant
impact on the alignment quality than decreasing the size. From this we conclude
that matching systems exploiting partial alignments need to take measures to ensure
the correctness of a given partial alignment from an unknown source.
Chapter 6 addresses research question 3, which presents the problem of ensuring
the correctness of a partial alignment. Some techniques exploit partial alignments
for the purpose of ontology mapping. An example of such a technique is the approach presented in Chapter 5. In order for these techniques to function correctly,
it is necessary that the given partial alignment contains as few errors as possible.
To evaluate the correctness of a given partial alignment, we propose a method utilizing feature evaluation techniques from the field of machine learning. To apply
such techniques, one must first define a feature space. A feature space is a core
mathematical concept describing a space that is spanned by inspecting n different
variables. For example, taking the variables ‘height’, ‘width’ and ‘depth’ would
span a 3-dimensional feature space. Plotting the respective values for each feature
of different objects would thus allow us to inspect the differences between the objects with regard of their physical location and perform analytical tasks based on
this data. In the field of machine learning, a feature space is not restricted by the
amount or types of features. Therefore, one can span a feature space using any
amount of features modelling quantities such as position, size, age, cost, type or
duration. A core task in the field of machine learning is classification, where one
must designate a class to an object for which the values for each feature are known.
An example of such a task is determining whether a person is a reliable debtor given
his or her income, employment type, age and marital status. A classification system
does this by first analysing a series of objects for which the class values are already
known. Feature evaluation techniques help the designer of a specific classification
system to determine the quality of a feature with respect to the classification task
by analysing the pre-classified objects. For our approach, we utilize this work-flow
in order to design an evaluation system for a series of anchors. We span a feature
space where every feature represents the result of a consistency evaluation between
a specific anchor and a given correspondence. Using a selection of feature evaluation
techniques, we then measure the quality of each feature and therefore the quality of
its corresponding anchor. To generate the consistency measurements, we define a
metric requiring a base similarity, for which we evaluate three types of similarities:
a syntactical, a profile and a lexical similarity. For each type of similarity, we evaluate our approach against a baseline ranking, which is created by directly applying
the same similarity on each anchor. Our evaluation revealed that our approach was
able to produce better anchor evaluations for each type of similarity metric than
Summary
211
the corresponding baseline. For the syntactic and lexical similarities we observed
significant improvements.
The research presented in Chapter 7 tackles the fourth research question. This
chapter focuses on a specific kind of matching problems, being ontologies which have
very little terminology in common. Many matching techniques rely on the presence of
shared or very similar terminology in order to decide whether two concepts should
be matched. These techniques fail to perform adequately if the given ontologies
use different terminology to model the same concepts. Existing techniques circumvent this problem by adding new terminology to the concept definitions. The new
terms can be acquired by searching a lexical resource such as WordNet, Wikipedia or
Google. However, if an appropriate source of new terminology is not available then it
becomes significantly harder to match these ontologies. We investigate a possible alternative by proposing a method exploiting a given partial alignment. Our approach
is built upon an existing profile similarity. This type of similarity exploits semantic
relations in order to gather context information which is useful for matching. Our
extension allows it to exploit the semantic relations that are specified in the partial
alignment as well. The evaluation reveals that our approach can compute correspondences of similar quality exploiting only partial alignments as existing frameworks
using appropriate lexical resources. Furthermore, we establish that a higher performance is achievable if both lexical resources and partial alignments are exploited by
a mapping system.
Chapter 8 provides the conclusions of this thesis and discusses possibilities of
future research. Taking the answers to the research questions into account, we
conclude that there are a multitude of ways in which auxiliary resources can be
exploited in order to aide ontology matching systems. First, lexical resources can
be effectively exploited when applying a virtual document-based disambiguation
policy. Second, through the creation of anchor-profiles it is possible to exploit partial
alignments to derive similarity scores between concepts. Third, by using a featureevaluation approach one can evaluate anchors to ensure approaches utilizing partial
alignments perform as expected. Fourth, by extending profile similarities, such that
these also exploit anchors, one can match ontologies with little to no terminological
overlap.
When discussing future research, we identify several key areas which should be
investigated to improve the applicability of the presented work. First, research efforts should be directed into the robustness of the approaches. For example, the
disambiguation approach of Chapter 4 relies on the presence of terminological information to be able to identify senses. If this information is sparse or lacking all
together, then the effectiveness of this approach can be affected. A solution could
be the combination of multiple disambiguation approaches. Based on the available
meta-information, a decision system could determine which approach is best suited
for each ontology. A different area for future research would be the generation of
reliable partial alignments. If a partial alignment does not exist for a given matching
problem, then generating one during run-time would enable a matching system to
use techniques which require the existence of such alignments. This would allow for
the presented techniques of this thesis to be applicable to a wider group matching
problems.
212
Summary
Samenvatting
De beschikbaarheid van data speelt een steeds belangrijkere rol in onze samenleving. Bedrijven gebruiken vaak informatiesystemen om informatie op te slaan over
hun klanten, transacties en producten. Daardoor is het mogelijk om deze data te
analyseren en om zo meer kennis te (her)gebruiken door bijvoorbeeld voorspellingen
te doen aan de hand van trends, zodanig dat bedrijven hun handelsstrategieën kunnen verbeteren. Sterker nog, een bedrijf kan zich richten op het toegankelijk maken
van databronnen voor consumenten. Dit soort diensten speelt een steeds grotere rol
in het alledaagse leven. Diensten zoals Google en Wikipedia worden doorgaans gebruikt om algemene informatie te vinden. Gespecialiseerde diensten zoals Amazon,
bol.com en Yelp worden gebruikt om informatie en beoordelingen van producten te
vinden en om deze producten zelfs te kopen. Om opgeslagen data te kunnen interpreteren is het noodzakelijk dat deze data een structuur heeft en geannoteerd is met
meta-informatie. Dit maakt het mogelijk om de betekenis van ieder datapunt, en zijn
relatie met andere datapunten, te bepalen. Bijvoorbeeld, het datapunt ‘555-12345’
is van weinig nut als het niet bekend is dat dit een telefoonnummer representeert en
wie de eigenaar van dit nummer is. Een informatiesysteem beschrijft de betekenissen en structuur van de opgeslagen data met behulp van een zogenaamde ontologie.
Deze ontologie specificeert een aantal typen, ookwel concepten genoemd, en hoe deze
concepten aan elkaar zijn gerelateerd.
Bedrijven staan vaak voor het probleem dat ze informatie willen uitwisselen
tussen verschillende systemen. Veronderstel bijvoorbeeld dat Bedrijf A beslist om
Bedrijf B over te nemen. Om de bedrijfsvoering van Bedrijf B voort te kunnen
zetten moet Bedrijf A alle informatie uit het systeem van Bedrijf B in zijn eigen systeem overzetten. Hier kan het gebeuren dat de data van Bedrijf B met een andere
ontologie is gemodelleerd dan de data van Bedrijf A. De oorzaak hiervan kan zijn
dat de twee bedrijven verschillende eisen hebben voor hun systemen of verschillende
ontwerpprincipes hebben gehanteerd bij het definiëren van de ontologieën. In dit
soort gevallen is het door de incompatibiliteit van de systemen niet zomaar mogelijk
om data tussen de twee systemen uit te wisselen.
Een mogelijke oplossing om het overzetten van data tussen twee systemen, welke
verschillende ontologieën gebruiken, mogelijk te maken is het zogenoemde ontologiemapping proces. Het doel van ontologie-mapping is het identificeren van alle conceptparen welke gebruikt kunnen worden om dezelfde soort data te modelleren. Een
volledige lijst van correspondenties tussen twee ontologieën wordt opgeslagen in een
zogenaamde alignment of mapping. Met behulp van deze mapping is het mogelijk
214
Samenvatting
om data uit één ontologie te herschrijven zodat deze conform is aan de specificaties
van de andere ontologie. Hierdoor wordt het mogelijk om data tussen twee systemen
uit te wisselen, ondanks het feit dat de twee systemen verschillende ontologieën gebruiken. Het maken van zo’n mapping vergt veel werk. Om een mapping te maken
moet een domeinexpert handmatig alle correspondenties definiëren en controleren.
Deze aanpak is niet haalbaar als men een mapping tussen twee grote ontologieën
moet maken, waarbij iedere ontologie duizenden concepten modelleert. Het is dus
nodig om het proces van ontologie-mapping te automatiseren. Een ander applicatiedomein is het zogenaamde Semantic Web. Dit domein stelt de volgende stap
in de evolutie van het world-wide-web voor, waar alle beschikbare informatie door
een machine leesbaar en semantisch gestructureerd is. Deze semantische structuur
is ook gedefinieerd door middels een ontologie, zodanig dat het machines mogelijk is
om semantische informatie uit het web te verzamelen. Om semantische informatie
onafhankelijk te verzamelen, moet het een machine mogelijk zijn om verschillende
ontologieën automatisch te mappen. Met behulp van een mapping is het een machine mogelijk om informatie te verzamelen welke in een verschillende semantische
structuur is gemodelleerd.
Sinds het afgelopen decennium is ontologie-mapping een actief onderzoeksveld.
Er zijn gespecialiseerde mapping-systemen ontwikkeld die gebruik maken van een
combinatie van technieken om de overeenkomsten tussen concepten te bepalen. Met
behulp van deze systemen worden overeenkomende concepten geëxtraheerd, welke
vervolgens de alignment tussen de twee ontologieën vormen. In sommige gevallen is
het mogelijk dat er extra informatie beschikbaar is, welke gebruikt kan worden om
het mapping-proces te verbeteren. Een voorbeeld van dit soort extra informatie is
Wikipedia. Een lexicale bron zoals Wikipedia maakt het mogelijk om definities van
woorden te raadplegen, synonieme woorden te identificeren en informatie van gerelateerde concepten te raadplegen. Een ander voorbeeld van een extra informatiebron
is een partiële mapping. Een partiële mapping is een onvolledige mapping welke
het resultaat is van een eerdere poging om een mapping tussen de ontologieën te
creëren. Deze mapping is onvolledig omdat bijvoorbeeld een domainexpert niet in
staat was deze te voltooien wegens tijdgebrek. Een belangrijke uitdaging in het veld
van ontologie-mapping is dus het creëren van technieken welke van dit soort informatiebronnen gebruik maken om een mapping te genereren. Dit heeft ons naar de
volgende probleemstelling geleid:
Hoe kunnen we ontologie-mapping systemen verbeteren door gebruik te
maken van externe informatiebronnen?
Om deze probleemstelling aan te pakken hebben we vier onderzoeksvragen geformuleerd welke dit onderzoek gestuurd hebben:
1. Hoe kan men nauwkeurig lexicale betekenissen aan ontologieconcepten koppelen?
2. Hoe kan men partiële mappings gebruiken om de overeenkomsten tussen concepten te bepalen?
Samenvatting
215
3. Hoe kan men beoordelen of correspondenties afkomstig ui partiële mappings
betrouwbaar zijn?
4. In hoeverre is het mogelijk om profielovereenkomsten te verbeteren zodat het
mogelijk wordt om mappings tussen ontologieën te genereren welke weinig gezamenlijke termen handhaven?
In hoofdstuk 1 introduceren we het onderzoeksveld van ontologie-mapping. We
introduceren de problemen die kunnen ontstaan als men data tussen informatiesystemen wil uitwisselen. Verder introduceren wij een reeks van reële domeinen waar het
gepresenteerde werk van toepassing is, zoals bijvoorbeeld informatie integratie, webdienst-compositie en agentcommunicatie. Wij presenteren ook een kort overzicht van
de belangrijkste onderzoeksproblemen met betrekking tot ontologie-mapping. In de
laatste sectie van dit hoofdstuk introduceren en bespreken wij de probleemstelling
en de onderzoeksvragen van dit proefschrift.
Hoofdstuk 2 maakt de lezer bekend met belangrijke achtergrondinformatie. Hier
introduceren wij formeel het probleem van ontologie-mapping. Verder detailleren en
illustreren we de meest gebruikte technieken waarmee mappings geëvalueerd kunnen
worden. Tot slot introduceren wij een aantal datasets dat gebruikt kann worden om
een ontologie-mapping systeem te evalueren.
Technieken die beschikbaar zijn voor ontologie-mapping worden in hoofdstuk
3 geı̈ntroduceerd. Hier maken wij de lezer bekend met de opbouw van huidige
mapping-systemen en de meest gebruikte technieken. Wij introduceren hier de drie
kerntaken welke een mapping-systeem moet uitvoeren, namelijk de overeenkomstberekening, overeenkomstcombinatie en correspondentie-extractie. Voor iedere kerntaak geven wij een overzicht van technieken die voor de gegeven taak van toepassing
zijn. Vervolgens geven wij een overzicht van huidige mapping-systemen met een
focus op systemen die gebruik maken van externe informatiebronnen.
In hoofdstuk 4 beantwoorden wij de eerste onderzoeksvraag. Het kernprobleem hier betreft het nauwkeurig koppelen van lexicale definities aan de gemodelleerde concepten uit de ontologie. Dit proces staat bekend onder de naam disambiguatie. Een voorbeeld van dit proces is het vaststellen dat het concept ‘Bank‘
naar het financiële instituut refereert en niet naar het meubelstuk. Technieken
die van lexicale informatiebronnen gebruik maken hebben deze koppelingen nodig
om de overeenkomsten tussen concepten te bepalen met behulp van bepaalde algoritmen. Wij pakken deze onderzoeksvraag aan door het introduceren van een op
Information-Retrieval-gebaseerde techniek waarmee ontologieconcepten aan lexicale
betekenissen gekoppeld kunnen worden. Met behulp van deze techniek zetten wij
een disambiguatie-kader op waarmee lexicale bedoelingen gefilterd worden afhankelijk van hun overeenkomstwaarden en een filterstrategie. Wij evalueren vier verschillende filterstrategieën die een lexicale betekenis filteren als ze de bijbehorende
overeenkomstwaarde onvoldoende vinden. De filterstrategieën worden geëvalueerd
met behulp van drie verschillende lexicale overeenskomstmetrieken. Onze evaluatie
heeft laten zien dat het toepassen van onze disambiguatieaanpak de prestaties van
alle drie overeenkomstmaten heeft verbeterd. Verder hebben wij het effect van het
verzwaren van de termgewichten van de concept-annotaties en betekenis-annotaties
onderzocht. Deze evaluatie heeft laten zien dat het verzwaren van termgewichten
216
Samenvatting
afhankelijk van hun oorsprong in de ontologie of lexicale informatiebron een groter
positief effect heeft op de prestatie dan het toepassen van het veelgebruikte TF-IDF
aanpak, afkomstig uit het veld van Information-Retrieval.
Het onderzoek in hoofdstuk 5 behandelt onderzoeksvraag 2 in relatie tot het gebruiken van partiële mappings. Het kernprobleem hier is dat het mapping-systeem
toegang tot een onvolledige mapping heeft, ook wel een partiëel-mapping genoemd,
en dus de onbekende correspondenties moet bepalen om een volledige mapping te
creëren. Hier is het doel om de individuele correspondenties van de partiële mapping, ook wel ankers genoemd, te gebruiken om de kwaliteit van de berekende
mappings te verbeteren. Om deze vraag te beantwoorden stellen wij een methode voor die is gebaseerd op het vergelijken van concepten door het meten van de
overeenkomsten tussen een concept en de gegeven ankers. Voor ieder concept worden
de overeenkomstwaarden in een zogenoemd ankerprofiel samengevoegd. Twee concepten worden als overeenkomend beschouwd als hun ankerprofielen overeenkomen,
d.w.z. dat zij vergelijkbare overeenkomsten hebben met de ankerconcepten. In onze
evaluatie hebben wij kunnen vaststellen dat onze aanpak in staat is om vergelijkbare
prestaties te leveren aan de top mapping-systemen in het gebied. Onze aanpak is
echter wel afhankelijk van het bestaan van geschikte meta-informatie waarmee concepten met ankers worden vergeleken. Hieruit concluderen wij dat alle soorten van
meta-informatie geraadpleegd moeten worden door een combinatie van overeenkomstmaten toe te passen om zeker te zijn dat deze techniek voor alle soorten problemen
geschikt is. Tot slot voeren wij een systematisch onderzoek uit om vast te stellen
hoe groot de invloed is van de grootte en de correctheid van de partiële mapping op
de kwaliteit van de berekende mapping. Hier stellen wij vast dat zowel de grootte
als ook de correctheid invloed hebben op de mappingkwaliteit. Verder stellen wij
vast dat een vermindering van de correctheid van de partiële mapping een sterkere
invloed heeft dan een vermindering van de grootte. Hieruit concluderen wij dat
mapping-systemen die van partiële mappings gebruik maken maatregelen moeten
nemen om ervoor te zorgen dat partiële mappings uit onbekende bronnen correct
zijn.
Onderzoeksvraag 3 is het hoofdthema van hoofdstuk 6. De kernvraag hier is het
zeker stellen van de correctheid van een gegeven partiële mapping. Sommige technieken genereren een mapping met behulp van een partiële mapping. Een voorbeeld
van zo’n techniek is te zien in hoofdstuk 5. Het is voor deze technieken nodig dat de
gegeven partiële mapping zo min mogelijk fouten bevat zodat deze technieken zoals
gewensd presteren. Om de correctheid van een partiële mapping te evalueren stellen
wij een techniek voor die gebruik maakt van feature-evaluatietechnieken, afkomstig
uit het veld van machine-learning. Om van een feature-evaluatietechniek gebruik
te maken moet men eerst een feature-ruimte definiëren. Een feature-ruimte is een
kernconcept uit de wiskunde dat een ruimte beschrijft die wordt opgespannen door n
verschillende features. Bijvoorbeeld, door gebruik te maken van de features ‘hoogte’,
‘breedte’ en ‘diepte’ kan men een 3-dimensionale feature-ruimte opspannen. Door
de bijbehorende waardes van de verschillende objecten in deze ruimte te plotten
kan men de verhouding van de objecten zien in verband met hun fysieke locatie,
en met behulp van deze data verschillende analyses uit voeren. In het veld van
machine-learning zijn er geen beperkingen wat betreft het soort of het aantal van
Samenvatting
217
de gebruikte features. Het is dus ook mogelijk om features te gebruiken om maten
zoals positie, grootte, leeftijd, kosten, type of duur weer te geven. Een kerntaak in
het veld van machine-learning is classificatie, waar men een categorie aan een object
moet toekennen voor welke de waarde van ieder feature bekend is. Een voorbeeld
van zo’n taak is het bepalen of een persoon een betrouwbare debiteur is, afhankelijk
van zijn inkomen, soort aanstelling, leeftijd en gezinstoestand. Een classificatiesysteem doet dit door eerst een reeks objecten te analyseren van welke de bijbehorende
categorieën al bekend zijn. Feature-evaluatietechnieken helpen de maker van een
classificatiesysteem te bepalen hoe zeer een bepaalde feature van nut is m.b.t. de
classificatietaak door middel van het analyseren van de al geclassificeerde objecten.
Voor onze aanpak benutten wij deze werk-flow om een evaluatiesysteem voor een
reeks ankers te creëren. Wij zetten een feature-ruimte op waar elke feature het
resultaat van een consistentie-evaluatie tussen een specifiek anker en gegeven correspondentie representeert. Met behulp van feature-evaluatietechnieken evalueren wij
de kwaliteit van de features, en dus ook de kwaliteit van de bijbehorende ankers.
Om de consistentie-waarden te berekenen, definiëren wij een maat welke gebruik
maakt van een basis overeenkomstmetriek. Wij evalueren drie soorten maten als basis overeenkomstmetriek: een syntactische, een profiel en een lexicale metriek. Voor
ieder type maat evalueren wij onze aanpak ten opzichte van een basis evaluatie, welke
gemaakt is door het directe toepassen van de maat op de ankers. Onze evaluatie
heeft laten zien dat voor ieder soort maat onze aanpak betere evaluaties produceert
dan de bijbehorende basis evaluatie. Voor de syntactische en lexicale maat hebben
wij significante verbetering vast kunnen stellen.
Onderzoeksvraag 4 is het onderwerp van het onderzoek dat is beschreven in
hoofdstuk 7. Het hoofdthema hier is een specifieke soort van mapping-taken, namelijk
het mappen van ontologieën die verschillende terminologieën gebruiken om dezelfde
concepten te modelleren. Veel technieken vereisen het bestaan van gelijke of soortgelijke termen om te bepalen of twee concepten met elkaar overeen komen of niet.
Deze technieken presteren slecht als de gegeven ontologieën terminologieën gebruiken
die doorgaans verschillend zijn. Bestaande technieken vermijden dit probleem door
middel van het toevoegen van nieuwe terminologie aan de conceptdefinities. Nieuwe
termen worden opgezocht door bijvoorbeeld het raadplegen van bronnen zoals WordNet, Wikipedia of Google. Als dit soort bronnen niet beschikbaar zijn, dan wordt
het aanzienlijk moeilijker om een mapping tussen de twee ontologieën te maken. Wij
onderzoeken een alternatief dat van een gegeven partiële mapping gebruik maakt.
Onze aanpak is gebaseerd op bestaande profiel-overeenkomstmetrieken. Dit soort
maten maakt gebruik van semantische relaties om belangrijke contextinformatie te
verzamelen. Onze uitbreiding van de gebruikte maat maakt het mogelijk om ook
van de semantische relaties in de partiële mapping gebruik te maken. Onze evaluatie
heeft laten zien dat onze aanpak in staat is om met behulp van alleen een partiële
mapping correspondenties te genereren die een vergelijkbare kwaliteit hebben als
bestaande technieken die van lexicale informatiebronnen gebruik maken. Verder
stellen wij vast dat een betere kwaliteit haalbaar is als een techniek gebruik maakt
van zowel lexicale bronnen als ook partiële mappings.
Hoofdstuk 8 geeft de conclusies van dit proefschrift en bespreekt de mogelijkheden voor verder onderzoek. Rekening houdend met de antwoorden op de gestelde on-
218
Samenvatting
derzoeksvragen concluderen wij dat er diverse mogelijkheden bestaan om externe informatiebronnen te gebruiken voor het verbeteren van ontologie-mapping-systemen.
Ten eerste kunnen lexicale informatiebronnen beter benut worden als eerst een disambiguatie methode toegepast wordt. Ten tweede, door het creëren van ankerprofielen is het mogelijk om overeenkomstwaarden tussen concepten te berekenen.
Ten derde, door gebruik te maken van een op feature-selectie-gebaseerde techniek
is het mogelijk om ankers te evalueren en er zo voor te zorgen dat technieken die
van partiële mappings gebruik maken presteren zoals kan woorden verwacht. Ten
vierde, door het uitbreiden van profiel-overeenkomstmetrieken, zodanig dat deze van
gegeven ankers gebruik maken, is het mogelijk om ontologieën te mappen die weinig
terminologie met elkaar gemeen hebben.
In de discussie over verder onderzoek stellen wij verschillende onderwerpen vast
die onderzocht moeten worden om de toepasbaarheid van het gepresenteerde onderzoek te verbeteren. Een onderwerp voor verdergaand onderzoek is dat de robuustheid van de technieken onderzocht moet worden. Bijvoorbeeld, de prestatie van
de disambiguatie-techniek van hoofdstuk 4 hangt af van de aanwezigheid van terminologische informatie om zo de correcte lexicale bedoelingen te identificeren. Bij
ontologieën waar weinig tot helemaal geen terminologische informatie is gemodelleerd zou het mogelijk kunnen zijn dat de prestaties van de aanpak slechter uitvallen dan verwacht. Een mogelijke oplossing zou het combineren van verschillende disambiguatie-technieken kunnen zijn. Afhankelijk van de beschikbare metainformatie zou een beslissysteem kunnen bepalen welke disambiguatie-techniek het
meest geschikt is voor elke ontologie. Een ander gebied voor verder onderzoek zou
het generen van partiële mappings kunnen zijn. Voor problemen waar geen partiële
mappping beschikbaar is, zou het genereren van een betrouwbare partiële mapping
het mogelijk maken om technieken die gebruik maken van deze partiële mappings
toe te passen. Dit zou het dus mogelijk maken om de gepresenteerde technieken op
een grotere reeks problemen toe te passen.
About the Author
Frederik C. (Christiaan) Schadd was born
on June 16th, 1985 in Hamm, Germany. He received his high-school diploma
(Abitur) at the Gymnasium am Turmhof,
Mechernich, Germany in 2004. In the
same year, he started his bachelor studies
in Knowledge Engineering at Maastricht
University. During this study he visited
Baylor University in Waco, Texas, USA to
follow the FastTrac Entrepreneurial Training Program. Frederik received his B.Sc.
degree in Knowledge Engineering at the
end of the 2007-2008 academic year. He
continued his studies at Maastricht University by pursuing a Master’s degree in
Artificial Intelligence in 2008.
During his studies he also attended an Engagement Workshop in Robotics, Animatronics and Artificial Intelligence in Bristol, UK, and worked as a tutor for the
University College Maastricht. In 2010, Frederik completed his Master’s degree with
the distinction cum laude. In the same year, Frederik was employed at Maastricht Instruments as a Software Engineer and Mathematician, where he worked on projects
involving medical data analysis and image reconstruction. At the end of this year,
Frederik started a Ph.D. project at the Department of Knowledge Engineering at
Maastricht University, which resulted in several publications in scientific conferences
or journals and ultimately this dissertation. Next to his scientific duties, Frederik
was also involved in teaching at the Department of Knowledge Engineering and part
of the WebCie commission of the MaasSAC climbing association.