Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

6 Meilicke TP OM'06

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Improving Automatically Created Mappings

using Logical Reasoning

Christian Meilicke1 , Heiner Stuckenschmidt1 , Andrei Tamilin2


1
University of Mannheim, Germany
2
ITC-irst and University of Trento, Italy

Abstract. A lot of attention has been devoted to heuristic methods for discov-
ering semantic mappings between ontologies. Despite impressive improvements,
the mappings created by these automatic matching tools are still far from being
perfect. In particular, they often contain wrong and redundant mapping rules. In
this paper we present an approach for improving such mappings using logical
reasoning in the context of Distributed Description Logics (DDL). Our method is
orthogonal to the matching algorithm used and can therefore be used in combina-
tion with any matching tool. We explain the general idea of our approach infor-
mally using a small example and present the results of experiments conducted on
the OntoFarm Benchmark which is part of the Ontology Alignment Evaluation
challenge.

1 Motivation
The problem of semantic heterogeneity is becoming more and more pressing in many
areas of information technologies. The Semantic Web is only one area where the prob-
lem of semantic heterogeneity has lead to intensive research on methods for semantic
integration. The specific problem of semantic integration on the Semantic Web is the
need to not only integrate data and schema information, but to also provide means to
integrate ontologies, rich semantic models of a particular domain. There are two lines
of work connected to the problem of a semantic integration of ontologies:

– The (semi-) automatic detection of semantic relations between ontologies (e.g., [9,
6, 11, 12, 7]).
– The representation and use of semantic relations for reasoning and query answering
(e.g., [14, 10, 5, 3, 2]).

So far, work on representation of and reasoning with mappings has focussed on


mechanisms for answering queries and using mappings to compute subsumption rela-
tionships between concepts in the mapped ontologies. These methods always assumed
that the mappings used are manually created and of high quality (in particular consis-
tent). In this paper we investigate logical reasoning about mappings that are not assumed
to be perfect. In particular, our methods can be used to check (automatically created)
mappings for formal and conceptual consistency and determine implied mappings that
have not explicitly been represented. We investigate such mappings in the context of
Distributed Description Logics [1, 13], an extension of traditional description logics
with mappings between concepts in different T-boxes. The functionality described in
this paper will become more important in the future because more and more ontologies
are created and need to be linked. For larger ontologies the process of mapping will not
be done completely by hand, but will rely on or will at least be supported by automatic
mapping approaches. We see our work as a contribution to semi-automatic approaches
for creating mappings between ontologies where possible mappings are computed auto-
matically and then corrected manually making use of methods for checking the formal
and conceptual properties of the mappings.

In previous work we have proposed a number of formal properties of mappings in


Distributed Description Logics that we consider useful for judging the quality of a set
of mappings [16]. In this paper, we refine and extend this work in several directions.

Debugging of mappings We propose a process for (semi-)automatically debugging au-


tomatically created mappings making use of some of the properties mentioned above.
In particular we use the notion of mapping consistency to detect problems caused by
the mappings. For each potential problem, we determine the minimal set of mapping
rules responsible for the problem (minimal conflict set). For each conflict set, we try to
identify which mapping rule is incorrect and remove it form the mapping.

Implementation On top of the DRAGO reasoning system [15] we built a prototype of


mapping debugger for computing minimal conflict sets with respect to an inconsistency
caused by a mapping as well as some heuristics for automatic repairing of an incon-
sistent mapping. We further added a minimization functionality for computing minimal
mapping sets from redundant ones.

Experiments We tested the approach using the OntoFarm data set, a set of several rich
OWL ontologies describing the domain of conference management systems [17]. We
used the CtxMatch matching tool to automatically create mappings between each of the
ontologies. We further automatically determined problems (in particular unsatisfiable
concepts) created by the mapping and tried to fix them automatically using the debug-
ging process proposed in this paper. In the concluding step of the experimental study,
we tried to compute for each mapping its logically-equivalent minimal version.

The structure of the paper is as follows. We start with a brief recall of basic def-
initions of Distributed Description Logics and explanations of the reasoning mecha-
nisms. Then we describe the intuitions of our debugging/minimization approaches us-
ing a small example. Finally, we report on some preliminary experimental evaluation of
the techniques proposed in this paper and summarize the results.

2 Distributed Description Logic


Distributed Description Logic framework (DDL) is a formal tool for representing and
reasoning with multiple ontologies pairwise linked by semantic mappings. In this sec-
tion, we briefly recall some key definitions and properties of DDL relying on the origi-
nal studies in [1, 13].
2.1 Syntax and Semantics

Given a set I of indexes, used to enumerate a set of ontologies, a Distributed Descrip-


tion Logics is then a collection {DLi }i∈I of Description Logics. Each ontology i is for-
malized by a T-box Ti of DLi , so that the initial set of ontologies in DDL corresponds
to a family of T-boxes T = {Ti }i∈I . To distinguish the descriptions from various Ti in
the family, DDL utilizes a prefix notation to pin descriptions to ontologies where they
are considered in, e.g., i : X, i : X ⊑ Y . Semantic relations between pairs of ontolo-
gies a represented in DDL by bridge rules. A bridge rule from i to j is an expression of
the following two forms:

i : X −→ j : Y – an into-bridge rule

i : X −→ j : Y – an onto-bridge rule
where X and Y are concepts of ontologies Ti and Tj respectively. The derived bridge

rule i : X −→ j : Y can be defined as the conjunction of corresponding into- and
onto-bridge rule.

Intuitively, the into-bridge rule i : Bachelor −→ j : Student states that, from
the j-th point of view the concept Bachelor in i is more specific than its local concept

Student. Similarly, the onto-bridge rule i : Scientif icEvent −→ j : Conf erence
expresses the more generality relation.
A distributed T-box T = hT , Bi consists of a collection of T-boxes T = {Ti }i∈I
and a collection of bridge rules B = {Bij }i6=j∈I between them.
The semantics of DDL is based on the key assumption that each ontology Ti in
the family is locally interpreted by interpretation Ii on its local interpretation domain
∆Ii . The semantic correspondences between heterogeneous local domains, e.g., the
representations of a registration fee in US Dollars and in Euro, are modeled in DDL by
a domain relation.
A domain relation rij represents a possible way of mapping the elements of ∆Ii
into the domain ∆Ij : rij ⊆ ∆Ii × ∆Ij such that ′ Ij
S rij denotes {d ∈ ∆ | hd, dIii ∈

Ii
rij }; for any subset DS of ∆ , rij (D) denotes d∈D rij (d); and for any R ⊆ ∆ ×
∆Ij rij (R) denotes hd,d′ i∈R rij (d) × rij (d′ ). For instance, if ∆I1 and ∆I2 are the
representations of a registration fee in US Dollars and in Euro, then r12 could be a rate
of exchange function, or some other approximation relation.
A distributed interpretation I = h{Ii }i∈I , {rij }i6=j∈I i of a distributed T-box T =
hT , Bi consists of a family of local interpretations Ii on local interpretation domains
∆Ii , one for each Ti , and a family of domain relations rij between these local domains.
A distributed interpretation I is said to satisfy a distributed T-box T = hT , Bi, written
I |= T, if all T-boxes in T are satisfied

I |= Ti , if Ii |= A ⊑ B for all A ⊑ B ∈ Ti

and all bridge rules in B are satisfied:



I |= i : X −→ j : Y, if rij (X Ii ) ⊆ Y Ij

I |= i : X −→ j : Y, if rij (X Ii ) ⊇ Y Ij
Given a distributed T-box T = hT , Bi, one can perform some basic Distributed DL
inferences. A concept i : C is satisfiable with respect to T if there exist a distributed
interpretation I of T such that C Ii 6= ∅. A concept i : C is subsumed by a concept
i : D with respect to T (T |= i : C ⊑ D) if for every distributed interpretation I of T
we have that C Ii ⊆ DIi .

2.2 DDL Inference Mechanisms

Although both in DL and Distributed DL the fundamental reasoning services lay in


verification of concepts satisfiability/subsumption within a certain ontology, in DDL,
besides the ontology itself, the reasoning also depends on other ontologies that affect
it through semantic mappings. This affection consist in the ability of bridge rules to
propagate the knowledge across ontologies in form of subsumption axioms.
The simplest case illustrating the knowledge propagation in DDL is the following:
⊒ ⊑
i : A ⊑ B, i : A −→ j : G, i : B −→ j : H (1)
j:G⊑H
In languages that support disjunction, the simplest propagation rule can be gener-
alized to the propagation of subsumption between a concept and a disjunction of other
concepts in the following way:

⊒ ⊑
i : A ⊑ B1 ⊔ . . . ⊔ Bn , i : A −→ j : G, i : Bk −→ j : Hk (1 ≤ k ≤ n) (2)
j : G ⊑ H1 ⊔ . . . ⊔ H n

The important property of the described knowledge propagation is that it is direc-


tional, i.e., bridge rules from i to j support knowledge propagation only from i towards
j. It has been shown in [13] that adding the inference pattern (2) to existing DL tableaux
reasoning methods lead to a correct and complete method for reasoning in DDL. This
method has been implemented in the DRAGO DDL reasoner.

3 The Debugging Process

In this section we will explain the general idea of our approach for improving automat-
ically created mappings based on reasoning about mappings in Distributed Description
Logics using a simple example. In particular, we consider two ontologies in the domain
of conference management systems, the same domain we did our experiments in. For
each ontology, i and j, we only consider a single axiom, namely:

i : Author ⊑ P erson and j : P erson ⊑ ¬Authorization

These simple axioms that describe the concept of a person in two different ontolo-
gies – one stating that an author is a special kind of person and the other one stating that
the concepts Person and Authorization (to access submitted papers) are disjoint concept
– are enough to explain the important features of our approach. The approach consists
of the following steps.
3.1 Mapping Creation
In the first step, we use any existing system for matching ontologies to create an initial
set of mapping hypotheses. In particular, we are interested in mappings between class
names, because these are the kinds of mappings that we can reason about using DDL
framework. In order to support automatical repair of inconsistent mappings later on,
the matching algorithm chosen should ideally not only return a set of mappings, but
also a level of confidence in the correctness of a mapping. For the sake of simplicity,
we assume that we use a simple string matching method that compares the overlap
in concept names and computes a similarity value that denotes the relative size of the
common substring1 . Mappings are created based on a threshold for this value that we
assume to be 1/3. Applying this method to the example will result in the following two
mappings with corresponding levels of confidence:

i : P erson −→ j : P erson, 1.00

i : Author −→ j : Authorization, 0.46

We further assume that the mapping method also applies some structural heuristics
to derive additional mappings and propagates the levels of confidence accordingly. For
instance, the fact that i : P erson is a superconcept of i : Author which is assumed to
be equivalent to j : Authorization may be used to derive the following mapping:

i : P erson −→ j : Authorization, 0.46

In the same way, the fact that i : Author is a subconcept of i : P erson and the fact
that i : P erson is assumed to be equivalent to j : P erson may be used to the following
addition mapping:

i : Author −→ j : P erson, 1.00
We can easily see that the process has produced two incorrect mappings, namely
the ones with a confidence of 0.46. It could be argued that it is easy to get rid of these
incorrect mappings by raising the threshold to 0.5 for instance. This however is no
sustainable solution to the problem, because there might be mappings with a level of
confidence below 0.5 that are correct, on the other hand, there might still be incor-
rect mappings with a confidence of more than 0.5. Instead of relying on artificially set
thresholds, we propose to analyze the impact of created mappings on the connected
ontologies and to eliminate mappings that have a malicious influence.

3.2 Diagnosis
The mapping set described in the last step now serves as a basis for analyzing the
effect of mappings and detecting malicious mappings. This process is similar to the
well known concept of model-based diagnosis which has already successfully been ap-
plied to the task of detecting wrong axioms in single ontologies. Similar to existing ap-
proaches for diagnosing ontologies, our starting point are unsatisfiable concepts which
1
of course we use more sophisticated methods in the real experiments
are interpreted as symptoms for which a diagnosis has to be computed. Compared to
the general task of diagnosing ontologies, we are in a lucky position, because we have
to deal with a much smaller set of potential diagnosis. In particular, we claim that the
ontologies connected in the first step do not contain unsatisfiable concepts. If we now
observe unsatisfiable concepts in the target ontology2 and assuming that the ontologies
themselves are correct, we know that they have to be caused by some mappings in the
mapping set.
To illustrate this situation, we can have a look at our example again. Using existing
techniques for reasoning in DDL, we can derive that the concept Authorization is glob-
ally unsatisfiable, i.e., j : AuthorizationI = ∅, because we have Authorzation ⊑
¬P erson and at the same time, we can infer Authorization ⊑ P erson. There are
two reasons for this, namely:

j : AuthorizationIj = rij (i : AuthorIi ) ⊆ rij (i : P ersonIi ) = j : P ersonIj

and

j : AuthorizationIj ⊆ rij (i : P ersonIi ) = j : P ersonIj

Interpreting the inconsistency of the concept j : Authorization as a symptom, we


can now try to identify and repair the cause of this inconsistency. For this purpose, we
compute irreducible conflict set for this symptom. Here an irreducible conflict set is
a set of mappings that makes the concept unsatisfiable and has the additional property
that removing a mapping from the set makes the concept satisfiable again. the arguments
above it is easy to see that he have the following irreducible conflict sets:
≡ ≡
{i : P erson −→ j : P erson, i : Author −→ j : Authorization}

and
≡ ⊒
{i : P erson −→ j : P erson, i : P erson −→ j : Authorization}

In classical diagnosis, all conflict sets3 are computed and the diagnosis is computed
from these conflict sets using the hitting set algorithm. For the case of diagnosing map-
pings this is neither computationally feasible nor does it provide the expected result. In

our example, the hitting set would consist of the mapping i : P erson −→ j : P erson
which, as we sill see later, is the only mapping that actually carries some correct infor-
mation.
Our solution to the problem is to use an iterative approach that computes an often
not minimal hitting set by determining one conflict set at a time and immediately fixing
it in the way described in the next section. In our example, the algorithm will first
detect the second conflict and fix it, afterwards, the method checks whether the concept
j : Authorization is still inconsistent. As this is the case, the second conflict set will
be detected and fixed as well removing the problem.
2
the formal semantics of DDL guarantees that the addition of mappings cannot lead to unsatis-
fiable concepts in the source ontology
3
in classical diagnosis often only minimal conflict sets are considered
3.3 Heuristic Debugging

As mentioned above, the result of the diagnosis step is an irreducible conflict sets, in
particular a set of mappings that make a concept unsatisfiable and with the additional
property that removing one mapping from this set solves the problem in the sense that
the concept becomes satisfiable. The underlying idea of our approach is now that un-
satisfiable concepts are the result of wrong mappings. This means that each irreducible
conflict set contains at least one mapping rule that does state a correct semantic relation
between concepts and therefore should not be in the set of mappings. The goal of the
debugging step is now to identify this malicious mapping and remove it from the over-
all mapping set. If we chose the right mapping for removal the quality of the overall
mapping set should be improved, because a wrong mapping has been removed. In the
case of our example, the first irreducible conflict set that will be considered consists of
the following two mappings one of which we have to remove:

i : P erson −→ j : P erson, 1.00

i : Author −→ j : Authorization, 0.46

There are different ways now, in which a decision about the mapping to remove
could be made. The easiest way is to use an interactive approach where the conflict
sets are presented to a human user who decides which mapping should be removed.

In our case, the user will easily be able to decide that the mapping i : Author −→
j : Authorization is not correct and should be removed. In the second iteration, the
following two mappings will be in the irreducible conflict set:

i : P erson −→ j : P erson, 1.00

i : P erson −→ j : Authorization, 0.46

For this set the user will be able to see immediately that the second mapping should
be removed, because it is not correct. This approach sound trivial, but in the presence of
large mapping sets, providing the user with feedback about potential problems in terms
of small conflict sets is of great help and often reveals problems that are hard to see
when looking at the complete mapping set.
We can also try to further automate the debugging process by letting the system
decide, which mapping rule to eliminate. In cases where the matching system already
provides a measure of confidence, this is again quite simple, as we can simply remove
the mapping rule with the lowest degree of confidence. In our case this is again the rule

i : Author −→ j : Authorization and removing it will lead to a better mapping set.
It is not always possible, however, to rely on the confidence provided by the matching
system, either because the system simply does not provide any or because the levels
of confidence provided are not informative. In our experiments, we often had the situ-
ation where all mapping even though they were conflicting had a confidence of 100%
attached. In this case, we have to think of a new way of ranking mappings. An approach
that we used in our experiments that turned out to work quite well is to compute the
semantic distance of the concept names involved using WordNet synsets. For the ex-
ample above it is clear that this heuristic will also lead to an exclusion of the second
rule, because the class names in the first rule are equivalent and therefore have the least
semantic distance possible. In cases where no distinction can be made using this heuris-
tic, we have to switch back to the interactive mode and ask the user which mapping to
remove. In any cases, the debugging step leaves us with a single mapping that does
not create any inconsistencies. In order to get a complete set of correct mappings, we
can now infer all additional mappings that follow from this one which leads us to the
corrected final set of mappings in our case this final set if the following.

i : P erson −→ j : P erson, 1.00

i : Author −→ j : P erson, 1.00
In summary, the process above is a way to improve the quality of automatically gen-
erated mapping sets by means of intelligent post-processing. Using formal properties
of mappings and logical reasoning we are able to detect wrong mappings by analyzing
their impact and tracking unwanted effects back to the mapping rules that caused them.
In this our method is not yet another ontology matching method, but it is actually or-
thogonal to existing developments in the area of ontology matching as it can be applied
to any set of mappings. The approach can be extended in several directions. First of all
we can use symptoms other than concept satisfiability as a starting point for diagnosis.
Further, we can use the method on joint sets of competing mappings created by different
matching algorithms. This will help us to get a better coverage of the actual semantic
relations and the trust in the quality of the different matching algorithms provides us
with an additional criterion for selecting mappings to be discarded.

3.4 Minimization
A further improvement of the debugged mapping can be achieved by removing re-
dundant mappings - mappings that logically follow from other mappings. In [16] we
defined the notion of minimality of a mapping that we use in this context to remove re-
dundant mappings. In the example for instance, the two mappings derived using struc-
tural heuristics do not really add new information to the system, because they can be
derived from the two equivalence mappings that have been created first. In particular

i : P erson −→ j : Authorization, is redundant information, because:

i : AuthorIi ⊆ i : P ersonIi (3)


=⇒ rij (AuthorIi ) ⊆ rij (P ersonIi ) (4)
j
rij (P ersonIi ) = j : P ersonI (5)
=⇒ rij (Author) ⊆ j : P ersonIj (6)
This means that for reasoning with automatically created mappings, we only have
to take into account the equivalence mapping between the person concept in the two
ontologies, because it is the basis for inferring the other one. For this reason, we re-
move all mappings that can be shown to be redundant in the sense that they can be
derived from using other mappings from the set of mappings and only continue with
the resulting minimal mapping set that still carries all the semantics of the complete set.
4 Experiments
In this section we report on some preliminary experimental evaluation of the mapping
debugging/minimization techniques presented in the preceding sections. All the exper-
iments have been conducted on the prototype of the debugger/minimizer implemented
on top of the DRAGO DDL reasoner [15].

4.1 Experimental Setting


To perform experiments, we used a set of ontologies developed in the OntoFarm project
[17] which are used as a part of Benchmark in Ontology Alignment Evaluation chal-
lenge.4 In particular, we selected several ontologies modeling the domain of conference
organization:
Ontology Description Logics Expressivity Number of classes Number of properties
CMT ALCIF(D) 30 59
CONFTOOL SIF(D) 39 36
CRS ALCIF(D) 14 17
EKAW SHIN 73 33
PCS ALCIF(D) 24 38
SIGKDD ALCI(D) 51 28

Given this ontology test set, we apply the following experimental scenario. Using
the CtxMatch matching tool [4], we automatically compute mappings between pairs of
ontologies in the test set. Among the created mappings, we further identify those ones
which are capable of producing unsatisfiable classes and therefore need to be debugged
first. In the process of debugging, malicious bridge rules in mappings are automatically
diagnosed and removed in accordance with the heuristic debugging discussed in Sec-
tion 3. In the concluding step of the experimental study, we apply the minimization
algorithm to compute for each mapping a logically-equivalent minimal set of bridge
rules. Note that for those mappings which demand the debugging first the minimization
is applied to their repaired descendants.

4.2 Results
The results of applying the heuristic debugging and minimization techniques to the
automatically generated mappings are summarized in Table 1 and Table 2. More infor-
mation about the test data and results can be obtained visiting the applications section
of the DRAGO reasoner web page.5
During the debugging process we performed the following measurements: the ini-
tial amount of bridge rules in the mapping to be debugged, number of classes which
become unsatisfiable due to the mapping, and finally the sets of bridge rules which are
diagnosed as malicious and are automatically removed by the debugging algorithm. Af-
ter the removal of malicious bridge rules, a mapping becomes repaired in a sense that it
is not capable of producing unsatisfiability anymore. As shown in Table 1, the results of
4
http://nb.vse.cz/∼svabo/oaei2006/
5
http://sra.itc.it/projects/drago/applications.html
Removed bridge

Set of removed
classes count
Unsatisfiable
Bridge rules

bridge rules
rules count
Mapping

count

CMT-CONFTOOL 48 3 3 CM T : Conf erence −→ CON F T OOL : Organization

CM T : P erson −→ CON F T OOL : P oster

CM T : P rogramCommitteeChair −→ CON F T OOL : Event

CMT-CRS 53 1 1 CM T : Document −→ CRS : program

CMT-EKAW 116 4 5 CM T : P erson −→ F lyer

CM T : P erson −→ M ulti − author V olume

CM T : P erson −→ P roceedings

CM T : P rogramCommitteeChair −→ Social Event

CM T : P erson −→ Conf erence P roceedings

CONFTOOL-CRS 80 10 15 CON F T OOL : U niversity −→ CRS : event

CON F T OOL : Social event −→ CRS : program

CON F T OOL : Author −→ CRS : event

CON F T OOL : P erson −→ CRS : event

CON F T OOL : P articipant −→ CRS : event

CON F T OOL : Event −→ CRS : participant
...

CRS-CMT 53 2 3 CRS : document −→ CM T : Acceptance

CRS : document −→ CM T : P rogramCommitteeChair

CRS : program −→ CM T : P rogramCommitteeChair

CRS-CONFTOOL 80 27 30 CRS : conf erence −→ CON F T OOL : Organization

CRS : person −→ CON F T OOL : Event

CRS : person −→ CON F T OOL : P oster

CRS : document −→ CON F T OOL : Event

CRS : author −→ CON F T OOL : Event

CRS : participant −→ CON F T OOL : Event

CRS : event −→ CON F T OOL : P erson
...

PCS-CONFTOOL 45 5 5 P CS : Conf erence −→ CON F T OOL : Organization

P CS : Report −→ CON F T OOL : Event

P CS : Report −→ CON F T OOL : Organization

P CS : P ERSON −→ CON F T OOL : P oster

P CS : Accepted paper −→ CON F T OOL : Event

PCS-EKAW 120 4 5 P CS : P ERSON −→ EKAW : F lyer

P CS : P ERSON −→ EKAW : M ulti − author V olume

P CS : P ERSON −→ EKAW : P roceedings

P CS : W eb site −→ EKAW : Event

P CS : P ERSON −→ EKAW : Conf erence P roceedings

SIGKDD-CMT 60 1 1 SIGKDD : P rogram Committee −→ CM T : P rogramCommitteeChair

SIGKDD-CONFTOOL 72 3 3 SIGKDD : Conf erence −→ CON F T OOL : Organization

SIGKDD : P erson −→ CON F T OOL : P oster

SIGKDD : Deadline Author notif ication −→ CON F T OOL : P erson

SIGKDD-CRS 57 1 1 SIGKDD : Document −→ CRS : program

SIGKDD-EKAW 127 4 5 SIGKDD : P erson −→ EKAW : F lyer

SIGKDD : P erson −→ EKAW : M ulti − author V olume

SIGKDD : P erson −→ EKAW : P roceedings

SIGKDD : Deadline Author notif ication −→ EKAW : P erson

SIGKDD : P erson −→ EKAW : Conf erence P roceedings

Table 1. Debugging results.


Entailed bridge

Entailed bridge
reduction rate

reduction rate
Bridge rules

Bridge rules

Bridge rules

Bridge rules
rules count

rules count
Mapping

Mapping
count

count
CMT-CONFTOOL∗ 45 34 76% EKAW-CMT 115 96 83%
CMT-CRS∗ 52 38 73% EKAW-SIGKDD 127 95 75%
CMT-SIGKDD 59 45 76% PCS-CONFTOOL∗ 40 25 63%
CMT-EKAW∗ 111 94 85% PCS-CRS 38 21 55%
CONFTOOL-CMT 48 34 71% PCS-SIGKDD 56 36 64%
CONFTOOL-CRS∗ 65 40 62% PCS-CMT 73 58 79%
CONFTOOL-SIGKDD 75 43 57% PCS-EKAW∗ 115 96 83%
CONFTOOL-PCS 45 27 60% SIGKDD-CMT∗ 59 45 76%
CRS-CMT∗ 50 34 68% SIGKDD-CONFTOOL∗ 69 41 59%
CRS-CONFTOOL∗ 50 37 74% SIGKDD-CRS∗ 56 34 61%
CRS-SIGKDD 57 34 60% SIGKDD-PCS 56 36 64%
CRS-PCS 38 21 55% SIGKDD-EKAW∗ 122 94 77%

Table 2. Minimization results (starred mappings were first repaired applying the debugging).

applying the heuristic debugging approach proposed in Section 3 are quite reassuring –
all of the mappings automatically removed by our method are actually incorrect ones.
To estimate minimization rate we measured the initial number of bridge rules and
the amount of logically entailed bridge rules discovered by applying the minimization
technique. As summarized in Table 2, the amount of the entailed bridge rules in a certain
automatically generated mapping varies from 50 to 80% to the initial number of bridge
rules in this mapping.

5 Discussion
We have presented a method for automatically improving the result of heuristic match-
ing systems using logical reasoning. The basic idea is similar to existing work on de-
bugging ontologies and uses some non-standard inference methods for reasoning about
mappings introduced in previous work. The method feeds on the fact that most exist-
ing matching algorithms ignore the logical implications of new mappings. This gap is
filled by our method that detects malicious impacts of generated mappings and traces
them back to their source. As we have shown in the experiments, in almost all cases
(in fact in all cases observed in the experiment) the unwanted effects were caused by
wrong mappings and we were able to remove them automatically thus improving the
correctness of the generated mapping. Actually, the idea of using logical reasoning in
the matching process is not new and has been proposed by others (e.g., [7, 8]), the way
it is used in our work, however, is unique, as it is the only approach that takes the effects
of mappings into account. We believe that this additional step can significantly improve
the quality of matching methods and should be integrated in existing matching algo-
rithms as far as they are concerned with expressive ontologies that support consistency
checking. In fact, the expressiveness of the language used to encode the ontologies to
be matched seems to be the only limitation of our approach which can only be applied
if the language supports consistency checking. In our experiments, we have seen that
we can improve the correctness of matching results by removing wrong mappings. So
far, we did not quantify this improvement, this has to be done in future work.
Acknowledgements
This work was partially supported by the German Science Foundation in the Emmy-
Noether Program and by the European Union under grant FP6-507482 (KnowledgeWeb)
as part of the T-Rex exchange program.

References
1. A. Borgida and L. Serafini. Distributed description logics: Assimilating information from
peer sources. Journal of Data Semantics, 1:153–184, 2003.
2. P. Bouquet, J. Euzenat, E. Franconi, L. Serafini, G. Stamou, and S. Tessaris. Specification of
a common framework for characterizing alignment. Deliver. 2.2.4, KnowledgeWeb, 2004.
3. P. Bouquet, F. Giunchiglia, F. van Harmelen, L. Serafini, and H. Stuckenschmidt. C-OWL:
Contextualizing ontologies. In Proceedings of the 2nd International Semantic Web Confer-
ence (ISWC-03), volume 2870 of LNCS, pages 164–179. Springer, 2003.
4. P. Bouquet, L. Serafini, and S. Zanobini. Semantic coordination: a new approach and an
application. In Proceedings of the Second Internatinal Semantic Web Conference, volume
2870 of Lecture Notes in Computer Science, pages 130–145. Springer Verlag, 2003.
5. D. Calvanese, G. De Giacomo, and M. Lenzerini. A framework for ontology integration. In
Proceedings of the Semantic Web Working Symposium, pages 303–316, Stanford, CA, 2001.
6. M. Ehrig and S. Staab. QOM - Quick Ontology Mapping. In Proceedings of the 3rd Inter-
national Semantic Web Conference (ISWC-04), 2004.
7. F. Giunchiglia, P. Shvaiko, and M. Yatskevich. S-match: an algorithm and an implementation
of semantic matching. In Proceedings of the European Semantic Web Conference (ESWS-
04), pages 61–75, 2004.
8. F. Giunchiglia, M. Yatskevich, and E. Giunchiglia. Efficient semantic matching. In Proceed-
ings of the European Semantic Web Conference (ESWS-05), pages 272–289, 2005.
9. E. Hovy. Combining and standardizing largescale, practical ontologies for machine trans-
lation and other uses. In Proceedings of the 1st International Conference on Language
Resources and Evaluation (LREC), pages 535–542, 1998.
10. J. Madhavan, P. A. Bernstein, P. Domingos, and A. Halevy. Representing and reasoning
about mappings between domain models. In Proceedings of the 18th National Conference
on Artificial Intelligence (AAAI-02), 2002.
11. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching
algorithm and its application to schema matching. In Proceedings of the 18th International
Conference on Data Engineering (ICDE-02). IEEE Computing Society, 2002.
12. N. F. Noy and M. A. Musen. The PROMPT suite: Interactive tools for ontology merging and
mapping. International Journal of Human-Computer Studies, 59(6):983–1024, 2003.
13. L. Serafini, A. Borgida, and A. Tamilin. Aspects of distributed and modular ontology rea-
soning. In Proceedings of the 19th International Joint Conference on Artificial Intelligence
(IJCAI-05), 2005.
14. L. Serafini, H. Stuckenschmidt, and H. Wache. A formal investigation of mapping languages
for terminological knowledge. In Proceedings of the 19th International Joint Conference on
Artificial Intelligence (IJCAI-05), 2005.
15. L. Serafini and A. Tamilin. DRAGO: Distributed reasoning architecture for the semantic
web. In Proceedings of the 2nd European Semantic Web Conference (ESWC-05), 2005.
16. H. Stuckenschmidt, H. Wache, and L. Serafini. Reasoning about ontology mappings. In
Proceedings of the ECAI-06 Workshop on Contextual Representation and Reasoning, 2006.
17. O. Svab, V. Svatek, P. Berka, D. Rak, and P. Tomasek. Ontofarm: Towards an experimental
collection of parallel ontologies. In Poster Proceedings of the International Semantic Web
Conference 2005 (ISWC-05), 2005.

You might also like