Journal of Machine Learning Research 4 (2003) 493-525
Submitted 5/01; Published 8/03
Learning Semantic Lexicons from a Part-of-Speech and Semantically
Tagged Corpus Using Inductive Logic Programming
Vincent Claveau
Pascale Sébillot
V INCENT.C LAVEAU @ IRISA . FR
PASCALE .S EBILLOT @ IRISA . FR
IRISA
Campus de Beaulieu
35042 Rennes cedex, France
Cécile Fabre
C ÉCILE .FABRE @ UNIV- TLSE 2. FR
ERSS
University of Toulouse II
5 allées A. Machado
31058 Toulouse cedex, France
Pierrette Bouillon
P IERRETTE .B OUILLON @ ISSCO . UNIGE . CH
TIM/ISSCO - ETI
University of Geneva
40 Bvd du Pont-d’Arve
CH-1205 Geneva, Switzerland
Editors: James Cussens and Alan M. Frisch
Abstract
This paper describes an inductive logic programming learning method designed to acquire from a
corpus specific Noun-Verb (N-V) pairs—relevant in information retrieval applications to perform
index expansion—in order to build up semantic lexicons based on Pustejovsky’s generative lexicon
(GL) principles (Pustejovsky, 1995). In one of the components of this lexical model, called the
qualia structure, words are described in terms of semantic roles. For example, the telic role indicates the purpose or function of an item (cut for knife ), the agentive role its creation mode (build
for house ), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding
relational information. The learning method enables us to automatically extract, from a morphosyntactically and semantically tagged corpus, N-V pairs whose elements are linked by one of the
semantic relations defined in the qualia structure in GL. It also infers rules explaining what in the
surrounding context distinguishes such pairs from others also found in sentences of the corpus but
which are not relevant. Stress is put here on the learning efficiency that is required to be able to
deal with all the available contextual information, and to produce linguistically meaningful rules.
Keywords: corpus-based acquisition, lexicon learning, generative lexicon, inductive logic programming, subsumption under object identity, private properties
1. Introduction
The aim of information retrieval (IR) is to develop systems able to provide a user who questions
a document database with the most relevant texts. In order to achieve this goal, a representation
of the contents of the documents and/or the query is needed, and one commonly used technique
is to associate those elements with a collection of some of the words that they contain, called inc 2003 Vincent Claveau, Pascale Sébillot, Cécile Fabre and Pierrette Bouillon.
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
dex terms. For example, the most frequent (simple or compound) common nouns (N), verbs (V)
and/or adjectives (A) can be chosen as indexing terms. See Salton (1989), Spärck Jones (1999) and
Strzalkowski (1995) for other possibilities. The solutions proposed to the user are the texts whose
indexes better match the query index. The quality of IR systems therefore depends highly on the
indexing language that has been chosen. Their performance can be improved by offering more extended possibilities of matching between indexes. This can be achieved through index expansion,
that is, the extension of index words with other words that are close to them in order to get more
matching chances. Morpho-syntactic expansion is quite usual: for example, the same index words
in plural and singular forms can be matched. Systems with linguistic knowledge databases at their
disposal can also deal with one type of semantic similarity, usually limited to specific intra-category
reformulations (especially N-to-N ones), following synonymy or hyperonymy links: for example,
the index word car can be expanded into vehicle.
Here we deal with a new kind of expansion that has been proven to be particularly useful
(Grefenstette, 1997; Fabre and Sébillot, 1999) for document database questioning. It concerns N-V
links and aims at allowing matching between nominal and verbal formulations that are semantically
close. Our objective is to permit a matching, for example, between a query index disk store and
the text formulation to sell disks, related by the semantic affinity between an entity (store) and its
typical function (sell). N-V index expansion however has to be controlled in order to ensure that the
same concept is involved in the two formulations. We have chosen Pustejovsky’s generative lexicon
(GL) framework (Pustejovsky, 1995; Bouillon and Busa, 2001) to define what a relevant N-V link
is, that is, an N-V pair in which the N and the V are related by a semantic link that is prominent
enough to be used to expand index terms.
In the GL formalism, lexical entries consist of structured sets of predicates that define a word.
In one of the components of this lexical model, called the qualia structure, words are described in
terms of semantic roles. The telic role indicates the purpose or function of an item (for example, cut
for knife ), the agentive role its creation mode (build for house ), the constitutive role its constitutive
parts (handle for handcup ) and the formal role its semantic category (contain (information) for
book ). The qualia structure of a noun is mainly made up of verbal associations, encoding relational
information. Such N-V links are especially relevant for index expansion in IR systems (Fabre and
Sébillot, 1999; Bouillon et al., 2000b). In what follows, we will thus consider as a relevant N-V pair
a pair composed of an N and a V related by one of the four semantic relations defined in the qualia
structure in GL.
However, GL is currently no more than a formalism; no generative lexicons exist that are precise
enough for every domain and application (for example IR), and the manual construction cost of a
lexicon based on GL principles is prohibitive. Moreover, the real N-V links that are the keypoint of
the GL formalism vary from one corpus to another and cannot therefore be defined a priori. A way
of building such lexicons—that is, such N-V pairs in which V plays one of the qualia roles of N—is
required. The aim of this paper is to present a machine learning method, developed in the inductive
logic programming framework, that enables us to automatically extract from a corpus N-V pairs
whose elements are linked by one of the semantic relations defined in the GL qualia structure (called
qualia pairs hereafter), and to distinguish them, in terms of surrounding categorial (Part-of-Speech,
POS) and semantic context, from N-V pairs also found in sentences of the corpus but not relevant.
Our method must respect two kinds of properties: firstly it must be robust, that is, it must infer
rules explaining the concept of qualia pair that can be used on a corpus to automatically acquire GL
semantic lexicons. Secondly it has to be efficient in producing generalizations from a large amount
494
L EARNING S EMANTIC L EXICONS U SING ILP
of possible contextual information found in very large corpora. This work has also a linguistic
motivation: linguists do not currently know all the patterns that are likely to convey qualia relations
in texts and cannot therefore verbalize rules that describe them; the generalizations inferred by our
learning method have thus a linguistic interest. The paper will be divided into four parts. Section 2
briefly presents a little more information about GL and motivates using N-V index expansion based
on this formalism in information retrieval applications. Section 3 describes the corpus that we have
used in order to build and test our learning method, and the POS and semantic tagging that we
have associated with its words to be able to characterize the context of N-V qualia pairs. Section
4 explains the machine learning method that we have developed and in particular raises questions
of expressiveness and efficiency. Section 5 is dedicated to its theoretical and empirical validation,
when applied to our technical corpus, and ends with a discussion about the linguistic relevance of
the generalized clauses that we have learnt in order to explain the concept of qualia pairs.
2. The Generative Lexicon and Information Retrieval
In this section, we first describe the structure of a lexical entry in the GL formalism. We then argue
for the use of N-V index expansion based on GL qualia structure in information retrieval.
2.1 Lexical Entries in the Generative Lexicon
As mentioned above, lexical entries in GL consist of structured sets of typed predicates that define
a word. Lexical representations can thus be considered as reserves of types on which different
interpretative strategies operate; these representations are responsible for word meaning in context.
This generative theory of the lexicon includes three levels of representation for a lexical entry:
the argument structure (argstr), the event structure (eventstr), and the qualia structure (qs) as
illustrated in Figure 1 for word W.
W
ARGSTR =
EVENTSTR =
QS =
ARG1 = ...
D-ARG1 = ...
E1 = ...
E2 = ...
RESTR = temporal relation between events
HEAD = relation of prominence
W-lcp
FORMAL = ...
CONST = ...
TELIC = ...
AGENTIVE = ...
Figure 1: Lexical entry in GL
All the syntactic categories receive the same levels of description. Argument and event structures contain the arguments and the events that occur in the definitions of the words. These elements
can be either necessarily expressed syntactically or not—in this last case, they are called default arguments (D - ARG) or default events (D - E). The qualia structure links these arguments and events
and defines the way they take part in the semantics of the word.
495
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
In the qualia structure, the four semantic roles correspond to interpreted features that form the
basic vocabulary for the lexical description of a word, and determine the structure of the information
associated with it (that is, its lexical conceptual paradigm (lcp )). Their meanings have already
been given in Section 1. Figure 2 presents the lexical representation of book as mentioned by
Pustejovsky (1995), in which the item both appears as a physical object, and as an object that
contains information (denoted by info.physobj-lcp ).
book
ARGSTR =
EVENTSTR =
QS =
ARG1 = y : info
ARG2 = x : physobj
D-E1 = e1
D-E2 = e2
info.physobj-lcp
FORMAL = contain(x, y)
CONST = part-of(x.y, z : cover, pages, ...)
TELIC = read(e1, w, x.y)
AGENTIVE = write(e2 , v, x.y)
Figure 2: Lexical representation of book
This representation can be interpreted as: λx.y[book(x : physobj.y : info) ∧ contain(x, y) ∧
λwλe1 [read(e1 , w, x.y)] ∧ ∃e2 ∃v[write(e2 , v, x.y)]].
A network of relations is thus defined for each noun, for example book-write, book-read, bookcontain for book. These relations are not empirically defined but are linguistically motivated: they
are the relations that are necessary to explain the semantic behaviour of the noun. These are the
kinds of relations we want to use in information retrieval (IR) applications to expand index terms
and deal with intercategorial semantic paraphrases of users’ requests.
2.2 N-V Qualia Relations for Information Retrieval
Arguments for using GL theory to define N-V pairs relevant for index reformulation have already
been reported by Bouillon et al. (2000b). We only point out here the main reasons for that option.
Many authors agree on the fact that index reformulation must not be limited to N-N relations.
For example, Grefenstette (1997) suggests the importance of syntagmatic N-V links to explicit
and disambiguate nouns contained in short requests in an IR application. One way to semantically
characterize research is to extract verbs that co-occur with it to know what research can do (research
shows, research reveals, etc.), or what is done for research (do research, support research, etc.). Our
work within GL framework is a way to systemize such a proposition.
From the theoretical point of view, GL is a theory of words in context: it defines under-specified
lexical representations that acquire their specifications within corpora. For example (see Figure
2), book in a given corpus can receive the agentive predicate publish, the telic predicate teach,
etc. Those representations can be considered as a way to structure information in a corpus and, in
that sense, the relations that are defined in GL are privileged information for IR. Moreover, in this
perspective, GL has been preferred to existing lexical resources such as WordNet (Fellbaum, 1998)
for two main reasons: the lexical relations we want to exhibit—namely N-V links—are unavailable
496
L EARNING S EMANTIC L EXICONS U SING ILP
in WordNet, which focuses on paradigmatic lexical relations; WordNet is a domain-independent,
static resource, which, as such, cannot be used to describe lexical associations in specific texts,
considering the great variability of semantic associations from one domain to another (Voorhees,
1994; Smeaton, 1999).
Concerning practical issues, the validity of using GL theory to define N-V couples relevant
for reformulation has already been partly tested. First, Fabre (1996) has shown that N-V qualia
pairs can be used to calculate the semantic representations of binominal sequences (NN compounds
in English and N preposition N sequences in French), and thus offer extended possibilities for
reformulations of compound index terms. Fabre and Sébillot (1999) have then used those relations
in an experiment conducted on a French telematic service system. They have shown that the context
of binominal sequences can be used to disambiguate nouns, provided that syntagmatic links exist
or are developed within the thesaurus of the retrieval system, and that these syntagmatic relations
can be used to discover semantic paraphrase links between a user’s question and the texts of an
indexed database. A second test has also been carried out in the documentation service of a Belgian
bank (Vandenbroucke, 2000). Its documentalists traditionally use boolean questions with nominal
terms. They were asked to evaluate the relevance of proposed qualia verbs associated with nouns of
their questions to specify their requests or to access documents they had not thought of. Those first
results were quite promising.
However, in order to be able to make the most of N-V qualia pairs and deeply evaluate their
relevance for information retrieval applications, a method to automatically acquire these pairs from
a corpus is necessary. Our goal is thus to learn GL-based semantic lexicons from corpora (more
precisely N-V qualia pairs). Before describing the learning method we have developed to achieve
this goal, we first present the corpus we have used, and the information we have associated with its
words to be able to characterize the context of N-V qualia pairs.
3. The MATRA-CCR Corpus and its Tagging
In this section, the technical corpus that we have used to learn semantic lexicons based on GL
principles is described. This corpus has first undergone part-of-speech (POS) tagging which aims
at providing each word of the text with an unambiguous categorial tag (singular common noun,
infinitive verb, etc.); categorial tagging is presented in Section 3.2. Secondly, in order to permit
learning of what distinguishes qualia pairs from non-qualia ones that appear in exactly the same
syntactic structures, semantic tags have been added. For example in structures like Verbinf det N1
prep N2, the pair N2 Verbinf is sometimes non-qualia (for example (corrosion, vérifier ) (corrosion,
check) in vérifier l’absence de corrosion (check the absence of corrosion)) but sometimes qualia
(for example (réservoir, vider ) (tank, empty) in vider le fond du réservoir (empty the bottom of
the tank)) when N1 indicates for example a part of an object. A simple POS-tagging of those two
sentences does not display any difference between them. Section 3.3 is dedicated to the description
of the semantic tagging of the corpus, that is to the addition of tags unambiguously describing the
semantic class of each of its words.
3.1 The MATRA-CCR Corpus
The French corpus used in this project is a 700 KBytes handbook of helicopter maintenance, provided by MATRA - CCR Aérospatiale, which contains more than 104,000 word occurrences. The
MATRA - CCR corpus has some special characteristics that are especially well suited for our task: it
497
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
is coherent, that is, its vocabulary and syntactic structures are homogeneous; it contains many concrete terms (screw, door, etc.) that are frequently used in sentences together with verbs indicating
their telic (screws must be tightened, etc.) or agentive roles (execute a setting, etc.).
3.2 Part-of-Speech Tagging
This corpus has been POS-tagged with the help of annotation tools developed in the MULTEXT
project (Ide and Véronis, 1994; Armstrong, 1996); sentences and words are first segmented with
MT S EG ; words are analyzed and lemmatized with MMORPH (Petitpierre and Russell, 1998; Bouillon et al., 1998), and finally disambiguated by the TATOO tool, a hidden Markov model tagger (Armstrong et al., 1995). Each word therefore only receives one POS tag which indicates its morphosyntactic category (and its gender, number, etc.) with high precision: less than 2% of errors have
been detected when compared to a manually tagged 4,000-word test-sample of the corpus. Those
POS tags are one of the elements used by our learning method to characterize the context in which
qualia pairs can be found.
3.3 Semantic Tagging
The semantic tagging is done on the already POS-tagged MATRA - CCR corpus; we therefore benefit
from the disambiguation of polyfunctional words (that is, words that have different syntactic categories, such as règle in French which can be the indicative of the verb to regulate and the common
noun rule ) (Wilks and Stevenson, 1996). We have first built the semantic classification which we
used as tagset for the semantic tagging. This tagging process is then carried out with the help of
the same probabilistic tagger as for POS-tagging and, as shown here, the majority of the semantic
ambiguities are solved.
More precisely, a lexicon containing every word (the lexicon entries) of the MATRA - CCR corpus
is created; it associates with each word all its possible semantic tags. The most relevant tagset for
each category must be chosen. We only describe here the semantic classification of the main POS
categories of the MATRA - CCR corpus. We also give the results of its semantic tagging using the
hidden Markov model tagger. A more detailed presentation can be found in (Bouillon et al., 2001).
WordNet’s (Fellbaum, 1998) most generic classes have initially been selected to systematically
classify the nouns. However, irrelevant classes (for our corpus) have been withdrawn and, for large
classes, a more precise granularity has been chosen (for example the class artefact has been split into
more precise categories). This has led to 33 classes, hierarchically organized as shown in Figure 3
(WordNet classes not used for tagging are in italics and semantic tags are bracketed). Only 8.7%
of the entries of the common noun lexicon are ambiguous. Most of those ambiguities correspond
to complementary polysemy (for example, enfoncement can both indicate a process (pushing in) or
its result (hollow); it is therefore classified as both pro and sta).
Concerning verbs, WordNet classification was judged too specific. A minimal partition into 7
classes has been selected. Only 7 verbs (among about 570) are ambiguous. Adjectives and prepositions, etc. have also been classified and have led to the creation of lexicons in which very few
entries are ambiguous.
Those various lexicons are then used to carry out the semantic tagging of the POS-tagged
MATRA - CCR corpus by projecting the semantic tags on the corresponding words. Ambiguities are
solved with the help of the probabilistic tagger, following principles described in (Bouillon et al.,
2000a). A 6,000-word sample of the corpus has been chosen to evaluate the semantic tagging pre498
L EARNING S EMANTIC L EXICONS U SING ILP
form (frm)
attribute (atr)
property (pty)
time unit (tme)
unit (unt)
abstraction
measure (mea)
definite quantity (qud)
relation (rel)
social relation
communication
(com)
natural event (hap)
event
human activity (acy)
act (act)
phenomenon (phm)
noun
group (grp)
process (pro)
social group (grs)
psychological feature (psy)
state (sta)
body part (prt)
entity (ent)
causal agent (agt)
human (hum)
artefact (art)
object (pho)
instrument
(ins)
part (por)
substance (sub)
chemical compound
(chm)
stuff (stu)
location (loc)
point (pnt)
position (pos)
Figure 3: Hierarchy of classes for the semantic tagging of common nouns
499
container
(cnt)
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
cision. It contains 7.78% of ambiguous words; 85% of them have been correctly disambiguated
(1.18% of semantic tagging errors).
All those POS and semantic tags in the MATRA - CCR corpus are the contextual key information
used by the learning method that we have developed in order to automatically extract N-V qualia
pairs. The next section explains its realization.
4. The Machine Learning Method
We aim at learning a special kind of semantic relations from our POS and semantically tagged
MATRA - CCR corpus, that is, verbs playing a specific role in the semantic representation of common
nouns, as defined in the qualia structure in GL formalism. Trying to infer lexical semantic information from corpora is not new: a lot of work has already been conducted on this subject, especially in
the statistical learning domain. See Grefenstette (1994b), for example, or Habert et al. (1997) and
Pichon and Sébillot (1997) for surveys of this field. Following Harris’s framework (Harris et al.,
1989), such research tries to extract both syntagmatic and paradigmatic information, respectively
studying the words that appear in the same window-based or syntactic contexts as a considered lexical unit (first order word affinities Grefenstette, 1994a), or the words that generate the same contexts
as the key word (second order word affinities). For example, Briscoe and Carroll (1997) and Faure
and Nédellec (1999) try to automatically learn verbal argument structures and selectional restrictions; Agarwal (1995) and Bouaud et al. (1997) build semantic classes; Hearst (1992) and Morin
(1999) focus on particular lexical relations, like hyperonymy. Some of this research is concerned
with automatically obtaining more complete lexical semantic representations (Grefenstette, 1994b;
Pichon and Sébillot, 2000). Among these studies, mention must be made of the research described
by Pustejovsky et al. (1993) which gives some principles for acquiring GL qualia structures from a
corpus; this work is however quite different from ours because it is based upon the assumption that
the extraction of the qualia structure of a noun can be performed by spotting a set of syntactic structures related to qualia roles; we propose to go one step further as we have no a priori assumptions
concerning the structures that are likely to convey these roles in a given corpus.
In order to automatically acquire N-V pairs whose elements are linked by one of the semantic
relations defined in the qualia structure in GL, we have decided to use a symbolic machine learning
method. Moreover, symbolic learning has led to several studies on the automatic acquisition of
semantic lexical elements from corpora (Wermter et al., 1996) during the last few years. This
section is devoted to the explanation of our choice and to the description of the method that we have
developed.
Our selection of a learning method is guided by the fact that this method must not only provide
a predictor (this N-V pair is relevant, this one is not) but also infer general rules able to explain
the examples, that is, bring linguistically interpretable elements about the predicted qualia relations.
This essential explanatory characteristic has motivated our choice of the inductive logic programming (ILP) framework (Muggleton and De Raedt, 1994) in which programs, that are inferred from
a set of facts and a background knowledge, are logic programs, that is, sets of Horn clauses. Contrary to some statistical methods, it does not just give raw results but explains the concept that is
learnt, that is, here, what characterizes a qualia pair (versus a non-qualia one). This choice is also
especially justified by the fact that, up to now, linguists do not know what all the textual patterns
that express qualia relations are; they cannot thus verbalize rules describing them. Therefore, ILP
seems to be an appropriate option since its relational nature can provide a powerful expressiveness
500
L EARNING S EMANTIC L EXICONS U SING ILP
for these linguistic patterns. Moreover, as linguistic theories provide no clues concerning elements
that indicate qualia relations, ILP’s adaptable framework is particularly suitable for us. Lastly, the
errors inherent in the automatic POS and semantic tagging process previously described make the
choice of an error-tolerant learning method essential. The possibility of handling data noise in ILP
guarantees this robustness.
For our experiments, we provide a set of N-V pairs related by one of the qualia relations (positive
example set, E + ) within a POS and semantic context (elements from sentences containing those NV pairs in the corpus), and a set of N-V pairs that are not semantically linked (negative example
set, E − ). Generalizing rules from semantic and POS information about words that occur in the
context of N-V qualia pairs in the corpus and from distances between N and V in the sentences from
which examples are built is a particularly hard task. The difficulty is mainly due to the amount of
information that has to be handled by the ILP algorithm. We must therefore focus on the efficiency
of this learning step to be certain to obtain linguistically meaningful clauses in a relatively small
amount of time. Most ILP systems provide a way to deal more or less with the problem of the form
of the rules but only some of them enable a total control of this form and of the rule search efficiency.
Moreover, the particular structure of our POS and semantic information makes it essential to use
a system capable of processing relational background knowledge. For our project, we have thus
chosen ALEPH1 , Srinivasan’s ILP implementation that has already been proven well suited to deal
with a large amount of data in multiple domains (mutagenesis, drug structure. . . ) and permits
complete and precise customization of all the settings of the learning task. For research use, ALEPH
is also very attractive since it is entirely written in Prolog and thus allows the user to easily have
a comprehensive view of the learning process, and in particular to write his/her own refinement
operator to adequately perform rule search. However, this is certainly not the fastest choice: other
ILP programs could be used that would perform in shorter time, but it would be to the detriment of
a complete user control on the learning task. A few experiments have indeed been carried out with
Quinlan’s FOIL; the computing time was better (about half of the ALEPH time, see Section 5.1), but
some of the produced rules did not match the linguistically motivated form requirements we defined
in Section 4.2. These results are certainly due to the greedy search strategy used by FOIL.
In this section we first explain the construction of E + and E − for ALEPH. We then define the
space in which the rules that we want to learn are searched for (that is, what the rules we learn are
and how they are related to each other). We finally describe how we improve the efficiency of the
search by pruning some irrelevant hypotheses. The clauses that are obtained and their evaluation
are detailed in Section 5.
4.1 Example Construction
Our first task consists in building up E + and E − for ALEPH, in order for it to infer generalized
clauses that explain what, in the POS and semantic contexts of N-V pairs, distinguishes relevant
pairs from non-relevant ones. Here is our methodology for their construction.
First, every common noun in the MATRA - CCR corpus is considered. More precisely, we only
deal with a 81,314 word occurrence subcorpus of the MATRA - CCR corpus, which is formed by all the
sentences that contain at least one N and one V. This subcorpus contains 1,489 different N (29,633
noun occurrences) and 567 different V (9,522 verb occurrences). For each N, the 10 most strongly
associated V, in terms of χ2 (a statistical correlation measure based upon the relative frequencies of
1. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/aleph/aleph toc.html
501
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
words), are selected. This first step produces at the same time pairs whose components are correctly
bound by one qualia role ((roue, gonfler) (wheel, inflate)) and pairs that are fully irrelevant ((roue,
prescrire) (wheel, prescribe)).
Each pair is manually annotated as relevant or irrelevant according to Pustejovsky’s qualia structure principles. A Perl program is then used to find the occurrences of these N-V pairs in the sentences of the corpus.
For each occurrence of each pair that is supposed to be used to build one positive example (that
is, pairs that have been globally annotated as relevant), a manual control has to be done to ensure that
the N and the V really are in the expected relation within the studied sentence. After this control,
a second Perl program automatically produces the positive example by adding a clause of the form
is qualia(noun identifier,verb identifier). to the set E + . Information is also added to the background
knowledge that describes each word of the sentence and the position in the sentence of the N-V
pair. For example, for a five word long sentence whose word identifiers are w 1 ... w 5, and the
N-V pair w 4-w 2, the following clauses are added:
tags(w 1,POS-tag,semantic-tag).
tags(w 2,POS-tag,semantic-tag).
pred(w 2,w 1).
tags(w 3,POS-tag,semantic-tag).
pred(w 3,w 2).
tags(w 4,POS-tag,semantic-tag).
pred(w 4,w 3).
tags(w 5,POS-tag,semantic-tag).
pred(w 5,w 4).
distances(w 4,w 2,distance in words,distance in verbs).
where pred(x,y) indicates that word y occurs just before word x in the sentence, the predicate tags/3
gives the POS and semantic tags of a word, and distances/4 specifies the number of words and the
number of verbs between N and V in the sentence (a negative distance indicates that N occurs before
V, a positive one indicates that V occurs before N in the studied sentence; distances are shifted by
one in order to distinguish a positive null distance from a negative null one).
For example, the N-V qualia pair in boldface in the sentence “ L’installation se compose : de
deux atterrisseurs protégés par des carénages, fixés et articulés. . . ” (the system is composed: of
two landing devices protected by streamlined bodies, fixed and articulated. . . ) is transformed into
is qualia(m11124 52,m11124 35). and
tags(m11123 3 deb,tc vide,ts vide).
tags(m11123 3,tc noun sg,ts pro).
pred(m11123 3,m11123 3 deb).
tags(m11123 16,tc pron,ts ppers).
pred(m11123 16,m11123 3).
tags(m11123 19,tc verb sg,ts posv).
pred(m11123 19,m11123 16).
tags(m11123 27,tc wpunct pf,ts ponct).
pred(m11123 27,m11123 19).
tags(m11124 1,tc prep,ts rde).
pred(m11124 1,m11123 27).
tags(m11124 4,tc num,ts quant).
pred(m11124 4,m11124 1).
tags(m11124 9,tc noun pl,ts art).
pred(m11124 9,m11124 4).
tags(m11124 35,tc verb adj,ts acp).
502
L EARNING S EMANTIC L EXICONS U SING ILP
pred(m11124 35,m11124 9).
tags(m11124 44,tc prep,ts rman).
pred(m11124 44,m11124 35).
tags(m11124 52,tc noun pl,ts art).
pred(m11124 52,m11124 44).
tags(m11124 62,tc wpunct,ts virg).
pred(m11124 62,m11124 52).
tags(m11125 1,tc verb adj,ts acp).
pred(m11125 1,m11124 62).
tags(m11125 7,tc conj coord,ts rconj).
pred(m11125 7,m11125 1).
tags(m11125 10,tc verb adj,ts acp).
pred(m11125 10,m11125 7).
...
distances(m11124 52,m11125 35,2,1).
where the special tags tc vide and ts vide describe the empty word which is used to indicate the
beginning and the end of the sentence.
The negative examples are elaborated in the same way as the positive ones, with the same Perl
program. They are automatically built from the above mentioned highly correlated N-V pairs that
have been manually annotated as irrelevant, and from the occurrences in the corpus of potential
relevant N-V pairs rejected during E + construction (see above). For example, the non-qualia pair
in boldface in the following sentence: “Au montage : gonfler la roue à la pression prescrite, . . . ”
(When assembling: inflate the wheel to the prescribed pressure, . . . ) is added to the set E − as
is qualia(m7978 15,m7978 31). and the following clauses are stored into the background knowledge:
tags(m7977 1 deb,tc vide,ts vide).
tags(m7977 1,tc prep,ts ra).
pred(m7977 1,m7977 1 deb).
tags(m7977 3,tc noun sg,ts acy).
pred(m7977 3,m7977 1).
tags(m7977 11,tc wpunct pf,ts ponct).
pred(m7977 11,m7977 3).
tags(m7978 7,tc verb inf,ts acp).
pred(m7978 7,m7977 11).
tags(m7978 15,tc noun sg,ts ins).
pred(m7978 15,m7978 7).
tags(m7978 20,tc prep,ts ra).
pred(m7978 20,m7978 9).
tags(m7978 22,tc noun sg,ts phm).
pred(m7978 22,m7978 20).
tags(m7978 31,tc verb adj,ts acc).
pred(m7978 31,m7978 22).
tags(m7978 41,tc wpunc,ts virg).
pred(m7978 41,m7978 31).
...
distances(m7978 15,m7978 31,-3,-1).
During this step, as shown in the encoding of the previous positive and negative examples, some
categories of words are not taken into account: the determiners, and some adjectives, which are not
considered as relevant to bring up information about context of qualia or non-qualia pairs.
503
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
3,099 positive examples and about 3,176 negative ones are automatically produced this way
from the MATRA - CCR corpus. A LEPH’s background knowledge is also provided with other information, that describes special relationships among POS and semantic tags. Those relationships encode,
for example, the fact that a tag tc verb pl indicates a conjugated verb in the plural (conjugated plural),
that can be considered as a conjugated verb (conjugated) or simply a verb (verb). Here is an example
of those literals describing the words from a linguistic point of view:
verb( W ) :- conjugated( W ).
verb( W ) :- infinitive( W ).
...
conjugated( W ) :- conjugated plural( W ).
conjugated( W ) :- conjugated singular( W ).
conjugated plural( W ) :- tagcat( W, tc verb pl ).
...
The background knowledge file describing all these relations and all the predicate definitions is
given in appendix A. All the datasets (examples and background knowledge) used for the experiments are available from the authors on demand.
Let us define some terms we use later in this paper:
– “most general literals” are literals describing words that do not appear in the body of a clause
in the background knowledge (for example, common noun/1, verb/1). Note that every word
can be described by one and only one “most general literal”.
– “POS literals” (resp. “semantic literals”) are literals describing the morpho-syntactic (semantic) aspect of a word (the most general literals are not considered as semantic or POS literals),
– “most general POS literals” (resp. “most general semantic literals”) are POS (semantic) literals that appear in the body of most general literals (for example, infinitive/1, entity/1).
– for two literals l1 and l2 such that a rule l1:-l2. exists in the background knowledge, l1 is
called the immediate generalization of l2 and l2 is the immediate specialization of l1. The
immediate generalizations of literals are unique with respect to the background knowledge.
For any word W of our corpus, our background knowledge is such that all the literals describing
W can be ordered in a tree whose particular structure is used in the learning process. Indeed, the
root of the tree is the most general literal describing W and it has two branches, one for the POS
literals and the other for the semantic literals. Any node (literal) of these two branches has only one
upper node (its immediate generalization) and at most one lower (its immediate specialization if it
exists). Other useful predicates are also stored in the background knowledge, for example tagcat/2
and tagsem/2, that are used as an interface between the examples and the POS and semantic literals,
and the predicate suc/2 defined as suc(X,Y) :- pred(Y,X).; suc/2 is only used for reading convenience
and is considered, especially in the hypothesis construction, as the equivalent of pred/2 (that is,
is qualia(A,B) :- suc(A,B). and is qualia(A,B) :- pred(B,A). are considered as one unique hypothesis).
Given E + , E − and the background knowledge B, ALEPH tries to deal with that large amount
of information and discover rules that explain (most of) the positive examples and reject (most of)
the negative ones. To infer those rules, it uses examples to generate and test various hypotheses,
and keeps those that seem relevant regarding what we want to learn. To sum up, ALEPH algorithm
follows a very simple procedure that can be described in 4 steps, as stated in ALEPH’s manual:
504
L EARNING S EMANTIC L EXICONS U SING ILP
1 select one example to be generalized. If none exist, stop;
2 build ⊥, that is, the most specific clause that explains the example;
3 search the space of solutions bounded below by ⊥ for the hypothesis that maximizes a score
function. This is done with the help of a refinement operator ;
4 remove examples that are “covered” (“explained”) by the hypothesis that has been found.
Return to step 1.
The search of hypotheses (step 3) is the most complex task of this algorithm, and also the longest
one. To improve the efficiency of the learning and control the expressiveness of the solutions, this
search space must be characterized.
4.2 Hypothesis Search Lattice
Many machine learning tasks can be considered as a search problem. In ILP, the hypothesis H that
has to be learnt must satisfy:
∀e+ ∈ E + : B ∪ H |= e+ (completeness)
∀e− ∈ E − : B ∪ H 6|= e− (correctness)
Such a hypothesis is searched for through the space of all Horn clauses to find the one that is complete and correct. Unfortunately, the tests required on the training data are costly and preclude an
exhaustive search throughout the entire hypothesis space. Several kinds of biases are therefore used
to limit that search space (see Nédellec et al., 1996). One of the most natural ones is the hypothesis
language bias which defines syntactic constraints on the hypotheses to be found. This restriction
on the search space considerably limits the number of potential solutions, prevents overfitting and
ensures that only well-formed ones are obtained.
For us, a well-formed hypothesis is defined as a clause that gives (semantic and/or POS) information about words (N, V or words occurring in their context) and/or information about respective
positions of N and V in the sentence. For example is qualia(A,B) :- artefact(A), pred(B,C), suc(A,C), auxiliary(C).—which means that a N-V pair is qualia if N is an artefact, V is preceded by an auxiliary verb
and N is followed by the same verb—is a well-formed hypothesis. We have therefore to indicate
in ALEPH’s settings that the predicates artefact/1, pred/2, suc/2, auxiliary/1. . . can be used to construct
a hypothesis. Another constraint on the hypothesis language is that there can be at most one item
of POS information and one item of semantic information about a given word. This means that
the hypothesis is qualia(A,B) :- pred(B,C), participle(C), past participle(C). is not considered legal since
there are two items of POS information about the word represented by C. Conversely, the hypotheses is qualia(A,B) :- pred(B,C), participle(C), action verb(C). or is qualia(A,B) :- pred(B,C), past participle(C),
physical action verb(C). or even is qualia(A,B) :- pred(B,C), suc(A,C). are well-formed with respect to our
task. Redundant information on one word is indeed superfluous and useless since all our POS and
semantic information is hierarchically organized: one of the literals is thus more specific than the
others and describes the word in a precisely enough way; the other literals are therefore useless.
In our example, there is no need to say that C is a participle (participle(C)) if it is known to be a
past participle (past participle(C)). This superfluousness issue is managed by our refinement operator.
Several other predicates, in particular those dealing with the distances between N and V and their
relative positions, are used in the hypothesis language. More than 100 different predicates can thus
occur in a hypothesis.
505
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
Even with this language bias, our learning search space remains huge. Fortunately, the hypotheses can be organized by a generality relation (with the help of a quasi-order on hypotheses) which
permits the algorithm to run intelligently across the space of solutions. Several quasi-orderings
have been studied in the ILP framework. Logical implication would ideally be the preferred generality relation, but undecidability results lead to its rejection (Nienhuys-Cheng and de Wolf, 1996).
Another order, commonly used by ILP systems, is θ-subsumption (Plotkin, 1970), defined below.
Definition 1 A clause C1 θ-subsumes a clause C2 (C1 θ C2 ) if and only if (iff) there is a substitution
θ such that C1 θ ⊆ C2 (considering the clauses as sets of literals).
This order is weaker than implication (C1 θ C2 ⇒ C1 |= C2 but reverse is not true) but allows an easier handling of the clauses. θ-subsumption remains however too strong for our application. Indeed, let us consider H1 ≡ is qualia(X1,Z1 ) :- suc(X1 ,Y1 ), pred(Z1,W1 ), verb(Y1 ), verb(W1 ).
and H2 ≡ is qualia(X2,Z2 ) :- suc(X2 ,Y2 ), pred(Z2 ,Y2 ), verb(Y2 ).. Then, we have H1 θ H2 with θ =
[X1 /X2 ,Y1 /Y2 , Z1 /Z2 ,W1 /Y2 ] and since in our application, variables represent words, this means that
θ-subsumption allows to consider one word as two different ones in a clause, as this is the case
with the word Y1 /W1 in H1 . This property is not considered as relevant for our learning task; we
thus focus our attention on a coercive form of θ-subsumption: θ-subsumption under object identity
(henceforth θOI -subsumption) (Esposito et al., 1996) defined below.
Definition 2 (after Badea and Stanciu (1999)) A clause C1 θOI -subsumes a clause C2 (C1 OI C2 )
iff there is a substitution θ such that C1 θ ⊆ C2 and θ is injective (that is, θ does not unify variables
of C1 ).
θOI -subsumption is obviously weaker than θ-subsumption (C1 OI C2 ⇒ C1 θ C2 but reverse
is false) but preserves the expected property H1 6OI H2 (with H1 and H2 as defined above). This is
handled in ALEPH by generating hypotheses with sets of inequalities stating that variables with two
different names cannot be unified. For example, H1 is internally represented in ALEPH by
is qualia(X,Z):-suc(X,Y),pred(Z,W),verb(Y),verb(W),X6=Z,X6=Y,Z6=Y,X6=W,Y6=W,Z6=W.
For reading convenience, in the remaining of this paper we do not write these sets of inequalities
and we assume that two differently named variables are distinct.
The notion of generality (we call it θNV -subsumption) that we use is derived from the θOI subsumption and adapted to fit the needs of our application. Indeed, θOI -subsumption, as defined
above, does not totally capture the generality notion we want to use in our hypothesis space. First,
we wish to take into account the hierarchical organization of our POS and semantic information, that
is, we want our generality notion to make the most of the domain theory described in the background
knowledge, following ideas developed in the generalized subsumption framework (Buntine, 1988).
For example, we want the hypothesis is qualia(A,B) :- object(A). to be considered as more general
than is qualia(A,B) :- artefact(A). which must itself be considered as more general than is qualia(A,B) :instrument(A). (see Figure 3).
Moreover, we want to avoid clauses with no constraint set on a variable. For example, the hypothesis is qualia(A,B) :- infinitive(B), pred(A,C). could simply be expressed by is qualia(A,B) :- infinitive(B).
since pred(A,C) does not bring any linguistically interesting information. However, is qualia(A,B) :suc(A,C), suc(C,D), object(D). is considered as well-formed since there is a semantic constraint on the
506
L EARNING S EMANTIC L EXICONS U SING ILP
word D, and C is coerced by the two suc/2. This condition is very similar to the well-known linkedness: according to Helft (1987), a clause is said to be linked if all its variables are linked; a variable
V is linked in a clause C if and only if V occurs in the head of C, or there is a literal l in C that
contains the variables V and W (V 6= W ) and W is linked in C. It also corresponds to the connection
constraint (Quinlan, 1990), i1-determinate clauses in the ij-determinacy framework (Muggleton and
Feng, 1990) or chain-clause concept (Rieger, 1996), but in our case, every variable must not only be
connected to head variables by a path of variables (with the help of pred/2 and suc/2), but besides, it
must be “used” elsewhere in the hypothesis body. A hypothesis meeting all these conditions is said
to be well-formed with respect to our learning task.
Therefore, we say that with respect to the background knowledge B, C NV D if there exist
an injective substitution θ and a function fD is such that fD (C)θ ⊆ D ( fD ({l1 , l2 , ..., lm }) means
{ fD (l1 ), fD (l2 ), ..., fD (lm )}) where fD such that ∀l ∈ C, B, fD (l) |= l.
Intuitively, this means that a clause D can be more specific than C if
1 – D has literals in addition to literals of C;
2 – D contains literals more specific (with respect to POS and semantic information hierarchy)
on the same variables than C.
As for θ-subsumption and θOI -subsumption, θNV -subsumption induces a quasi-ordering upon
the space of hypotheses with respect to our particular background knowledge and our definition of
well-formed hypothesis, as stated by the three following results:
– C NV C (reflexivity)
– C1 NV C2 and C2 NV C1 ⇒ C1 and C2 are equivalent (written C1 ∼NV C2 ); in our case (as
well as for θOI -subsumption) C1 ∼NV C2 means C1 = C2 up to variable renaming (antisymmetry)
– C1 NV C2 and C2 NV C3 ⇒ C1 NV C3 (transitivity)
Proof
1 - Reflexivity: trivial.
2 - Antisymmetry: C1 NV C2 and C2 NV C1 , thus there exist f1 , f2 , θ1 and θ2 such that f1 (C1 )θ1 ⊆
C2 and f2 (C2 )θ2 ⊆ C1 , with ∀l ∈ C1 , B, f1 (l) |= l and ∀l ∈ C2 , B, f2 (l) |= l. Therefore, ∀l ∈ C1 , B, f2 ( f1 (l)) |=
f1 (l) and thus ∀l ∈ C1 , B, f2 ( f1 (l)) |= l with f2 ( f1 (l)) ∈ C1 . Since C1 is considered as well-formed
and with respect to our background knowledge, we have ∀l ∈ C1 , f2 ( f1 (l)) = l and f1 (l) = l; similarly, ∀l ∈ C2 , f2 (l) = l. This means that C1 θ1 ⊆ C2 and C2 θ2 ⊆ C1 and since θ1 and θ2 are injective,
C1 and C2 are only alphabetic variants.
3 - Transitivity: C1 NV C2 and C2 NV C3 , thus there exist f1 , f2 , θ1 and θ2 such that f1 (C1 )θ1 ⊆ C2
and f2 (C2 )θ2 ⊆ C3 . We have f2 ( f1 (C1 ))θ1 θ2 ⊆ C3 , and f1 ◦ f2 (composition of f1 and f2 ) and θ1 ◦ θ2
are injective, therefore C1 NV C3 .
Thanks to our example representation and the background knowledge used, all the literals that
can occur in hypotheses are deterministic; such hypotheses are said to be determinate clauses. With
these linked determinate clauses, the θNV -subsumption quasi-ordering implies that the hypothesis
space is structured as a lattice (detailed proof is given in appendix B for θOI -subsumption and θNV subsumption). At the top of this lattice, we find the most general clause (⊤) and below, a most
507
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
is qualia(A,B).
1
is qualia(A,B) :- common noun(A).
1
2
is qualia(A,B) :- pred(B,C), preposition(C).
2
is qualia(A,B) :- entity(A).
is qualia(A,B) : - singular common noun(A).
2
2
is qualia(A,B) :- object(A).
1
1
is qualia(A,B) :- pred(B,C), goal preposition(C).
is qualia(A,B) :- object(A), singular common noun(A).
1
1
2
is qualia(A,B) :- artefact(A),
singular common noun(A).
is qualia(A,B) :- object(A), singular common noun(A),
pred(B,C), goal preposition(C).
Figure 4: Hypothesis lattice for θNV -subsumption
specific clause (called MSC or bottom and henceforth written down ⊥). In our case, ⊤ is the clause
clause containing all
the literals that can be found to describe the example to be generalized (see Muggleton, 1995, for
details about ⊥ construction) minus superfluous literals (literals giving more general information
about a word than other literals in ⊥). Figure 4 shows a simple example of our lattice; numbers on
the edges refer to the first or the second condition of the given definition of θNV -subsumption.
The way the search is performed in this lattice is really important to find the best hypothesis
(with respect to the chosen score function) in the shortest possible time. As our background knowledge has the structure of a forest (a set of trees) and the relation introducing variables (the sequence
relation pred/suc) is determinate, it is quite easy to build a perfect refinement operator (Badea and
Stanciu, 1999) allowing an effective traversal of this hypothesis space ordered by θNV -subsumption
using the methods described there. However, in order to save computation time, we avoid exploring
parts of the hypothesis space (that is, refinements of hypotheses) if we know that there cannot be
any good solution in those parts.
is qualia(A,B). stating that all N-V pairs are qualia pairs, and ⊥ is a constant-free
4.3 Pruning and Private Properties
Pruning the search is a delicate task and must be controlled so as not to “miss” a potential solution.
The problem is that if a hypothesis violates some property P, one of its refinements can perhaps be
508
L EARNING S EMANTIC L EXICONS U SING ILP
correct with respect to P. Let us see how we manage pruning in our lattice with the guarantee of not
leaving a valid solution out.
Some properties, called private properties (Torre and Rouveirol, 1997a,b,c), allow safe pruning
with respect to a given refinement operator. They enable us to avoid refining a given hypothesis
that does not satisfy the expected properties without taking the risk of missing a solution since no
descendant of the hypothesis will satisfy those properties.
Definition 3 (from Torre and Rouveirol 1997c) A property P is said to be private with respect to
the refinement operator ρ into the search space S iff:
∀H, H ′ ∈ S : ∀(H ′ ∈ ρ∗ (H) ∧ P(H) ⇒ P(H ′ ))
where X indicates the negation of X and ∀F, with F a formula, denotes the universal closure of F,
which is the closed formula obtained by adding a universal quantifier for every variable having a
free occurrence in F.
Let us examine a very simple and well-known private property (used as an example by Torre
and Rouveirol, 1997c) that allows us to prune the search safely: the length of a clause. Formally,
the property that binds the length of a clause to k literals can be expressed as |H| ≤ k (|C| denotes
the number of literals in clause C). This property is private with respect to the operator ρ in the
search space S iff ∀H, H ′ ∈ S : ∀k ∈ N : (H ′ ∈ ρ∗nv (H) ∧ |H| > k ⇒ |H ′ | > k). Our operator basically
consists in adding literals (H ′ ∈ ρnv (H) ∧ |H ′ | > |H|) or in replacing a literal by a more specific one
(then H ′ ∈ ρnv (H) ∧ |H ′| = |H|). The clause length property is thus private with respect to ρnv and
allows a safe pruning as soon as a hypothesis has too many literals.
Several other private properties are used to prune search in a safe way. We use for example the
minimal number of positive examples to be covered, that is, if a clause does not explain at least a
given number of positive examples this hypothesis is not considered as relevant. That property is
obviously private with respect to ρnv since the numbers of covered positive and negative examples
decrease through specialization.
In ILP systems, properties about the score function are often used to prune search. This function
permits us to decide which hypothesis is the best one for the learning task. The one we have chosen
is s(H) = (P − N, |H|) where P is the number of positive examples and N the number of negative
examples covered by hypothesis H. H1 is said to be a better hypothesis than H2 (with s(H1 ) =
(P1 − N1 , |H1 |) and s(H2 ) = (P2 − N2 , |H2 |)) iff P1 − N1 > P2 − N2 or P1 − N1 = P2 − N2 ∧ |H1 | < |H2 |.
Unfortunately, since P − N is not monotonic, we cannot say anything in general about the score of
the refinements of a given hypothesis H that does satisfy a score criterion such that s(H) < k, where
k can be the best score found until then in the search. This property would permit an optimal
pruning, but since it is not private in our case, we cannot use it. The private property about this
score function we make the most of to prune search is weaker: sopt (H) ≥ Sbest where Sbest is the
greatest difference P − N found during the search and sopt (H) = Pcurrent − N⊥. Pcurrent is the number
of positive examples covered by the current hypothesis, N⊥ is the number of negative examples
covered by ⊥ (evaluated at its construction time). ∀H, H ′ ∈ S : ∀Sbest ∈ N : (H ′ ∈ ρ∗ (H)∧ sopt (H) <
Sbest ⇒ sopt (H ′ ) < Sbest ) since P decreases through the search and N⊥ is constant.
All those (safe) prunings ensure finding the best solution in a minimal amount of time. Two
kinds of output are produced by this learning process: some clauses that have not been generalized
(that is, some of the positive examples), and a set of generalized clauses, called G hereafter, and
509
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
on which we shall focus our attention. Before using the clauses in G on the MATRA - CCR corpus to
acquire N-V qualia pairs and automatically produce a GL-based semantic lexicon, we must validate
our learning process in different ways and examine what kind of rules have been learnt.
5. Learning Validation and Results
This section is dedicated to three aspects of the validation of the machine learning method we have
described. First we focus on the theoretical results of the learning, that is, we take an interest in
the quality of G with respect to the training data (E + and E − ). The second step of the validation
concerns its empirical aspect. We have applied the generalized clauses that have been inferred to
the MATRA - CCR corpus and have evaluated the relevance of the decision made on the classification
of N-V pairs as qualia or not. The last step concerns the linguistic relevance evaluation of the learnt
rules, that is, from a linguistic point of view, what information do we learn about the semantic and
syntactic context in which qualia pairs appear?
5.1 Theoretical Validation
This first point concerns the determination of a learning quality measure with the chosen parameter setting. We are particularly interested in the proportion of positive examples that are covered
by the generalized clauses, and if we accept some noise in ALEPH parameter adjustment to allow
more generalizations, by the proportion of negative examples that are rejected by those generalized
clauses. The measure of the recall and precision rates of the learning method can be summed up
using the Pearson coefficient, which is used to compare the results of different experiments:
Pearson =
(T
√P∗T N)−(FP∗FN)
PrP∗PrN∗AP∗AN
where A = actual, Pr = predicated, P = positive, N= negative, T= true, F= false ; a value close to 1
indicates a good learning.
In order to obtain good approximations of the main characteristic numbers of the learning
method, we perform a 10-fold cross-validation (Kohavi, 1995) on the initial sets of 3,099 positive examples and 3,176 negative ones. Thus, the set of examples (E+ and E− ) is divided into ten
subsets, each of whose is in turn used as a testing set while the nine others are used as training
set; ten learning processes are then performed with these training sets and evaluated onto the corresponding testing sets. Table 1 summarizes time,2 precision, recall and Pearson coefficient averages
and standard deviations obtained through this 10-fold cross-validation.
Average
Standard deviation
Time
(seconds)
10285
1440
Precision
Recall
Pearson
0.813
0.028
0.890
0.024
0.693
0.047
Table 1: Cross-validation results
2. Experiments were conducted on a 966MHz PC running Linux.
510
L EARNING S EMANTIC L EXICONS U SING ILP
The entire set of examples is then used as training set by ALEPH; 9 generalized clauses (see
Section 5.3) are found in less than 3 hours. We now try to estimate the performance of these rules
by comparing their results on an unknown dataset with those obtained by 4 experts.
5.2 Empirical Validation
In order to evaluate the empirical validity of our learning method, we have applied the 9 generalized clauses to the MATRA - CCR corpus and have studied the appropriateness of their decisions
concerning the classification of each pair as relevant or not. Since it is impossible to test all the N-V
combinations found in the corpus, our evaluation has focused on 7 significant common nouns in the
domain which were not used as examples, (vis, écrou, porte, voyant, prise, capot, bouchon) (screw,
nut, door, indicator signal, plug, cowl, cap).
The evaluation has been carried out in two steps as follows. First, a Perl program retrieves all
N-V pairs that appear in the same sentence in a part of the corpus and include one of the studied
common nouns, and forwards them to 4 GL experts. The experts manually tag each pair as relevant
or not. Divergences are discussed until complete agreement is reached.
In a second stage, this reference corpus is compared to the answers obtained for these N-V pairs
of the same part in the corpus by the application of the clauses learnt with ALEPH. The results
obtained for the seven selected common nouns are presented in Table 2. One N-V pair is considered
as tagged “relevant” by the clauses if at least one of them covers this pair.
qualia pairs detected qualia
non-qualia pairs detected qualia
qualia pairs detected non-qualia
non-qualia pairs detected non-qualia
Pearson
62
40
4
180
0.666
Table 2: Empirical validation on the MATRA - CCR corpus
These results are quite promising, especially if we compare them to those obtained by χ2 correlation (see Table 3) which was the first step of our selection of N-V couples in the corpus (see
Section 4.1).
qualia pairs detected qualia
non-qualia pairs detected qualia
qualia pairs detected non-qualia
non-qualia pairs detected non-qualia
Pearson
33
35
33
185
0.337
Table 3: χ2 results on the MATRA - CCR corpus
On one side, our ILP method detects most of the qualia N-V couples, like porte-ouvrir (dooropen) or voyant-signaler (warning light-warn). The four non-detected pairs appear in very rare
constructions in our corpus, like prise-relier (plug-connect) in la citerne est reliée à l’appareil par
des prises (the tank is connected to the machine by plugs) where a prepositional phrase (PP) à
511
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
l’appareil (to the machine) is inserted between the verb and the par-PP (by-PP). On the other side,
only 8 pairs from the 40 non-qualia pairs detected qualia by our learning method cannot be linked
syntactically. That means that the ILP algorithm can already reliably distinguish between syntactically and not syntactically linked pairs. In comparison, 25 of the 35 non-qualia pairs detected qualia
by the χ2 are not even syntactically related.
The main problem for the ILP algorithm is therefore to correctly identify N-V pairs related by a
telic or agentive relation—the most common qualia links in our corpus—among the pairs that could
be syntactically related. But here we should carefully distinguish two types of errors. The first ones
are caused by constructions that are ambiguous and where the N-V can or cannot be syntactically
related, as enlever-prises (remove-plugs) in enlever les shunts sur les prises (remove the shunts
from the plugs). They cannot be disambiguated by superficial clues about the context in which the
V and the N occur and show the limitation of using tagged corpus for the learning process. However
they are very rare in our corpus (8 pairs). On the contrary, all remaining errors seem more related
to the parameterizing of the learning method. For example, taking into consideration the number of
nouns between the V and the N could avoid a lot of wrong pairs like poser-capot (put up-cover) in
poser les obturateurs capots (put up cover stopcocks) or assurer-voyant (make sure-warning light)
in s’assurer de l’allumage du voyant (make sure that the warning light is switched on).
The empirical validation can be therefore considered as positive and we can now focus on the
last step of the evaluation that consists in assessing the linguistic validity of the generalized clauses.
5.3 Linguistic Validation
For the linguist, the issue is not only to find good examples of qualia relations but also to identify
in texts the linguistic patterns that are used to express them. Consequently, the question is: what do
these clauses tell us about the linguistic structures that are likely to convey qualia relations between
a noun and a verb? We know from previous research (Morin, 1999) dealing with other types of
semantic relations that a given relation can be instantiated by a wide variety of linguistic patterns,
and that this set of structures may greatly vary from one corpus to another. Such research generally
focuses on hyperonymy (is-a) and meronymy (part-of) relations, which provide the basic structure
of ontologies. Our aim is thus similar, with the additional difficulty that some of the relations
we focus on—such as the telic or agentive ones—have never been extensively studied on corpora,
and are more difficult to identify than more conventional semantic relations. Previous research
concerned with the acquisition of elements of GL (Pustejovsky et al., 1993) has looked at some
solutions for identifying words linked by prespecified syntactic relations in texts, such as object
relations between verbs and nouns, or certain types of N-N compounds. This research is not deeply
evaluated and is however quite different from ours: contrary to this approach, we have indeed no a
priori assumptions about the kind of structures in which telic, agentive or formal N-V pairs may be
found.
We are thus faced with a set of nine clauses that we now try to interpret in terms of linguistic
rules:
(1) is qualia(A,B) :- precedes(B,A), near verb(A,B), infinitive(B), action verb(B).
(2) is qualia(A,B) :- contiguous(A,B).
(3) is qualia(A,B) :- precedes(B,A), near word(A,B), near verb(A,B), suc(B,C), preposition(C).
(4) is qualia(A,B) :- near word(A,B), pred(A,C), void(C).
(5) is qualia(A,B) :- precedes(B,A), suc(B,C), pred(A,D), punctuation(D), singular common noun(A), colon(C).
512
L EARNING S EMANTIC L EXICONS U SING ILP
(6) is qualia(A,B) :- near word(A,B), suc(B,C), suc(C,D), action verb(D).
(7) is qualia(A,B) :- precedes(A,B), near word(A,B), pred(A,C), punctuation(C).
(8) is qualia(A,B) :- near verb(A,B), pred(B,C), pred(C,D), pred(D,E), preposition(E), pred(A,F), void(F).
(9) is qualia(A,B) :- precedes(A,B), near verb(A,B), pred(A,C), subordinating conjunction(C).
where near word(X,Y) means that X and Y are separated by at least one word and at most two words,
and near verb(X,Y) that there is no verb between X and Y.
What is most striking is the fact that, at this level of generalization, few linguistic features are
retained. Previous learning on the same corpus with no semantic tagging using PROGOL and a
poorer contextual information (Sébillot et al., 2000) had led to less generalized rules containing
more linguistic elements; these rules were however less relevant for acquiring correct qualia pairs.
The 9 clauses learnt here seem to provide very general indications and tell us very little about verb
types (action verb is the only information we get), nouns (common noun) or prepositions that are
likely to fit into such structures. But the clauses contain other information, related to several aspects
of linguistic descriptions, like:
- proximity: this is a major criterion. Most clauses indicate that the noun and the verb must be
either contiguous (Clause 2) or separated by at most one element (Clauses 3, 4, 6 and 7) and that no
verb must appear between N and V (Clauses 1, 3, 8 and 9).
- position: Clauses 4 and 7 indicate that one of the two elements is found at the beginning of a
sentence or right after a punctuation mark, whereas the relative position of N and V ( precedes/2) is
given in Clauses 1, 3, 5, 7 and 9.
- punctuation: punctuation marks, more specifically colons, are mentioned in Clauses 5 and 7.
- morpho-syntactic categorization: the first clause detects a very important structure in the text,
corresponding to action verbs in the infinitive form.
These features shed light on linguistic patterns that are very specific to the corpus, a text falling
within the instructional genre. We find in this text many examples in which a verb at the infinitive
form occurs at the beginning of a proposition and is followed by a noun phrase. Such lists of
instructions are very typical of the corpus:
- débrancher la prise (disconnect the plug)
- enclencher le disjoncteur (engage the circuit breaker)
- déposer les obturateurs (remove the stopcocks)
To further evaluate these findings, we have compared what we find by means of the learning
process to linguistic observations obtained manually on the same corpus (Galy, 2000). Galy has
listed a set of canonical verbal structures that convey telic information:
infinitive verb + det + noun (visser le bouchon) (to tighten the cap)
verb + det + noun (ferment le circuit ) (close the circuit)
noun + past participle (bouchon maintenu ) (held cap)
noun + be + past participle (circuits sont raccordés) (circuits are connected)
noun + verb (un bouchon obture) (a cap blocks up)
be + past participle + par + det + noun (sont obturées par les bouchons) (are blocked up by caps)
The two types of results show some overlap: both experiments demonstrate the significance of
infinitive structures and highlight patterns in which the verb and noun are very close to each other.
Yet the results are quite different since the learning method proposes a generalization of the struc513
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
tures discovered by Galy. In particular, the opposition between passive and active constructions is
merged in Clause 2 by the indication of mere contiguity (V can occur before or after N). Conversely,
some clues, like punctuation marks and position in the sentence, have not been observed by manual
analysis because they are related to levels of linguistic information that are usually neglected by
linguistic observation, even if they are known to be good pattern markers (Jones, 1994).
Consequently, when we look at the results of the learning process from a linguistic point of
view, it appears that the clauses give very general surface clues about the structures that are favored
in the corpus for the expression of qualia relations. Yet, these clues are sufficient to give access to
some corpus-specific patterns, which is a very interesting result.
6. Conclusions and Future Work
The acquisition method of N-V qualia pairs—as defined in Pustejovsky’s generative lexicon formalism—
that we have developed leads to very promising results. Concerning the ILP learning system itself,
we have defined and made the most of a well-suited generality notion extending object identity
subsumption, which has led to obtaining only well-formed hypotheses that can be linguistically interpreted. The speed of the learning step is improved by safely pruning the search of the best rules
on certain conditions expressed as private properties. The rules that are learnt lead to very good
results for the N-V qualia pairs acquisition task: 94% of all relevant pairs are detected for seven
significant common nouns; these results have to be compared with the 50% results of χ2 . Moreover,
from a practical point of view, the linguistic validation of the inferred rules confirms the ability of
our method to help a linguist detect linguistic patterns dedicated to the expression of qualia roles.
One next step of our research will consist in repeating the experiment on new textual data, in
order to see what types of specific structures will be detected in a less technical corpus; and we
will also focus on N-N pairs, which very frequently exhibit telic relations in texts (as in: bouchon
de protection, protective cap). Another potential avenue is to try to learn separately each qualia
semantic relation (telic, agentive, formal) instead of all together as it is done up to now. Even if
such a distinction is maybe not useful for an information retrieval application, it could result in
linguistically interesting rules.
Other future studies should also be undertaken to improve the portability of the full method.
In particular, the semantic tagging of a corpus needs an expert’s supervision to build the semantic
classification of all the words. Even if the determination of the relevant classes for one domain
can be partly automated (see Agarwal, 1995; Grefenstette, 1994b, for example), it still remains
too costly to be carried out on any new corpus. The last phase of the project will deal with the
real use of the N-V (and possibly N-N) pairs obtained with the machine learning method within an
information retrieval system (such as a textual search engine) and the evaluation of the improvement
of its performances both from a theoretical (recall and precision rate) and empirical (with the help
of real human users) point of view.
Acknowledgments
The authors wish to thank Céline Rouveirol for helpful discussions and for insightful comments on
an earlier version of this paper. They would also like to thank James Cussens and the anonymous
reviewers for their excellent advice.
514
L EARNING S EMANTIC L EXICONS U SING ILP
Appendix A. Background Knowledge
Here is the listing of the background knowledge part describing the linguistic relations as used in
the experiments described in Section 5.
%%%%%%%%%%%%%%%%%%%%%%%%%
% background knowledge
% common noun %%%%%%%%%%%%%
common noun( W ) :- plural common noun( W ).
common noun( W ) :- singular common noun( W ).
common noun( W ) :- abstraction( W ).
common noun( W ) :- event( W ).
common noun( W ) :- group( W ).
common noun( W ) :- psychological feature( W ).
common noun( W ) :- state( W ).
common noun( W ) :- entity( W ).
common noun( W ) :- location( W ).
plural common noun(W):- tagcat(W,tc noun pl).
singular common noun(W):- tagcat(W,tc noun sg).
abstraction( W ) :- attribute( W ).
abstraction( W ) :- measure( W ).
abstraction( W ) :- relation( W ).
event( W ) :- natural event( W ).
event( W ) :- act(W).
event( W ) :- phenomenon(W ).
natural event( W ) :- tagsem(W, ts hap ).
phenomenon( W ) :- tagsem(W, ts phm ).
phenomenon( W ) :- process( W ).
process( W ) :- tagsem(W, ts pro ).
act( W ) :- tagsem(W, ts act ).
act( W ) :- human activity( W ).
human activity( W ) :- tagsem(W, ts acy ).
group( W ) :- tagsem(W, ts grp ).
group( W ) :- social group( W ).
social group( W ) :- tagsem(W, ts grs ).
psychological feature( W ) :- tagsem(W, ts psy ).
state( W ) :- tagsem(W, ts sta ).
entity( W ) :- tagsem(W, ts ent ).
entity( W ) :- body part( W ).
entity( W ) :- causal agent( W ).
entity( W ) :- object( W ).
body part( W ) :- tagsem(W, ts prt ).
object( W ) :- tagsem(W, ts pho ).
object( W ) :- artefact( W ).
object( W ) :- part( W ).
object( W ) :- substance( W ).
part( W ) :- tagsem(W, ts por ).
location( W ) :- tagsem(W, ts loc ).
location( W ) :- point(W).
point( W ) :- tagsem(W, ts pnt ).
515
C LAVEAU , S ÉBILLOT, FABRE
point( W ) :- position( W ).
position( W ) :- tagsem(W, ts pos ).
attribute( W ) :- tagsem(W, ts atr ).
attribute( W ) :- form( W ).
attribute( W ) :- property( W ).
form( W ) :- tagsem(W, ts frm ).
property( W ) :- tagsem(W, ts pty ).
measure( W ) :- tagsem(W, ts mea ).
measure( W ) :- definite quantity( W ).
measure( W ) :- unit( W ).
time unit( W ) :- tagsem(W, ts tme ).
definite quantity( W ) :- tagsem(W, ts qud ).
unit( W ) :- tagsem(W, ts unt ).
unit( W ) :- time unit( W ).
relation( W ) :- tagsem(W, ts rel ).
relation( W ) :- communication( W ).
communication( W ) :- tagsem(W, ts com ).
causal agent( W ) :- tagsem(W, ts agt ).
causal agent( W ) :- human( W ).
human( W ) :- tagsem(W, ts hum ).
artefact( W ) :- tagsem(W, ts art ).
artefact( W ) :- instrument(W).
instrument( W ) :- tagsem(W, ts ins ).
instrument( W ) :- container( W ).
container( W ) :- tagsem(W, ts cnt ).
substance( W ) :- tagsem(W, ts sub ).
substance( W ) :- chemical compound( W ).
substance( W ) :- stuff( W ).
chemical compound( W ) :- tagsem(W, ts chm ).
stuff( W ) :- tagsem(W, ts stu ).
% verb %%%%%%%%%%%%%%%%%
verb( W ) :- infinitive( W ).
verb( W ) :- participle( W ).
verb( W ) :- conjugated( W ).
verb( W ) :- action verb( W ).
verb( W ) :- state verb( W ).
verb( W ) :- modal verb( W ).
verb( W ) :- temporality verb( W ).
verb( W ) :- possesion verb( W ).
verb( W ) :- auxiliary( W ).
infinitive( W ) :- tagcat(W, tc verb inf).
participle( W ) :- present participle( W ).
participle( W ) :- past participle( W ).
present participle( W ) :- tagcat(W, tc verb prp).
past participle( W ) :- tagcat(W, tc verb pap).
conjugated( W ) :- conjugated plural(W).
conjugated( W ) :- conjugated singular(W).
conjugated plural( W ) :- tagcat(W, tc verb pl).
conjugated singular( W ) :- tagcat(W, tc verb sg).
action verb( W ) :- cognitive action verb( W ).
516
AND
B OUILLON
L EARNING S EMANTIC L EXICONS U SING ILP
action verb( W ) :- physical action verb( W ).
cognitive action verb( W ) :- tagsem(W, ts acc ).
physical action verb( W ) :- tagsem(W, ts acp ).
state verb( W ) :- tagsem(W, ts eta ).
modal verb( W ) :- tagsem(W, ts mod ).
temporality verb( W ) :- tagsem(W, ts tem ).
possesion verb( W ) :- tagsem(W, ts posv ).
auxiliary( W ) :- tagsem(W, ts aux ).
% preposition %%%%%%%%
preposition( W ) :- tagcat(W, tc prep).
preposition( W ) :- spat preposition( W ).
preposition( W ) :- goal preposition( W ).
preposition( W ) :- temp preposition( W ).
preposition( W ) :- manner preposition( W ).
preposition( W ) :- rel preposition( W ).
preposition( W ) :- caus preposition( W ).
preposition( W ) :- neg preposition( W ).
preposition( W ) :- en preposition( W ).
preposition( W ) :- sous preposition( W ).
preposition( W ) :- a preposition( W ).
preposition( W ) :- de preposition( W ).
spat preposition( W ) :- tagsem(W, ts rspat ).
goal preposition( W ) :- tagsem(W, ts rpour ).
temp preposition( W ) :- tagsem(W, ts rtemp ).
manner preposition( W ) :- tagsem(W, ts rman ).
rel preposition( W ) :- tagsem(W, ts rrel ).
caus preposition( W ) :- tagsem(W, ts rcaus ).
neg preposition( W ) :- tagsem(W, ts rneg ).
en preposition( W ) :- tagsem(W, ts ren ).
sous preposition( W ) :- tagsem(W, ts rsous ).
a preposition( W ) :- tagsem(W, ts ra ).
de preposition( W ) :- tagsem(W, ts rde ).
% adjective %%%%%%%%%%%
adjective( W ) :- singular adjective( W ).
adjective( W ) :- plural adjective( W ).
adjective( W ) :- verbal adjective( W ).
adjective( W ) :- comparison adjective( W ).
adjective( W ) :- concrete prop adjective( W ).
adjective( W ) :- abstract prop adjective( W ).
adjective( W ) :- nominal adjective( W ).
singular adjective( W ) :- tagcat(W, tc adj sg).
plural adjective( W ) :- tagcat(W, tc adj pl).
verbal adjective( W ) :- tagcat(W, tc verb adj).
comparison adjective( W ) :- tagsem(W, ts acomp ).
concrete prop adjective( W ) :- tagsem(W, ts apty ).
abstract prop adjective( W ) :- tagsem(W, ts apa ).
nominal adjective( W ) :- tagsem(W, ts anom ).
517
C LAVEAU , S ÉBILLOT, FABRE
AND
% pronoun %%%%%%%%%%%%%
pronoun(W):- rel pronoun(W).
pronoun(W):- non rel pronoun(W).
pronoun(W):- tagsem(W, ts pron).***
rel pronoun(W) :- tagcat(W, tc pron rel).
non rel pronoun(W) :- tagcat(W, tc pron).
% others %%%%%%%%%%%%%
proper noun( W ) :- tagsem(W, ts nompropre ).
proper noun( W ) :- tagsem(W, ts numero ).
coordinating conjunction(W) :- tagsem(W, ts rconj).
subordinating conjunction(W) :- tagsem(W, ts subconj).
bracket( W ) :- tagsem(W, ts paro ).
bracket( W ) :- tagsem(W, ts parf ).
ponctuation( W ) :- comma( W ).
ponctuation( W ) :- colon( W ).
ponctuation( W ) :- dot( W ).
ponctuation( W ) :- tagcat(W, tc wpunct).
comma( W ) :- tagsem(W, ts virg ).
colon( W ) :- tagsem(W, ts ponct ).
dot( W ) :- tagsem(W, ts punct ).
void(W) :- tagcat(M,tc vide).
figures( W ) :- tagsem(W, ts quant ).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% order
%
precedes(V,N) :- distances(N,V,X, ), 0<X.
precedes(N,V) :- distances(N,V,X, ), 0>X.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% distances in verbs
%
near verb(N,V) :- distances(N,V, ,1).
near verb(N,V) :- distances(N,V, ,-1).
far verb( N,V ) :- distances(N,V, ,X), -1>X , -3<X.
far verb( N,V ) :- distances(N,V, ,X), 1<X , X<3.
very far verb( N,V ) :- distances(N,V, ,X), -2>X.
very far verb( N,V ) :- distances(N,V, ,X), X>2.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% distances in words
%
contiguous(N,V) :- distances(N,V,1, ).
contiguous(N,V) :- distances(N,V,-1, ).
near word(N,V) :- distances(N,V,X, ), -1>X , -4<X.
near word(N,V) :- distances(N,V,X, ), 1<X , X<4.
far word(N,V) :- distances(N,V,X, ), -3>X, -8<X.
far word(N,V) :- distances(N,V,X, ), X>3, X<8.
very far word(N,V) :- distances(N,V,X, ), -7>X.
518
B OUILLON
L EARNING S EMANTIC L EXICONS U SING ILP
very far word(N,V) :- distances(N,V,X, ), X>7.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% other predicates
suc(X,Y) :- pred(Y,X).
tagcat(Word, POStag) :- tags(Word, POStag, ).
tagsem(Word, Semtag) :- tags(Word, , Semtag).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% information about examples
tags(m15278 1 deb,tc vide,ts vide).
tags(m15278 1,tc verb inf,ts tem).
pred(m15278 1,m15278 1 deb).
...
Appendix B. Hypothesis Search Space
A clause space ordered by θOI -subsumption (see Definition 2, page 506) is in general not a lattice
whereas this is the case under θ-subsumption (Semeraro et al., 1994). However, we show in the
first section of this appendix that such a clause space can be a lattice when particular assumptions
concerning the clauses that it contains are made. A similar proof for our application framework, that
is, for the hypothesis search space presented in Section 4.2 with a θNV -subsumption quasi-ordering,
is proposed in the second section.
B.1 Hypothesis Lattice under θOI -subsumption
A quasi-ordered set under θOI -subsumption is in general not a lattice since the infimum and supremum are generally not unique in these sets. However, let us consider determinate linked clauses
(see Section 4.2) and a space bounded below by a bottom clause (⊥). All these conditions ensure
the infimum and supremum of two clauses in our hypothesis space to be unique. In this first section, C D (respectively C ∼ D) means C is more general (equivalent) than D with respect to the
θOI -subsumption order.
Proposition 4 For any C and D in the space of linked determinate clauses ordered by θOI -subsumption,
if C D then the injective substitution θ such that Cθ ⊆ D is unique.
Proof Reductio ad absurdum. Let us consider that there exist two different injective substitutions
θ1 and θ2 such that Cθ1 ⊆ D and Cθ2 ⊆ D. Since θ1 and θ2 are injective, Cθ1 and Cθ2 only differ
in variable naming. C and D are linked clauses, this means that there exists a literal l ∈ C such that
lθ1 ∈ D, lθ2 ∈ D and lθ1 6= lθ2 where input variables of l are identical in lθ1 and lθ2 and output
variables are different. This contradicts the fact that all literals are determinate.
Proposition 5 In the space of linked determinate clauses ordered by θOI -subsumption and bounded
below by a bottom clause ⊥, the supremum of any two clauses is unique.
Proof Reductio ad absurdum. Let us consider A1 and A2 as two different suprema for C1 and
C2 . A1 , A2 , C1 and C2 are more general than ⊥, so there exists a unique θA⊥1 such that A1 θA⊥1 ⊆ ⊥
519
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
(Proposition 4). In the same way, we have unique θA⊥2 , θC⊥1 , and θC⊥2 such that A2 θA⊥2 ⊆ ⊥, C1 θC⊥1 ⊆ ⊥
and C2 θC⊥2 ⊆ ⊥.
A1 is a supremum for C1 so A1 C1 θC⊥1 since C1 ∼ C1 θC⊥1 . Thus, there exists θ1 such that A1 θ1 ⊆
C1 θC⊥1 . Now, C1 θC⊥1 ⊆ ⊥ therefore A1 θ1 ⊆ ⊥, which means that θ1 = θA⊥1 (Proposition 4). Therefore,
we have A1 θA⊥1 ⊆ C1 θC⊥1 and in a similar way, A1 θA⊥1 ⊆ C2 θC⊥2 , A2 θA⊥2 ⊆ C1 θC⊥1 and A2 θA⊥2 ⊆ C2 θC⊥2 .
Let us note S = A1 θA⊥1 ∪ A2 θA⊥2 . Thus, S ⊆ C1 θC⊥1 and S ⊆ C2 θC⊥2 . This means that S C1 and
S C2 since C1 θC⊥1 ∼ C1 and C2 θC⊥2 ∼ C2. Besides, A1 S, A2 S and S ≁ A1 , S ≁ A2 because
A1 ≁ A2 . This contradicts the fact that A1 and A2 are suprema for C1 and C2 .
Proposition 6 In the space of linked determinate clauses ordered by θOI -subsumption and bounded
below by a bottom clause ⊥, the infimum of any two clauses is unique.
Proof Same thing as for supremum, with C1 and C2 two infima for A1 and A2 . Then consider
I = C1 θC⊥1 ∩C2 θC⊥2 .
From Propositions 5 and 6, we can conclude that the space of linked determinate clauses ordered
by θOI -subsumption and bounded below by a bottom clause ⊥ is a lattice.
B.2 Hypothesis Lattice under θNV -subsumption
As for θOI -subsumption, we show that in our application framework, the hypothesis search space
ordered by the θNV -subsumption is a lattice. In the remaininder of this appendix, B represents the
background knowledge used for our learning task, and ∼ denote the θNV -subsumption order as
defined in Section 4.2.
Proposition 7 In the space of well-formed clauses ordered by θNV -subsumption, for any clause
C and D, if C D then the injective substitution θ such that f (C)θ ⊆ D (with f such that ∀l ∈
C, B, f (l) |= l) is unique.
Proof Same proof as Propostion 4 by considering Cchain —the subset of C containing the head literal
and all the pred/2 and suc/2 literals of C—and by noting that Cchain contains all the variables of C
and that with respect to our particular background knowledge, for any f such that f (C)θ ⊆ D with
f such that ∀l ∈ C, B, f (l) |= l, necessarily ∀l ∈ Cchain , f (l) = l.
Proposition 8 In the space of well-formed clauses ordered by θNV -subsumption, the supremum of
any two clauses is unique.
Proof Reductio ad absurdum. Let us consider A1 and A2 as two different suprema for C1 and C2 .
A1 is more general than ⊥, so ∃θA⊥1 injective and f⊥A1 such that f⊥A1 (A1 )θA⊥1 ⊆ ⊥ and θA⊥1 is unique
(Proposition 7). In the same way, we have unique θA⊥2 , θC⊥1 , and θC⊥2 .
A1 is a supremum for C1 so there exist θ1 and f1 such that f1 (A1 )θ1 ⊆ C1 θC⊥1 . Now, with Achain
1
θ1 ⊆ C1chain θC⊥1 since f (Achain ) = Achain . In the same way C1chain θC⊥1 ⊆ ⊥.
as defined above, Achain
1
Therefore, we have Achain
θ1 ⊆ C1chain θC⊥1 ⊆ ⊥ and then, from Proposition 7, θ1 = θA⊥1 . Finally, we
1
520
L EARNING S EMANTIC L EXICONS U SING ILP
have f1 (A1 )θA⊥1 ⊆ C1 θC⊥1 and in a similar way, there exist f2 , f3 and f4 such that f2 (A1 )θA⊥1 ⊆ C2 θC⊥2 ,
f3 (A2 )θA⊥2 ⊆ C1 θC⊥1 and f4 (A2 )θA⊥2 ⊆ C2 θC⊥2 .
Let us note that S = A1 θA⊥1 ∪ A2 θA⊥2 \ {l1 | l1 , l2 ∈ (A1 θA⊥1 ∪ A2 θA⊥2 ), l1 6= l2 , and B, l2 |= l1 }. S is
a well-formed hypothesis and by construction S A1 θA⊥1 and S A2 θA⊥2 and since A1 θA⊥1 ∼ A1 and
A2 θA⊥2 ∼ A2 , then S A1 and S A2 . We define f5 such that ∀liS ∈ S, f5 (liS ) = f1 (liS ) if liS ∈ A1 θA⊥1
and f5 (liS ) = f3 (liS ) otherwise. Similarly, we define f6 such that ∀liS ∈ S, f6 (liS ) = f2 (liS ) if liS ∈ A1 θA⊥1
and f6 (liS ) = f4 (liS ) otherwise. Thus, f5 (S) ⊆ C1 θC⊥1 and f6 (S) ⊆ C2 θC⊥2 , which means that S C1 θC⊥1
and S C2 θC⊥2 . Therefore, S C1 and S C2 . This contradicts the fact that A1 and A2 are suprema
for C1 and C2 .
Proposition 9 In the space of well-formed clauses ordered by θNV -subsumption and with respect
to our background knowledge, the infimum of any two clauses is unique.
Proof Same thing as for supremum, with C1 and C2 two infima for A1 and A2 . Then consider
I = (C1 θC⊥1 ∩C2 θC⊥2 ) ∪ {l1 | l1 , l2 ∈ C1 θC⊥1 ∪C2 θC⊥2 , l1 6= l2 and B, l2 |= l1 }.
From Propositions 8 and 9, we can conclude that our hypothesis space ordered by θNV -subsumption
is a lattice.
References
Rajeev Agarwal. Semantic Feature Extraction from Technical Texts with Limited Human Intervention. PhD thesis, Mississippi State University, USA, 1995.
Susan Armstrong. MULTEXT: Multilingual text tools and corpora. In H. Feldweg and W. Hinrichs,
editors, Lexikon und Text, pages 107–119. Max Niemeyer Verlag, Tübingen, Germany, 1996.
Susan Armstrong, Pierrette Bouillon, and Gilbert Robert.
Tagger
Technical report, ISSCO, University of Geneva, Switzerland, 1995.
http://issco-www.unige.ch/staff/robert/tatoo/tagger.html.
overview.
URL
Liviu Badea and Monica Stanciu. Refinement operators can be (weakly) perfect. In Sašo Džeroski
and Peter Flach, editors, Proceedings of the 9th International Conference on Inductive Logic Programming, ILP-99, volume 1634 of LNAI, pages 21–32, Bled, Slovenia, 1999. Springer-Verlag.
Jacques Bouaud, Benoı̂t Habert, Adeline Nazarenko, and Pierre Zweigenbaum. Regroupements issus de dépendances syntaxiques en corpus: Catégorisation et confrontation avec deux
modélisations conceptuelles. In Manuel Zacklad, editor, Proceedings of Ingénierie des Connaissances, pages 207–223, Roscoff, France, 1997. AFIA - Éditions INRIA Rennes.
Pierrette Bouillon, Robert H. Baud, Gilbert Robert, and Patrick Ruch. Indexing by statistical tagging. In Martin Rajman and Jean-Cédric Chappelier, editors, Proceedings of Journées d’Analyse
statistique des Données Textuelles, JADT’2000, pages 35–42, Lausanne, Switzerland, 2000a.
Pierrette Bouillon and Federica Busa. Generativity in the Lexicon. Cambridge University Press,
Cambridge, UK, 2001.
521
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
Pierrette Bouillon, Vincent Claveau, Cécile Fabre, and Pascale Sébillot. Using part-of-speech and
semantic tagging for the corpus-based learning of qualia structure elements. In Pierrette Bouillon and Kyoko Kanzaki, editors, Proceedings of First International Workshop on Generative
Approaches to the Lexicon, GL’2001, Geneva, Switzerland, 2001. Geneva University Press.
Pierrette Bouillon, Cécile Fabre, Pascale Sébillot, and Laurence Jacqmin. Apprentissage de
ressources lexicales pour l’extension de requêtes. Traitement Automatique des Langues, special issue: Traitement automatique des langues pour la recherche d’information, 41(2):367–393,
2000b.
Pierrette Bouillon, Sabine Lehmann, Sandra Manzi, and Dominique Petitpierre. Développement de
lexiques à grande échelle. In André Clas, Salah Mejri, and Taı̈eb Baccouche, editors, Proceedings
of Colloque de Tunis 1997 “La mémoire des mots”, pages 71–80, Tunis, Tunisia, 1998. Serviced.
Ted Briscoe and John Carroll. Automatic extraction of subcategorisation from corpora. In Paul
Jacobs, editor, Proceedings of 5th ACL conference on Applied Natural Language Processing,
pages 356–363, Washington, USA, 1997. Morgan Kaufmann.
Wray Lindsay Buntine. Generalized subsumption and its application to induction and redundancy.
Artificial Intelligence, 36(2):149–176, 1988.
Floriana Esposito, Angela Laterza, Donato Malerba, and Giovanni Semeraro. Refinement of Datalog programs. In B. Pfahringer and J. Fürnkranz, editors, Proceedings of the MLnet Familiarization Workshop on Data Mining with Inductive Logic Programming, pages 73–94, Bari, Italy,
1996.
Cécile Fabre and Pascale Sébillot. Semantic interpretation of binominal sequences and information
retrieval. In Proceedings of International ICSC Congress on Computational Intelligence: Methods and Applications,CIMA’99, Symposium on Advances in Intelligent Data Analysis AIDA’99,
Rochester, N.Y., USA, 1999.
Cécile Fabre. Interprétation automatique des séquences binominales en anglais et en français.
Application à la recherche d’informations. PhD thesis, University of Rennes 1, France, 1996.
David Faure and Claire Nédellec. Knowledge acquisition of predicate argument structures from
technical texts using machine learning: The system ASIUM. In Dieter Fensel and Rudi Studer,
editors, Proceedings of 11th European Workshop EKAW’99, pages 329–334, Dagstuhl, Germany,
1999. Springer-Verlag.
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge,
MA, USA, 1998.
Edith Galy. Repérer en corpus les associations sémantiques privilégiées entre le nom et le verbe:
Le cas de la fonction dénotée par le nom. Master’s thesis, Université de Toulouse - Le Mirail,
France, 2000.
Gregory Grefenstette. Corpus-derived first, second and third-order word affinities. In W. Martin,
W. Meijs, M. Moerland, E. ten Pas, P. van Sterkenburg, and P. Vossen, editors, Proceedings of
EURALEX’94, Amsterdam, The Netherlands, 1994a.
522
L EARNING S EMANTIC L EXICONS U SING ILP
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, Dordrecht, 1994b.
Gregory Grefenstette. SQLET: Short query linguistic expansion techniques, palliating one-word
queries by providing intermediate structure to text. In Luc Devroye and Claude Chrisment, editors, Proceedings of Recherche d’Informations Assistée par Ordinateur, RIAO’97, pages 500–
509, Montréal, Québec, Canada, 1997.
Benoı̂t Habert, Adeline Nazarenko, and André Salem. Les linguistiques de corpus. Armand
Collin/Masson, Paris, 1997.
Zelig Harris, Michael Gottfried, Thomas Ryckman, Paul Mattick (Jr), Anne Daladier, Tzvee N.
Harris, and Suzanna Harris. The Form of Information in Science, Analysis of Immunology Sublanguage. Kluwer Academic Publisher, Dordrecht, 1989.
Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Christian Boitet,
editor, Proceedings of 14th International Conference on Computational Linguistics, COLING-92,
pages 539–545, Nantes, France, 1992.
Nicolas Helft. Inductive Generalization: A logical framework. In Ivan Bratko and Nada Lavrac,
editors, Proceedings of the 2nd European Working Session on Learning, EWSL, pages 149–157,
Bled, Yugoslavia, 1987. Sigma Press.
Nancy Ide and Jean Véronis. MULTEXT (multilingual tools and corpora). In Proceedings of
15th International Conference on Computational Linguistics, COLING-94, pages 90–96, Kyoto,
Japan, 1994. Morgan Kaufmann.
Bernard Jones. Can punctuation help parsing? Technical Report 29, Centre for Cognitive Science,
University of Edinburgh, UK, 1994.
Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection.
In Chris S. Mellish, editor, Proceedings of the 14th International Joint Conference on Artificial
Intelligence, IJCAI 95, pages 1137–1145, Montréal, Québec, Canada, 1995. Morgan Kaufmann.
Emmanuel Morin. Extraction de liens sémantiques entre termes à partir de corpus de textes techniques. PhD thesis, Université de Nantes, France, 1999.
Stephen Muggleton. Inverse entailment and Progol. New Generation Computing, 13(3-4):245–286,
1995.
Stephen Muggleton and Luc De Raedt. Inductive logic programming: Theory and methods. Journal
of Logic Programming, 19-20:629–679, 1994.
Stephen Muggleton and Cao Feng. Efficient induction of logic programs. In Setsuo Arikawa,
S. Goto, Setsuo Ohsuga, and Takashi Yokomori, editors, Proceedings of the 1st Conference on
Algorithmic Learning Theory, pages 368–381, Tokyo, Japan, 1990. Springer-Verlag - Ohmsha.
Claire Nédellec, Céline Rouveirol, Hilde Adé, Francesco Bergadano, and Birgit Tausend. Declarative bias in inductive logic programming. In Luc De Raedt, editor, Advances in Inductive Logic
Programming, pages 82–103. IOS Press, Amsterdam, 1996.
523
C LAVEAU , S ÉBILLOT, FABRE
AND
B OUILLON
Shan-Hwei Nienhuys-Cheng and Ronald de Wolf. Least generalizations and greatest specializations
of sets of clauses. Journal of Artificial Intelligence Research, 4:341–363, 1996.
Dominique Petitpierre and Graham Russell. MMORPH - the multext morphology program. Technical report, ISSCO, University of Geneva, Switzerland, 1998.
Ronan Pichon and Pascale Sébillot. Acquisition automatique d’informations lexicales à partir de
corpus: Un bilan. Research report 3321, INRIA, Rennes, France, 1997.
Ronan Pichon and Pascale Sébillot. From corpus to lexicon: From contexts to semantic features. In
Barbara Lewandowska-Tomaszczyk and Patrick James Melia, editors, Proceedings of Practical
Applications in Language Corpora, PALC’99, volume 1 of Lodz studies in Language, pages 375–
389. Peter Lang, 2000.
Gordon D. Plotkin. A note on inductive generalization. In B. Meltzer and D. Michie, editors,
Machine Intelligence 5, pages 153–163, Edinburgh, 1970. Edinburgh University Press.
James Pustejovsky. The Generative Lexicon. MIT Press, Cambridge, MA, USA, 1995.
James Pustejovsky, Peter Anick, and Sabine Bergler. Lexical semantic techniques for corpus analysis. Computational Linguistics, 19(2):331–358, 1993.
John Ross Quinlan. Learning logical definitions from relations. Machine Learning, 5(3):239–266,
1990.
Anke Rieger. Optimizing chain Datalog programs and their inference procedures. LS-8 Report 20,
University of Dortmund, Lehrstuhl Informatik VIII, Dortmund, Germany, 1996.
Gerard Salton. Automatic Text Processing. Addison-Wesley, 1989.
Pascale Sébillot, Pierrette Bouillon, and Cécile Fabre. Inductive logic programming for corpusbased acquisition of semantic lexicons. In Claire Cardie, Walter Daelemans, Claire Nédellec, and
Erik Tjong Kim Sang, editors, Proceedings of the Fourth Conference on Computational Natural
Language Learning (CoNLL-2000) and of the Second Learning Language in Logic Workshop
(LLL-2000), pages 199–208, Lisbon, Portugal, September 2000.
Giovanni Semeraro, Floriana Esposito, Donato Malerba, Clifford Brunk, and Michael J. Pazzani.
Avoiding non-termination when learning logic programs: A case study with FOIL and FOCL. In
L. Fribourg and F. Turini, editors, Proceedings of Logic Program Synthesis and Transformation
- MetaProgramming in Logic, LOPSTR 1994, volume 883 of LNCS, pages 183–198. SpringerVerlag, 1994.
Alan F. Smeaton. Using NLP or NLP resources for information retrieval tasks. In Tomek Strzalkowski, editor, Natural Language Information Retrieval, pages 99–111. Kluwer Academic Publishers, Dordrecht, 1999.
Karen Spärck Jones. What is the role of NLP in text retrieval? In Tomek Strzalkowski, editor,
Natural Language Information Retrieval, pages 1–24. Kluwer Academic Publishers, Dordrecht,
1999.
524
L EARNING S EMANTIC L EXICONS U SING ILP
Tomek Strzalkowski. Natural language information retrieval. Information Processing and Management, 31(3):397–417, 1995.
Fabien Torre and Céline Rouveirol. Natural ideal operators in inductive logic programming. In
M. van Someren and Widmer G., editors, Proceedings of 9th European Conference on Machine
Learning (ECML’97), volume 1224 of LNAI, pages 274–289, Prague, Czech Republic, April
1997a. Springer-Verlag.
Fabien Torre and Céline Rouveirol. Opérateurs naturels en programmation logique inductive. In
Henri Soldano, editor, 12èmes Journées Françaises d’Apprentissage (JFA’97), pages 53–64,
Roscoff, France, 1997b. AFIA - Éditions INRIA Rennes.
Fabien Torre and Céline Rouveirol. Private properties and natural relations in inductive logic programming. Technical Report 1118, Laboratoire de Recherche en Informatique d’Orsay (LRI),
France, July 1997c.
Laurence Vandenbroucke. Indexation automatique par couples nom-verbe pertinents. DES information and documentation report, Faculté de Philosophie et Lettres, Université Libre de Bruxelles,
Belgium, 2000.
Ellen M. Voorhees. Query expansion using lexical-semantic relations. In W. Bruce Croft and C. J.
van Rijsbergen, editors, Proceedings of ACM SIGIR’94, Dublin, Ireland, 1994. ACM - SpringerVerlag.
Stefan Wermter, Ellen Riloff, and Gabriele Scheler, editors. Connectionist, Statistical and Symbolic
Approaches to Learning for Natural Language Processing, volume 1040 of LNCS. SpringerVerlag, 1996.
Yorick Wilks and Mark Stevenson. The grammar of sense: Is word-sense tagging much more than
part-of-speech tagging? Technical report, University of Sheffield, UK, 1996.
525