Pattern Anal Applic
DOI 10.1007/s10044-015-0516-9
SHORT PAPER
Acquisition of Turkish meronym based on classification
of patterns
Tuǧba Yıldız1 • Banu Diri2 • Savaş Yıldırım1
Received: 26 December 2014 / Accepted: 14 August 2015
Springer-Verlag London 2015
Abstract The identification of semantic relations from a
raw text is an important problem in Natural Language
Processing. This paper provides semi-automatic patternbased extraction of part–whole relations. We utilized and
adopted some lexico-syntactic patterns to disclose meronymy relation from a Turkish corpus. We applied two
different approaches to prepare patterns; one is based on
pre-defined patterns that are taken from the literature,
second automatically produces patterns by means of
bootstrapping method. While pre-defined patterns are
directly applied to corpus, other patterns need to be discovered first by taking manually prepared unambiguous
seeds. Then, word pairs are extracted by their occurrence in
those patterns. In addition, we used statistical selection on
global data that is obtaining from all results of entire patterns. It is a whole-by-part matrix on which several association metrics such as information gain, T-score, etc., are
applied. We examined how all these approaches improve
the system accuracy especially within corpus-based
approach and distributional feature of words. Finally, we
& Tuǧba Yıldız
tdalyan@bilgi.edu.tr
Banu Diri
banu@ce.yildiz.edu.tr
Savaş Yıldırım
savas.yildirim@bilgi.edu.tr
1
Department of Computer Engineering, Faculty of
Engineering, İstanbul Bilgi University, Santral Campus, Eski
Silahtaraǧa Elektrik Santralı Kazım Karabekir Cad. No: 2/13,
34060 Eyüp, İstanbul, Turkey
2
Department of Computer Engineering, Faculty of Electric
and Electronic, Yıldız Technical University, D Blok
Davutpaşa Mah., Davutpaşa Caddesi, 34220 Esenler,
İstanbul, Turkey
conducted a variety of experiments with a comparison
analysis and showed advantage and disadvantage of the
approaches with promising results.
Keywords Corpus-based method Lexico-syntactic
pattern Meronym Part–whole
1 Introduction
Semantic relation refers to the relation between words,
phrases, sentences, and documents. One of the important
semantic relations is meronymy that represents the relationship between a part and its corresponding whole. The
meronym is also mentioned in the literature with other
references such as part–whole, mereological parthood
relations, or partonomy [1–3]. Meronymic relationship has
been a subject of some disciplines such as cognitive linguistics [1, 4], logic [5], psycholinguistics [6–8], linguistics
[9, 10], and so far. Having many aspects of meronym
relations turns out to be quite difficult and seems as a
complex relation. Because it is hard to differentiate meronym relation from other semantic relations. Besides that
there is no agreement on how to distinguish various kinds
of meronymic relations. For example, the concept ‘‘part
of’’ relation is used to denote a family of meronymic
relations in many studies. Because ‘‘part of’’ does not
always refer to a specific meronymy and represents a
variety of meronym relations with test frame such as X is a
part of Y. So that studies often provided insights about the
several different types of meronymic relations in the literature [2, 6–9, 11].
In this study, we presented a model for semi-automatically extracting part–whole relations from a Turkish raw
text. For this purpose, three different clusters of patterns
123
Pattern Anal Applic
were analyzed in Turkish corpus; General, Dictionarybased, and Bootstrapped patterns. First cluster is based on
general patterns which are the most widely used in the
literature. These patterns were collected from some pioneer
studies [2, 8, 12] and analyzed in Turkish. 240K cases were
obtained from general patterns. Second one is based on
dictionary patterns that were extracted from TDK1 and
Wiktionary2. The number of cases is 509K for dictionarybased patterns. We adopted both types of patterns to extract
the sentences that include part–whole relations from a
Turkish corpus. Some patterns which are not suitable and
applicable for Turkish language are eliminated. The most
frequent wholes were selected for each lexico-syntactic
patterns (LSPs). Each whole and its potential parts were
ranked according to their frequencies. Third cluster is
based on bootstrapping of the unambiguous seeds. Some
manually prepared seeds were used to induce and score
LSPs. Six reliable patterns were extracted, some were
eliminated according to experiments. We compared the
strength of some association measures with respect to their
precisions. Variety of statistical methods were applied on
the global data obtained from the entire patterns to improve
system performance, especially recall. For the evaluation,
we selected first 10, 20, and 30 candidates ranked by the
association measures such as Dice, T-score, etc. The proposed parts of a given whole were manually evaluated by
looking at their semantic roles.
The rest of this paper is organized as follows: Sect. 2
presents related works in computational linguistics that are
listed. Methodology part is presented in Sect. 3. Statistical
measurements that are used in this study are proposed in
Sect. 4. Challenges are listed in Sect. 5. Evaluation part of
the study is explained in Sect. 6. Production capacity and
success rate are given in Sect. 7.
2 Meronym studies in computational linguistics
In computational linguistics, comprehensive list of studies
has been done for automatically discovering semantic
relations because of variety of needs such as enriching
ontology or building lexical database. Although manually
defined lexical resources such as WordNet and FrameNet
are very valuable for Natural Language Processing (NLP)
problems, they could have limited capacity and might not
work against evolving systems, e.g., social media language
or so-called texting language. Recent studies emphasized
the importance of automatic construction of such lexical
database that has high importance especially for domainspecific text and open vocabulary system. For example, the
1
2
Türk Dil Kurumu (The Turkish Language Association).
Vikisözlük: Özgür Sözlük.
123
study [13] designed a architecture to capture synonym, is-a
and meronym relation for gene ontology. Another studies
on meronym extraction are reported for the domain of
college biology [14], biomedical text [15], and product
development and customer services [16].
Recently, various studies have employed hand-crafted
LSPs which are a useful technique especially in semantic
relation extraction. Although manually crafting pattern is
the most preferred method due to its simplicity and success, it could be time consuming. To cope with that
drawback, bootstrapping approach using seeds was proposed to construct patterns [17]. In addition, machine
learning techniques with using contextual information or
hybrid methods were also offered as alternatives for meronym extraction [12, 18–20].
The most precise and well-known method that relies on
LSPs was applied by [21]. Hand-crafted patterns were
identified and suggested for hyponym (is-a) relations from
raw text. Although the same technique was applied to
extract meronym relations in [21], the results were reported
to be concluded with no great success.
In [22], a statistical method was proposed to find parts in
very large corpus. Using Hearst’s method, five lexical
patterns and six seeds (book, building, car, hospital, plant,
and school) for wholes were identified. Extracted part–
whole relations by using patterns were ranked according to
some statistical criteria with an accuracy of 70 % for the
top 20 words and 55 % for the top 50 words.
A semi-automatic method was presented in [23] for
learning semantic constraints to detect part–whole relations. The method picked up pairs from WordNet and
searches them on text collection: SemCor and LA Times
from TREC-9. Sentences containing pairs were extracted
and manually inspected to obtain list of LSPs. Training
corpus was generated by manually annotating positive and
negative examples. The decision tree algorithm was used as
learning procedure. The model’s accuracy was 83 %. The
extended version of this study was proposed in [12]. An
average precision of 80.95 % was obtained.
The study [24] developed a method to discover part–
whole relations from vocabularies and text. The method
followed two main phases: learning part–whole patterns
and learning wholes by applying the patterns. An average
precision of 74 % was achieved. As another similar study,
Espresso [17] used patterns to find several semantic relations besides meronymic relations by bootstrapping algorithm. The method started with applying seed pairs to
automatically detect generic patterns. Espresso ranked and
filtered patterns/instances with the reliability scoring.
System performance for part-of relations on TREC was
80 % precision.
The other similar approach to Espresso was proposed in
[16, 25]. A set of seeds for each type of part–whole
Pattern Anal Applic
relations was defined. Espresso successfully is used to
retrieve part–whole relations from corpus. For English
corpora, precision was 80 % for general seeds and 82 %
for structural ‘‘part of’’ seeds.
Another attempt at automatic extraction of part–whole
relation was for a Chinese Corpus [26]. The sentence
containing part–whole relations was manually picked and
then annotated to get LSPs. Patterns were employed on
training corpus to find pairs of concepts. A set of heuristic
rules was proposed to confirm part–whole relations. The
model performance was evaluated with a precision of
86 %. Another important study was done by [27] for
Chinese language. This study focused on Name Entity
components and their relations. The results have shown
that the overall average precision is 63.92 %.
In Turkish, recent studies to harvest meronym relations
were based on dictionary definition (TDK) and Wiktionary.
The studies [28, 29] manually extracted various patterns
from dictionary definitions (TDK) by using several features
(morphological structures, noun clauses, clue words, and
the order of the words in the sentence) to develop a
semantic network for Turkish. In the first step, they defined
some phrasal patterns that were observed in dictionary
definitions to represent specific semantic relations. Second,
reliable patterns were applied to dictionary to find the
relations. They accounted seven different semantic relations in [29]: hyponym, synonym, antonym, amount-of,
group-of, member-of, and has-a. Only last four of them can
be subsumed by meronym relations. Accuracy of them was
ordered as 81, 87, 96, and 82 %, respectively.
In another study for Turkish [30], similar pattern-based
approach was applied on TDK and Wiktionary. They listed
ten different semantic relations and five of them can be
accounted in meronym such as made-of, part–whole, created-by, location-of, and purpose. Accuracy rates were 48,
55, 36, 34, and 55 % for each relations.
All these studies are based on dictionary definition
(TDK) and Wiktionary in Turkish. The first major attempt
[31] modeled a semi-automatically extraction of part–
whole relations from a Turkish corpus. The model takes a
list of manually prepared seeds to induce syntactic patterns
and estimates their reliabilities. It then captures the variations of part–whole candidates from the corpus with 67 %
precision. Another study based on LSPs [32] extracted
meronym relation. It exploited and compared a list of
patterns. It examined how these patterns improve the system performance especially for Turkish corpus.
The main objective of all the studies is to elaborate
the resources. It is obvious that less-studied languages
are lack of various resources. Turkish is among the lessstudied languages and it highly needs such works and
resources. It is worth elaborating such language resources
by means of automatic architecture. Our corpus-driven
meronym extraction architecture is considered the first
comprehensive study and major attempt for Turkish
meronymy.
3 Methodology
The methodology here is to acquire the part–whole pairs
from a Turkish corpus of 500M tokens. A morphological
parser which is based on a two-level morphology has an
accuracy of 98 % was used [33]. The web corpus containing four sub-corpora was used as raw text. Three of
them are from major Turkish news portals and another
corpus is a general sampling of web pages in the Turkish
Language.
In this study, meronym relation was considered as a
noun-to-noun relation rather than other POS-tags. We
evaluate three different clusters of patterns in different
aspects; General Patterns (GP), Dictionary-based Patterns
(TDK-P), and Bootstrapped Patterns (BP). While general
patterns are widely used and well known especially within
a huge corpus, the dictionary-based patterns are suitable
and applicable to dictionary-like resources (TDK, Wikipedia, etc.). Although the latter is suitable for dictionary,
we discussed that it can have a productive capacity to
disclose semantic relation from a corpus. The last approach
is to bootstrap patterns using a set of part–whole seeds. In
addition, we conducted some experiments with different
statistical measures of association such as Information
Gain (IG), v2 , etc., to evaluate their performance. They are
compared in terms of precision and recall scores within a
variety of experiments.
3.1 General patterns (GP)
The most precise acquisition methodology applied earlier
by [21] relies on LSPs. We start with the same idea of
using the widely used patterns to acquire part–whole
relations. GP have been widely used and well-known
patterns in several studies [2, 8, 12]. One of these studies
[8] used frames as part of, partly and made of for six
different types of meronymic relations. The other study
[12] represented that some patterns always refer to part–
whole relation in English text, while most of them are
ambiguous. The study [2] developed a formal taxonomy,
distinguishing transitive mereological (1) part–whole
relations from intransitive meronymic (2) ones. All general patterns that were used in three different studies are
listed in Table 1. While NPx represents ‘‘part,’’ NPy
represents ‘‘whole’’ in patterns that are shown in tables.
There are also various studies that have used these patterns, most of them are subsumed by the following
patterns.
123
Pattern Anal Applic
Table 1 Patterns that are used
in three different studies
Winston et al. [8]
Girju et al. [12]
NPx part of NPy
Parts of NPy include NPx
NPx member of NPy (1)
NPx partly NPy
NPy consist of NPx
NPx constituted of NPy (1)
NPy made of NPx
Keet and Artale [2]
NPy made of NPx
NPx subquantity of NPy (1)
NPx member of NPy
NPx participates in NPy (1)
One of NPy constituents NPx
NPx involved in NPy (2)
NPx located in NPy (2)
NPx contained in NPy (2)
NPx structural part of NPy(2)
Table 2 A summary for
general patterns
GP
#ofCases
#ofWhole
The most frequent wholes
NPx part of NPy
19K
2.5K
Life, Culture, Turkey
NPx member of NPy
23K
2K
Commission, Turkey, Group
NPy constituted of NPx
598
293
System, Program, Project
NPy made of NPx
6.3K
1.7K
Questionnaire, Public opinion
NPy consist of NPx
9.2K
2K
Report, Material, Product
NPy has/have NPx
120K
8.2K
Turkey, Person, Job
NPy with NPx
68.8K
8.7K
Person, Government, Turkey
All the patterns were manually adopted into Turkish
equivalences, where syntactic and morphological difficulties were handled by suitable LSP with regular expressions.
The patterns are equivalent to the English patterns in terms
of translation and meaning. The process was carried out by
accessing and utilizing each morpheme to extract the
sentences bearing part–whole relation. As expected, some
patterns which were not suitable and applicable for Turkish
language were eliminated. The remaining patterns were
evaluated in terms of capacity and reliability. Summary of
general patterns is given in Table 2. Turkish equivalents of
these patterns were constructed in regular expression forms
and are listed (see Table 12 in Sect. 9).
In order to evaluate the approach, we picked up the most
frequent wholes for each LSPs. For each whole, its
potential parts are ranked according to their frequencies.
To distinguish the distinctiveness, we normalized the frequency by dividing the number of times a part occurs with
given whole by number of times a part retrieved by all
patterns. We selected first 30 candidates ranked by their
scores for evaluation. The proposed parts were manually
evaluated by looking at their semantic role.
3.2 Dictionary-based patterns (TDK-P)
An efficient and reliable way of applying LSP is to extract
information from Machine Readable Dictionaries (MRDs).
The use of language in dictionary is generally simple,
informative, and highly includes a set of syntactic patterns.
Thus, many studies have exploited the dictionary definition
123
recently. For Turkish, the studies [28–30] exploited dictionary definition (TDK) and Wiktionary. They applied
structural patterns in regular expressions to harvest
semantic relations. We examined all meronym-related
patterns from these studies and carried out for our study.
We provided a summary report for dictionary-based patterns as shown in Table 3.
Member-of, made-of, consist-of, and has/have can be
confused with the ones in the general patterns, whereas
pattern specifications are different from each other (see
Table 13 in Sect. 9). All patterns were applied to
Turkish corpus as in general patterns and a similar
process was carried out. Even though these patterns are
useful especially in dictionary, they are needed to check
if they could return redundant and incorrect results or
not.
3.3 Bootstrapped patterns (BP)
Bootstrapped patterns are totally different from that of
others described above. The approach is implemented in
two phases: pattern identification and part–whole pair
detection. For the pattern identification, we begin by
manually preparing a set of unambiguous seed pairs that
definitely convey a part–whole relation. For instance, the
pair (engine, car) would be member of that set. The seed
set is further divided into two subsets: an extraction set
and an assessment set. Each pair in the extraction set is
used as query for retrieving sentences containing that
pair. Then, we generalize many LSPs by replacing part
Pattern Anal Applic
Table 3 A summary for TDK-P
TDK-P
#ofC
#ofW
The most frequent wholes
Group-of (whole|group|all|set|flock|union)
22.7K
3.6K
Game, Human, Woman
Member-of (class|member|team)
20K
3.8K
Turkey, Team, Newspaper
Member-of (from the family of NPy)
184
47
Legumes, Rosacea, Citrus fruit
Amount-of (amount|measure|unit)
3.4K
1.4K
Bank, Dollar, Euro
Has/Have (NPy has the suffix of l(H))
445K
3.7K
Game, Human, Woman
Consist-of
12.4K
2.7K
Group, Committee, Team
Made-of
4.9K
1.4K
Payment, Interruption, Import
#ofC number of cases, #ofW number of wholes, H high vowel (ı,i,u,ü)
and whole token with a wild-card or any meta-character.
The second set, the assessment set, is then used to
compute the usefulness or reliability scores of all captured patterns. Those patterns having low reliability were
eliminated. The remaining patterns were kept, along with
their reliability scores. A classic way to estimate reliability of an extraction pattern is to measure how it
correctly identifies the parts of a given whole. The
success rate is obtained by dividing the number of correctly extracted pairs by the number of all extracted
pairs. The outcome of entire phase is a list of reliable
LSPs along with their reliability scores.
The remaining phase is the same with previous pattern
methods. The instantiated instances (part–whole pairs)
are assessed and ranked according to their reliability
scores. There are several ways to compute a reliability
score for both pattern and instances. We experiment with
three different measures of association (Pmi, Dice,
T-score) to evaluate their performance in scoring function. We also utilized idf to cover more specific parts.
The motivation for use of idf is to differentiate distinctive features from common ones. Differences between
distinctive and general parts are emphasized in Sect. 6.3.
All findings have been already reported in our previous
Table 4 Bootstrapped patterns
and examples
study [31]. Based on reliability scores, we decided to
filter out some generated patterns and finally obtained six
different significant patterns. The list of the patterns and
examples can be found in Table 4.
Quality of each pattern is checked against a given
assessment set. Initially, setting instance reliability of all
pairs in the assessment set to 1, reliability score of the
patterns is computed. P1 was found the most reliable pattern with respect to all aspects. P1 is based on genitive case
which many studies utilized for the problem. We roughly
order the pattern as P1, P2, P3, P6, P4, and P5 by their
normalized average scores. To calculate reliability of
instances, following association measures are used: Pmi,
Pmi-idf, Dice, Dice-idf, T-score, and T-score-idf. For a
particular whole noun, all possible parts instantiated by
patterns are selected as a candidate set. For each association measure, their reliability scores of both patterns and
instances were calculated and further sorted. The first K
candidate parts were checked against the expected parts.
For the evaluation phase, we manually and randomly
selected five whole words: book, computer, ship, gun, and
building. For a better evaluation, we selected first 10, 20,
and 30 candidates ranked by the association measure
mentioned above.
Patterns
Examples
P1. NPy ? Gen NPx ? Pos
Door of the house
P2. NPy ? Nom NPx ? Pos
House door
P3. NPy ? Gen (N-ADJ) ? NPx ? Pos
Back garden gate of the house
Evin kapısı
Ev kapısı
Evin arka bahçe kapısı
P4. NPy of one-of NPxs
The door of one of the houses
Evlerden birinin kapısı
P5. NPx whose NPy
The house whose door is locked
Kapısı kilitli olan ev
P6. NPxs with NPy
The house with garden and pool
Bahçeli ve havuzlu ev
123
Pattern Anal Applic
4 Statistical selection
So far, we have selected first N most frequent parts for a
given whole by running a specific pattern in GP or TDK-P.
We have taken each pattern individually and evaluated the
results. Instead, in this part, we retrieved all candidates
part–whole pairs obtained from all patterns (in GP and
TDK-P) and built a big whole-by-part matrix, namely
global matrix, whose cellij represents how many times
wholei and partj co-occur together, no matter which patterns produce them. In order to compare the clusters GP
and TDK-P, we also used two separate bunches, and a big
integrated one as well. The global whole-by-pair matrix
gives a chance to apply some statistical metrics such as IG,
v2 test, etc. If a part particularly occurs with a specific
whole, it indicates that there is a meaningful link between
them. Or, if a common part mostly appears with many
wholes, its global importance is lower than others as formulated in idf. By applying the formulas such as v2 value
or IG, raw counts of the global matrix can be converted in
that each cell can represent a weighted value.
times a given whole and a given part appear together. With
this, we can retrieve most frequent N parts for a given
whole. This score adds up all frequency number from all
patterns, hence, gives a baseline with which we can compare the models.
5 Challenges
We have faced many problems so far. Here, we have discussed those problems that mostly encountered in this kind
of studies along with some solutions.
•
4.1 Baseline algorithm
Each approach must have its own baseline algorithm
because their circumference might have particular advantage or disadvantage due to many factors. To designate a
baseline algorithm for bootstrapped patterns, for a given
whole, its possible parts are retrieved from a list ranked by
association measure between whole and part that are
instantiated by a reliable pattern as formulated in Eq. (1).
•
•
j whole; pattern; part j
assoc (whole, part) ¼
j; pattern, part jj whole, pattern ; j
ð1Þ
We intuitively designated a baseline algorithm to compare
the results and the expectation. A proposed model should
outperform the baseline algorithm. The baseline function is
based on most reliable and productive pattern, the genitive
pattern. The capacity of pattern is about 2M part–whole
pairs.
For a given whole, all candidate parts of it in the genitive pattern are extracted. Taking co-occurrence frequency
between the whole and part could be misleading due to
some nouns frequently placed in part/head position such as
side, front, behind, outside. To overcome the problem of
the co-occurrence, the individual distributions of both
whole and part must be taken into account as shown in
Eq. (1). These final scores are ranked and their first K parts
are selected as the output of baseline algorithm.
For the evaluation of GP and TDK-P, we applied different baseline algorithm. The matrix shows how many
123
•
Almost all studies suffer from the very basic problem of
NLP: ‘‘ambiguity of sense’’?. For a given whole,
proposed parts could be incorrect due to polysemous
words. The study [12] represented that some of the
patterns always refer to part–whole relation in English
text, while most of them are ambiguous. ‘‘part of’’
pattern, genitive construction, the verb -to have, noun
compounds, and prepositional construction are classified as ambiguous meronymic expressions. For Turkish
domain, we could not easily do such classification and
find even one unambiguous pattern to extract part–
whole relation. Additional methods are needed to cope
with the problem and to find more accurate results.
Adoption of the general patterns from other studies to
Turkish domain is difficult due to free word order
language characteristics of language. The noun phrases
can easily change their position in a sentence without
changing the meaning of the sentence.
Determining a window is crucial for the potential parts.
Keeping the window size smaller can lead to losing real
parts. However, a larger window leads to many
irrelevant NPs extracted with large context and it
deteriorates system performance. We observed the
window size of 15 allows us to capture more reliable
parts and sentences.
The patterns can also encode other semantic relations
such as hyponymy or relatedness. Although use of
genitive case is very popular for detecting part–whole
relations, the characteristic of genitive is ambiguous.
The morphological feature of genitive is a good
indicator to disclose a semantic relation between a
head and its modifier. We found that the genitive has a
good indicative capacity, although it can encode
various semantic interpretations. Taking the example,
‘‘Ali’s team’’?, and the first interpretation could be that
the team belongs to Ali, the second interpretation is that
Ali’s favorite team or the team he supports. It refers
such relations ‘‘Ali’s pencil/Possession’’?, ‘‘Ali’s
father/Kindship’’?, and ‘‘Ali’s handsomeness/Attribute’’?. Same difficulties are valid for other patterns.
Pattern Anal Applic
•
•
•
•
•
•
•
To overcome the problem, statistical evidence have
been utilized so far.
Even the best patterns are not safe enough all the time.
The sentence ‘‘door is a part of car’’? strongly
represents part–whole relation, whereas ‘‘he is part of
the game’’? gives only ambiguous relation. The word
‘‘Part of’’ has nine different meanings in TDK. It means
that it is nine times more difficult to disclose the
relation.
Some patterns tend to disclose some particular relations
such as Possession, Kindship, Ownership, Attribute,
Attachment, and Property which are considered as part–
whole relation in this study. Some can retrieve other
types of semantic relations such as hyponym, synonym,
relatedness, etc.
The model mostly needs background knowledge especially for domain-specific problem. For instance, when
running models on football domain, the model needs an
ontology covering facts such as ‘‘Manchester United is
a football team’’?.
Some expressions can be more informal than written
language or grammar. Indeed, in any language, different kinds of expression can be appropriate in various
situations. From formal to informal, from written to
spoken, from jargon to slang, all type of expressions are
a part of corpus. This variety can cause another
bottleneck for applying regular expression or patterns.
Some words are not suitable for meronymy relations.
Even in WordNet, many synsets have no meronym
relation. E.g., how many parts can these words
‘‘result’’? or ‘‘point’’? have? Particularly, abstract
words are harder than concrete ones in terms of
evaluation. Therefore, evaluation must be done depending on word characteristics.
Rich morphological feature of Turkish language means
a barrier for computational syntax and complicated
morphology. For instance, an English phrase including
more than ten words can be translated into one single
Turkish word by means of morphological suffixes.
Some patterns have very limited capacity. For example,
içeren parçaları (parts of NPy include NPx) and kısmen
Table 5 The performance of
GP and TDK-P for the first N
selection
•
(partly) have very poor results. Both are excluded
because of the number of returned cases. First pattern
returns 2 and latter returns 10 cases only.
Some wholes have limited parts, for example, ithalat
(import), baklagiller (legume family), ödeme (payment), başvuru (application), dosya (file), etc.
6 Evaluation
Three types of patterns have been taken into consideration.
In evaluation phase, GP and TDK-P were compared to each
other due to similar approach, and bootstrapped method
was analyzed individually. Furthermore, the results pooled
from all patterns were evaluated by means of statistical
measurements such as v2 , IG metrics, etc.
6.1 Analysis of general patterns (GP) vs. dictionarybased patterns (TDK-P)
For each category, we selected top 30 words from
ranked list and randomly presented them to a user for
evaluation. Each category was judged by three people.
Rating of user for each word is 0/1 for part–whole
relation. Results in Table 5 show precision scores of
patterns for first 10, 20, and 30 selections. It indicates
that GP are slightly more successful and robust than
TDK-P on average. While GP have 64.2, 61.8, and
56.6 % precision, TDK-P have 67.8, 48.9, and 40.7 %
for first 10, 20, and 30 parts selection, respectively.
Moreover, GP are considered more productive than
TDK-P. The results in Table 9 show production capacity
of GP as 12.5 and TDK as 11.9 on average. We will
discuss production capacity (see Sect. 7).
At first glance, the most successful results seem to be
produced by with (from GP), and which has/have and
consist-of (from TDK-P) as shown in Table 11. However,
evaluation on only precision could be misleading for some
cases. Although we are not able to measure recall value, we
propose that recall could be evaluated over production
GP
N:10
N:20
N:30
TDK-P
N:10
N:20
N:30
Part-of
52
52
54
Group-of
42
44
41.3
Member-of
57.5
53.8
53.3
Member-of
80
73
62.7
Constitute-of
50
46.3
21.7
Amount-of
60
52.5
41.5
0.0
Consist-of
83.3
80
74.1
Family-of
38.2
0.0
Made-of
50
52.5
50
Made-of
77.2
0.0
0.0
Has/have
70
67
62
Consist-of
97.1
91.4
61.6
With
86.7
80.8
81.1
Has/have
80
81.7
77.8
AVG-GP
64.2
61.8
56.6
AVG-TDK-P
67.8
48.9
40.7
123
Pattern Anal Applic
6.3 Distinctive parts vs. general parts
capacity which is ‘‘number of cases per whole denoted by
#ofCpW [ 1 (freq[whole][1).’’ The most productive patterns are has/have (TDK-P) with production ratio of 42.6.
This pattern has also good precision score of 77.8 % as
well. has/have pattern (GP) has production ratio of 22.7
and its precision is 62 %. The pattern member-of (GP) has
production ratio of 21.1 and has 53.3 % precision value.
Table 9 suggests that has/have pattern (TDK-P) gives
promising result in terms of both precision and recall
(production capacity), then F-measure naturally. The
highest precision of 81.1 % is achieved by pattern with
(GP). However, it has relatively lower capacity of 11.7 than
those patterns discussed. The worst patterns are made-of
(GP), made-of (TDK-P), constitute-of (GP), and family-of
(TDK-P). They have production capacity of 6.5, 6.6, 4.1,
and 5.6 and precision rate of 28.3, 25.7, 21.7, and 13 %,
respectively. They showed very poor performance in many
aspects.
In this study, part is categorized and evaluated within two
groups; distinctive and general parts. If a part is inheritable
from hypernyms of its whole, it may be defined as ‘‘general’’ or ‘‘inheritable.’’ Otherwise, it is defined as ‘‘specific’’? or ‘‘distinctive’’? part. Here, distinctive part means
that part of a whole is not hierarchically inherited. E.g., a
desk has Has–Part relationship with drawer and segment as
in WordNet. While drawer is distinctive part of desk,
segment is a general part that inherits from its hypernym
artifact. First, the parts seem to be general like point, side,
segment, etc. These can be inherited from upper physical
entity. Second, the parts seem to be distinctive like kitchen
of the house. Thus, to distinguish such parts, we utilized
idf. We observed that the most frequent part instances are,
for example, top, inside, segment, side, front, head, etc., all
of which has resemblance.
We evaluate the distinction problem through bootstrapped patterns due to its production capacity, simplicity,
and quick evaluation. Similar results can be obtained
through other pre-defined patterns as well. Table 7 shows
performance of Pmi, Dice, T-score, and their idf-weighted
counterparts and baseline metrics in terms of distinctiveness. There are two clear observation here: (1) idf-weighted
metrics are better than others as expected. Idf eventually
can discriminate particular parts by definition, because
low-frequent terms have higher idf value. Thus, they can
represent distinctive part. (2) Dice-based formulas outperform other two metrics, Pmi and T-score. Additionally,
Table 7 also indicates that only metric which can surpass
baseline algorithm is Dice and its idf counterpart.
6.2 Analysis of bootstrapped patterns
Table 6 shows an evaluation of patterns based on three
metrics, their idf-weighted counterparts, and baselines. The
most successful formula is Dice-idf and Dice. Dice-idf has
precision value of 72, 67, and 64 % for selection 10, 20,
and 30, respectively. While baseline algorithm achieves the
same performance with Dice metric for selection 10, further selections, Dice outperforms. Pmi is the second most
successful metric that can surpass baseline algorithm after
selection 20 and 30.
Comparing the data in Table 11, we can observe that
bootstrapped patterns are comparable with pre-defined
pattern clusters even though bootstrapping system does not
take any patterns but only a list of correct pairs. This
characteristic gives two important aspects: One is domain
independent property where the system can be applied to
any corpus or domain (domain can be in different language
or particular field like medicine, biology, etc.). Second
advantage of bootstrapping is to run the model for any
arbitrary whole. The wholes are not selected by means of
analyzing the potential output as in pre-defined patterns.
For GP and TDK-P, on the contrary, the wholes must be
selected first and then evaluated depending on capacity of
patterns.
Table 6 The results of three
metrics
Table 8 shows the performance of a list of statistical metrics
on whole-by-part global data. The resulting table has three
bunches, first gives the results for data obtained from GP,
second is regarding TDK-P, and third one is an integrated
bunches. Under each bunch, scores from IG, v2 , T-score,
Dice, and Frequency (baseline) approach were represented.
For the bunch GP, the ranking is T-score [ IG [ Dice [
Freq [ v2 . For TDK-P, it is T-score [ Dice [ IG [
Freq [ v2 , which is akin to GP, where IG and Dice are
#ofP
Pmi
Pmi-idf
Dice
Dice-idf
T-score
T-score-idf
Baseline
Avg
10
64
50
68
72
54
52
68
61.1
20
30
63
56
50
44
67
64
67
64
49
44
47
47.3
57
50.6
57.1
52.9
#ofP number of parts
123
6.4 Statistical measurements
Pattern Anal Applic
Table 7 The results for
distinctive parts
#ofP
Pmi
Pmi-idf
Dice
Dice-idf
T-score
T-score-idf
Baseline
Avg
10
50
50
58
64
34
44
60
51.4
20
48
50
48
53
34
40
51
46.3
30
40.7
42.7
47.3
49.3
31.3
38.7
40.7
41.5
#ofP number of parts
Table 8 Statistical measurements for GP vs. TDK-P
Patterns
GP
TDK-P
AVG
SM
10
20
30
58.9
IG
66.7
65
v2
44.4
36.1
35.6
T-score
74.4
70.6
66.3
Dice
66.7
59.4
54.8
Freq
48.9
43.3
41.5
IG
70
62.8
58.1
v2
T-score
55.6
45
43
72.2
68.3
61.5
Dice
70
65.6
61.1
Freq
63.3
57.8
55.9
AVG-IG
68.3
63.9
58.5
AVG-v2
50
40.6
39.3
AVG-T-score
73.3
69.4
63.9
AVG-Dice
68.3
62.5
58
AVG-Freq
56.1
50.6
48.7
Freq frequency
swapped. The most important observation is that T-score
value showed the best performance. However, v2 does not
even outperform the baseline algorithm within each bunch.
T-score, IG, and Dice formula are the most successful metrics. Main advantage of statistical selection is to integrate all
results coming from heterogeneous patterns, where each
pattern has different success rate, production capacity, tendency to meronymy subtype, e.g., attachment, possession.
Merging all output from all patterns can increase recall value
of the model and cover many wholes in a broader scope
because each single pattern can have its own potential whole
and tendency. Some could not take the whole as a parameter.
We evaluated those pre-defined patterns on whole terms
which are already produced in advance. Therefore, the difference in success ratio between the patterns could be compared in terms of various aspects. Looking at Table 8, the
model proposed here gives a promising result in terms of
precision and recall.
7 Production capacity and recall estimation
Table 9 shows number of cases, number of wholes proposed by each pattern, and their success rates. We also
select those wholes whose frequency is higher than 1 to
decrease error rate coming from false matching. At first
glance, the most successful pattern is with (GP) when
ranking them according to precision for first 30 selection.
Production capacity denoted by #ofCpW [ 1 and success ratio can be combined to evaluate them within different aspects, where production capacity does not refer to
how many cases matched by corresponding pattern, but
number of cases matched per whole on average.
By multiplying the success rate and the normalized
value of #ofCpW [ 1 (number of cases for per whole
whose frequency is bigger than 1), we got another ranking
factor representing and combining both precision and
production capacity. The priority of patterns are has/have
(TDK-P), has/have (GP), member-of (GP), with (GP), and
part-of (GP). The pattern has/have (TDK-P) has both good
production rate of 42.6 and precision rate of 77.8 %, and
therefore, it appears in first place in new combined ranking.
The poorest patterns are family-of, amount-of (TDK-P)
and constitute-of (GP) according to new ranking factor.
Another evaluation might be done over correlation
between success (precision) and some factors such as
number of cases, wholes, cases per whole, and others.
When looking at correlation Table 10, the success of a
pattern mostly and strongly depends on number of producing unique wholes. Second is #ofwholes [ 1 and
#ofCpW [ 1. This finding is worth analyzing deeply. The
number of cases matched by a given pattern has secondary
importance. The essential point is number of unique wholes
and of cases per each whole that a pattern can extract. As
shown in Table 9, for some patterns, e.g., made-of (GP,
TDK-P) and amount-of (TDK-P), although they have a
good capacity for matching, they have poor #ofCpW [ 1
score. Thus, this kind of scattered patterns does not show a
significant performance.
Briefly saying, the most successful pattern is GP-with and
its precision is 81.1 % as shown in Table 11. The number of
relation extracted is about 68.8K by that pattern. For a
comparison, there is no any corpus-based study for Turkish
language. There are a few dictionary-based studies as already
mentioned in Sect. 2. [30] achieved 50 % precision on
average, and the number of relation is about 1K. [29] applied
four different meronym-related patterns and get 87 % and
production size is about 3K. [28] performed 79.6 % precision with the total relation size of about 1.7K. As it was
123
Pattern Anal Applic
Table 9 Ranked by success
rate of each pattern
C
P
#ofC
#ofW
#ofW [ 1
#ofC [ 1
#ofCpW [ 1
S30
GP
With
68.8K
8.7K
5.6K
65.7K
11.7
81.1
TDK-P
Has/have
445K
13.7K
10.3K
442K
42.6
77.8
GP
Consist-of
9.2K
2K
1K
8.2K
8.1
74.1
TDK-P
Member-of
20K
3.9K
2K
18.2K
8.6
GP
Has/have
12K
8.2K
5.1K
117K
22.7
62.7
62
TDK-P
Consist-of
12.4K
2.7K
1.4K
11K
GP
Part-of
19.3K
2.4K
1.3K
18.1K
13.8
7.8
54
61.6
GP
Member-of
23K
2K
1K
22K
21.1
53.3
TDK-P
Amount-of
3.4K
1.4K
5.5K
7.5K
1.4
41.5
TDK-P
Group-of
22.7K
3.6K
2K
21K
10.9
41.3
GP
Made-of
6.3K
1.7K
836
5.4K
6.5
28.3
TDK-P
Made-of
4.9K
1.4K
612
4K
6.6
25.7
GP
Constitute-of
598
293
97
402
4.1
21.7
TDK-P
Family-of
184
47
30
167
5.6
13
C Cluster, P patterns, #ofC number of cases, #W number of whole, #ofW [ 1 number of whole whose
frequency is greater than 1, #ofC [ 1 number of cases whose wholes are seen more than 1 times, #ofCpW
[ 1 number of cases per whole whose frequency is greater than 1, S30 success rate for 30 candidates
Table 10 Correlation table
Correlation
Success rate
#ofCases
0.49
#ofWhole
0.71
#ofW [ 1
0.60
#ofC [ 1
0.48
#ofCpW [ 1
0.54
noticed before, the main advantage of our approach is
especially production capacity where the model proposed
here can capture over 68K relations from a given corpus of
500M token. For a better comparison, all the inputs and
conditions must be balanced in terms of size and quality.
Taking into consideration that the studies in other
languages can utilize WordNet and larger lexical
resources, they have some resource advantages. However, Turkish language does not have such rich language
resources. This makes our study much harder and disadvantageous. When checking the performance scores of
all studies done in all languages so far, we noticed that
our approach showed sufficient performance in spite of
disadvantageous situation. With a gold standard comparison, we need to apply our approach to the similar
data in similar environment. A language independent
model might be designed for such comparison. However,
Table 11 Best of patterns GP
vs. TDK-P
123
we can slightly look at other studies’ success rates to
understand the meaning of the work proposed here
without comparing the performance results. As one of
the first important studies, [22] got 70 % precision with
limited example set. [23] conducted the experiment on
over 100K sentences and used semantic relations from
WordNet and achieved 83 % success rate. [24] and [17]
performed the scores of 74 and 80 %, respectively.
8 Conclusions
We utilized and adopted LSPs to disclose meronymy
relation from a Turkish corpus. Two different approaches
were considered to prepare patterns; one is based on predefined patterns that are taken from the literature. Second
approach automatically produces patterns by means of
bootstrapping method. Pre-defined patterns fall into two
clusters; General and Dictionary-based patterns. Bootstrapping patterns are categorized as third cluster. We also
used statistical method on global data obtaining from all
results of entire patterns in three clusters.
After morphologically parsing a huge corpus, all patterns were realized in specific regular expressions in
accordance with parsed corpus. Each pattern is designed so
#ofParts
GP
with
TDK
consist-of
TDK
has/have
GP
T-score
TDK
T-score
Bootstrap
Dice-idf
10
86.7
97.1
80.5
74.4
72.2
72
20
80.8
91.4
81.7
70.6
68.3
67
30
81.1
61.6
77.8
66.3
61.5
64
Pattern Anal Applic
that we can separately pick up whole and its potential
candidate parts to be proposed. With a variety of experiments, we addressed some problems, concluded a list of
facts, and achieved successful results for Turkish meronymy problem. As analyzing General patterns and Dictionary-based patterns, it is said that an appropriate pattern
design is capable of solving the problem of meronymy.
Several significant findings of the study are reported in
corresponding sections. Some of them can be listed as follows: Even though dictionary-based patterns are so suitable
for dictionary-like corpus by definition, they have good and
comparable potential to extract part–whole pairs from a
corpus. General patterns are slightly better than them. Some
particular patterns from both clusters, GP and TDK-P, have a
good indicative capacity in terms of production and precision
as shown in paper. The approach utilizing bootstrapping first
retrieves reliable patterns, then, extracts and proposes some
part candidates for a given whole. The success results of that
approach are comparable to other pre-defined patterns. It has
also domain-independent characteristic and good production
capacity as well. Therefore, it can be easily applied for other
relation problems.
Instead of applying each pattern one by one, all results
from entire patterns are merged as input for statistical
methods. The global whole-by-part matrix is measured by
means of several statistics such as IG, v2 ; etc. The results
indicate that it has very similar behavior with bootstrapped
pattern, where the results are comparable to pre-defined
list. Moreover, statistical selection and bootstrapping have
Table 12 General patterns (GP)
and their Turkish equivalents
GP
NPx is (a|-) part of NPy
large scale and good production capacity. Production
capacity denoted by #ofCpW [ 1 of a pattern does refer to
how many cases matched per whole on average. That
capacity and success ratio can be combined to evaluate
proposed patterns. Even though some patterns seem to have
good accuracy, they have less production capacities. Thus,
the output of such patterns has limited number of wholes.
We evaluated the success of patterns over not only precision but also combined ranking factor taking #ofCpW [ 1
and success rate as parameters. No matter to which cluster
a pattern belongs, if a pattern can produce higher number
of unique wholes, it can show better performance. We
check which pattern characteristic highly correlates with
success rate (precision). The correlation table indicates that
success of a pattern highly depends on number of producing unique wholes. Second important attribute of a
pattern is average number of cases per whole as indicated
in the table.
As final remark, all experiments show that proposed
methods have good indicative capacity for solution of
meronymy problem, because each method can outperform
its corresponding baseline algorithm as shown in corresponding tables. To the best of our knowledge, there is no
such comprehensive corpus-based experiment for building
semantic lexicon for Turkish language.
Appendix
See Tables 12 and 13.
Turkish equivalent of GP
...NPx...NPy ? gen...(bir)? parçasıdır/kısmıdır
...NPy ? gen...parça/kısım(ları|sı|ı)...NPx...
...NPx...NPy ? gen...parça/kısım(larından|sından|ından) biridir
NPy ? gen...parça/kısım(larından|sından|ından) biri olan ...NPx
NPx member of NPy
...NPx...NPy ? gen...(bir)? üyesidir
...NPx...(bir)? NPy ? nom...üyesidir
...NPx...NPy ? gen...üye(lerinden|sinden) biridir
NPy ? gen ...üye(lerinden|sinden) biri olan ...NPx
NPy constituted of NPx
...NPx...NPy ? gen...bileşen(lerinden|inden) biridir
NPy ? gen ...bileşen(lerinden|inden) biri olan ...NPx
...NPx...NPy ? gen...(bir)? bileşenidir
...NPy ? gen...bileşen(leri|i)...NPx...
NPy made of NPx
NPy,...NPx ? abl yapıl(mıştır|maktadır)
NPy,...NPx ? abl yapılmış olup
...NPx ? abl yapılan NPy
NPy consist of NPx
NPy,...NPx...içerir
Has/have
NPy ? gen...NPx ? p3sg ? nom...(vardır|var)
NPx ? p3sg ? nom var olan NPy
NPy ? pnon ? loc NPx (var|vardır)
With
NPx ? p3sg ? nom olan NPy
123
Pattern Anal Applic
Table 13 Dictionary-based
patterns (TDK-P) and their
Turkish equivalents
TDK-P
Turkish Equivalent of TDK-P
NPy,...(whole|group|all|set|flock|union) of NPx
NPy,...NPx ? (gen|nom) bütünü(dür|-)
NPy,...NPx ? (gen|nom) topluluğu(dur|-)
NPy,...NPx ? (gen|nom) tümü(dür|-)
NPy,...NPx ? (gen|nom) birliği(dir|-)
NPy,...NPx ? (gen|nom) kümesi(dir|-)
NPy,...NPx ? (gen|nom) sürüsü(dür|-)
NPy,...(class|member|team) of NPx
NPy,...NPx ? (gen|nom) sınıfı(dır|-)
NPy,...NPx ? (gen|nom) üyesi(dir|-)
NPy,...NPx ? (gen|nom) takımı(dır|-)
NPx,...family of NPy
NPx, NPy ? gillerden
NPy ? gillerden...NPx
NPy,...(amount|measure|unit) of NPx
NPy,...NPx ? (gen|nom) miktarı(dır|-)
NPy,...NPx ? (gen|nom) ölçüsü(dür|-)
NPy,...NPx ? (gen|nom) birimi(dir|-)
NPy consist of NPx
NPy made of NPx
NPx ? abl...yapıl(an|mış) NPy
Has/have
NPx ? nom-adj-with NPy
References
1. Cruse AD (2003) The lexicon. In: Aronoff M, Ress-Miller J (eds)
The handbook of linguistics. Blackwell Publisher Ltd., Oxford,
pp 238–264
2. Keet CM, Artale A (2008) Representing and reasoning over a
taxonomy of part–whole relations. Appl Ontol 3(1–2):91–110
3. Pribbenow S (2002) Meronymic relationships: from classical
mereology to complex part–whole relations. In: Green R, Bean
CA, Myaeng SH (eds) The semantics of relationships. Springer,
Netherlands, pp 35–50
4. Croft W, Cruse D (2004) Cognitive linguistics. Cambridge
University Press, Cambridge
5. Simons P (1987) Parts: a study in ontology. Oxford University
Press, UK
6. Gerstl P, Pribbenow S (1995) Midwinters, end games, and body
parts: a classification of part–whole relations. Int J Hum–Comput
Stud 43(5–6):865–889
7. Iris MA, Litowitz BE, Evens M (1988) Problems of the part–
whole relation. In: Evens M (ed) Relational models of the lexicon. Cambridge University Press, Cambridge, pp 261–288
8. Winston ME, Chaffin R, Herrmann D (1987) A Taxonomy of
part–whole relations. Cogn Sci 11(4):417–444
9. Miller GA et al (1990) Introduction to WordNet: an on-line
lexical database. Int J Lexicogr 3(4):235–244
10. Murphy ML (2003) Semantic relations and the lexicon: antonymy, synonymy, and other paradigms. Cambridge University
Press, UK
11. Artale A, Franconi E, Guarino N, Pazzi L (1996) Part–whole
relations in object-centered systems: an overview. Data Knowl
Eng 20(3):347–383
12. Girju R, Badulescu A, Moldovan D (2006) Automatic discovery
of part–whole relations. Comput Linguist 32(1):83–135
13. Hamon T, Natalia G (2008) How can the term compositionality
be useful for acquiring elementary semantic relations? In:
Nordström B, Ranta A (eds) Advances in natural language
processing, LNCS 5221. Springer, Berlin, Heidelberg,
pp 181–192
123
NPx ? abl oluş(an|muş) NPy
NPy,...NPx ? abl oluşmuştur
14. Roberts A (2005) Learning meronyms from biomedical text. In:
Proceedings of the ACL student research workshop (ACLstudent
’05). Association for Computational Linguistics, Stroudsburg,
PA, USA, pp 49–54
15. Ling X, Clark P, Weld DS (2013) Extracting meronyms for a
biology knowledge base. In: Proceedings of the 2013 workshop
on automated knowledge base construction (AKBC ’13). ACM,
USA, pp 7–12
16. Ittoo A, Bouma G, Maruster L, Wortmann H (2010) Extracting
meronymy relationships from domain specific, textual corporate
databases. In: Hopfe CJ, Rezgui Y, Metais E, Preece AD, Li H
(eds) Natural language processing and information system, LNCS
6177. Springer, Berlin, pp 48–59
17. Pantel P, Pennacchiotti M (2006) Espresso: leveraging generic
patterns for automatically harvesting semantic relations. In:
Proceeding of the 21st international conference on computational
linguistics and 44th annual meeting of the Association for
Computational Linguistics. Australia, Sydney, pp 113–120
18. Vor der Bruck T, Helbig H (2010) Meronymy extraction using an
automated theorem prover. J Lang Technol Comput Linguist
25(1):57–82
19. Vor der Bruck T, Helbig H (2010) Validating meronymy
hypotheses with support vector machines and graph kernels. In:
Proceedings of the 2010 Ninth International Conference on
Machine Learning and Applications (ICMLA ’10). IEEE Computer Society, Washington, DC, USA, pp 243–250
20. Xia F, Cungen C (2014) Extracting part–whole relations from
online encyclopedia. In: Shi Z, Wu Z, Leake D, Sattler U (eds)
Intelligent information processing VII. IFIP advances in information and communication technology, vol 432. Springer, Berlin,
Heidelberg, pp 57–66
21. Hearst MA (1992) Automatic acquisition of hyponyms from large
text corpora. In: Proceedings of the 14th international conference
on computational linguistics, COLING 1992. Nantes, France,
pp 539–545
22. Berland M, Charniak E (1999) Finding parts in very large corpora. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational
Linguistics, USA, pp 57–64
Pattern Anal Applic
23. Girju R, Badulescu A, Moldovan D (2003) Learning semantic
constraints for the automatic discovery of part–whole relations.
In: Proceedings of the human language technology conference of
the North American Chapter of the Association for Computational Linguistics. Edmonton, Canada, pp 1–8
24. Van HWR, Kolb H, Schreiber G (2006) A method for learning
part–whole relations. In: Cruz IF, Decker S, Allemang D, Preist
C, Schwabe D, Mika P, Uschold M, Aroyo L (eds) International
semantic web, LNCS 4273. Springer, Berlin, pp 723–735
25. Ittoo A, Bouma G (2010) On learning subtypes of the part–whole
relation: do not mix your seeds. In: Proceedings of the 48th
annual meeting of the Association for Computational Linguistics,
ACL’10. Association for Computational Linguistics, Uppsala,
Sweden, pp 1328–1336
26. Cao X, Cao C, Wang S, Lu H (2008) Extracting part–whole
relations from unstructured Chinese Corpus. In: Proceedings of
the 2008 5th international conference on fuzzy systems and
knowledge discovery, pp 175–179
27. Yao T, Uszkoreit H (2005) Identifying semantic relations
between named entities from Chinese texts. In: Lu R, Siekmann
JH, Ullrich C (eds) Proceedings of the 2005 joint Chinese-German conference on cognitive systems, LNCS 4429. SpringerVerlag, Berlin, Heidelberg, pp 70–83
28. Orhan Z, Pehlivan I, Uslan V, Onder P (2011) Automated
extraction of semantic word relations in Turkish lexicon. Math
Comput Appl 16(1):13–22
29. Serbetçi A, Orhan Z, Pehlivan I (2011) Extraction of semantic
word relations in Turkish from dictionary definitions. In: Proceedings of the ACL 2011 workshop on relational models of
semantics, RELMS 2011. Portland, Oregon, USA, pp 11–18
30. Yazıcı E, Amasyalı MF (2011) Automatic extraction of semantic
relationships using Turkish dictionary definitions. EMO Bilimsel
Dergi, İstanbul
31. Yıldız T, Yıldırım S, Diri B (2013) Extraction of part–whole
relations from Turkish corpora. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, LNCS 7816.
Springer, Berlin, Heidelberg, pp 126–138
32. Yıldız T, Diri B, Yıldırım S (2014) Analysis of lexico-syntactic
patterns for meronym extraction from a Turkish corpus. 6th
Language and technology conference. Human language technologies as a challenge for computer science and linguistics,
LTC, Poland, pp 429–433
33. Sak H, Güngör T, Saraçlar M (2008) Turkish language resources:
morphological parser, morphological disambiguator and web
corpus. In: Nordström B, Ranta A (eds) Advances in natural
language processing, LNCS 5221. Springer-Verlag, Berlin, Heidelberg, pp 417–427
123