Computer Science and Information Systems 2015 Volume 12, Issue 2, Pages: 635-654
https://doi.org/10.2298/CSIS140905020L
Full text ( 381 KB)
Cited by
Tuning a semantic relatedness algorithm using a multiscale approach
Leal José Paulo (University of Porto, Faculty of Sciences, CRACS & INESC-Porto LA, Porto, Portugal)
Costa Teresa (University of Porto, Faculty of Sciences, CRACS & INESC-Porto LA, Porto, Portugal)
The research presented in this paper builds on previous work that lead to the
definition of a family of semantic relatedness algorithms. These algorithms
depend on a semantic graph and on a set of weights assigned to each type of
arcs in the graph. The current objective of this research is to automatically
tune the weights for a given graph in order to increase the proximity
quality. The quality of a semantic relatedness method is usually measured
against a benchmark data set. The results produced by a method are compared
with those on the benchmark using a nonparametric measure of statistical
dependence, such as the Spearman’s rank correlation coefficient. The
presented methodology works the other way round and uses this correlation
coefficient to tune the proximity weights. The tuning process is controlled
by a genetic algorithm using the Spearman’s rank correlation coefficient as
fitness function. This algorithm has its own set of parameters which also
need to be tuned. Bootstrapping is a statistical method for generating
samples that is used in this methodology to enable a large number of
repetitions of a genetic algorithm, exploring the results of alternative
parameter settings. This approach raises several technical challenges due to
its computational complexity. This paper provides details on techniques used
to speedup the process. The proposed approach was validated with the WordNet
2.1 and the WordSim-353 data set. Several ranges of parameter values were
tested and the obtained results are better than the state of the art methods
for computing semantic relatedness using the WordNet 2.1, with the advantage
of not requiring any domain knowledge of the semantic graph.
Keywords: semantic similarity, linked data, genetic algorithms, bootstrapping, WordNet