The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as h... more The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.
In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory ... more In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints.
Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014
ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting task... more ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting tasks in Information Extrac-tion due to its various applications. Entity Linking (EL) is the task of detecting mentioned entities in a text and linking them to the corresponding entries of a Knowledge Base. EL is traditionally composed of three major parts: i)spotting, ii)candidate generation, and iii)candidate disam-biguation. The performance of an EL system is highly de-pendent on the accuracy of each individual part. In this paper, we focus on these three main building blocks of EL systems and try to improve on the results of one of the open source EL systems, namely DBpedia Spotlight. We propose to use text pre-processing and parameter tuning to "focus" a general-purpose EL system to perform better on different kinds of input text. Also, one of the main drawbacks of EL systems is identifying where a name does not refer to any known entity. To improve this so-called NIL-detection, we define different features using a set of texts and their known entities and design a classifier to automatically classify DB-pedia Spotlight's output entities as "NIL" or "Not NIL". The proposed system has participated in the SIGIR ERD Chal-lenge 2014 and the performance analysis of this system on the challenge's datasets shows that the proposed approaches successfully improve the accuracy of the baseline system.
The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as h... more The current state-of-the-art Entity Linking (EL) systems are geared towards corpora that are as heterogeneous as the Web, and therefore perform sub-optimally on domain-specific corpora. A key open problem is how to construct effective EL systems for specific domains, as knowledge of the local context should in principle increase, rather than decrease, effectiveness. In this paper we propose the hybrid use of simple specialist linkers in combination with an existing generalist system to address this problem. Our main findings are the following. First, we construct a new reusable benchmark for EL on a corpus of domain-specific conversations. Second, we test the performance of a range of approaches under the same conditions, and show that specialist linkers obtain high precision in isolation, and high recall when combined with generalist linkers. Hence, we can effectively exploit local context and get the best of both worlds.
In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory ... more In this paper we present LocLinkVis (Locate-Link-Visualize); a system which supports exploratory information access to a document collection based on geo-referencing and visualization. It uses a gazetteer which contains representations of places ranging from countries to buildings, and that is used to recognize toponyms, disambiguate them into places, and to visualize the resulting spatial footprints.
Proceedings of the first international workshop on Entity recognition & disambiguation - ERD '14, 2014
ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting task... more ABSTRACT Recently, Entity Linking and Retrieval turned out to be one of the most interesting tasks in Information Extrac-tion due to its various applications. Entity Linking (EL) is the task of detecting mentioned entities in a text and linking them to the corresponding entries of a Knowledge Base. EL is traditionally composed of three major parts: i)spotting, ii)candidate generation, and iii)candidate disam-biguation. The performance of an EL system is highly de-pendent on the accuracy of each individual part. In this paper, we focus on these three main building blocks of EL systems and try to improve on the results of one of the open source EL systems, namely DBpedia Spotlight. We propose to use text pre-processing and parameter tuning to "focus" a general-purpose EL system to perform better on different kinds of input text. Also, one of the main drawbacks of EL systems is identifying where a name does not refer to any known entity. To improve this so-called NIL-detection, we define different features using a set of texts and their known entities and design a classifier to automatically classify DB-pedia Spotlight's output entities as "NIL" or "Not NIL". The proposed system has participated in the SIGIR ERD Chal-lenge 2014 and the performance analysis of this system on the challenge's datasets shows that the proposed approaches successfully improve the accuracy of the baseline system.
Uploads
Papers by Jaap Kamps
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.