Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • Los Angeles, California, United States
... Willard McCarty offers back-ground to the deeper scientific-philosophical movements behind this development. ... The paper by Berzak, Richter, Ehrler and Shore describes a system for searching, browsing, and visualising a large... more
... Willard McCarty offers back-ground to the deeper scientific-philosophical movements behind this development. ... The paper by Berzak, Richter, Ehrler and Shore describes a system for searching, browsing, and visualising a large collection of speeches by Fidel Castro. ...
ABSTRACT Motivated by cognitive lexical models, network-based distributional semantic models (DSMs) were proposed in [Iosif and Potamianos (2013)] and were shown to achieve state-of-the-art performance on semantic similarity tasks. Based... more
ABSTRACT Motivated by cognitive lexical models, network-based distributional semantic models (DSMs) were proposed in [Iosif and Potamianos (2013)] and were shown to achieve state-of-the-art performance on semantic similarity tasks. Based on evidence for cognitive organization of concepts based on degree of concreteness, we investigate the performance and organization of network DSMs for abstract vs. concrete nouns. Results show a “concreteness effect” for semantic similarity estimation. Network DSMs that implement the maximum sense similarity assumption perform best for concrete nouns, while attributional network DSMs perform best for abstract nouns. The performance of metrics is evaluated against human similarity ratings on an English and a Greek corpus.
AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM).... more
AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a well- established method for term extraction, the C/NC-value method. The performance evaluation of two AMTEx
The identification of relevant historical sources such as newspapers and letters and the extraction of information from them is an essential part of historical research. In this work, our aim is the detection of relevant primary sources... more
The identification of relevant historical sources such as newspapers and letters and the extraction of information from them is an essential part of historical research. In this work, our aim is the detection of relevant primary sources with the goal to support researchers working on a specific historical event. We focus on the historical daily Dutch newspaper archive of the National Library of the Netherlands and strike events that happened in the Netherlands during the 1980s. Using a manually compiled database of strikes in the Netherlands, we first attempt to find reports on those strikes in historical daily newspapers by automatically associating database records to the daily press of the time covering the same strike. Then, we generalise our methodology to detect strike events in the press not currently covered by the strikes database, and support in this way the extension of secondary historical resources. Our methods are evaluated against the manually constructed database of strikes.
The increasing amount of information stored in texts in electronic form has made imperative the need to distinguish information salient to our purposes. Research in Information Extraction (IE) focuses on this problem. IE aims at the... more
The increasing amount of information stored in texts in electronic form has made imperative the need to distinguish information salient to our purposes. Research in Information Extraction (IE) focuses on this problem. IE aims at the detection and extraction of pre-specified ...
The UvT system is based on a hybrid, lin-guistic and statistical approach, originally proposed for the recognition of multi-word terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an... more
The UvT system is based on a hybrid, lin-guistic and statistical approach, originally proposed for the recognition of multi-word terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an extended noun phrase rule set and take into ...
Cultural heritage institutions are making their digital content available and searchable online. Digital metadata descriptions play an important role in this endeavour. This metadata is mostly manually created and often lacks detailed... more
Cultural heritage institutions are making their digital content available and searchable online. Digital metadata descriptions play an important role in this endeavour. This metadata is mostly manually created and often lacks detailed annotation, consistency and, most importantly, explicit semantic content descriptors which would facilitate online browsing and exploration of available information. This paper proposes the enrichment of existing cultural heritage metadata with automatically generated semantic content ...
The digital age has had a profound effect on our cultural heritage and the academic research that studies it. Staggering amounts of objects, many of them of a textual nature, are being digitised to make them more readily accessible to... more
The digital age has had a profound effect on our cultural heritage and the academic research that studies it. Staggering amounts of objects, many of them of a textual nature, are being digitised to make them more readily accessible to both experts and laypersons. Besides a vast potential for more effective and efficient preservation, management, and presentation, digitisation offers opportunities to work with cultural heritage data in ways that were never feasible or even imagined. To explore and exploit these possibilities, an ...
We present a phrase-based extension to memory-based machine translation. This form of example-based machine translation employs lazy-learning classifiers to translate fragments of the source sentence to fragments of the target sentence.... more
We present a phrase-based extension to memory-based machine translation. This form of example-based machine translation employs lazy-learning classifiers to translate fragments of the source sentence to fragments of the target sentence. Source-side fragments consist of variable-length phrases in a local context of neighboring words, translated by the classifier to a target-language phrase. We compare three methods of phrase extraction, and present a new decoder that reassembles the translated fragments into one final translation ...
The CAFETIERE1 formalism is a rule-based system for temporal text mining. The term “text mining” conjures up a range of activities in which large scale repositories of textual documents are filtered for content, using various techniques... more
The CAFETIERE1 formalism is a rule-based system for temporal text mining. The term “text mining” conjures up a range of activities in which large scale repositories of textual documents are filtered for content, using various techniques that go beyond mere indexing.
Abstract. We describe the recent enhancement of the CAFETIERE formalism (Conceptual Annotation of Facts, Events, Terms, Individual Entities and RElations) with the ability to link natural language words and phrases in textual documents... more
Abstract. We describe the recent enhancement of the CAFETIERE formalism (Conceptual Annotation of Facts, Events, Terms, Individual Entities and RElations) with the ability to link natural language words and phrases in textual documents with instances and classes from a language-enabled ontology. The language-enabled ontology is one with an index from one or more natural language expressions to each concept (as in WordNet). In an information extraction application.