Text Similarity
133 Followers
Recent papers in Text Similarity
Abstract. Soft-cardinality spectra (SC spectra) is a new method of approximation for text strings in linear time, which divides text strings into character q-grams of different sizes. The method allows simultaneous use of weighting at... more
Geocoding is a method used to convert address information into geographical coordinates. It plays a vital role in displaying the relationship between geographic features and semantic information expressed in texts. The objective of this... more
Location, usually defined by postal address information or geographic coordinate values, is one of the leading themes in geography. Famous global mapping services such as ArcGIS Online, Bing Maps, Google Maps, or Yandex Maps can provide... more
The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization... more
The slowness of legal proceedings in the common law legal system is a widely known fact. Any tool which could help reduce the time taken for the resolution of a case is invaluable. Common legal systems place a great importance on... more
Penilaian Kemiripan Teks (Text Similarity) memainkan peranan yang sangat penting dalam bidang NLP (Natural Language Processing). Dalam artikel ini, dibangun Model Vektor Kata (Word Vector Model) berbasis JST dan melatih corpus Bahasa Cina... more
This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets... more
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus linguistics for which there is still a notable lack of standards. Bearing this in mind, this paper aims at investigating the use of textual... more
Geocoding is a method used to convert address information into geographical coordinates. It plays a vital role in displaying the relationship between geographic features and semantic information expressed in texts. The objective of this... more
Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus linguistics for which there is still a notable lack of standards. Bearing this in mind, this paper aims at investigating the use of textual... more
One of the main efforts of recent computational linguistics is to formalize the process of identifying and evaluating similarity between narratives, which is argued to be a key concept for all human behavior. Analyses of the data of 52... more
Soft cardinality is a softened version of the classical cardinality of set theory. However, given its high cost of computing (exponential order), an approximation quadratic in the number of terms in the text has been proposed in the past.... more
"The classical set theory provides a method for comparing ob- jects using cardinality and intersection, in combination with well-known resemblance coecients such as Dice, Jaccard, and cosine. However, set operations are intrinsically... more
Abstract. Soft cardinality (SC) is a softened version of the classical cardinality of set theory. However, given its prohibitive cost of computing (exponential order), an approximation quadratic in the number of terms in the text has been... more
With the huge heap of data around the web, there is the need to extract information from the vast availability. This information retrieval is efficiently done by the search engines, used by millions of people regularly. Meta Search... more
This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets... more
This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets... more
The ability to identify similarities between narratives has been argued to be central in human interactions. Previous work that sought to formalize this task has hypothesized that narrative similarity can be equated to the existence of a... more
The ability to identify similarities between narratives has been argued to be central in human interactions. Previous work that sought to formalize this task has hypothesized that narrative similarity can be equated to the existence of a... more
The classical set theory provides a method for comparing objects using cardinality and intersection, in combination with well-known resemblance coefficients such as Dice, Jaccard, and cosine. However, set operations are intrinsically... more
Building a robust MT system requires a sufficiently large parallel corpus to be avail-able as training data. In this paper, we propose to automatically extract parallel sentences fromcomparable corpora without using any MT system... more
The World Wide Web is the largest repository of public data and it is continuously expanding in size and complexity with the increasing use of internet but to retrieve the relevant documents is still a big challenge in the field of... more