Text Similarity Research Papers

Abstract. Soft-cardinality spectra (SC spectra) is a new method of approximation for text strings in linear time, which divides text strings into character q-grams of different sizes. The method allows simultaneous use of weighting at... more

Geocoding is a method used to convert address information into geographical coordinates. It plays a vital role in displaying the relationship between geographic features and semantic information expressed in texts. The objective of this... more

Location, usually defined by postal address information or geographic coordinate values, is one of the leading themes in geography. Famous global mapping services such as ArcGIS Online, Bing Maps, Google Maps, or Yandex Maps can provide... more

Bookmark
Download
- by Batuhan Kilic
- •
- 4
  Text Similarity, Poi, Binary Logistic Regression, Reverse Geocoding

The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization... more

Bookmark
Download
- by Sergio Jimenez
- •
- 3
  Text Similarity, Monge-Elkan, Generalized Monge-Elkan

Most research in the automatic assessment of free text answers written by students address English language. This paper handles the assessment task in Arabic language. This research focuses on applying multiple similarity measures... more

Bookmark
Download
- by Wael Gomaa and +1
  Aly Fahmy
- •
- 4
  Natural Language Processing, Semantic similarity, Short Answer Questions Grading, Text Similarity

The slowness of legal proceedings in the common law legal system is a widely known fact. Any tool which could help reduce the time taken for the resolution of a case is invaluable. Common legal systems place a great importance on... more

EXPERT (EXPloiting Empirical appRoaches to Translation): http://expert-itn.eu

Penilaian Kemiripan Teks (Text Similarity) memainkan peranan yang sangat penting dalam bidang NLP (Natural Language Processing). Dalam artikel ini, dibangun Model Vektor Kata (Word Vector Model) berbasis JST dan melatih corpus Bahasa Cina... more

Penilaian Kemiripan Teks (Text Similarity) memainkan peranan yang sangat penting dalam bidang NLP (Natural
Language Processing). Dalam artikel ini, dibangun Model Vektor Kata (Word Vector Model) berbasis JST dan melatih
corpus Bahasa Cina dari Sohu News, World News, dan lain sebagainya. Artikel ini mengusulkan metode untuk
menghitung nilai kemiripan teks semantik menggunakan Word Vector Model, selanjutnya metode yang ditawarkan
akan dibandingan dengan metode tradisional seperti TF-IDF, dan hasilnya percobaan membuktikan bahwa metode
yang ditawarkan cukup efektif. Menghitung kemiripan teks merupakan metrik dalam membandingkan dua atau lebih
artikel. Pada umumnya hal ini dapat dipisahkan yaitu:
 Menghitung kemiripan semantic
 Menghitung ketidak-miripan non-semantik
Penelitian ini memiliki aplikasi yang sangat luas pada bidang Information Retrieval, Automatic Question
Answering, dan Machine Learning[1-2]. Saat ini telah banyak sekali penelitian yang matang tentang metode-metode
penghitungan kemiripan teks non-semantik, namun demikian masih banyak hal yang dapat dilakukan untuk teks
semantik.
Pan Qianhong mengusulkan cara untuk menghitung kemiripan teks yang berdasarkan Auttribute Theory [3], dan
membangun model Properties Gravity Splitting pada suatu teks, yang menghitung korelasi antara kata kunci dengan
bantuan jarak antar titik koordinat.
Zhang Huanjiong mengusulkan metode untuk menghitung kemiripan teks berbasis Hamming Distance dan juga
menggunakan Hamming Concept [4]. Metode ini menggunakan cara baru untuk menghitung dengan kenyamanan
yang sangat baik dan akurasi yang tinggi. Metode ini merepresentasikan informasi teks dengan menggunakan kode
kata (codeword), yang memungkinkan untuk mendeskripsikan informasi teks dalam simpul/gabungan. Sementara itu
hal ini berbeda dengan menggunakan konsep spasi (vektor space) secara tradisional.
Huo Hua mengusulkan cara untuk menghitung kemiripan teks berdasarkan Compressed Sparse Vector
Multiplying, yang secara efektif mengurangi persyaratan biaya (cost of computing) dan penyimpanan (harddisk).
Hanya elemen non-zero yang disimpan dan direpresentasikan pada metode ini.
Yu Gang mengusulkan metode untuk menghitung kemiripan teks berbasis semantik leksikal [7] dengan
Algoritma Maximum Matching. Metode ini menghitung korelasi dari kedua basis vektor artikel pada semantik leksikal
dari How-Net dan mendapatkan kemiripan dari kedua artikel tersebut.

Bookmark
Download
- by Syahroni Wahyu
- •
- 6
  Text Mining, Text Classification, Text Analysis, Text Similarity

With the big amount of online and offline written data, plagiarism detection has become an eminent need for various fields of science and knowledge. Various context based plagiarism detection methods have been published in the literature.... more

Bookmark
Download
- by Hussein Soori and +1
  Jan Platoš
- •
- 13
  Plagiarism Detection, Data Compression, Compression Algorithms, Textual Data Compression

This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets... more

Bookmark
Download
- by Sergio Jimenez
- •
- 5
  Machine Learning, Textual Entailment, Text Similarity, Soft Cardinality

Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus linguistics for which there is still a notable lack of standards. Bearing this in mind, this paper aims at investigating the use of textual... more

Geocoding is a method used to convert address information into geographical coordinates. It plays a vital role in displaying the relationship between geographic features and semantic information expressed in texts. The objective of this... more

Bookmark
Download
- by Batuhan Kilic
- •
- 5
  Text Similarity, Google Maps, BIng Maps, Geocoding

Describing, comparing and evaluating corpora are key issues in corpus-based translation and corpus linguistics for which there is still a notable lack of standards. Bearing this in mind, this paper aims at investigating the use of textual... more

One of the main efforts of recent computational linguistics is to formalize the process of identifying and evaluating similarity between narratives, which is argued to be a key concept for all human behavior. Analyses of the data of 52... more

Soft cardinality is a softened version of the classical cardinality of set theory. However, given its high cost of computing (exponential order), an approximation quadratic in the number of terms in the text has been proposed in the past.... more

Bookmark
Download
- by Sergio Jimenez
- •
- 4
  Text Similarity, Soft Cardinality, Text Comparison, Baselines for NLP

"The classical set theory provides a method for comparing ob- jects using cardinality and intersection, in combination with well-known resemblance coecients such as Dice, Jaccard, and cosine. However, set operations are intrinsically... more

Bookmark
Download
- by Sergio Jimenez
- •
- 2
  Text Similarity, Soft Cardinality

Abstract. Soft cardinality (SC) is a softened version of the classical cardinality of set theory. However, given its prohibitive cost of computing (exponential order), an approximation quadratic in the number of terms in the text has been... more

Bookmark
Download
- by Sergio Jimenez
- •
- 5
  N-Grams, Text Similarity, Soft Cardinality, SC Spectra

With the huge heap of data around the web, there is the need to extract information from the vast availability. This information retrieval is efficiently done by the search engines, used by millions of people regularly. Meta Search... more

Bookmark
Download
- by Jaswinder Singh
- •
- 3
  Web Mining, Search Engine Optimization, Text Similarity

This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets... more

The ability to identify similarities between narratives has been argued to be central in human interactions. Previous work that sought to formalize this task has hypothesized that narrative similarity can be equated to the existence of a... more

The classical set theory provides a method for comparing objects using cardinality and intersection, in combination with well-known resemblance coefficients such as Dice, Jaccard, and cosine. However, set operations are intrinsically... more

Building a robust MT system requires a sufficiently large parallel corpus to be avail-able as training data. In this paper, we propose to automatically extract parallel sentences fromcomparable corpora without using any MT system... more

The World Wide Web is the largest repository of public data and it is continuously expanding in size and complexity with the increasing use of internet but to retrieve the relevant documents is still a big challenge in the field of... more

Bookmark
Download
- by Jaswinder Singh
- •
- 6
  Information Retrieval, Web Mining, Text Analysis, Similarity Measures

Measures of text similarity have been used for a long time in applications in natural language processing, including Information Retrieval, Document Clustering, Word Sense Disambiguation, Machine Translation, Text Summarization and Short... more

Bookmark
Download
- by Wael Gomaa and +1
  Aly Fahmy
- •
- Text Similarity

Text Similarity

Log In