Measuring Global Similarity Between Texts

Fahrenberg, Uli; Biondi, Fabrizio; Corre, Kevin; Jegourel, Cyrille; Kongshøj, Simon; Legay, Axel

doi:10.1007/978-3-319-11397-5_17

Uli Fahrenberg⁷,
Fabrizio Biondi⁷,
Kevin Corre⁷,
Cyrille Jegourel⁷,
Simon Kongshøj⁸ &
…
Axel Legay⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8791))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

1051 Accesses
4 Citations

Abstract

We propose a new similarity measure between texts which, contrary to the current state-of-the-art approaches, takes a global view of the texts to be compared. We have implemented a tool to compute our textual distance and conducted experiments on several corpuses of texts. The experiments show that our methods can reliably identify different global types of texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On measurement of distances between texts in dictionary-based content analysis

Article 05 July 2024

Semantic Textual Similarity Using Various Approaches

Semantic Similarity Calculation of Short Texts Based on Language Network and Word Semantic Information

Notes

References

Asarin, E., Degorre, A.: Volume and entropy of regular timed languages. hal (2009). http://hal.archives-ouvertes.fr/hal-00369812
Basset, N., Asarin, E.: Thin and thick timed regular languages. In: Fahrenberg and Tripakis [9], pp. 113–128
Google Scholar
Cortelazzo, M.A., Nadalutti, P., Tuzzi, A.: Improving Labbé’s intertextual distance: testing a revised version on a large corpus of italian literature. J. Quant. Linguist. 20(2), 125–152 (2013)
Article Google Scholar
Damerau, F.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Article Google Scholar
Fahrenberg, U., Biondi, F., Corre, K., Jegourel, C., Kongshøj, S., Legay, A.: Measuring global similarity between texts. Technical report, arxiv (2014). http://arxiv.org/abs/1403.4024
Fahrenberg, U., Legay, A.: Generalized quantitative analysis of metric transition systems. In: Shan, C. (ed.) APLAS 2013. LNCS, vol. 8301, pp. 192–208. Springer, Heidelberg (2013)
Chapter Google Scholar
Fahrenberg, U., Legay, A.: The quantitative linear-time-branching-time spectrum. Theor. Comput. Sci. (2013). http://dx.doi.org/10.1016/j.tcs.2013.07.030
Fahrenberg, U., Legay, A., Thrane, C.R.: The quantitative linear-time-branching-time spectrum. In: Chakraborty, S., Kumar, A. (eds.) FSTTCS. vol. 13 of LIPIcs, pp. 103–114 (2011)
Google Scholar
Fahrenberg, U., Tripakis, S. (eds.): FORMATS 2011. LNCS, vol. 6919. Springer, Heidelberg (2011)
MATH Google Scholar
Haverkort, B.R.: Formal modeling and analysis of timed systems: Technology push or market pull? In: Fahrenberg and Tripakis [9], pp. 18–24
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Book Google Scholar
Kharmeh, S.A., Eder, K., May, D.: A design-for-verification framework for a configurable performance-critical communication interface. In: Fahrenberg and Tripakis [9], pp. 335–351
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (1955)
Article Google Scholar
Labbé, C.: Ike Antkare, one of the great stars in the scientific firmament. ISSI Newsl. 6(2), 48–52 (2010). http://hal.archives-ouvertes.fr/hal-00713564
Labbé, C., Labbé, D.: Inter-textual distance and authorship attribution Corneille and Molière. J. Quant. Linguist. 8(3), 213–231 (2001)
Article Google Scholar
Labbé, C., Labbé, D.: A tool for literary studies: intertextual distance and tree classification. Literary Linguist. Comp. 21(3), 311–326 (2006)
Article Google Scholar
Labbé, C., Labbé, D.: Duplicate and fake publications in the scientific literature: how many SCIgen papers in computer science? Scientometrics 94(1), 379–396 (2013)
Article Google Scholar
Labbé, D.: Experiments on authorship attribution by intertextual distance in English. J. Quant. Linguist. 14(1), 33–80 (2007)
Article Google Scholar
Lin, C.Y., Hovy, E.H.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL (2003)
Google Scholar
Lin, C.Y., Och, F.J.: Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Scott, D., Daelemans, W., Walker, M.A. (eds.) ACL. pp. 605–612. ACL (2004)
Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Article Google Scholar
Noorden, R.V.: Publishers withdraw more than 120 gibberish papers. Nature News & Comment, February 2014. http://dx.doi.org/10.1038/nature.2014.14763
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL. pp. 311–318. ACL (2002)
Google Scholar
Sankaranarayanan, S., Homaei, H., Lewis, C.: Model-based dependability analysis of programmable drug infusion pumps. In: Fahrenberg and Tripakis [9], pp. 317–334
Google Scholar
Savoy, J.: Authorship attribution: a comparative study of three text corpora and three languages. J. Quant. Linguist. 19(2), 132–161 (2012)
Article Google Scholar
Savoy, J.: Authorship attribution based on specific vocabulary. ACM Trans. Inf. Syst. 30(2), 12 (2012)
Article Google Scholar
Smith, S.T., Kao, E.K., Senne, K.D., Bernstein, G., Philips, S.: Bayesian discovery of threat networks. CoRR abs/1311.5552v1 (2013)
Google Scholar
Smith, S.T., Senne, K.D., Philips, S., Kao, E.K., Bernstein, G.: Network detection theory and performance. CoRR abs/1303.5613v1 (2013)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Springer second update on SCIgen-generated papers in conference proceedings. Springer Statement, April 2014. http://www.springer.com/about+springer/media/statements?SGWID=0-1760813-6-1460747-0
Tomasi, F., Bartolini, I., Condello, F., Degli Esposti, M., Garulli, V., Viale, M.: Towards a taxonomy of suspected forgery in authorship attribution field. A case: Montale’s Diario Postumo. In: DH-CASE. pp. 10:1–10:8. ACM (2013)
Google Scholar
Ulusoy, A., Smith, S.L., Ding, X.C., Belta, C.: Robust multi-robot optimal path planning with temporal logic constraints. CoRR abs/1202.1307v2 (2012)
Google Scholar
Ulusoy, A., Smith, S.L., Ding, X.C., Belta, C., Rus, D.: Optimal multi-robot path planning with temporal logic constraints. CoRR abs/1107.0062v1 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria/IRISA Rennes, Rennes, France
Uli Fahrenberg, Fabrizio Biondi, Kevin Corre, Cyrille Jegourel & Axel Legay
University College of Northern Denmark, Aalborg, Denmark
Simon Kongshøj

Authors

Uli Fahrenberg
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Biondi
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Corre
View author publications
You can also search for this author in PubMed Google Scholar
Cyrille Jegourel
View author publications
You can also search for this author in PubMed Google Scholar
Simon Kongshøj
View author publications
You can also search for this author in PubMed Google Scholar
Axel Legay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Uli Fahrenberg .

Editor information

Editors and Affiliations

University Joseph Fourier, Grenoble, France
Laurent Besacier
Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fahrenberg, U., Biondi, F., Corre, K., Jegourel, C., Kongshøj, S., Legay, A. (2014). Measuring Global Similarity Between Texts. In: Besacier, L., Dediu, AH., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2014. Lecture Notes in Computer Science(), vol 8791. Springer, Cham. https://doi.org/10.1007/978-3-319-11397-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-11397-5_17
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11396-8
Online ISBN: 978-3-319-11397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measuring Global Similarity Between Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On measurement of distances between texts in dictionary-based content analysis

Semantic Textual Similarity Using Various Approaches

Semantic Similarity Calculation of Short Texts Based on Language Network and Word Semantic Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Measuring Global Similarity Between Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On measurement of distances between texts in dictionary-based content analysis

Semantic Textual Similarity Using Various Approaches

Semantic Similarity Calculation of Short Texts Based on Language Network and Word Semantic Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation