Automatic Identification of European Languages

Zhdanova, Anna V.

doi:10.1007/3-540-36271-1_7

Anna V. Zhdanova^5,6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2553))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

460 Accesses

Abstract

We describe our word-based implementation of a language identifying system for the text messages written in European languages. Specifically, we use and compare linguistic (based on functional words) and statistic (based on the word frequency) approaches to construction of the identifying vocabularies. Our version of the statistic approach copes with the differences in degrees of word overlap among languages and the problem of the small-size messages. In addition, it allows an user to choose the accuracy of language identification. At present, our system identifies 8 languages (Bulgarian, English, French, German, Italian, Russian, Spanish and Swedish) in various encodings. With the identifying vocabularies of limited size (less than 1500 keys per language), the accuracy of identification attains 99% even for the messages containing only one sentence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Indian Language Identification for Short Text

Automatic language identification: a case study of Pahari languages

Article 12 May 2023

Automatic Language Identification for Celtic Texts

References

Beesley, K.R.: Language Identifier: A Computer Program for Automatic Natural-Language Identification of On-Line Text. In: Languages at Crossroads; Proceedings of the 29-th Annual Conference of the American Translators Association (1988) 47–54.
Google Scholar
Schmitt, J.C.: Trigram-based Method of Language Identification. US Patent 5,062,143 (1991).
Google Scholar
Grefenstette, G.: Comparing Two Language Identification Schemes. In: Proceedings of 3-rd International Conference on Statistical Analysis of Textual Data (1995).
Google Scholar
Giguet, E.: Categorization According to Language: A Step Toward Combining Linguistic Knowledge and Statistic Learning. In: Proceedings of the International Workshop on Parsing Technologies (1995).
Google Scholar
Mather, L.: A Linear Algebra Approach to Language Identification. In: Proceedings of the 4-th International Workshop on Principles of Digital Document Processing (1998) 92–103.
Google Scholar

Download references

Author information

Authors and Affiliations

A.P. Ershov Institute of Informatics Systems, 630090, Novosibirsk, Russia
Anna V. Zhdanova
Novosibirsk State University, 630090, Novosibirsk, Russia
Anna V. Zhdanova

Authors

Anna V. Zhdanova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Systems Sciences, Royal Institute of Technology, Forum 100, 16440, Kista, Sweden
Birger Andersson , Maria Bergholtz & Paul Johannesson , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhdanova, A.V. (2002). Automatic Identification of European Languages. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds) Natural Language Processing and Information Systems. NLDB 2002. Lecture Notes in Computer Science, vol 2553. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36271-1_7

Download citation

DOI: https://doi.org/10.1007/3-540-36271-1_7
Published: 28 February 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00307-6
Online ISBN: 978-3-540-36271-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics