Name: Automatic Language Identification in Texts
ISBN: 978-3-031-45822-4

Overview

Authors:

Tommi Jauhiainen ⁰,
Marcos Zampieri ¹,
Timothy Baldwin ²,
…
Krister Lindén ³

Tommi Jauhiainen
1. University of Helsinki, Helsinki, Finland
View author publications

You can also search for this author in PubMed Google Scholar
Marcos Zampieri
1. George Mason University, Fairfax, USA
View author publications

You can also search for this author in PubMed Google Scholar
Timothy Baldwin
1. MBZUAI, Abu Dhabi, United Arab Emirates
View author publications

You can also search for this author in PubMed Google Scholar
Krister Lindén
1. University of Helsinki, Helsinki, Finland
View author publications

You can also search for this author in PubMed Google Scholar

Reviews the history of LI research, including the challenges that have renewed interest in researching the topic
Compares and contrasts the features and methods commonly used for LI, as well as LI performance evaluation methods
Highlights the applications of language identification and identifies areas for future research in LI

Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)

823 Accesses
1 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

eBook USD 34.99

Price excludes VAT (USA)

Hardcover Book USD 44.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

This book provides readers with a brief account of the history of Language Identification (LI) research and a survey of the features and methods most used in LI literature. LI is the problem of determining the language in which a document is written and is a crucial part of many text processing pipelines. The authors use a unified notation to clarify the relationships between common LI methods. The book introduces LI performance evaluation methods and takes a detailed look at LI-related shared tasks. The authors identify open issues and discuss the applications of LI and related tasks and proposes future directions for research in LI.

Automatic Language Identification for Celtic Texts

Artificial Intelligence and Language

TweetLID: a benchmark for tweet language identification

Article 26 September 2015

Keywords

Table of contents (7 chapters)

Front Matter

Pages i-xiv

Download chapter PDF
Introduction to Language Identification
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 1-17
Features and Methods
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 19-63
Evaluation and Measurement
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 65-97
Specific Challenges of Variation and Text Types
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 99-115
Large Scale, Multi-domain Language Identification
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 117-135
Applications and Related Tasks
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 137-145
Conclusion and Future Directions
- Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Pages 147-148

Authors and Affiliations

University of Helsinki, Helsinki, Finland

Tommi Jauhiainen, Krister Lindén
George Mason University, Fairfax, USA

Marcos Zampieri
MBZUAI, Abu Dhabi, United Arab Emirates

Timothy Baldwin

About the authors

Tommi Jauhiainen, Ph.D., is a Post-doctoral Researcher at The University of Helsinki. He wrote his master’s thesis on automatic language identification and continued his research on the same subject as a doctoral student. Dr. Jauhiainen organized the first shared task in Cuneiform Language Identification (CLI) in 2019 as well as the Uralic Language Identification (ULI) shared tasks in 2020 and 2021. He is the first author of approximately 20 peer-reviewed publications on language identification.

Marcos Zampieri, Ph.D., is an Assistant Professor at George Mason University. He received his PhD from Saarland University with a thesis on computational modelling of language variation. He has published over 100 peer-reviewed papers on various topics in computational linguistics and NLP such as language and dialect identification, native language identification, machine translation, lexical complexity prediction, and social media mining.

Timothy Baldwin, Ph.D., is the Acting Provost and Chair of the Department of Natural Language Processing at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in addition to being a Melbourne Laureate Professor in the School of Computing and Information Systems at The University of Melbourne. Prior to joining The University of Melbourne, he was a Senior Research Engineer at the Center for the Study of Language and Information at Stanford University. He is the author of over 450 peer-reviewed publications across diverse topics in natural language processing and AI, in addition to being an ARC Future Fellow, and the recipient of a number of prestigious awards at top conferences.

Krister Lindén, Ph.D., is the Research Director of Language Technology at the University of Helsinki in addition to the National Coordinator of FIN-CLARIN, the Finnish Node of CLARIN ERIC, which is a European research infrastructure for Social Sciences and the Humanities. Heis the Chair of the CLARIN National Coordinators Forum and a member of CLIC (Committee for Legal and Ethical Issues in CLARIN). He holds a doctoral degree in Language Technology from the University of Helsinki. He is the co-author of more than 160 publications related to language technology and its utilization in digital humanities and language resource processing. He is currently also a deputy team leader in the Centre of Excellence of Ancient Near Eastern Empires.

Bibliographic Information

Book Title: Automatic Language Identification in Texts
Authors: Tommi Jauhiainen, Marcos Zampieri, Timothy Baldwin, Krister Lindén
Series Title: Synthesis Lectures on Human Language Technologies
DOI: https://doi.org/10.1007/978-3-031-45822-4
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0)
Copyright Information: The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024
Hardcover ISBN: 978-3-031-45821-7Published: 03 January 2024
Softcover ISBN: 978-3-031-45824-8Due: 16 January 2025
eBook ISBN: 978-3-031-45822-4Published: 01 January 2024
Series ISSN: 1947-4040
Series E-ISSN: 1947-4059
Edition Number: 1
Number of Pages: XIV, 148
Number of Illustrations: 2 b/w illustrations, 8 illustrations in colour
Topics: Natural Language Processing (NLP), Computational Linguistics, Computer Applications, Statistics, general, Artificial Intelligence, Computer Science, general

Publish with us

Policies and ethics

Automatic Language Identification in Texts

Overview

Access this book

Subscribe and save

Buy Now

Other ways to access

About this book

Similar content being viewed by others

Automatic Language Identification for Celtic Texts

Artificial Intelligence and Language

TweetLID: a benchmark for tweet language identification

Keywords

Table of contents (7 chapters)

Front Matter

Introduction to Language Identification

Features and Methods

Evaluation and Measurement

Specific Challenges of Variation and Text Types

Large Scale, Multi-domain Language Identification

Applications and Related Tasks

Conclusion and Future Directions

Authors and Affiliations

University of Helsinki, Helsinki, Finland

George Mason University, Fairfax, USA

MBZUAI, Abu Dhabi, United Arab Emirates

About the authors

Bibliographic Information

Publish with us

Navigation

Automatic Language Identification in Texts

Overview

Access this book

Subscribe and save

Buy Now

Other ways to access

About this book

Similar content being viewed by others

Keywords

Table of contents (7 chapters)

Front Matter

Authors and Affiliations

University of Helsinki, Helsinki, Finland

George Mason University, Fairfax, USA

MBZUAI, Abu Dhabi, United Arab Emirates

About the authors

Bibliographic Information

Publish with us

Search

Navigation