research-article

Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

Authors:

Jannik Strötgen,

Michael GertzAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 13, Issue 1

Article No.: 1, Pages 1 - 21

https://doi.org/10.1145/2540989

Published: 01 February 2014 Publication History

Abstract

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code.

In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.

References

[1]

Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop. 1--8.

[2]

André Bittar, Pascal Amsili, Pascal Denis, and Laurence Danlos. 2011. French TimeBank: An ISO-TimeML annotated reference corpus. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Short Papers - Vol. 2). 130--134.

Digital Library

[3]

Nicolas Boffo and Océane Ho Dinh. 2010. Automatic processing of temporality for VIET4NooJ. In Proceedings of the NooJ Conference. 39--41.

[4]

Tommaso Caselli. 2010. It-TimeML: TimeML Annotation Scheme for Italian. Version 1.3.1. Tech. rep. Instituto di Linguistica Computazionale C.N.R.

[5]

Tommaso Caselli, Felice dell’Orletta, and Irina Prodanof. 2009. TETI: A TimeML compliant TimEx tagger for Italian. In Proceedings of the International Multiconference on Computer Science and Information Technology. 185--192.

[6]

Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele Sprugnoli, Emanuele Pianta, and Irina Prodanof. 2011. Annotating events, temporal expressions and relations in Italian: The It-TimeML experience for the Ita-TimeBank. In Proceedings of the 5th Linguistic Annotation Workshop. 143--151.

Digital Library

[7]

Angel X. Chang and Christopher D. Manning. 2012. SUTime: A library for recognizing and normalizing time expressions. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3735--3740.

[8]

Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inform. Process. 8, 4, Article 14.

Digital Library

[9]

Lisa Ferro, Laurie Gerber, Inderjeet Mani, Beth Sundheim, and George Wilson. 2005. TIDES 2005 Standard for the Annotation of Temporal Expressions. Tech. rep., MITRE Corporation.

[10]

David Ferrucci and Adam Lally. 2004. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Lang. Eng. 10, 3--4, 327--348.

Digital Library

[11]

Marta Guerrero Nieto and Roser Saurí. 2012. ModeS TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.

[12]

Philippe Lambert, Sylviane R. Schwer, and Nicolas Boffo. 2012. A new model of time expressions detection and annotation in Vietnamese: The hôm case. In Proceedings of the International Conference on Asian Language Processing. 181--184.

Digital Library

[13]

Valentina Bartalesi Lenzi and Rachele Sprugnoli. 2007. Evalita 2007: Description and results of the TERN task. In Proceedings of the Evalita Workshop.

[14]

Hector Llorens, Estela Saquete, and Borja Navarro. 2010. TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 284--291.

Digital Library

[15]

Bernardo Magnini, Emanuele Pianta, Christian Girardi, Matteo Negri, Lorenza Romano, Manuela Speranza, Valentina Bartalesi Lenzi, and Rachele Sprugnoli. 2006. I-CAB: The Italian Content Annotation Bank. In Proceedings of the 5th International Conference on Language Resources and Evaluation.

[16]

Inderjeet Mani and George Wilson. 2000. Robust temporal processing of news. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 69--76.

Digital Library

[17]

Pawel Mazur. 2012. Broad-Coverage Rule-Based Processing of Temporal Expressions. Ph.D. dissertation, Macquarie University and Wroclaw University of Technology.

[18]

Pawel Mazur and Robert Dale. 2009. The DANTE temporal expression tagger. In Proceedings of the 3rd Language and Technology Conference. 245--257.

Digital Library

[19]

Pawel Mazur and Robert Dale. 2010. WikiWars: A new corpus for research on temporal expressions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 913--922.

Digital Library

[20]

Matteo Negri. 2007. Dealing with Italian temporal expressions: The ITA-CHRONOS system. In Proceedings of the Evalita Workshop.

[21]

Matteo Negri and Luca Marseglia. 2004. Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004. Tech. rep.

[22]

Matteo Negri, Estela Saquete, Patricio Martínez-Barco, and Rafael Muñoz. 2006. Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events. 30--37.

Digital Library

[23]

Cam-Tu Nguyen, Xuan-Hieu Phan, and Thu-Trang Nguyen. 2010. JVnTextPro: a Tool to Process Vietnamese Texts. Tech. rep., Version 2.0, http://jvntextpro.sourceforge.net/.

[24]

Dinh-Hoa Nguyen. 1997. Vietnamese. Vol. 9. John Benjamins Publishing Company.

[25]

Marcel Puchol-Blasco, Estela Saquete, and Patricio Martínez-Barco. 2007. Multilingual extension of temporal expression recognition using parallel corpora. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning. 175--180.

Digital Library

[26]

James Pustejovsky, Robert Knippen, Jessica Littman, and Roser Saurí. 2005. Temporal and event information in natural language text. Lang. Resources Eval. 39, 2--3, 123--164.

[27]

Iman Saleh, Lamia Tounsi, and Josef van Genabith. 2011. ZamAn and Raqm: Extracting temporal and numerical expressions in Arabic. In Proceedings of the 7th Asia Information Retrieval Societies Conference. 562--573.

Digital Library

[28]

Estela Saquete, Rafael Muñoz, and Patricio Martínez-Barco. 2006. Event ordering using TERSEO system. Data Knowl. Eng. 58, 1, 70--89.

Digital Library

[29]

Estela Saquete and James Pustejovsky. 2011. Automatic transformation from TIDES to TimeML annotation. Lang. Resources Eval. 45, 4, 495--523.

Digital Library

[30]

Roser Saurí and Toni Badia. 2012. Spanish TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.

[31]

Roser Saurí, Estela Saquete, and James Pustejovsky. 2010. Annotating Time Expressions in Spanish. TimeML Annotation Guidelines. Tech. rep. BM 2010-02, Barcelona Media.

[32]

Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing.

[33]

Jannik Strötgen and Michael Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324.

Digital Library

[34]

Jannik Strötgen and Michael Gertz. 2011. WikiWarsDE: A German corpus of narratives annotated with temporal expressions. In Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology. 129--134.

[35]

Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3746--3753.

[36]

Jannik Strötgen and Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Lang. Resources Eval. 47, 2, 269--298.

[37]

Jannik Strötgen, Julian Zell, and Michael Gertz. 2013. HeidelTime: Tuning English and developing Spanish resources for TempEval-3. In Proceedings of the 7th International Workshop on Semantic Evaluation. 15--19.

[38]

Pham Thi Xuan Thao, Tran Quoc Tri, Ai Kawazoe, Dien Dinh, and Nigel Collier. 2007. Construction of Vietnamese corpora for named entity recognition. In Proceedings of the Large Scale Semantic Access to Content (Text, Image, Video, and Sound). 719--724.

Digital Library

[39]

Laurence C. Thompson. 1991. A Vietnamese Reference Grammar. University of Hawaii Press.

[40]

Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. 173--180.

Digital Library

[41]

Tran Quoc Tri, Pham Thi Xuan Thao, Quoc-Hung Ngo, Dien Dinh, and Nigel Collier. 2007. Named entity recognition in Vietnamese documents. Progress Inform. 4, 5--13.

[42]

Naushad UzZaman, Hector Llorens, James F. Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. 2012. TempEval-3: Evaluating events, time expressions, and temporal relations. CoRR abs/1206.5333.

[43]

Naushad UzZaman, Hector Llorens, Leon Derczynski, Marc Verhagen, James Allen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation. 1--9.

[44]

Marc Verhagen and James Pustejovsky. 2008. Temporal processing with the TARSQI toolkit. In Proceedings of the 22nd International Conference on on Computational Linguistics: Demonstration Papers. 189--192.

Digital Library

[45]

Marc Verhagen, Roser Saurí, Tommaso Caselli, and James Pustejovsky. 2010. SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 57--62.

Digital Library

Cited By

Zhong XCambria E(2023)Time expression recognition and normalization: a surveyArtificial Intelligence Review10.1007/s10462-023-10400-y56:9(9115-9140)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s10462-023-10400-y
CAMCI HERYİĞİT G(2021)Türkçe Zamansal İfadelerin Yakalanması ve TanımlanmasıTurkish Temporal Expression Extraction and IdentificationBilişim Teknolojileri Dergisi10.17671/gazibtd.85314514:3(337-343)Online publication date: 31-Jul-2021
https://doi.org/10.17671/gazibtd.853145
Zhong XCambria EZhong XCambria E(2021)Literature ReviewTime Expression and Named Entity Recognition10.1007/978-3-030-78961-9_2(15-34)Online publication date: 7-Jun-2021
https://doi.org/10.1007/978-3-030-78961-9_2
Show More Cited By

Index Terms

Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Temponym Tagging: Temporal Scopes for Textual Phrases
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

For many NLP and IR applications, anchored temporal information extracted from textual documents is of utmost importance. Thus, temporal tagging -- the extraction and normalization of temporal expressions -- has gained a lot of attention in recent years ...
Evaluating Various Tokenizers for Arabic Text Classification
Abstract
The first step in any NLP pipeline is to split the text into individual tokens. The most obvious and straightforward approach is to use words as tokens. However, given a large text corpus, representing all the words is not efficient in terms of ...
ArSphere: Arabic word vectors embedded in a polar sphere
Abstract
Word embeddings mean the mapping of words into vectors in an N-dimensional space. ArSphere: is an approach that designs word embeddings for the Arabic language. This approach overcomes one of the shortcomings of word embeddings (for English ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 13, Issue 1

February 2014

93 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/2590408

Editor:
Richard Sproat
Google, Inc., USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2014

Accepted: 01 October 2013

Revised: 01 September 2013

Received: 01 May 2013

Published in TALIP Volume 13, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
389
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhong XCambria E(2023)Time expression recognition and normalization: a surveyArtificial Intelligence Review10.1007/s10462-023-10400-y56:9(9115-9140)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s10462-023-10400-y
CAMCI HERYİĞİT G(2021)Türkçe Zamansal İfadelerin Yakalanması ve TanımlanmasıTurkish Temporal Expression Extraction and IdentificationBilişim Teknolojileri Dergisi10.17671/gazibtd.85314514:3(337-343)Online publication date: 31-Jul-2021
https://doi.org/10.17671/gazibtd.853145
Zhong XCambria EZhong XCambria E(2021)Literature ReviewTime Expression and Named Entity Recognition10.1007/978-3-030-78961-9_2(15-34)Online publication date: 7-Jun-2021
https://doi.org/10.1007/978-3-030-78961-9_2
Haffar NHkiri EZrigui M(2019)TimeML Annotation of Events and Temporal Expressions in Arabic TextsComputational Collective Intelligence10.1007/978-3-030-28377-3_17(207-218)Online publication date: 4-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-28377-3_17
Hakak SKamsin APalaiahnakote STayan OIdna Idris MAbukhir K(2018)Residual-based approach for authenticating pattern of multi-style diacritical Arabic textsPLOS ONE10.1371/journal.pone.019828413:6(e0198284)Online publication date: 20-Jun-2018
https://doi.org/10.1371/journal.pone.0198284
Mansouri BZahedi MCampos RFarhoodi MYari AChampin PGandon FMédini LLalmas MIpeirotis P(2018)Understanding the use of Temporal Expressions on Persian Web SearchCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191635(1743-1748)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3184558.3191635
Zhao XJin PYue L(2018)Discovering topic time from web newsInformation Processing and Management: an International Journal10.1016/j.ipm.2015.04.00151:6(869-890)Online publication date: 29-Dec-2018
https://dl.acm.org/doi/10.1016/j.ipm.2015.04.001
Boudaa TEl Marouani MEnneya N(2018)Arabic Temporal Expression Tagging and NormalizationBig Data, Cloud and Applications10.1007/978-3-319-96292-4_43(546-557)Online publication date: 14-Aug-2018
https://doi.org/10.1007/978-3-319-96292-4_43
Mansouri BZahedi MCampos RFarhoodi MRahgozar M(2018)ParsTime: Rule-Based Extraction and Normalization of Persian Temporal ExpressionsAdvances in Information Retrieval10.1007/978-3-319-76941-7_67(715-721)Online publication date: 1-Mar-2018
https://doi.org/10.1007/978-3-319-76941-7_67
Van Canh TMarkert KNejdl W(2017)RussianFlu-DE: A German Corpus for a Historical Epidemic with Temporal AnnotationResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_6(61-73)Online publication date: 2-Sep-2017
https://doi.org/10.1007/978-3-319-67008-9_6
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents