Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Time for More Languages: Temporal Tagging of Arabic, Italian, Spanish, and Vietnamese

Published: 01 February 2014 Publication History

Abstract

Most of the research on temporal tagging so far is done for processing English text documents. There are hardly any multilingual temporal taggers supporting more than two languages. Recently, the temporal tagger HeidelTime has been made publicly available, supporting the integration of new languages by developing language-dependent resources without modifying the source code.
In this article, we describe our work on developing such resources for two Asian and two Romance languages: Arabic, Vietnamese, Spanish, and Italian. While temporal tagging of the two Romance languages has been addressed before, there has been almost no research on Arabic and Vietnamese temporal tagging so far. Furthermore, we analyze language-dependent challenges for temporal tagging and explain the strategies we followed to address them. Our evaluation results on publicly available and newly annotated corpora demonstrate the high quality of our new resources for the four languages, which we make publicly available to the research community.

References

[1]
Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st International Temporal Web Analytics Workshop. 1--8.
[2]
André Bittar, Pascal Amsili, Pascal Denis, and Laurence Danlos. 2011. French TimeBank: An ISO-TimeML annotated reference corpus. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Short Papers - Vol. 2). 130--134.
[3]
Nicolas Boffo and Océane Ho Dinh. 2010. Automatic processing of temporality for VIET4NooJ. In Proceedings of the NooJ Conference. 39--41.
[4]
Tommaso Caselli. 2010. It-TimeML: TimeML Annotation Scheme for Italian. Version 1.3.1. Tech. rep. Instituto di Linguistica Computazionale C.N.R.
[5]
Tommaso Caselli, Felice dell’Orletta, and Irina Prodanof. 2009. TETI: A TimeML compliant TimEx tagger for Italian. In Proceedings of the International Multiconference on Computer Science and Information Technology. 185--192.
[6]
Tommaso Caselli, Valentina Bartalesi Lenzi, Rachele Sprugnoli, Emanuele Pianta, and Irina Prodanof. 2011. Annotating events, temporal expressions and relations in Italian: The It-TimeML experience for the Ita-TimeBank. In Proceedings of the 5th Linguistic Annotation Workshop. 143--151.
[7]
Angel X. Chang and Christopher D. Manning. 2012. SUTime: A library for recognizing and normalizing time expressions. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3735--3740.
[8]
Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inform. Process. 8, 4, Article 14.
[9]
Lisa Ferro, Laurie Gerber, Inderjeet Mani, Beth Sundheim, and George Wilson. 2005. TIDES 2005 Standard for the Annotation of Temporal Expressions. Tech. rep., MITRE Corporation.
[10]
David Ferrucci and Adam Lally. 2004. UIMA: An architectural approach to unstructured information processing in the corporate research environment. Natural Lang. Eng. 10, 3--4, 327--348.
[11]
Marta Guerrero Nieto and Roser Saurí. 2012. ModeS TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.
[12]
Philippe Lambert, Sylviane R. Schwer, and Nicolas Boffo. 2012. A new model of time expressions detection and annotation in Vietnamese: The hôm case. In Proceedings of the International Conference on Asian Language Processing. 181--184.
[13]
Valentina Bartalesi Lenzi and Rachele Sprugnoli. 2007. Evalita 2007: Description and results of the TERN task. In Proceedings of the Evalita Workshop.
[14]
Hector Llorens, Estela Saquete, and Borja Navarro. 2010. TIPSem (English and Spanish): Evaluating CRFs and semantic roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 284--291.
[15]
Bernardo Magnini, Emanuele Pianta, Christian Girardi, Matteo Negri, Lorenza Romano, Manuela Speranza, Valentina Bartalesi Lenzi, and Rachele Sprugnoli. 2006. I-CAB: The Italian Content Annotation Bank. In Proceedings of the 5th International Conference on Language Resources and Evaluation.
[16]
Inderjeet Mani and George Wilson. 2000. Robust temporal processing of news. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 69--76.
[17]
Pawel Mazur. 2012. Broad-Coverage Rule-Based Processing of Temporal Expressions. Ph.D. dissertation, Macquarie University and Wroclaw University of Technology.
[18]
Pawel Mazur and Robert Dale. 2009. The DANTE temporal expression tagger. In Proceedings of the 3rd Language and Technology Conference. 245--257.
[19]
Pawel Mazur and Robert Dale. 2010. WikiWars: A new corpus for research on temporal expressions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 913--922.
[20]
Matteo Negri. 2007. Dealing with Italian temporal expressions: The ITA-CHRONOS system. In Proceedings of the Evalita Workshop.
[21]
Matteo Negri and Luca Marseglia. 2004. Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004. Tech. rep.
[22]
Matteo Negri, Estela Saquete, Patricio Martínez-Barco, and Rafael Muñoz. 2006. Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer. In Proceedings of the Workshop on Annotating and Reasoning about Time and Events. 30--37.
[23]
Cam-Tu Nguyen, Xuan-Hieu Phan, and Thu-Trang Nguyen. 2010. JVnTextPro: a Tool to Process Vietnamese Texts. Tech. rep., Version 2.0, http://jvntextpro.sourceforge.net/.
[24]
Dinh-Hoa Nguyen. 1997. Vietnamese. Vol. 9. John Benjamins Publishing Company.
[25]
Marcel Puchol-Blasco, Estela Saquete, and Patricio Martínez-Barco. 2007. Multilingual extension of temporal expression recognition using parallel corpora. In Proceedings of the 14th International Symposium on Temporal Representation and Reasoning. 175--180.
[26]
James Pustejovsky, Robert Knippen, Jessica Littman, and Roser Saurí. 2005. Temporal and event information in natural language text. Lang. Resources Eval. 39, 2--3, 123--164.
[27]
Iman Saleh, Lamia Tounsi, and Josef van Genabith. 2011. ZamAn and Raqm: Extracting temporal and numerical expressions in Arabic. In Proceedings of the 7th Asia Information Retrieval Societies Conference. 562--573.
[28]
Estela Saquete, Rafael Muñoz, and Patricio Martínez-Barco. 2006. Event ordering using TERSEO system. Data Knowl. Eng. 58, 1, 70--89.
[29]
Estela Saquete and James Pustejovsky. 2011. Automatic transformation from TIDES to TimeML annotation. Lang. Resources Eval. 45, 4, 495--523.
[30]
Roser Saurí and Toni Badia. 2012. Spanish TimeBank 1.0. Tech. rep., Linguistic Data Consortium (LDC), Philadelphia, PA.
[31]
Roser Saurí, Estela Saquete, and James Pustejovsky. 2010. Annotating Time Expressions in Spanish. TimeML Annotation Guidelines. Tech. rep. BM 2010-02, Barcelona Media.
[32]
Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing.
[33]
Jannik Strötgen and Michael Gertz. 2010. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation. 321--324.
[34]
Jannik Strötgen and Michael Gertz. 2011. WikiWarsDE: A German corpus of narratives annotated with temporal expressions. In Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology. 129--134.
[35]
Jannik Strötgen and Michael Gertz. 2012. Temporal tagging on different domains: Challenges, strategies, and gold standards. In Proceedings of the 8th International Conference on Language Resources and Evaluation. 3746--3753.
[36]
Jannik Strötgen and Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Lang. Resources Eval. 47, 2, 269--298.
[37]
Jannik Strötgen, Julian Zell, and Michael Gertz. 2013. HeidelTime: Tuning English and developing Spanish resources for TempEval-3. In Proceedings of the 7th International Workshop on Semantic Evaluation. 15--19.
[38]
Pham Thi Xuan Thao, Tran Quoc Tri, Ai Kawazoe, Dien Dinh, and Nigel Collier. 2007. Construction of Vietnamese corpora for named entity recognition. In Proceedings of the Large Scale Semantic Access to Content (Text, Image, Video, and Sound). 719--724.
[39]
Laurence C. Thompson. 1991. A Vietnamese Reference Grammar. University of Hawaii Press.
[40]
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. 173--180.
[41]
Tran Quoc Tri, Pham Thi Xuan Thao, Quoc-Hung Ngo, Dien Dinh, and Nigel Collier. 2007. Named entity recognition in Vietnamese documents. Progress Inform. 4, 5--13.
[42]
Naushad UzZaman, Hector Llorens, James F. Allen, Leon Derczynski, Marc Verhagen, and James Pustejovsky. 2012. TempEval-3: Evaluating events, time expressions, and temporal relations. CoRR abs/1206.5333.
[43]
Naushad UzZaman, Hector Llorens, Leon Derczynski, Marc Verhagen, James Allen, and James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. In Proceedings of the 7th International Workshop on Semantic Evaluation. 1--9.
[44]
Marc Verhagen and James Pustejovsky. 2008. Temporal processing with the TARSQI toolkit. In Proceedings of the 22nd International Conference on on Computational Linguistics: Demonstration Papers. 189--192.
[45]
Marc Verhagen, Roser Saurí, Tommaso Caselli, and James Pustejovsky. 2010. SemEval-2010 Task 13: TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. 57--62.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing
ACM Transactions on Asian Language Information Processing  Volume 13, Issue 1
February 2014
93 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/2590408
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2014
Accepted: 01 October 2013
Revised: 01 September 2013
Received: 01 May 2013
Published in TALIP Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Arabic NLP
  2. HeidelTime
  3. TIMEX3
  4. Temporal tagging
  5. Vietnamese NLP

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Time expression recognition and normalization: a surveyArtificial Intelligence Review10.1007/s10462-023-10400-y56:9(9115-9140)Online publication date: 1-Sep-2023
  • (2021)Türkçe Zamansal İfadelerin Yakalanması ve TanımlanmasıTurkish Temporal Expression Extraction and IdentificationBilişim Teknolojileri Dergisi10.17671/gazibtd.85314514:3(337-343)Online publication date: 31-Jul-2021
  • (2021)Literature ReviewTime Expression and Named Entity Recognition10.1007/978-3-030-78961-9_2(15-34)Online publication date: 7-Jun-2021
  • (2019)TimeML Annotation of Events and Temporal Expressions in Arabic TextsComputational Collective Intelligence10.1007/978-3-030-28377-3_17(207-218)Online publication date: 4-Sep-2019
  • (2018)Residual-based approach for authenticating pattern of multi-style diacritical Arabic textsPLOS ONE10.1371/journal.pone.019828413:6(e0198284)Online publication date: 20-Jun-2018
  • (2018)Understanding the use of Temporal Expressions on Persian Web SearchCompanion Proceedings of the The Web Conference 201810.1145/3184558.3191635(1743-1748)Online publication date: 23-Apr-2018
  • (2018)Discovering topic time from web newsInformation Processing and Management: an International Journal10.1016/j.ipm.2015.04.00151:6(869-890)Online publication date: 29-Dec-2018
  • (2018)Arabic Temporal Expression Tagging and NormalizationBig Data, Cloud and Applications10.1007/978-3-319-96292-4_43(546-557)Online publication date: 14-Aug-2018
  • (2018)ParsTime: Rule-Based Extraction and Normalization of Persian Temporal ExpressionsAdvances in Information Retrieval10.1007/978-3-319-76941-7_67(715-721)Online publication date: 1-Mar-2018
  • (2017)RussianFlu-DE: A German Corpus for a Historical Epidemic with Temporal AnnotationResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_6(61-73)Online publication date: 2-Sep-2017
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media