Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A hybrid system for code switch point detection in informal Arabic text

Published: 14 October 2014 Publication History

Abstract

How to detect the switch between a standard and a dialectal form of a language in written text and why this is important for natural language processing tasks.

References

[1]
Ferguson, C. A. Diglossia. Word 15 (1959), 325--340.
[2]
Solorio, T. and Liu, Y. Part-of-speech Tagging for English-Spanish Code Switched Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2008.
[3]
Manandise, E. and Gdaniec, C. Morphology to the Rescue Redux: Resolving borrowings and code-mixing in machine translation. In SFCM'11. 2011.
[4]
Biadsy, F., Hirschberg, J., and Habash, N. Spoken Arabic Dialect Identification Using Phonotactic Modeling. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the meeting of the European Association for Computational Linguistics (EACL). (Athens, Greece). 2009.
[5]
Zaidan, O. and Callison-Burrch, C. The Arabic Online Commentary Dataset: An annotated dataset of informal Arabic with high dialectal content. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2011.
[6]
Cotterell, R. and Callison-Burch, C. A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC). Reykjavik, Iceland. 2014.
[7]
Salloum, W., Elfardy, H., Alamir-Salloum, L., Habash, N., and Diab, M. Sentence Level Dialect Identification for Machine Translation System Selection. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). 2014.
[8]
Habash, N., Diab, M., and Rabmow, O. Conventional Orthography for Dialectal Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC). Istanbul, Turkey. 2012.
[9]
Eskander, R., Habash, N., Rambow, O., and Tomeh, N. Processing Spontaneous Orthography. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, GA. 2013.
[10]
Stolcke, ASRILM an Extensible Language Modeling Toolkit. In Proceedings of the International Conference on Spoken Language Processing. 2002
[11]
Elfardy, H., Al-Badrashiny, M., and Diab, M. Code Switch Point Detection in Arabic. In Proceedings of the 18th International Conference on Application of Natural Language to Information Systems (NLDB2013). MediaCity, UK. 2013.
[12]
Elfardy, H. and Diab, M. T. Sentence Level Dialect Identification in Arabic. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). 2013.
[13]
Habash, N., Eskander, R., and Hawwari, A. A Morphological Analyzer for Egyptian Arabic. NAACL-HLT 2012 Workshop on Computational Morphology and Phonology (SIGMORPHON2012). 2012.
[14]
Hall, M., Frank, E., Holmes, G., Reutemann, B. P., and Witten, I. H. The WEKA Data Mining Software: an update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10--18.

Cited By

View all
  • (2020)Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case studyNatural Language Engineering10.1017/S135132492000011X(1-14)Online publication date: 5-May-2020

Recommendations

Comments

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2014
Published in XRDS Volume 21, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case studyNatural Language Engineering10.1017/S135132492000011X(1-14)Online publication date: 5-May-2020

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media