Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A hybrid system for code switch point detection in informal Arabic text

Published: 14 October 2014 Publication History
  • Get Citation Alerts
  • Abstract

    How to detect the switch between a standard and a dialectal form of a language in written text and why this is important for natural language processing tasks.

    References

    [1]
    Ferguson, C. A. Diglossia. Word 15 (1959), 325--340.
    [2]
    Solorio, T. and Liu, Y. Part-of-speech Tagging for English-Spanish Code Switched Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2008.
    [3]
    Manandise, E. and Gdaniec, C. Morphology to the Rescue Redux: Resolving borrowings and code-mixing in machine translation. In SFCM'11. 2011.
    [4]
    Biadsy, F., Hirschberg, J., and Habash, N. Spoken Arabic Dialect Identification Using Phonotactic Modeling. In Proceedings of the Workshop on Computational Approaches to Semitic Languages at the meeting of the European Association for Computational Linguistics (EACL). (Athens, Greece). 2009.
    [5]
    Zaidan, O. and Callison-Burrch, C. The Arabic Online Commentary Dataset: An annotated dataset of informal Arabic with high dialectal content. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 2011.
    [6]
    Cotterell, R. and Callison-Burch, C. A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC). Reykjavik, Iceland. 2014.
    [7]
    Salloum, W., Elfardy, H., Alamir-Salloum, L., Habash, N., and Diab, M. Sentence Level Dialect Identification for Machine Translation System Selection. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). 2014.
    [8]
    Habash, N., Diab, M., and Rabmow, O. Conventional Orthography for Dialectal Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC). Istanbul, Turkey. 2012.
    [9]
    Eskander, R., Habash, N., Rambow, O., and Tomeh, N. Processing Spontaneous Orthography. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta, GA. 2013.
    [10]
    Stolcke, ASRILM an Extensible Language Modeling Toolkit. In Proceedings of the International Conference on Spoken Language Processing. 2002
    [11]
    Elfardy, H., Al-Badrashiny, M., and Diab, M. Code Switch Point Detection in Arabic. In Proceedings of the 18th International Conference on Application of Natural Language to Information Systems (NLDB2013). MediaCity, UK. 2013.
    [12]
    Elfardy, H. and Diab, M. T. Sentence Level Dialect Identification in Arabic. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). 2013.
    [13]
    Habash, N., Eskander, R., and Hawwari, A. A Morphological Analyzer for Egyptian Arabic. NAACL-HLT 2012 Workshop on Computational Morphology and Phonology (SIGMORPHON2012). 2012.
    [14]
    Hall, M., Frank, E., Holmes, G., Reutemann, B. P., and Witten, I. H. The WEKA Data Mining Software: an update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10--18.

    Cited By

    View all
    • (2020)Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case studyNatural Language Engineering10.1017/S135132492000011X(1-14)Online publication date: 5-May-2020

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image XRDS: Crossroads, The ACM Magazine for Students
    XRDS: Crossroads, The ACM Magazine for Students  Volume 21, Issue 1
    Natural Language
    Fall 2014
    65 pages
    ISSN:1528-4972
    EISSN:1528-4980
    DOI:10.1145/2677339
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 October 2014
    Published in XRDS Volume 21, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Popular
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case studyNatural Language Engineering10.1017/S135132492000011X(1-14)Online publication date: 5-May-2020

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Digital Edition

    View this article in digital edition.

    Digital Edition

    Magazine Site

    View this article on the magazine site (external)

    Magazine Site

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media