Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2824864.2824882acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features

Published: 05 December 2014 Publication History
  • Get Citation Alerts
  • Abstract

    This paper aims at implementing Named Entity Recognition (NER) for four languages such as English, Tamil, Hindi and Malayalam. The results obtained from this work are submitted to a research evaluation workshop Forum for Information Retrieval and Evaluation (FIRE 2014). This system detects three levels of named entity tags which are referred as nested named entities. It is a multi-label problem solved using chain classifier method. In this work, Conditional Random Field (CRF) and Support Vector Machine (SVM) are used for implementing NER system. In FIRE 2014, we developed a English NER system using CRF and other NER system for Tamil, Hindi and Malayalam are based on SVM. The FIRE estimated the average precision for all the four languages as 41.93 for outermost level and 33.25 for inner level. In order to improve the performance of Indian languages, we implemented CRF based NER system for the same corpus in Tamil, Hindi and Malayalam. The average precision measure for these mentioned languages are 42.87 for outer level and 36.31 for inner level. The overall performance of the NER system improved by 2.24% for outer level and 9.20% for inner level.

    References

    [1]
    S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy. Integrated machine learning techniques for arabic named entity recognition. IJCSI, 7:27--36, 2010.
    [2]
    Abinaya.N, Neethu John, Anand Kumar.M and Soman.K.P. Amrita@fire-2014: Named entity recognition for indian languages. working notes in FIRE 2014 -- NER Task, 2014.
    [3]
    S. B. Bam and T. B. Shahi. Named entity recognition for nepali text using support vector machines. Intelligent Information Management, 2014, 2014.
    [4]
    Y. Benajiba, M. Diab, and P. Rosso. Arabic named entity recognition using optimized feature sets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 284--293. Association for Computational Linguistics, 2008.
    [5]
    A. Borthwick. A maximum entropy approach to named entity recognition. PhD thesis, New York University, 1999.
    [6]
    A. Ekbal and S. Bandyopadhyay. Bengali named entity recognition using support vector machine. In IJCNLP, pages 51--58, 2008.
    [7]
    A. Ekbal and S. Bandyopadhyay. Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering, 4(2):155--170, 2010.
    [8]
    G. Georgiev, P. Nakov, K. Ganchev, P. Osenova, and K. Simov. Feature-rich named entity recognition for bulgarian using conditional random fields. In RANLP, pages 113--117, 2009.
    [9]
    J. Giménez and L. Marquez. Svmtool: A general pos tagger generator based on support vector machines. In In Proceedings of the 4th International Conference on Language Resources and Evaluation. Citeseer, 2004.
    [10]
    R. Grishman. The nyu system for muc-6 or where's the syntax? In Proceedings of the 6th conference on Message understanding, pages 167--175. Association for Computational Linguistics, 1995.
    [11]
    T. Joachims. Svmlight: Support vector machine. SVM-Light Support Vector Machine http://svmlight.joachims. org/, University of Dortmund, 19(4), 1999.
    [12]
    D. Kaur and V. Gupta. A survey of named entity recognition in english and other indian languages. IJCSI International Journal of Computer Science Issues, 7(6):1694--0814, 2010.
    [13]
    T. Kudo. Crf++: Yet another crf toolkit {ol}. 2009.
    [14]
    J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
    [15]
    C. Malarkodi, R. Pattabhi, and L. D. Sobha. Tamil ner--coping with real time challenges. In 24th International Conference on Computational Linguistics, page 23.
    [16]
    D. Nadeau, P. Turney, and S. Matwin. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. 2006.
    [17]
    Pallavi, Anitha S Pillai and Sobha L. Named entity recognition for indian languages: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 3:1215--1218, November 2013.
    [18]
    S. Pandian, K. A. Pavithra, and T. Geetha. Hybrid three-stage named entity recognizer for tamil. INFOS2008, March Cairo-Egypt. Available at: http://infos2008. fci. cu. edu.eg/infos/NLP_08_P045-052. pdf, 2008.
    [19]
    Pattabhi RK Rao, Malarkodi CS, Vijay Sundar Ram and Sobha Lalitha Devi. Neril: Named entity recognition for Indian languages Track at FIRE-2014.
    [20]
    Prakash Hiremath, Shambhavi B. R. Approaches to named entity recognition in indian languages: A study. International Journal of Engineering and Advanced Technology (IJEAT), ISSN: 2249-8958, Volume-3 Issue-6,:191--194, August 2014.
    [21]
    L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147--155. Association for Computational Linguistics, 2009.
    [22]
    S. K. Saha, S. Chatterji, S. Dandapat, S. Sarkar, and P. Mitra. A hybrid approach for named entity recognition in indian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 17--24, 2008.
    [23]
    T. D. Singh, K. Nongmeikapam, A. Ekbal, and S. Bandyopadhyay. Named entity recognition for manipuri using support vector machine. In PACLIC, pages 811--818, 2009.
    [24]
    K. P Soman, R. Loganathan, and V. Ajay. machine learning with SVM and other kernel methods. PHI Learning Pvt. Ltd., 2009.
    [25]
    K. Srinivasagan, S. Suganthi, and N. Jeyashenbagavalli. An automated system for tamil named entity recognition using hybrid approach. In Intelligent Computing Applications (ICICA), 2014 International Conference on, pages 435--439. IEEE, 2014.
    [26]
    C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 93--128, 2006.
    [27]
    M. Tkachenko and A. Simanovsky. Named entity recognition: Exploring features. In Proceedings of KONVENS, volume 2012, pages 118--127, 2012.
    [28]
    R. Vijayakrishna and S. L. Devi. Domain focused named entity recognizer for tamil using conditional random fields. In IJCNLP, pages 59--66, 2008.
    [29]
    L. Zhang, Y. Pan, and T. Zhang. Focused named entity recognition using machine learning. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 281--288. ACM, 2004.
    [30]
    Abinaya.N, Neethu John, M. Anand Kumar and K. P Soman. AMRITA@FIRE-2014: Named Entity Recognition for Indian Languages. Working note in Forum for Information Retrieval Evaluation (FIRE 2014), 2014.

    Cited By

    View all
    • (2024)Hindi MWE Detection by Learning Phraseology from CorporaSN Computer Science10.1007/s42979-024-03088-65:6Online publication date: 10-Aug-2024
    • (2023)AGRONERExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120440229:PAOnline publication date: 13-Jul-2023
    • (2023)Tamil NLP Technologies: Challenges, State of the Art, Trends and Future ScopeSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-33231-9_6(73-98)Online publication date: 29-May-2023
    • Show More Cited By

    Index Terms

    1. AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
      December 2014
      151 pages
      ISBN:9781450337557
      DOI:10.1145/2824864
      • Editors:
      • Prasenjit Majumder,
      • Mandar Mitra,
      • Sukomal Pal,
      • Madhulika Agrawal,
      • Parth Mehta
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 December 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Conditional Random Fields (CRF)
      2. Named Entity Recognition (NER)
      3. Natural Language Processing (NLP)
      4. Support Vector Machine (SVM)

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      FIRE '14
      FIRE '14: Forum for Information Retrieval Evaluation
      December 5 - 7, 2014
      Bangalore, India

      Acceptance Rates

      Overall Acceptance Rate 19 of 64 submissions, 30%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 12 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hindi MWE Detection by Learning Phraseology from CorporaSN Computer Science10.1007/s42979-024-03088-65:6Online publication date: 10-Aug-2024
      • (2023)AGRONERExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120440229:PAOnline publication date: 13-Jul-2023
      • (2023)Tamil NLP Technologies: Challenges, State of the Art, Trends and Future ScopeSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-33231-9_6(73-98)Online publication date: 29-May-2023
      • (2022)Towards Malay named entity recognition: an open-source dataset and a multi-task frameworkConnection Science10.1080/09540091.2022.215901435:1Online publication date: 28-Dec-2022
      • (2022)Review Based on Named Entity Recognition for Hindi Language Using Machine Learning ApproachProceedings of Second International Conference in Mechanical and Energy Technology10.1007/978-981-19-0108-9_35(333-340)Online publication date: 27-Jun-2022
      • (2019)Extraction of Named Entities from Social Media Text in Tamil Language Using N-Gram Embedding for Disaster ManagementNature-Inspired Computation in Data Mining and Machine Learning10.1007/978-3-030-28553-1_10(207-223)Online publication date: 4-Sep-2019
      • (2018)Towards Generalizable Place Name Recognition SystemsProceedings of the 12th Workshop on Geographic Information Retrieval10.1145/3281354.3281363(1-10)Online publication date: 6-Nov-2018
      • (2018)Entity Extraction of Hindi-English and Tamil-English Code-Mixed Social Media TextText Processing10.1007/978-3-319-73606-8_16(206-218)Online publication date: 4-Jan-2018

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media