research-article

AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features

Authors:

Barathi H. B. Ganesh,

Anand M. Kumar,

K. P. SomanAuthors Info & Claims

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 103 - 111

https://doi.org/10.1145/2824864.2824882

Published: 05 December 2014 Publication History

Abstract

This paper aims at implementing Named Entity Recognition (NER) for four languages such as English, Tamil, Hindi and Malayalam. The results obtained from this work are submitted to a research evaluation workshop Forum for Information Retrieval and Evaluation (FIRE 2014). This system detects three levels of named entity tags which are referred as nested named entities. It is a multi-label problem solved using chain classifier method. In this work, Conditional Random Field (CRF) and Support Vector Machine (SVM) are used for implementing NER system. In FIRE 2014, we developed a English NER system using CRF and other NER system for Tamil, Hindi and Malayalam are based on SVM. The FIRE estimated the average precision for all the four languages as 41.93 for outermost level and 33.25 for inner level. In order to improve the performance of Indian languages, we implemented CRF based NER system for the same corpus in Tamil, Hindi and Malayalam. The average precision measure for these mentioned languages are 42.87 for outer level and 36.31 for inner level. The overall performance of the NER system improved by 2.24% for outer level and 9.20% for inner level.

References

[1]

S. AbdelRahman, M. Elarnaoty, M. Magdy, and A. Fahmy. Integrated machine learning techniques for arabic named entity recognition. IJCSI, 7:27--36, 2010.

[2]

Abinaya.N, Neethu John, Anand Kumar.M and Soman.K.P. Amrita@fire-2014: Named entity recognition for indian languages. working notes in FIRE 2014 -- NER Task, 2014.

[3]

S. B. Bam and T. B. Shahi. Named entity recognition for nepali text using support vector machines. Intelligent Information Management, 2014, 2014.

[4]

Y. Benajiba, M. Diab, and P. Rosso. Arabic named entity recognition using optimized feature sets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 284--293. Association for Computational Linguistics, 2008.

Digital Library

[5]

A. Borthwick. A maximum entropy approach to named entity recognition. PhD thesis, New York University, 1999.

Digital Library

[6]

A. Ekbal and S. Bandyopadhyay. Bengali named entity recognition using support vector machine. In IJCNLP, pages 51--58, 2008.

[7]

A. Ekbal and S. Bandyopadhyay. Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering, 4(2):155--170, 2010.

[8]

G. Georgiev, P. Nakov, K. Ganchev, P. Osenova, and K. Simov. Feature-rich named entity recognition for bulgarian using conditional random fields. In RANLP, pages 113--117, 2009.

[9]

J. Giménez and L. Marquez. Svmtool: A general pos tagger generator based on support vector machines. In In Proceedings of the 4th International Conference on Language Resources and Evaluation. Citeseer, 2004.

[10]

R. Grishman. The nyu system for muc-6 or where's the syntax? In Proceedings of the 6th conference on Message understanding, pages 167--175. Association for Computational Linguistics, 1995.

Digital Library

[11]

T. Joachims. Svmlight: Support vector machine. SVM-Light Support Vector Machine http://svmlight.joachims. org/, University of Dortmund, 19(4), 1999.

[12]

D. Kaur and V. Gupta. A survey of named entity recognition in english and other indian languages. IJCSI International Journal of Computer Science Issues, 7(6):1694--0814, 2010.

[13]

T. Kudo. Crf++: Yet another crf toolkit {ol}. 2009.

[14]

J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.

Digital Library

[15]

C. Malarkodi, R. Pattabhi, and L. D. Sobha. Tamil ner--coping with real time challenges. In 24th International Conference on Computational Linguistics, page 23.

[16]

D. Nadeau, P. Turney, and S. Matwin. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. 2006.

[17]

Pallavi, Anitha S Pillai and Sobha L. Named entity recognition for indian languages: A survey. International Journal of Advanced Research in Computer Science and Software Engineering, 3:1215--1218, November 2013.

[18]

S. Pandian, K. A. Pavithra, and T. Geetha. Hybrid three-stage named entity recognizer for tamil. INFOS2008, March Cairo-Egypt. Available at: http://infos2008. fci. cu. edu.eg/infos/NLP_08_P045-052. pdf, 2008.

[19]

Pattabhi RK Rao, Malarkodi CS, Vijay Sundar Ram and Sobha Lalitha Devi. Neril: Named entity recognition for Indian languages Track at FIRE-2014.

[20]

Prakash Hiremath, Shambhavi B. R. Approaches to named entity recognition in indian languages: A study. International Journal of Engineering and Advanced Technology (IJEAT), ISSN: 2249-8958, Volume-3 Issue-6,:191--194, August 2014.

[21]

L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pages 147--155. Association for Computational Linguistics, 2009.

Digital Library

[22]

S. K. Saha, S. Chatterji, S. Dandapat, S. Sarkar, and P. Mitra. A hybrid approach for named entity recognition in indian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pages 17--24, 2008.

[23]

T. D. Singh, K. Nongmeikapam, A. Ekbal, and S. Bandyopadhyay. Named entity recognition for manipuri using support vector machine. In PACLIC, pages 811--818, 2009.

[24]

K. P Soman, R. Loganathan, and V. Ajay. machine learning with SVM and other kernel methods. PHI Learning Pvt. Ltd., 2009.

[25]

K. Srinivasagan, S. Suganthi, and N. Jeyashenbagavalli. An automated system for tamil named entity recognition using hybrid approach. In Intelligent Computing Applications (ICICA), 2014 International Conference on, pages 435--439. IEEE, 2014.

Digital Library

[26]

C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 93--128, 2006.

[27]

M. Tkachenko and A. Simanovsky. Named entity recognition: Exploring features. In Proceedings of KONVENS, volume 2012, pages 118--127, 2012.

[28]

R. Vijayakrishna and S. L. Devi. Domain focused named entity recognizer for tamil using conditional random fields. In IJCNLP, pages 59--66, 2008.

[29]

L. Zhang, Y. Pan, and T. Zhang. Focused named entity recognition using machine learning. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 281--288. ACM, 2004.

Digital Library

[30]

Abinaya.N, Neethu John, M. Anand Kumar and K. P Soman. AMRITA@FIRE-2014: Named Entity Recognition for Indian Languages. Working note in Forum for Information Retrieval Evaluation (FIRE 2014), 2014.

Digital Library

Cited By

Mishra AShaikh S(2024)Hindi MWE Detection by Learning Phraseology from CorporaSN Computer Science10.1007/s42979-024-03088-65:6Online publication date: 10-Aug-2024
https://doi.org/10.1007/s42979-024-03088-6
G. VKanjirangat VGupta D(2023)AGRONERExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120440229:PAOnline publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.120440
Rajendran SAnand Kumar MRajalakshmi RDhanalakshmi VBalasubramanian PSoman K(2023)Tamil NLP Technologies: Challenges, State of the Art, Trends and Future ScopeSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-33231-9_6(73-98)Online publication date: 29-May-2023
https://doi.org/10.1007/978-3-031-33231-9_6
Show More Cited By

Index Terms

AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

AMRITA_CEN@FIRE-2014: Morpheme Extraction and Lemmatization for Tamil using Machine Learning
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

This paper presents the method of Morpheme Extraction and lemmatization for Tamil language in Morpheme Extraction Task (MET) of FIRE-2014. Tamil is a morphologically rich and agglutinative language. Such a language needs deeper analysis at the word ...
Urdu language processing: a survey

Extensive work has been done on different activities of natural language processing for Western languages as compared to its Eastern counterparts particularly South Asian Languages. Western languages are termed as resource-rich languages. Core ...
A deep learning-based bilingual Hindi and Punjabi named entity recognition system using enhanced word embeddings
Abstract
The increasing availability of information on the web makes the task of named entity recognition (NER) more challenging. Named entity recognition is an important pre-processor tool that is concerned with the extraction of entities of ...
Highlights
- Development of enhanced word embeddings for bilingual NER system is a novel attempt.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

December 2014

151 pages

ISBN:9781450337557

DOI:10.1145/2824864

Editors:
Prasenjit Majumder
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Mandar Mitra
Indian Statistical Institute, Kolkata, India
,
Sukomal Pal
Indian School of Mines, Dhanbad
,
Madhulika Agrawal
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FIRE '14

FIRE '14: Forum for Information Retrieval Evaluation

December 5 - 7, 2014

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mishra AShaikh S(2024)Hindi MWE Detection by Learning Phraseology from CorporaSN Computer Science10.1007/s42979-024-03088-65:6Online publication date: 10-Aug-2024
https://doi.org/10.1007/s42979-024-03088-6
G. VKanjirangat VGupta D(2023)AGRONERExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.120440229:PAOnline publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1016/j.eswa.2023.120440
Rajendran SAnand Kumar MRajalakshmi RDhanalakshmi VBalasubramanian PSoman K(2023)Tamil NLP Technologies: Challenges, State of the Art, Trends and Future ScopeSpeech and Language Technologies for Low-Resource Languages10.1007/978-3-031-33231-9_6(73-98)Online publication date: 29-May-2023
https://doi.org/10.1007/978-3-031-33231-9_6
Fu YLin NYang ZJiang S(2022)Towards Malay named entity recognition: an open-source dataset and a multi-task frameworkConnection Science10.1080/09540091.2022.215901435:1Online publication date: 28-Dec-2022
https://doi.org/10.1080/09540091.2022.2159014
Shelke RVanjale S(2022)Review Based on Named Entity Recognition for Hindi Language Using Machine Learning ApproachProceedings of Second International Conference in Mechanical and Energy Technology10.1007/978-981-19-0108-9_35(333-340)Online publication date: 27-Jun-2022
https://doi.org/10.1007/978-981-19-0108-9_35
Devi GKumar MSoman K(2019)Extraction of Named Entities from Social Media Text in Tamil Language Using N-Gram Embedding for Disaster ManagementNature-Inspired Computation in Data Mining and Machine Learning10.1007/978-3-030-28553-1_10(207-223)Online publication date: 4-Sep-2019
https://doi.org/10.1007/978-3-030-28553-1_10
Akdemir AHürriyetoğlu AYörük EGürel BYoltar ÇYüret D(2018)Towards Generalizable Place Name Recognition SystemsProceedings of the 12th Workshop on Geographic Information Retrieval10.1145/3281354.3281363(1-10)Online publication date: 6-Nov-2018
https://dl.acm.org/doi/10.1145/3281354.3281363
Remmiya Devi GVeena PAnand Kumar MSoman K(2018)Entity Extraction of Hindi-English and Tamil-English Code-Mixed Social Media TextText Processing10.1007/978-3-319-73606-8_16(206-218)Online publication date: 4-Jan-2018
https://doi.org/10.1007/978-3-319-73606-8_16

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents