research-article

Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval

Authors:

Vijay Kumar Sharma,

Namita MittalAuthors Info & Claims

ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Article No.: 28, Pages 1 - 4

https://doi.org/10.1145/2980258.2980310

Published: 25 August 2016 Publication History

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the ICIA 2016 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

Abstract

Todays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for this language barrier, where a user can search in his native language and get the relevant information in the required language. Currently, distributed word vector representation has a trend in various Natural Language Processing (NLP) task. These word vectors are used to identify similar contextual words. In this paper, we analyze the effectiveness of word vectors across the languages in Hindi-English CLIR. Skip-Gram Model (SGM) is used to learn bi-lingual word vectors from sentence aligned comparable corpus. IBM model is used to align the source language and target language words from sentence aligned comparable corpus. Best target language translation is selected with the help of top-n word alignments and word vectors.

References

[1]

Mustafa A, Tait J, and Oakes M. Literature review of cross-language information retrieval. In Transactions on Engineering, Computing and Technology, ISSN. (2005).

[2]

Nagarathinam A, and Saraswathi S. State of art: Cross Lingual Information Retrieval System for Indian Languages. In International Journal of computer application. Vol. 35, No. 13, (2011), 15--21.

[3]

Wang A, Li Y, and Wang W. Cross language information retrieval based on Ida. In International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009. IEEE, vol. 3, 485--490.

[4]

Nasharuddin NA, and Abdullah MT. Cross-lingual Information Retrieval State-of-the-Art. In electronic Journal of Computer Science and Information Technology (EJCSIT). Vol. 2, No. 1, (2010), 1--5.

[5]

Sharma VK, and Mittal N. Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches. In Information System Design and Intelligent Application, (2016), 699--708.

[6]

Sujatha P, and Dhavachelvan P. A review on the Cross and Multilingual Information Retrieval. In International Journal of Web & Semantic Technology (IJWesT). Vol. 2, No.4, (2011), 155--124.

[7]

Mahapatra L, Mohan M, Khapra MM, Bhattacharyya P. OWNS: Cross-lingual word sense disambiguation using weighted overlap counts and wordnet based similarity measures. In Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, (2010), 138--141.

Digital Library

[8]

Shishtla P, Surya G, Sethuramalingam S, Varma V. A language-independent transliteration schema using character aligned models at NEWS 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Association for Computational Linguistics, (2009), 40--43.

Digital Library

[9]

Ganguly D, Roy D, Mitra M, and Jones G. A word embedding based generalized language model for information retrieval. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, (2015), 795--798.

Digital Library

[10]

Pennington, J., Socher, R., and Manning, C. D. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14, (2014, October), 1532--1543.

[11]

Klementiev, A., Titov, I., and Bhattarai, B. Inducing crosslingual distributed representations of words. In Saarland Univerisity, Germany, (2012).

[12]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, (2013), 3111--3119.

Digital Library

[13]

Jagarlamudi J, and Kumaran A. Cross-Lingual Information Retrieval System for Indian Languages. In Advances in Multilingual and Multimodal Information Retrieval, Springer Berlin Heidelberg, (2007), 80--87.

Digital Library

[14]

Pingali P, Jagarlamudi J, and Varma V. A Dictionary Based Approach with Query Expansion to Cross Language Query Based Multi-Document Summarization: Experiments in Telugu-English. Mumbai, India, (2008).

[15]

Saravanan K, Udupa R, and Kumaran A. Crosslingual information retrieval system enhanced with transliteration generation and mining. In Forum for Information Retrieval Evaluation (FIRE-2010) Workshop (2010).

[16]

Surya G, Harsha S, Pingali P, and Verma V. Statistical transliteration for cross language information retrieval using HMM alignment model and CRF. In Proceedings of the 2nd Workshop on Cross Lingual Information Access (2008).

[17]

Shishtla P, Surya G, Sethuramalingam S, and Varma V. A language-independent transliteration schema using character aligned models at NEWS 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Association for Computational Linguistics, (2009), 40--43.

Digital Library

[18]

Larkey LS, Connell ME, Abduljaleel N. Hindi CLIR in thirty days. ACM Transactions on Asian Language Information Processing (TALIP), Vol. 2, no. 2, (2003), 130--142.

Digital Library

[19]

Bradford R, and Pozniak J. Combining Modern Machine Translation Software with LSI for Cross-Lingual Information Processing. In 2014 11th International Conference on Information Technology: New Generations (ITNG), IEEE, (2014), 65--72.

Digital Library

[20]

Nie J, Simard M, Isabelle P, and Durand R. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, (1999), 74--81.

Digital Library

[21]

Udupa R, Jagarlamudi J, and Saravanan K. Microsoft research india at fire 2008: Hindi-english cross-language information retrieval. In Working notes for Forum for Information Retrieval Evaluation (FIRE) Workshop (2008).

[22]

Zou, W. Y., Socher, R., Cer, D. M., and Manning, C. D. Bilingual Word Embeddings for Phrase-Based Machine Translation. In EMNLP (2013), 1393--1398.

[23]

Vulic I., and Moens M. F. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM (2015, August), 363--372.

Digital Library

[24]

Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, (1999).

Digital Library

Cited By

Sharma VMittal NVidyarthi A(2021)Semantic morphological variant selection and translation disambiguation for cross-lingual information retrievalMultimedia Tools and Applications10.1007/s11042-021-11074-w82:6(8197-8212)Online publication date: 11-Jun-2021
https://dl.acm.org/doi/10.1007/s11042-021-11074-w
Sharma VMittal NVidyarthi A(2020)Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information RetrievalIETE Technical Review10.1080/02564602.2020.184355339:2(276-285)Online publication date: 26-Nov-2020
https://doi.org/10.1080/02564602.2020.1843553
Sharma VMittal N(2018)A Comparative Study of Online Resources for Extracting Target Language TranslationRecent Findings in Intelligent Computing Techniques10.1007/978-981-10-8633-5_10(95-101)Online publication date: 4-Nov-2018
https://doi.org/10.1007/978-981-10-8633-5_10

Recommendations

Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation
Advances in Multilingual and Multimodal Information Retrieval

In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bi-lingual dictionaries. Query words not ...
Multilingual Topic Models for Bilingual Dictionary Extraction

A machine-readable bilingual dictionary plays a crucial role in many natural language processing tasks, such as statistical machine translation and cross-language information retrieval. In this article, we propose a framework for extracting a bilingual ...
Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

We investigated using the LDC English/Chinese bilingual wordlists for English-Chinese cross language retrieval. It is shown that the Chinese-to-English wordlist can be considered as both a phrase and word dictionary, and is preferable to the English-to-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIA-16: Proceedings of the International Conference on Informatics and Analytics

August 2016

868 pages

ISBN:9781450347563

DOI:10.1145/2980258

Copyright © 2016 ACM.

© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICIA-16

ICIA-16: International Conference on Informatics and Analytics

August 25 - 26, 2016

Pondicherry, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
127
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sharma VMittal NVidyarthi A(2021)Semantic morphological variant selection and translation disambiguation for cross-lingual information retrievalMultimedia Tools and Applications10.1007/s11042-021-11074-w82:6(8197-8212)Online publication date: 11-Jun-2021
https://dl.acm.org/doi/10.1007/s11042-021-11074-w
Sharma VMittal NVidyarthi A(2020)Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information RetrievalIETE Technical Review10.1080/02564602.2020.184355339:2(276-285)Online publication date: 26-Nov-2020
https://doi.org/10.1080/02564602.2020.1843553
Sharma VMittal N(2018)A Comparative Study of Online Resources for Extracting Target Language TranslationRecent Findings in Intelligent Computing Techniques10.1007/978-981-10-8633-5_10(95-101)Online publication date: 4-Nov-2018
https://doi.org/10.1007/978-981-10-8633-5_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten