Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2980258.2980310acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiciaConference Proceedingsconference-collections
research-article

Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval

Published: 25 August 2016 Publication History

Editorial Notes

NOTICE OF CONCERN: ACM has received evidence that casts doubt on the integrity of the peer review process for the ICIA 2016 Conference. As a result, ACM is issuing a Notice of Concern for all papers published and strongly suggests that the papers from this Conference not be cited in the literature until ACM's investigation has concluded and final decisions have been made regarding the integrity of the peer review process for this Conference.

Abstract

Todays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for this language barrier, where a user can search in his native language and get the relevant information in the required language. Currently, distributed word vector representation has a trend in various Natural Language Processing (NLP) task. These word vectors are used to identify similar contextual words. In this paper, we analyze the effectiveness of word vectors across the languages in Hindi-English CLIR. Skip-Gram Model (SGM) is used to learn bi-lingual word vectors from sentence aligned comparable corpus. IBM model is used to align the source language and target language words from sentence aligned comparable corpus. Best target language translation is selected with the help of top-n word alignments and word vectors.

References

[1]
Mustafa A, Tait J, and Oakes M. Literature review of cross-language information retrieval. In Transactions on Engineering, Computing and Technology, ISSN. (2005).
[2]
Nagarathinam A, and Saraswathi S. State of art: Cross Lingual Information Retrieval System for Indian Languages. In International Journal of computer application. Vol. 35, No. 13, (2011), 15--21.
[3]
Wang A, Li Y, and Wang W. Cross language information retrieval based on Ida. In International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009. IEEE, vol. 3, 485--490.
[4]
Nasharuddin NA, and Abdullah MT. Cross-lingual Information Retrieval State-of-the-Art. In electronic Journal of Computer Science and Information Technology (EJCSIT). Vol. 2, No. 1, (2010), 1--5.
[5]
Sharma VK, and Mittal N. Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches. In Information System Design and Intelligent Application, (2016), 699--708.
[6]
Sujatha P, and Dhavachelvan P. A review on the Cross and Multilingual Information Retrieval. In International Journal of Web & Semantic Technology (IJWesT). Vol. 2, No.4, (2011), 155--124.
[7]
Mahapatra L, Mohan M, Khapra MM, Bhattacharyya P. OWNS: Cross-lingual word sense disambiguation using weighted overlap counts and wordnet based similarity measures. In Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, (2010), 138--141.
[8]
Shishtla P, Surya G, Sethuramalingam S, Varma V. A language-independent transliteration schema using character aligned models at NEWS 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Association for Computational Linguistics, (2009), 40--43.
[9]
Ganguly D, Roy D, Mitra M, and Jones G. A word embedding based generalized language model for information retrieval. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, (2015), 795--798.
[10]
Pennington, J., Socher, R., and Manning, C. D. Glove: Global Vectors for Word Representation. In EMNLP, Vol. 14, (2014, October), 1532--1543.
[11]
Klementiev, A., Titov, I., and Bhattarai, B. Inducing crosslingual distributed representations of words. In Saarland Univerisity, Germany, (2012).
[12]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, (2013), 3111--3119.
[13]
Jagarlamudi J, and Kumaran A. Cross-Lingual Information Retrieval System for Indian Languages. In Advances in Multilingual and Multimodal Information Retrieval, Springer Berlin Heidelberg, (2007), 80--87.
[14]
Pingali P, Jagarlamudi J, and Varma V. A Dictionary Based Approach with Query Expansion to Cross Language Query Based Multi-Document Summarization: Experiments in Telugu-English. Mumbai, India, (2008).
[15]
Saravanan K, Udupa R, and Kumaran A. Crosslingual information retrieval system enhanced with transliteration generation and mining. In Forum for Information Retrieval Evaluation (FIRE-2010) Workshop (2010).
[16]
Surya G, Harsha S, Pingali P, and Verma V. Statistical transliteration for cross language information retrieval using HMM alignment model and CRF. In Proceedings of the 2nd Workshop on Cross Lingual Information Access (2008).
[17]
Shishtla P, Surya G, Sethuramalingam S, and Varma V. A language-independent transliteration schema using character aligned models at NEWS 2009. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Association for Computational Linguistics, (2009), 40--43.
[18]
Larkey LS, Connell ME, Abduljaleel N. Hindi CLIR in thirty days. ACM Transactions on Asian Language Information Processing (TALIP), Vol. 2, no. 2, (2003), 130--142.
[19]
Bradford R, and Pozniak J. Combining Modern Machine Translation Software with LSI for Cross-Lingual Information Processing. In 2014 11th International Conference on Information Technology: New Generations (ITNG), IEEE, (2014), 65--72.
[20]
Nie J, Simard M, Isabelle P, and Durand R. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, (1999), 74--81.
[21]
Udupa R, Jagarlamudi J, and Saravanan K. Microsoft research india at fire 2008: Hindi-english cross-language information retrieval. In Working notes for Forum for Information Retrieval Evaluation (FIRE) Workshop (2008).
[22]
Zou, W. Y., Socher, R., Cer, D. M., and Manning, C. D. Bilingual Word Embeddings for Phrase-Based Machine Translation. In EMNLP (2013), 1393--1398.
[23]
Vulic I., and Moens M. F. Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM (2015, August), 363--372.
[24]
Manning, Christopher D., and Hinrich Schütze. Foundations of statistical natural language processing. Vol. 999. Cambridge: MIT press, (1999).

Cited By

View all
  • (2021)Semantic morphological variant selection and translation disambiguation for cross-lingual information retrievalMultimedia Tools and Applications10.1007/s11042-021-11074-w82:6(8197-8212)Online publication date: 11-Jun-2021
  • (2020)Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information RetrievalIETE Technical Review10.1080/02564602.2020.184355339:2(276-285)Online publication date: 26-Nov-2020
  • (2018)A Comparative Study of Online Resources for Extracting Target Language TranslationRecent Findings in Intelligent Computing Techniques10.1007/978-981-10-8633-5_10(95-101)Online publication date: 4-Nov-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIA-16: Proceedings of the International Conference on Informatics and Analytics
August 2016
868 pages
ISBN:9781450347563
DOI:10.1145/2980258
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CLIR
  2. Comparable Corpus
  3. Contextual learning
  4. Skip-Gram Model
  5. Word-Embedding

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICIA-16

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Semantic morphological variant selection and translation disambiguation for cross-lingual information retrievalMultimedia Tools and Applications10.1007/s11042-021-11074-w82:6(8197-8212)Online publication date: 11-Jun-2021
  • (2020)Context-based Translation for the Out of Vocabulary Words Applied to Hindi-English Cross-Lingual Information RetrievalIETE Technical Review10.1080/02564602.2020.184355339:2(276-285)Online publication date: 26-Nov-2020
  • (2018)A Comparative Study of Online Resources for Extracting Target Language TranslationRecent Findings in Intelligent Computing Techniques10.1007/978-981-10-8633-5_10(95-101)Online publication date: 4-Nov-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media