Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2145432.2145526dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Analyzing methods for improving precision of pivot based bilingual dictionaries

Published: 27 July 2011 Publication History

Abstract

An A-C bilingual dictionary can be inferred by merging A-B and B-C dictionaries using B as pivot. However, polysemous pivot words often produce wrong translation candidates. This paper analyzes two methods for pruning wrong candidates: one based on exploiting the structure of the source dictionaries, and the other based on distributional similarity computed from comparable corpora. As both methods depend exclusively on easily available resources, they are well suited to less resourced languages. We studied whether these two techniques complement each other given that they are based on different paradigms. We also researched combining them by looking for the best adequacy depending on various application scenarios.

References

[1]
Francis Bond and Kentaro Ogura. 2007. Combining linguistic resources to create a machine-tractable Japanese-Malay dictionary. Language Resources and Evaluation, 42(2):127--136.
[2]
Francis Bond, Ruhaida Binti Sulong, Takefumi Yamazaki, and Kentaro Ogura. 2001. Design and construction of a machine-tractable Japanese-Malay dictionary. Proceedings of ASIALEX, SEOUL, 2001(2001):200--205.
[3]
Maike Erdmann, Kotaro Nakayama, Takahiro Hara, and Shojiro Nishio. 2008. An approach for extracting bilingual terminology from wikipedia. In Proceedings of the 13th international conference on Database systems for advanced applications, DASFAA'08, pages 380--392, Berlin, Heidelberg. Springer-Verlag. ACM ID: 1802552.
[4]
Pascale Fung. 1995. Compiling bilingual lexicon entries from a non-parallel English-Chinese corpus. In David Yarovsky and Kenneth Church, editors, Proceedings of the Third Workshop on Very Large Corpora, pages 173--183, Somerset, New Jersey. Association for Computational Linguistics.
[5]
Pablo Gamallo and José Pichel. 2010. Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, 11th International Conference, CICLing 2010. Proceedings, volume 6008 of Lecture Notes in Computer Science, pages 473--483. Springer.
[6]
Varga István and Yokoyama Shoichi. 2009. Bilingual dictionary generation for low-resourced language pairs. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, EMNLP '09, pages 862--870, Stroudsburg, PA, USA. Association for Computational Linguistics. ACM ID: 1699625.
[7]
Hiroyuki Kaji, Shin'ichi Tamamura, and Dashtseren Erdenebat. 2008. Automatic construction of a Japanese-Chinese dictionary via English. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/.
[8]
Mausam, Stephen Soderland, Oren Etzioni, Daniel S Weld, Michael Skinner, and Jeff Bilmes. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, ACL '09, page 262270, Stroudsburg, PA, USA. Association for Computational Linguistics. ACM ID: 1687917.
[9]
Kyonghee Paik, Satoshi Shirai, and Hiromi Nakaiwa. 2004. Automatic construction of a transfer dictionary considering directionality. In Proceedings of the Workshop on Multilingual Linguistic Ressources, MLR '04, pages 31--38, Stroudsburg, PA, USA. Association for Computational Linguistics. ACM ID: 1706243.
[10]
R. Rapp. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pages 519--526, College Park, USA. ACL.
[11]
Daphna Shezaf and Ari Rappoport. 2010. Bilingual lexicon generation using non-aligned signatures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, page 98107, Stroudsburg, PA, USA. Association for Computational Linguistics. ACM ID: 1858692.
[12]
S. Shirai and K. Yamamoto. 2001. Linking english words in two bilingual dictionaries to generate another language pair dictionary. In Proceedings of ICCPOL, pages 174--179.
[13]
J. Sjöbergh. 2005. Creating a free digital Japanese-Swedish lexicon. In Proceedings of PACLING 2005.
[14]
Kumiko Tanaka and Kyoji Umemura. 1994. Construction of a bilingual dictionary intermediated by a third language. In Proceedings of the 16th International Conference on Computational Linguistics (COLING'94), pages 297--303.
[15]
Takashi Tsunakawa, Naoaki Okazaki, and Jun'ichi Tsujii. 2008. Building bilingual lexicons using lexical translation probabilities via pivot languages. Proceedings of the Sixth International Language Resources and Evaluation (LREC'08).

Cited By

View all
  • (2017)A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language FamiliesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/313881517:2(1-29)Online publication date: 13-Nov-2017
  • (2015)A Constraint Approach to Pivot-Based Bilingual Dictionary InductionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/272314415:1(1-26)Online publication date: 21-Nov-2015
  • (2012)Regularized interlingual projectionsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390951(12-23)Online publication date: 12-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
EMNLP '11: Proceedings of the Conference on Empirical Methods in Natural Language Processing
July 2011
1647 pages
ISBN:9781937284114

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 27 July 2011

Qualifiers

  • Research-article

Acceptance Rates

Overall Acceptance Rate 73 of 234 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language FamiliesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/313881517:2(1-29)Online publication date: 13-Nov-2017
  • (2015)A Constraint Approach to Pivot-Based Bilingual Dictionary InductionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/272314415:1(1-26)Online publication date: 21-Nov-2015
  • (2012)Regularized interlingual projectionsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390951(12-23)Online publication date: 12-Jul-2012

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media