Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1218955.1219022dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A geometric view on bilingual lexicon extraction from comparable corpora

Published: 21 July 2004 Publication History

Abstract

We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons.

References

[1]
F. R. Bach and M. I. Jordan. 2001. Kernel independent component analysis. Journal of Machine Learning Research.
[2]
R. Besançon, M. Rajman, and J.-C. Chappelier. 1999. Textual similarities based on a distributional approach. In Proceedings of the Tenth International Workshop on Database and Expert Systems Applications (DEX'99), Florence, Italy.
[3]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.
[4]
H. Dejean, E. Gaussier, and F. Sadat. 2002. An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In International Conference on Computational Linguistics, COLING'02.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1--38.
[6]
Mona Diab and Steve Finch. 2000. A statistical word-level translation model for comparable corpora. In Proceeding of the Conference on Content-Based Multimedia Information Access (RIAO).
[7]
Pascale Fung. 2000. A statistical view on bilingual lexicon extraction - from parallel corpora to nonparallel corpora. In J. Véronis, editor, Parallel Text Processing. Kluwer Academic Publishers.
[8]
G. Grefenstette. 1994. Explorations in Automatic Thesaurus Construction. Kluwer Academic Publishers.
[9]
Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages 289--296. Morgan Kaufmann.
[10]
Thomas Hofmann. 2000. Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In Advances in Neural Information Processing Systems 12, page 914. MIT Press.
[11]
Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, pages 487--493.
[12]
Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In ACL 2002 Workshop on Unsupervised Lexical Acquisition.
[13]
P. A. W. Lewis, P. B. Baxendale, and J. L. Bennet. 1967. Statistical discrimination of the synonym/antonym relationship between words. Journal of the ACM.
[14]
C. Peters and E. Picchi. 1995. Capturing the comparable: A system for querying comparable text corpora. In JADT'95--3rd International Conference on Statistical Analysis of Textual Data, pages 255--262.
[15]
R. Rapp. 1995. Identifying word translations in nonparallel texts. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[16]
I. Shahzad, K. Ohtake, S. Masuyama, and K. Yamamoto. 1999. Identifying translations of compound nouns using non-aligned corpora. In Proceedings of the Workshop MAL'99, pages 108--113.
[17]
K. Tanaka and Hideya Iwasaki. 1996. Extraction of lexical translations from non-aligned corpora. In International Conference on Computational Linguistics, COLING'96.
[18]
Naonori Ueda and Ryohei Nakano. 1995. Deterministic annealing variant of the EM algorithm. In Advances in Neural Information Processing Systems 7, pages 545--552.
[19]
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. 2002. Finding language-independent semantic representation of text using kernel canonical correlation analysis. In Advances in Neural Information Processing Systems 12.

Cited By

View all
  • (2019)A survey of cross-lingual word embedding modelsJournal of Artificial Intelligence Research10.1613/jair.1.1164065:1(569-630)Online publication date: 1-May-2019
  • (2017)Bilingual lexicon induction from non-parallel data with minimal supervisionProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298059(3379-3385)Online publication date: 4-Feb-2017
  • (2017)Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from WikipediaACM Transactions on Asian and Low-Resource Language Information Processing10.1145/303829516:3(1-25)Online publication date: 17-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)A survey of cross-lingual word embedding modelsJournal of Artificial Intelligence Research10.1613/jair.1.1164065:1(569-630)Online publication date: 1-May-2019
  • (2017)Bilingual lexicon induction from non-parallel data with minimal supervisionProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298059(3379-3385)Online publication date: 4-Feb-2017
  • (2017)Corpus-Based Translation Induction in Indian Languages Using Auxiliary Language Corpora from WikipediaACM Transactions on Asian and Low-Resource Language Information Processing10.1145/303829516:3(1-25)Online publication date: 17-Mar-2017
  • (2015)Multilingual Topic Models for Bilingual Dictionary ExtractionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/269993914:3(1-22)Online publication date: 12-Jun-2015
  • (2014)Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based ModelsProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840410.1007/978-3-642-54903-8_26(310-323)Online publication date: 6-Apr-2014
  • (2013)A language modeling approach for extracting translation knowledge from comparable corporaProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_51(606-617)Online publication date: 24-Mar-2013
  • (2012)Bilingual lexicon extraction from comparable corpora using label propagationProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390952(24-36)Online publication date: 12-Jul-2012
  • (2012)Detecting highly confident word translations from comparable corpora without any prior knowledgeProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380872(449-459)Online publication date: 23-Apr-2012
  • (2012)Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon ExtensionACM Transactions on Asian Language Information Processing10.1145/2184436.218443911:2(1-31)Online publication date: 1-Jun-2012
  • (2012)QAlignProceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II10.1007/978-3-642-28601-8_8(83-96)Online publication date: 11-Mar-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media