Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/976909.979680dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

A word-to-word model of translational equivalence

Published: 07 July 1997 Publication History

Abstract

Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expenses of inducing or applying a full translation model. For theses applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level. The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc., Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionary-sized translation lexicons, and it can do so with over 99% accuracy.

References

[1]
P. F. Brown, J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, & P. Roossin, "A Statistical Approach to Language Translation," Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, 1988.
[2]
P. F. Brown, J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, R. Mercer, & P. Roossin, "A Statistical Approach to Machine Translation," Computational Linguistics 16(2), 1990.
[3]
P. F. Brown, V. J. Della Pietra, S. A. Della Pietra & R. L. Mercer, "The Mathematics of Statistical Machine Translation: Parameter Estimation," Computational Linguistics 19(2) 1993.
[4]
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, M. J. Goldsmith, J. Hajic, R. L. Mercer & S. Mohanty, "But Dictionaries are Data Too," Proceedings of the ARPA HLT Workshop, Princeton, NJ, 1993.
[5]
R. Catizone, G. Russell & S. Warwick "Deriving Translation Data from Bilingual Texts," Proceedings of the First International Lexical Acquisition Workshop, Detroit, MI, 1993.
[6]
S. Chen, Building Probabilistic Models for Natural Language, Ph.D. Thesis, Harvard University, 1996.
[7]
K. W. Church & E. H. Hovy, "Good Applications for Crummy Machine Translation," Machine Translation 8, 1993.
[8]
I. Dagan, K. Church, & W. Gale, "Robust Word Alignment for Machine Aided Translation," Proceedings of the Workshop on Very Large Corpora: Academic and Industrial Perspectives, Columbus, OH, 1993.
[9]
A. P. Dempster, N. M. Laird & D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society 34(B), 1977.
[10]
T. Dunning, "Accurate Methods for the Statistics of Surprise and Coincidence," Computational Linguistics 19(1), 1993.
[11]
P. Fung, "Compiling Bilingual Lexicon Entries from a Non-Parallel English-Chinese Corpus," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995a.
[12]
P. Fung, "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora," Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Boston, MA, 1995b.
[13]
W. Gale & K. W. Church, "A Program for Aligning Sentences in Bilingual Corpora" Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991.
[14]
W. Gale & K. W. Church, "Identifying Word Correspondences in Parallel Texts," Proceedings of the DARPA SNL Workshop, 1991.
[15]
A. Kumano & H. Hirakawa, "Building an MT Dictionary from Parallel Texts Based on Linguistic and Statistical Information," Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994.
[16]
E. Macklovitch "Using Bi-textual Alignment for Translation Validation: The TransCheck System," Proceedings of the 1st Conferene of the Association for Machine Translation in the Americas, Columbia, MD, 1994.
[17]
E. Macklovitch & M.-L. Hannan, "Line 'Em Up: Advances in Alignment Technology and their Impact on Translation Support Tools," 2nd Conference of the Association for Machine Translation in the Americas, Montreal, Canada, 1996.
[18]
I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.
[19]
I. D. Melamed, "A Geometric Approach to Mapping Bitext Correspondence," Proceedings of the First Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, 1996a.
[20]
I. D. Melamed "Automatic Detection of Omissions in Translations," Proceedings of the 16th International Conferene on Computational Linguistics, Copenhagen, Denmark, 1996b.
[21]
I. D. Melamed, "Automatic Construction of Clean Broad-Coverage Translation Lexicons," 2nd Conference of the Association for Machine Translation in the Americas, Montreal, Canada, 1996c.
[22]
I. D. Melamed, "Measuring Semantic Entropy," Proceedings of the SIGLEX Workshop on Tagging Text with Lexical Semantics, Washington, DC, 1997.
[23]
I. D. Melamed, "A Portable Algorithm for Mapping Bitext Correspondence," Proceedings of the 35th Conference of the Association for Computational Linguistics, Madrid, Spain, 1997. (in this volume)
[24]
A. Melby, "A Bilingual Concordance System and its Use in Linguistic Studies," Proceedings of the English LACUS Forum, Columbia, SC, 1981.
[25]
A. Nasr, personal communication, 1997.
[26]
P. Resnik & I. D. Melamed, "Semi-Automatic Acquisition of Domain-Specific Translation Lexicons," Proceedings of the 7th ACL Conference on Applied Natural Language Processing, Washington, DC, 1997.
[27]
D. W. Oard & B. J. Dorr, "A Survey of Multilingual Text Retrieval, UMIACS TR-96-19, University of Maryland, College Park, MD, 1996.
[28]
F. Smadja, "How to Compile a Bilingual Collocational Lexicon Automatically," Proceedings of the AAAI Workshop on Statistically-Based NLP Techniques, 1992.
[29]
D. Wu & X. Xia, "Learning an English-Chinese Lexicon from a Parellel Corpus," Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, MD, 1994.

Cited By

View all
  • (2015)A Constraint Approach to Pivot-Based Bilingual Dictionary InductionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/272314415:1(1-26)Online publication date: 21-Nov-2015
  • (2010)Improving corpus comparability for bilingual lexicon extraction from comparable corporaProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873854(644-652)Online publication date: 23-Aug-2010
  • (2009)Compiling a massive, multilingual dictionary via probabilistic inferenceProceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 110.5555/1687878.1687917(262-270)Online publication date: 2-Aug-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
July 1997
543 pages

Sponsors

  • Directorate General XIII (European Commission)
  • Universidad Complutense de Madrid
  • Universidad Autónoma de Madrid
  • Universidad Nacional de Educación a Distancia
  • Universidad Politécnica de Madrid

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 1997

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)10
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2015)A Constraint Approach to Pivot-Based Bilingual Dictionary InductionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/272314415:1(1-26)Online publication date: 21-Nov-2015
  • (2010)Improving corpus comparability for bilingual lexicon extraction from comparable corporaProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873854(644-652)Online publication date: 23-Aug-2010
  • (2009)Compiling a massive, multilingual dictionary via probabilistic inferenceProceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 110.5555/1687878.1687917(262-270)Online publication date: 2-Aug-2009
  • (2009)A new objective function for word alignmentProceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing10.5555/1611638.1611642(28-35)Online publication date: 4-Jun-2009
  • (2009)Improving the extraction of bilingual terminology from WikipediaACM Transactions on Multimedia Computing, Communications, and Applications10.1145/1596990.15969955:4(1-17)Online publication date: 6-Nov-2009
  • (2008)Brains, not brawnACM Transactions on Speech and Language Processing 10.1145/1839478.18394797:1(1-23)Online publication date: 4-Oct-2008
  • (2006)Constraining the phrase-based, joint probability statistical translation modelProceedings of the Workshop on Statistical Machine Translation10.5555/1654650.1654675(154-157)Online publication date: 8-Jun-2006
  • (2006)Sub-sentential alignment using substring co-occurrence countsProceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop10.5555/1557856.1557860(13-18)Online publication date: 20-Jul-2006
  • (2006)Using natural alignment to extract translation equivalentsProceedings of the 7th international conference on Computational Processing of the Portuguese Language10.1007/11751984_5(41-49)Online publication date: 13-May-2006
  • (2005)Data driven approaches to speech and language processingNonlinear Speech Modeling and Applications10.5555/2167540.2167551(164-198)Online publication date: 1-Jan-2005
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media