Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1802514.1802552guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

An approach for extracting bilingual terminology from Wikipedia

Published: 19 March 2008 Publication History

Abstract

With the demand of bilingual dictionaries covering domain-specific terminology, research in the field of automatic dictionary extraction has become popular. However, accuracy and coverage of dictionaries created based on bilingual text corpora are often not sufficient for domain-specific terms. Therefore, we present an approach to extracting bilingual dictionaries from the link structure of Wikipedia, a huge scale encyclopedia that contains a vast amount of links between articles in different languages. Our methods analyze not only these interlanguage links but extract even more translation candidates from redirect page and link text information. In an experiment, we proved the advantages of our methods compared to a traditional approach of extracting bilingual terminology from parallel corpora.

References

[1]
Shimohata, S.: Finding translation candidates from patent corpus. In: Proceedings of the Machine Translation Summit, September 12-16, 2005, pp. 50-54 (2005).
[2]
Sadat, F., Yoshikawa, M., et al.: Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval. In: The Companion Volume to the Proceedings of Annual Meeting of the Association for Computational Linguistics, July 2003, pp. 141-144 (2003).
[3]
Nakayama, K., Hara, T., Nishio, S.: A thesaurus construction method from large scale web dictionaries. In: IEEE International Conference on Advanced Information Networking and Applications (AINA 2007), pp. 932-939 (2007).
[4]
Nakayama, K., Hara, T., Nishio, S.: Wikipedia mining for an association web thesaurus construction. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, Springer, Heidelberg (2007).
[5]
Breen, J.W.: Jmdict: a japanese-multilingual dictionary. In: COLING Multilingual Linguistic Resources Workshop (August 2004).
[6]
Tsuji, K., Kageura, K.: Automatic generation of japanese-english bilingual thesauri based on bilingual corpora. Journal of the American Society for Information Science and Technology 57(7), 891-906 (2006).
[7]
Fung, P., McKeown, K.: A technical word- and term-translation aid using noisy parallel corpora across language groups. Machine Translation 12(1-2), 53-87 (1997).
[8]
Kaji, H.: Adapted seed lexicon and combined bidirectional similarity measures for translation equivalent extraction from comparable corpora. In: Proceedings of the Conference on Theoretical and Methodological Issues in Machine Translation, October 4-6, 2004, pp. 115-124 (2004).
[9]
Wikimedia Foundation: Wikimedia downloads, http://download.wikimedia.org/
[10]
Utiyama, M., Isahara, H.: Reliable measures for aligning japanese-english news articles and sentences. In: Proceedings of the Annual Meeting of Association for Computational Linguistics, pp. 72-79 (2003).
[11]
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. In: Proceedings of the International Conference on Computational Linguistics, vol. 19(2), pp. 263-311 (1993).
[12]
Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the Conference on Computational Linguistics, pp. 836-841 (1996).
[13]
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, October 2000, pp. 440-447 (2000).

Cited By

View all
  • (2019)Matching Graph, a Method for Extracting Parallel Information from Comparable CorporaACM Transactions on Asian and Low-Resource Language Information Processing10.1145/332971319:1(1-29)Online publication date: 25-Jul-2019
  • (2016)Not at Home on the RangeProceedings of the 2016 CHI Conference on Human Factors in Computing Systems10.1145/2858036.2858123(13-25)Online publication date: 7-May-2016
  • (2014)WikiBrainProceedings of The International Symposium on Open Collaboration10.1145/2641580.2641615(1-10)Online publication date: 27-Aug-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
DASFAA'08: Proceedings of the 13th international conference on Database systems for advanced applications
March 2008
713 pages
ISBN:3540785671
  • Editors:
  • Jayant R. Haritsa,
  • Ramamohanarao Kotagiri,
  • Vikram Pudi

Sponsors

  • Google Inc.
  • Tata Consultancy Services
  • Persistent Systems
  • Yahoo!
  • Great Software Laboratory

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 March 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Matching Graph, a Method for Extracting Parallel Information from Comparable CorporaACM Transactions on Asian and Low-Resource Language Information Processing10.1145/332971319:1(1-29)Online publication date: 25-Jul-2019
  • (2016)Not at Home on the RangeProceedings of the 2016 CHI Conference on Human Factors in Computing Systems10.1145/2858036.2858123(13-25)Online publication date: 7-May-2016
  • (2014)WikiBrainProceedings of The International Symposium on Open Collaboration10.1145/2641580.2641615(1-10)Online publication date: 27-Aug-2014
  • (2013)Chinese terminology extraction using EM-Based transfer learning methodProceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I10.5555/2468221.2468235(139-152)Online publication date: 24-Mar-2013
  • (2013)An open-source toolkit for mining WikipediaArtificial Intelligence10.1016/j.artint.2012.06.007194(222-239)Online publication date: 1-Jan-2013
  • (2012)Using domain-specific and collaborative resources for term translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392950(86-94)Online publication date: 12-Jul-2012
  • (2012)Towards building a multilingual semantic networkProceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation10.5555/2387636.2387641(30-37)Online publication date: 7-Jun-2012
  • (2011)Analyzing methods for improving precision of pivot based bilingual dictionariesProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145526(846-856)Online publication date: 27-Jul-2011
  • (2011)Language-independent context aware query translation using WikipediaProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web10.5555/2024236.2024260(145-150)Online publication date: 24-Jun-2011
  • (2009)Cross-lingual alignment and completion of Wikipedia templatesProceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies10.5555/1572433.1572437(21-29)Online publication date: 4-Jun-2009
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media