Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/860435.860499acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Automatic transliteration for Japanese-to-English text retrieval

Published: 28 July 2003 Publication History
  • Get Citation Alerts
  • Abstract

    For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In this paper, we describe a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used for automatically generating transliteration candidates. A monolingual Japanese corpus is then used for automatically validating the transliterated terms. We evaluate the usage of the extracted English-Japanese transliteration pairs with Japanese to English retrieval experiments over the CLEF bilingual test collections. The use of our automatically derived extension to a bilingual translation dictionary improves average precision, both before and after pseudo-relevance feedback, with gains ranging from 2.5% to 64.8%.

    References

    [1]
    Ballesteros, L., and Croft, B. Dictionary Methods for Cross-Language Information Retrieval, In Proceedings of Database and Expert Systems Applications, 1996, 791--801.
    [2]
    Buckley, C., Mitra, M., Walz, J., and Cardie, C. Using Clustering and SuperConcepts within SMART: TREC 6. In Voorhees Ellen M. and Donna K. Harman (editors). The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, MD, 1998, 107--124.
    [3]
    Cohen, P.R. Empirical Methods for Artificial Intelligence. MIT Press, 1995.
    [4]
    Davis, M. On the Effective Use of Large Parallel Corpora in Cross-language Text Retrieval. In G. Grefenstette, ed., Cross-Language Information Retrieval, Kluwer Academic Publishers, 1998, 12--22.
    [5]
    Docherty, V. and Heid, U. Computational Metalexicography in Practice - Corpus-Based Support for the Revision of a Commercial Dictionary. In Proceedings of the VIIIth EURALEX International Congress, 1998, 333--346.
    [6]
    Evans, D.A., and R.G. Lefferts: CLARIT-TREC Experiments. Information Processing and Management, Vol.31, No.3, 1995, 385--395.
    [7]
    Fujii, A., and Ishikawa, T. Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computer and the Humanities, Vol 35, No. 4, 2001, 389--420.
    [8]
    Gao, J., Nie, J., He, H, Chen, W., Zhou, M. Resolving Query Translation Ambiguity using a Decaying Co-occurrence Model and Syntactic Dependence Relations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002, 183--190.
    [9]
    Grefenstette, G. The Problem of Cross Language Information Retrieval. In G. Grefenstette, ed., Cross Language Information Retrieval, Kluwer Academic Publishers, 1998, 1--9.
    [10]
    Jang, M., Myaeng, S. H., and Park, S. Y. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), 1999, 223-229.
    [11]
    Kando, N., Nozue, T. (ed.) NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. Tokyo, Japan, 1999.
    [12]
    Kang, B. J. and Choi, K. S. Automatic English- Korean Back-transliteration. In Proceedings of 11th Conference on Hangul and Korean Information Processing, 1999.
    [13]
    Knight, K. and Graehl, J. Machine Transliteration. Computational Linguistics: 24(4), 1998.
    [14]
    Maeda, A., Sadat, F., Yoshikawa, M., and Uemura S. Query Term Disambiguation for Web Cross-Language Information Retrieval using a Search Engine. In Proceedings of the 5th International Workshop on Information Retrieval with Asian Languages, 2000.
    [15]
    Meng. H. M., Lo, W., Chen, B., and Tang, K. Generating Phonetic Cognates to Handel Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In The Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 2001), 2001.
    [16]
    Milic-Frayling, N., Tong, X., Zhai, C., Evans, D.A. CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments. In E.M. Voorhees and D.K. Harman (Editors), The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238. Washington, DC: U.S. Government Printing Office, 1997, 315--334.
    [17]
    Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Revised Papers. Springer, 2002.
    [18]
    Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002, Revised Papers. Springer, to appear.
    [19]
    Qu, Y., Gefenstette, G., and Evans, D. A. Resolving Translation Ambiguity using Monolingual Corpora. In the Working Notes for the CLEF 2002 Workshop, 2002, 115--126.
    [20]
    Stalls, B. G., and Knight, K. Translating Names and Technical Terms in Arabic Text. In Proceedings of the COLNG/ACL Workshop on Computational Approaches to Semitic Languages, 1998.

    Cited By

    View all
    • (2018)Machine transliteration and transliterated text retrieval: a surveySādhanā10.1007/s12046-018-0828-843:6Online publication date: 7-Jun-2018
    • (2014)Query expansion for mixed-script information retrievalProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609622(677-686)Online publication date: 3-Jul-2014
    • (2012)Transliteration mining using large training and test setsProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382061(243-252)Online publication date: 3-Jun-2012
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
    July 2003
    490 pages
    ISBN:1581136463
    DOI:10.1145/860435
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 July 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic transliteration
    2. cross language information retrieval

    Qualifiers

    • Article

    Conference

    SIGIR03
    Sponsor:

    Acceptance Rates

    SIGIR '03 Paper Acceptance Rate 46 of 266 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Machine transliteration and transliterated text retrieval: a surveySādhanā10.1007/s12046-018-0828-843:6Online publication date: 7-Jun-2018
    • (2014)Query expansion for mixed-script information retrievalProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval10.1145/2600428.2609622(677-686)Online publication date: 3-Jul-2014
    • (2012)Transliteration mining using large training and test setsProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382061(243-252)Online publication date: 3-Jun-2012
    • (2012)Translation of Biomedical Terms by Inferring Rewriting RulesMachine Learning10.4018/978-1-60960-818-7.ch514(1417-1433)Online publication date: 2012
    • (2012)Translation techniques in cross-language information retrievalACM Computing Surveys10.1145/2379776.237977745:1(1-44)Online publication date: 7-Dec-2012
    • (2012)Analysis of discussion contributions in translated Wikipedia articlesProceedings of the 4th international conference on Intercultural Collaboration10.1145/2160881.2160891(57-66)Online publication date: 21-Mar-2012
    • (2012)Translingual Mining from Text DataMining Text Data10.1007/978-1-4614-3223-4_10(323-359)Online publication date: 7-Jan-2012
    • (2011)Improved transliteration mining using graph reinforcementProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145578(1384-1393)Online publication date: 27-Jul-2011
    • (2010)Finite-state scriptural translationProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944657(791-800)Online publication date: 23-Aug-2010
    • (2009)A hybrid model for Urdu Hindi transliterationProceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration10.5555/1699705.1699746(177-185)Online publication date: 7-Aug-2009
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media