Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/564376.564406acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Comparing cross-language query expansion techniques by degrading translation resources

Published: 11 August 2002 Publication History

Abstract

The quality of translation resources is arguably the most important factor affecting the performance of a cross-language information retrieval system. While many investigations have explored the use of query expansion techniques to combat errors induced by translation, no study has yet examined the effectiveness of these techniques across resources of varying quality. This paper presents results using parallel corpora and bilingual wordlists that have been deliberately degraded prior to query translation. Across different languages, translingual resources, and degrees of resource degradation, pre-translation query expansion is tremendously effective. In several instances, pre-translation expansion results in better performance when no translations are available, than when an uncompromised resource is used without pre-translation expansion. We also demonstrate that post-translation expansion using relevance feedback can confer modest performance gains. Measuring the efficacy of these techniques with resources of different quality suggests an explanation for the conflicting reports that have appeared in the literature.

References

[1]
L. Ballesteros and W. B. Croft, 'Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval.' In the Proceedings of the 20th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-97), pp. 84--91, 1997.
[2]
L. Ballesteros and W. B. Croft, 'Resolving Ambiguity for Crosslanguage Retrieval.' In the Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-98), pp. 64--71, 1998.
[3]
A. Berger and J. Lafferty, 'Information Retrieval as Statistical Translation.' In Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-99), pp. 222--229, 1999.
[4]
C. Buckley, M. Mitra, J. Walz, and C. Cardie, 'Using Clustering and Super Concepts within SMART: TREC-6.' In E. Voorhees and D. Harman (eds.), Proceedings of the Sixth Text REtrieval Conference (TREC-6), NIST Special Publication 500--240, 1998.
[5]
C. Buckley, 'The TREC-9 Query Track.' In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Ninth Text REtrieval Conference (TREC-9), pp. 81--85, 2001.
[6]
K. W. Church, 'Char_align: A program for aligning parallel texts at the character level.' In the Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pp. 1--8, 1993.
[7]
A. Diekema, 'May the Best Team Win: Language Resources in CLIR.' Position paper at the CLEF-2000 workshop. Available online at: http://clef.iei.pi.cnr.it:2002/DELOS/CLEF/diekema.pdf
[8]
M. Franz, J. S. McCarley, T. Ward, and W. Zhu, 'Quantifying the Utility of Parallel Corpora.' Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-01), pp. 398--399, 2001.
[9]
F. Gey and A. Chen, 'TREC-9 Cross-Language Information Retrieval (English - Chinese) Overview.' In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Ninth Text REtrieval Conference (TREC-9), pp. 15--23, 2001.
[10]
J. Gonzolo, 'Language Resources in Cross-Language Text Retrieval: A CLEF Perspective.' In Carol Peters (ed.) Cross-Language Information Retrieval and Evaluation: Proceedings of the CLEF-2000 Workshop, Lisbon, Portugal, Lecture Notes in Computer Science 2069, Springer, pp. 36--47, 2001.
[11]
D. Harman, 'Relevance Feedback Revisited.' In the Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-92), pp. 1--10, 1992.
[12]
D. Harman, 'Overview of the Fourth Text REtrieval Conference (TREC-4).' In D. K. Harman, ed., Proceedings of the Fourth Text REtrieval Conference (TREC-4), NIST Special Publication 500-236, pp. 1--24, 1995.
[13]
D. Hiemstra, 'Using Language Models for Information Retrieval.' Ph. D. Thesis, Center for Telematics and Information Technology, The Netherlands, 2000.
[14]
W. Kraaij, 'TNO at CLEF-2001: Comparing Translation Resources.' To appear in Carol Peters (ed.), Proceedings of the CLEF-2001
[15]
K. L. Kwok and M. Chan, 'Improving Two-Stage Ad-Hoc Retrieval for Short Queries.' In the Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-98), pp. 250--256, 1998.
[16]
T. K. Landauer and M. L. Littman, 'Fully automated cross-language document retrieval using latent semantic indexing.' In the Proceedings of the 6th Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research. 31--38, 1990.
[17]
J. Mayfield and P. McNamee, 'Converting On-Line Bilingual Dictionaries from Human-Readable to Machine-Readable Form.' In these proceedings.
[18]
P. McNamee and J. Mayfield, 'JHU/APL Experiments at CLEF: Translation Resources and Score Normalization.' To appear in Carol Peters (ed.), Proceedings of the CLEF-2001 Workshop.
[19]
D. Oard and A. Diekema, 'Cross-Language Information Retrieval.' In M. Williams (ed.), Annual Review of Information Science, pp. 223--256, 1998.
[20]
C. Peters, 'Foreward to the Proceedings of the CLEF-2001 Workshop', to appear in 2002.
[21]
A. Pirkola, T. Hedlund, H. Keskusalo, and K. Järvelin, 'Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings.' In Information Retrieval, vol. 4, pp. 209--230, 2001.
[22]
J. M. Ponte and W. B. Croft, 'A Language Modeling Approach to Information Retrieval.' In the Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-98), pp. 275--281, 1998.
[23]
Y. Qiu and H. P. Frie, 'Concept Based Query Expansion.' In the Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-93), 1993.
[24]
P. Resnik, 'Mining the Web for Bilingual Text.' Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99), 1999.
[25]
G. Salton and C. Buckley, 'Improving Retrieval Performance by Relevance Feedback.' In the Journal of the American Society for Information Science, 41(4), pp. 288--297, 1990.
[26]
J. Xu and W. B. Croft, 'Query Expansion Using Local and Global Document Analysis.' In the Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-96), pp. 4--11, 1996.
[27]
J. Xu and R. Weischedel, 'Cross-lingual Information Retrieval Using Hidden Markov Models.' In the Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), 2000.
[28]
J. Xu, A. Fraser, and R. Weischedel, 'TREC 2001 Cross-lingual Retrieval at BBN.' In TREC-2001 Notebook Papers, pp. 122--131, 2001.
[29]
Cross-Language Evaluation Forum, http://www.clef-campaign.org/
[30]
NTCIR Project, http://research.nii.ac.jp/ntcir/
[31]
Text REtrieval Conference, http://trec.nist.gov/
[32]
http://dictionaries.travlang.com/
[33]
http://europa.eu.int/
[34]
http://www.june29.com/IDP/

Cited By

View all
  • (2024)Distillation for Multilingual Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657955(2368-2373)Online publication date: 10-Jul-2024
  • (2024)Query Expansion Using Proposed Location-Based Algorithm for Hindi–English CLIR: Analyzing Three Test CollectionsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142459001838:05Online publication date: 11-May-2024
  • (2022)Transfer Learning Approaches for Building Cross-Language Dense Retrieval ModelsAdvances in Information Retrieval10.1007/978-3-030-99736-6_26(382-396)Online publication date: 5-Apr-2022
  • Show More Cited By

Index Terms

  1. Comparing cross-language query expansion techniques by degrading translation resources

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2002
      478 pages
      ISBN:1581135610
      DOI:10.1145/564376
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 August 2002

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cross-language information retrieval
      2. query expansion
      3. query translation
      4. translation resources

      Qualifiers

      • Article

      Conference

      SIGIR02
      Sponsor:

      Acceptance Rates

      SIGIR '02 Paper Acceptance Rate 44 of 219 submissions, 20%;
      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Distillation for Multilingual Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657955(2368-2373)Online publication date: 10-Jul-2024
      • (2024)Query Expansion Using Proposed Location-Based Algorithm for Hindi–English CLIR: Analyzing Three Test CollectionsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142459001838:05Online publication date: 11-May-2024
      • (2022)Transfer Learning Approaches for Building Cross-Language Dense Retrieval ModelsAdvances in Information Retrieval10.1007/978-3-030-99736-6_26(382-396)Online publication date: 5-Apr-2022
      • (2020)Experiments with Cross-Language Speech Retrieval for Lower-Resource LanguagesInformation Retrieval Technology10.1007/978-3-030-42835-8_13(145-157)Online publication date: 27-Feb-2020
      • (2019)Simulating CLIR Translation Resource Scarcity using High-resource LanguagesProceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3341981.3344236(129-136)Online publication date: 26-Sep-2019
      • (2019)The Use of Ontology in Retrieval: A Study on Textual, Multilingual, and Multimedia RetrievalIEEE Access10.1109/ACCESS.2019.28978497(21662-21686)Online publication date: 2019
      • (2019)A learning to rank approach for cross-language information retrieval exploiting multiple translation resourcesNatural Language Engineering10.1017/S135132491900003225:3(363-384)Online publication date: 5-Mar-2019
      • (2018)Cross-Language Mining and RetrievalEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_89(667-672)Online publication date: 7-Dec-2018
      • (2017)Camera Based Two Factor Authentication Through Mobile and Wearable DevicesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31319041:3(1-37)Online publication date: 11-Sep-2017
      • (2017)Smartwatch Wearing Behavior AnalysisProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/31318921:3(1-31)Online publication date: 11-Sep-2017
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media