Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2064975.2064984acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Automatic construction of a bilingual thesaurus using citation analysis

Published: 24 October 2011 Publication History

Abstract

We propose a method for constructing a bilingual thesaurus automatically from patents. First, we extract hypernym-hyponym relations from Japanese and US patents by using the pattern "A such as B". Second, we align terms between these thesauri by combining statistical machine translation and citation analysis techniques. To confirm the effectiveness of our method, we conducted some experiments. The results showed that our best method obtained Recall of 79.4%, Precision of 77.5%, and F-measure of 78.3%.

References

[1]
Fujii, A. and Ishikawa, T. 2000. Cross-Language Information Retrieval Based on Query Keyword Translation: An Internet Search Application. International Journal of Computer Processing of Oriental Languages, Vol.13, No.1, pp.1--13.
[2]
Fujii, A., Iwayama, M., and Kando, N. 2007. Overview of the Patent Retrieval Task at the NTCIR-6 Workshop. Proceedings of the 6th NTCIR Workshop Meeting.
[3]
Fujii, A., Utiyama, M. Yamamoto, M., Utsuro, T. Ehara, T., Echizen-ya, H., and Shimohata, S. 2010. Overview of the Patent Translation Task at the NTCIR-8 Workshop. In Proceedings of the 8th NTCIR Workshop Meeting, pp.371--376.
[4]
Hearst, M. A. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. Proceedings of the 14th International Conference on Computational Linguistics, pp. 539--545.
[5]
Kessler, M. M. 1963. Bibliographic Coupling between Scientific Papers. American Documentation, Vol. 14, No. 1, pp. 10--25.
[6]
Lee., L. Measures of Distributional Similarity. 1999. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 25--32, 1999.
[7]
Lin, D. 1998. Automatic Retrieval and Clustering of Similar Words. Proceedings of the 17th International Conference on Computational Linguistics, pp. 768--774.
[8]
Morishita, Y., Utsuro, T., and Yamamoto, M. 2008. Integrating a Phrase-based SMT Model and a Bilingual Lexicon for Human in Semi-Automatic Acquisition of Technical Term Translation Lexicon. Proceedings of the 8th Conference of the Association for Machine Translation in the Americas, pp. 153--162.
[9]
Nanba, H., Fujii, A., Iwayama, M., and Hashimoto, T. 2010. Overview of the Patent Mining Task at the NTCIR-8 Workshop. Proceedings of the 8th NTCIR Workshop Meeting, pp. 293--302.
[10]
Och, F. J. and Ney, H. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, Vol. 29, No. 1, pp. 19--51.
[11]
Ohishi, Y., Itou, K., Takeda, K., and Fujii, A. 2006. Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus. Proceedings of the 5th International Conference on Language Resources and Evaluation, pp. 1368--1371.
[12]
Roda, G., Tait, J., Piroi, F., and Zenz, V. 2010. CLEF-IP 2009: Retrieval Experiments in the Intellectual Property Domain, in Peters, C., Di Nunzio, G. M., Kurimo, M., Mostefa, D., Penas, A. and Roda, G. (eds) Multilingual Information Access Evaluation I. Text Retrieval Experiments 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers, Springer LNCS, Vol. 6241, pp. 385--409.
[13]
Sato, S. and Sasaki, Y. 2003. Automatic Collection of Related Terms from the Web. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Vol. 2, pp. 121--124.
[14]
Shinzato, K. and Torisawa, K. 2004. Acquiring Hyponymy Relations from Web Documents. Proceedings of Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, pp. 73--80.
[15]
Small, H. 1973. Co-citation in the Scientific Literature: A New Measure of the Relationship between Two Documents. Journal of the American Society for Information Science, Vol. 24, pp. 265--269.
[16]
Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., and Sato, S. 2005. Translation Estimation for Technical Terms using Corpus collected from the Web. Proceedings of the Pacific Association for Computational Linguistics, pp. 325--331.

Cited By

View all
  • (2023)Automatic Multilingual Hypernym-Hyponym Relation Extraction Using a Link Prediction Model2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI59060.2023.00028(94-99)Online publication date: 8-Jul-2023
  • (2017)The Portability of Three Types of Text Mining Techniques into the Patent Text GenreCurrent Challenges in Patent Information Retrieval10.1007/978-3-662-53817-3_9(241-280)Online publication date: 26-Mar-2017
  • (2015)An approach to automated thesaurus construction using clusterization-based dictionary analysisProceedings of the 17th Conference of Open Innovations Association FRUCT10.1109/FRUCT.2015.7117979(104-109)Online publication date: 27-Apr-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PaIR '11: Proceedings of the 4th workshop on Patent information retrieval
October 2011
46 pages
ISBN:9781450309554
DOI:10.1145/2064975
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bilingual thesaurus
  2. citation analysis
  3. cross-lingual patent retrieval
  4. machine translation

Qualifiers

  • Research-article

Conference

CIKM '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 7 of 13 submissions, 54%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Automatic Multilingual Hypernym-Hyponym Relation Extraction Using a Link Prediction Model2023 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI59060.2023.00028(94-99)Online publication date: 8-Jul-2023
  • (2017)The Portability of Three Types of Text Mining Techniques into the Patent Text GenreCurrent Challenges in Patent Information Retrieval10.1007/978-3-662-53817-3_9(241-280)Online publication date: 26-Mar-2017
  • (2015)An approach to automated thesaurus construction using clusterization-based dictionary analysisProceedings of the 17th Conference of Open Innovations Association FRUCT10.1109/FRUCT.2015.7117979(104-109)Online publication date: 27-Apr-2015
  • (2014)Limitations of Automatic Patent IRDatenbank-Spektrum10.1007/s13222-014-0149-y14:1(5-17)Online publication date: 6-Feb-2014
  • (2013)Patent RetrievalFoundations and Trends in Information Retrieval10.1561/15000000277:1(1-97)Online publication date: 20-Feb-2013

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media