Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Effective query generation and postprocessing strategies for prior art patent search

Published: 01 March 2012 Publication History

Abstract

Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search. © 2012 Wiley Periodicals, Inc.

References

[1]
Allan, J., Connell, M.E., Croft, W.B., Feng, F.F., Fisher, D., & Li, X. (2000). INQUERY and TREC-9. In Proceedings of the 9th Text REtrieval Conference (TREC '09) (pp. 551––562). NIST Special Publication 500-249. Gaithersburg, MD: National Institute of Standards and Technology.
[2]
Baeza-Yates, R., & Ribeiro-Neto, B. (Eds.). (1999). Modern information retrieval. New York: ACM Press.
[3]
Callan, J. (2000). Distributed information retrieval. In B.Croft (Ed.), Advances in information retrieval (pp. 127–150). Dordrecht, The Netherlands: Kluwer.
[4]
Cetintas, S., & Si, L. (2007). Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search. In Proceedings of the 30th International Conference on Research and Development on Information Retrieval (ACM SIGIR'07) (pp. 707–708). New York: ACM Press.
[5]
Cetintas, S., & Si, L. (2009). Strategies for effective chemical information retrieval. In Proceedings of the 18th Text REtrieval Conference (TREC '09). NIST Special Publication 500-278. Gaithersburg, MD: National Institute of Standards and Technology.
[6]
Chawla, N.V., Japkowicz, N., & Kolcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM Special Interest Group on Knowledge Discovery and Data Mining Explorations Newsletter, 6(1), 1–6.
[7]
Corbett, P., & Murray-Rust, P. (2006). High-throughput identification of chemistry in life science texts. In Proceedings of the Second International Symposium on Computational Life Science (CompLife '06) (pp. 107–118). Berlin, Germany: Springer-Verlag.
[8]
Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2003). Overview of the TREC 2003 web track. In Proceedings of the 12th Text REtrieval Conference. Gaithersburg, MD: National Institute of Standards and Technology.
[9]
Fujii, A. (2007). Enhancing patent retrieval by citation analysis. In Proceedings of the 30th International Conference on Research and Development on Information Retrieval (ACM SIGIR'07) (pp. 793–794). New York: ACM Press.
[10]
Harris, C.G., Arens, R., & Srinivasan, P. (2011). Using classification code hierarchies for patent prior art searches. In M.Lupu, K.Mayer, J.Tait, A.J.Trippe, & W.B.Croft (Eds.), Current challenges in patent information retrieval (pp. 287––304). Berlin, Germany: Springer-Verlag.
[11]
Itoh, H. (2004, June). NTCIR-4 Patent retrieval experiments at RICOH. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-4), Tokyo, Japan.
[12]
Itoh, H. (2005, June). NTCIR-5 Patent retrieval experiments at RICOH. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-5), Tokyo, Japan.
[13]
Klinger, R., Kolárik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C.M. (2008). Detection of IUPAC and IUPAC-like chemical names. Bioinformatics, 24, i268––i276.
[14]
Konishi, K. (2005, June). Query terms extraction from patent document for invalidity search. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-5), Tokyo, Japan.
[15]
Lu, Y., Meng, W., Shu, L., Yu, C., & Liu, K.-L. (2005). Evaluation of result merging strategies for metasearch engines. In Proceedings of the Sixth International Conference on Web Information Systems Engineering (pp. 53–66). Berlin, Germany: Springer-Verlag.
[16]
Lupu, M., Huang, J., & Zhu, J. (2011). Evaluation of chemical information retrieval tools. In M.Lupu, K.Mayer, J.Tait, A.J.Trippe, & W.B.Croft (Eds.), Current challenges in patent information retrieval (pp. 109––124). Berlin, Germany: Springer-Verlag.
[17]
Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., & Crestani, F. (2011). Building queries for prior-art search. In Proceedings of the Second Information Retrieval Facility Conference (IRFC'11) (pp. 3–15). Berlin, Germany: Springer-Verlag.
[18]
Manning, C.D., Raghavan, P., & Schtze, H. (Eds.). (2008). Introduction to information retrieval. New York: Cambridge University Press.
[19]
Mase, H., Matsubayashi, T., Ogawa, Y., Iwayama, M., & Oshio, T. (2005). Proposal of two-stage patent retrieval method considering the claim structure. ACM Transactions on Asian Language Information Processing, 4(2), 186–202.
[20]
Metzler, D., & Croft, B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing & Management, 40(5), 735––750.
[21]
Mukherjea, S., & Bamba, B. (2004). BioPatentMiner: An information retrieval system for BioMedical patents. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04) (pp. 1066–1077). San Francisco: Morgan Kaufmann.
[22]
Porter, M.F. (1980). An algorithm for suffix stripping. Program: Electronic Library and Information Systems, 14(3), 130–137.
[23]
Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (ACM CIKM'04) (pp. 42–49). New York: ACM Press.
[24]
Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2008). How to make LETOR more useful and reliable. In Proceedings of the ACM Special Interest Group on Information Retrieval 2008 Workshop on Learning to Rank for Information Retrieval (pp. 52–58). New York: ACM Press.
[25]
Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2010). LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4), 346–374.
[26]
Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis.
[27]
Sun, B., Mitra, P., & Giles, L. (2008). Mining, indexing, and searching for textual chemical molecule information on the web. In Proceedings of the 17th International Conference on the World Wide Web (ACM WWW'08) (pp. 735–744). New York: ACM Press.
[28]
Taraki, T., Fujii, A., & Ishikawa, T. (2004). Associative document retrieval by query subtopic analysis and its applications to invalidity patent search. In Proceedings of the 13th International Conference on Information and Knowledge Management (ACM CIKM'04) (pp. 399–405). New York: ACM Press.
[29]
World Intellectual Property Office (WIPO). (2011a). International Patent Classification (IPC). Retrieved from
[30]
World Intellectual Property Office (WIPO). (2011b). International Patent Classification (IPC) statistics. Retrieved from
[31]
Xue, X., & Croft, B. (2009a). Automatic query generation for patent search. In Proceedings of the 18th International Conference on Information and Knowledge Management (ACM CIKM'09) (pp. 2037–2040). New York: ACM Press.
[32]
Xue, X., & Croft, B. (2009b). Transforming patents into prior-art queries. In Proceedings of the 32nd International Conference on Research and Development on Information Retrieval (ACM SIGIR'09) (pp. 808–809). New York: ACM Press.

Cited By

View all
  • (2019)Patent expanded retrieval via word embedding under composite-domain perspectivesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7056-613:5(1048-1061)Online publication date: 1-Oct-2019
  • (2017)Reply WithProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132979(327-336)Online publication date: 6-Nov-2017
  • (2016)When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?Proceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983858(1453-1462)Online publication date: 24-Oct-2016
  • Show More Cited By
  1. Effective query generation and postprocessing strategies for prior art patent search

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of the American Society for Information Science and Technology
    Journal of the American Society for Information Science and Technology  Volume 63, Issue 3
    March 2012
    201 pages

    Publisher

    John Wiley & Sons, Inc.

    United States

    Publication History

    Published: 01 March 2012

    Author Tags

    1. International Patent Classification
    2. patents
    3. query formulation
    4. search strategies
    5. weighting

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Patent expanded retrieval via word embedding under composite-domain perspectivesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7056-613:5(1048-1061)Online publication date: 1-Oct-2019
    • (2017)Reply WithProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132979(327-336)Online publication date: 6-Nov-2017
    • (2016)When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?Proceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983858(1453-1462)Online publication date: 24-Oct-2016
    • (2015)Patent MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370416:2(1-19)Online publication date: 21-May-2015
    • (2015)Multilayer source selection as a tool for supporting patent search and classificationInformation Retrieval10.1007/s10791-015-9270-218:6(559-585)Online publication date: 1-Dec-2015
    • (2014)Patent Query Formulation by Synthesizing Multiple Sources of Relevance EvidenceACM Transactions on Information Systems10.1145/265136332:4(1-30)Online publication date: 28-Oct-2014
    • (2013)Leveraging conceptual lexiconProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484056(113-122)Online publication date: 28-Jul-2013

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media