article

Effective query generation and postprocessing strategies for prior art patent search

Authors:

Suleyman Cetintas,

Luo SiAuthors Info & Claims

Journal of the American Society for Information Science and Technology, Volume 63, Issue 3

Pages 512 - 527

https://doi.org/10.1002/asi.21708

Published: 01 March 2012 Publication History

Abstract

Rapid increase in global competition demands increased protection of intellectual property rights and underlines the importance of patents as major intellectual property documents. Prior art patent search is the task of identifying related patents for a given patent file, and is an essential step in judging the validity of a patent application. This article proposes an automated query generation and postprocessing method for prior art patent search. The proposed approach first constructs structured queries by combining terms extracted from different fields of a query patent and then reranks the retrieved patents by utilizing the International Patent Classification (IPC) code similarities between the query patent and the retrieved patents along with the retrieval score. An extensive set of empirical results carried out on a large-scale, real-world dataset shows that utilizing 20 or 30 query terms extracted from all fields of an original query patent according to their log(tf)idf values helps form a representative search query out of the query patent and is found to be more effective than is using any number of query terms from any single field. It is shown that combining terms extracted from different fields of the query patent by giving higher importance to terms extracted from the abstract, claims, and description fields than to terms extracted from the title field is more effective than treating all extracted terms equally while forming the search query. Finally, utilizing the similarities between the IPC codes of the query patent and retrieved patents is shown to be beneficial to improve the effectiveness of the prior art search. © 2012 Wiley Periodicals, Inc.

References

[1]

Allan, J., Connell, M.E., Croft, W.B., Feng, F.F., Fisher, D., & Li, X. (2000). INQUERY and TREC-9. In Proceedings of the 9th Text REtrieval Conference (TREC '09) (pp. 551––562). NIST Special Publication 500-249. Gaithersburg, MD: National Institute of Standards and Technology.

[2]

Baeza-Yates, R., & Ribeiro-Neto, B. (Eds.). (1999). Modern information retrieval. New York: ACM Press.

[3]

Callan, J. (2000). Distributed information retrieval. In B.Croft (Ed.), Advances in information retrieval (pp. 127–150). Dordrecht, The Netherlands: Kluwer.

[4]

Cetintas, S., & Si, L. (2007). Exploration of the tradeoff between effectiveness and efficiency for results merging in federated search. In Proceedings of the 30th International Conference on Research and Development on Information Retrieval (ACM SIGIR'07) (pp. 707–708). New York: ACM Press.

[5]

Cetintas, S., & Si, L. (2009). Strategies for effective chemical information retrieval. In Proceedings of the 18th Text REtrieval Conference (TREC '09). NIST Special Publication 500-278. Gaithersburg, MD: National Institute of Standards and Technology.

[6]

Chawla, N.V., Japkowicz, N., & Kolcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM Special Interest Group on Knowledge Discovery and Data Mining Explorations Newsletter, 6(1), 1–6.

Digital Library

[7]

Corbett, P., & Murray-Rust, P. (2006). High-throughput identification of chemistry in life science texts. In Proceedings of the Second International Symposium on Computational Life Science (CompLife '06) (pp. 107–118). Berlin, Germany: Springer-Verlag.

[8]

Craswell, N., Hawking, D., Wilkinson, R., & Wu, M. (2003). Overview of the TREC 2003 web track. In Proceedings of the 12th Text REtrieval Conference. Gaithersburg, MD: National Institute of Standards and Technology.

[9]

Fujii, A. (2007). Enhancing patent retrieval by citation analysis. In Proceedings of the 30th International Conference on Research and Development on Information Retrieval (ACM SIGIR'07) (pp. 793–794). New York: ACM Press.

[10]

Harris, C.G., Arens, R., & Srinivasan, P. (2011). Using classification code hierarchies for patent prior art searches. In M.Lupu, K.Mayer, J.Tait, A.J.Trippe, & W.B.Croft (Eds.), Current challenges in patent information retrieval (pp. 287––304). Berlin, Germany: Springer-Verlag.

[11]

Itoh, H. (2004, June). NTCIR-4 Patent retrieval experiments at RICOH. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-4), Tokyo, Japan.

[12]

Itoh, H. (2005, June). NTCIR-5 Patent retrieval experiments at RICOH. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-5), Tokyo, Japan.

[13]

Klinger, R., Kolárik, C., Fluck, J., Hofmann-Apitius, M., & Friedrich, C.M. (2008). Detection of IUPAC and IUPAC-like chemical names. Bioinformatics, 24, i268––i276.

Digital Library

[14]

Konishi, K. (2005, June). Query terms extraction from patent document for invalidity search. Paper presented at the NII Test Collection for IR Systems Workshop (NTCIR-5), Tokyo, Japan.

[15]

Lu, Y., Meng, W., Shu, L., Yu, C., & Liu, K.-L. (2005). Evaluation of result merging strategies for metasearch engines. In Proceedings of the Sixth International Conference on Web Information Systems Engineering (pp. 53–66). Berlin, Germany: Springer-Verlag.

[16]

Lupu, M., Huang, J., & Zhu, J. (2011). Evaluation of chemical information retrieval tools. In M.Lupu, K.Mayer, J.Tait, A.J.Trippe, & W.B.Croft (Eds.), Current challenges in patent information retrieval (pp. 109––124). Berlin, Germany: Springer-Verlag.

[17]

Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., & Crestani, F. (2011). Building queries for prior-art search. In Proceedings of the Second Information Retrieval Facility Conference (IRFC'11) (pp. 3–15). Berlin, Germany: Springer-Verlag.

[18]

Manning, C.D., Raghavan, P., & Schtze, H. (Eds.). (2008). Introduction to information retrieval. New York: Cambridge University Press.

[19]

Mase, H., Matsubayashi, T., Ogawa, Y., Iwayama, M., & Oshio, T. (2005). Proposal of two-stage patent retrieval method considering the claim structure. ACM Transactions on Asian Language Information Processing, 4(2), 186–202.

Digital Library

[20]

Metzler, D., & Croft, B. (2004). Combining the language model and inference network approaches to retrieval. Information Processing & Management, 40(5), 735––750.

Digital Library

[21]

Mukherjea, S., & Bamba, B. (2004). BioPatentMiner: An information retrieval system for BioMedical patents. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04) (pp. 1066–1077). San Francisco: Morgan Kaufmann.

[22]

Porter, M.F. (1980). An algorithm for suffix stripping. Program: Electronic Library and Information Systems, 14(3), 130–137.

[23]

Robertson, S., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (ACM CIKM'04) (pp. 42–49). New York: ACM Press.

[24]

Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2008). How to make LETOR more useful and reliable. In Proceedings of the ACM Special Interest Group on Information Retrieval 2008 Workshop on Learning to Rank for Information Retrieval (pp. 52–58). New York: ACM Press.

[25]

Qin, T., Liu, T.-Y., Xu, J., & Li, H. (2010). LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 13(4), 346–374.

Digital Library

[26]

Strohman, T., Metzler, D., Turtle, H., & Croft, W. (2004). Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligent Analysis.

[27]

Sun, B., Mitra, P., & Giles, L. (2008). Mining, indexing, and searching for textual chemical molecule information on the web. In Proceedings of the 17th International Conference on the World Wide Web (ACM WWW'08) (pp. 735–744). New York: ACM Press.

[28]

Taraki, T., Fujii, A., & Ishikawa, T. (2004). Associative document retrieval by query subtopic analysis and its applications to invalidity patent search. In Proceedings of the 13th International Conference on Information and Knowledge Management (ACM CIKM'04) (pp. 399–405). New York: ACM Press.

[29]

World Intellectual Property Office (WIPO). (2011a). International Patent Classification (IPC). Retrieved from

[30]

World Intellectual Property Office (WIPO). (2011b). International Patent Classification (IPC) statistics. Retrieved from

[31]

Xue, X., & Croft, B. (2009a). Automatic query generation for patent search. In Proceedings of the 18th International Conference on Information and Knowledge Management (ACM CIKM'09) (pp. 2037–2040). New York: ACM Press.

[32]

Xue, X., & Croft, B. (2009b). Transforming patents into prior-art queries. In Proceedings of the 32nd International Conference on Research and Development on Information Retrieval (ACM SIGIR'09) (pp. 808–809). New York: ACM Press.

Cited By

Wang FQian TLiu BPeng Z(2019)Patent expanded retrieval via word embedding under composite-domain perspectivesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7056-613:5(1048-1061)Online publication date: 1-Oct-2019
https://dl.acm.org/doi/10.1007/s11704-018-7056-6
Van Gysel CMitra BVenanzi MRosemarin RKukla GGrudzien PCancedda NLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Reply WithProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132979(327-336)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132979
Andersson LLupu MPalotti JHanbury ARauber AMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?Proceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983858(1453-1462)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983858
Show More Cited By

Effective query generation and postprocessing strategies for prior art patent search
1. Information systems
  1. Information retrieval

Recommendations

Augmenting keyword-based patent prior art search using weighted classification code hierarchies

Patents are critical intellectual assets for any business. With the rapid increase in the patent filings, patent prior art retrieval has become an important task. The goal of the prior art retrieval is to find documents relevant to a patent application. ...
Patent classifications as indicators of intellectual organization

Using the 138,751 patents filed in 2006 under the Patent Cooperation Treaty, co-classification analysis is pursued on the basis of three- and four-digit codes in the International Patent Classification (IPC, 8th ed.). The co-classifications among the ...
A study of query reformulation for patent prior art search with partial patent applications
ICAIL '15: Proceedings of the 15th International Conference on Artificial Intelligence and Law

Patents are used by legal entities to legally protect their inventions and represent a multi-billion dollar industry of licensing and litigation. In 2014, 326,033 patent applications were approved in the US alone -- a number that has doubled in the past ...

Comments

Information & Contributors

Information

Published In

cover image Journal of the American Society for Information Science and Technology

Journal of the American Society for Information Science and Technology Volume 63, Issue 3

March 2012

201 pages

ISSN:1532-2882

Issue’s Table of Contents

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 March 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang FQian TLiu BPeng Z(2019)Patent expanded retrieval via word embedding under composite-domain perspectivesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7056-613:5(1048-1061)Online publication date: 1-Oct-2019
https://dl.acm.org/doi/10.1007/s11704-018-7056-6
Van Gysel CMitra BVenanzi MRosemarin RKukla GGrudzien PCancedda NLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Reply WithProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132979(327-336)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132979
Andersson LLupu MPalotti JHanbury ARauber AMukhopadhyay SZhai CBertino ECrestani FMostafa JTang JSi LZhou XChang YLi YSondhi P(2016)When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?Proceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983858(1453-1462)Online publication date: 24-Oct-2016
https://dl.acm.org/doi/10.1145/2983323.2983858
Zhang LLi LLi T(2015)Patent MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370416:2(1-19)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783704
Giachanou ASalampasis MPaltoglou G(2015)Multilayer source selection as a tool for supporting patent search and classificationInformation Retrieval10.1007/s10791-015-9270-218:6(559-585)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1007/s10791-015-9270-2
Mahdabi PCrestani F(2014)Patent Query Formulation by Synthesizing Multiple Sources of Relevance EvidenceACM Transactions on Information Systems10.1145/265136332:4(1-30)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2651363
Mahdabi PGerani SHuang JCrestani FJones GSheridan PKelly Dde Rijke MSakai T(2013)Leveraging conceptual lexiconProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484056(113-122)Online publication date: 28-Jul-2013
https://dl.acm.org/doi/10.1145/2484028.2484056

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents