Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Using genre-specific features for patent summaries

Published: 01 January 2017 Publication History

Abstract

Targeted summarization technique for patent material.Segment as intra-sentence summarization unit.Exploitation of lexical chains across the whole patent document.Full-fledged text generation techniques for summarization. Patent search is recall-driven, which goes hand in hand with at least a partial sacrifice of precision. As a consequence, patent analysts have to regularly view and examine a large amount of patents. This implies a very high workload. Interactive analysis aids that help to minimize this workload are thus of high demand. Still, these aids do not reduce the amount of the material to be examined, they only facilitate its examination. Its reduction can be achieved working with patent summaries instead of full patent documents. So far, high quality patent summaries are produced mainly manually and only a few research works address the problem of automatic patent summarization. Most often, these works either replicate the summarization metrics known from general discourse summarization or focus on the claims of a patent. However, it can be observed that neither of the strategies is adequate: general discourse state-of-the-art summarization techniques are of limited use due to the idiosyncrasies of the patent genre, and techniques that focus on claims only miss in their summaries important details provided in the other sections on the components of the invention introduced in the claims. We propose a patent summarization technique that takes the idiosyncrasies of the patent genre (such as the unbalanced distribution of the content across the different sections of a patent, excessive length of the sentences in the claims, abstract vocabulary, etc.) into account to obtain a comprehensive summary of the invention. In particular, we make use of lexical chains in the claims and in the description of the invention and of aligned claimdescription segments at the subsentential level to assess the relevance of the individual fragments of the document for the summary. The most relevant fragments are selected and merged using full-fledged natural language generation techniques.

References

[1]
A. Abbas, L. Zhang, S. Khan, A literature review on the state-of-the-art in patent analysis, World Patent Information, 37 (2014) 3-13.
[2]
A. Abu-Jbara, D. Radev, Coherent citation-based summarization of scientific papers, Association for Computational Linguistics, Stroudsburg, PA, USA, 2011.
[3]
C. Aone, M.E. Okurowski, J. Gorlinsky, B. Larsen, A trainable summarizer with knowledge acquired from robust NLP techniques, in: Advances in automatic text summarization, MIT Press, Cambridge, MA, 1999, pp. 71-80.
[4]
S. Azzam, K. Humphreys, R. Gaizauskas, Using coreference chains for text summarization, 1999.
[5]
R. Barzilay, M. Elhadad, Text summarizations with lexical chains, in: Advances in automatic text summarization, MIT Press, Cambridge, MA, 1999.
[6]
B. Bohnet, Top accuracy and fast dependency parsing is not a contradiction, 2010.
[7]
B. Bohnet, L. Wanner, Open source graph transducer interpreter and grammar development environment, 2010.
[8]
D. Bonino, A. Ciaramella, F. Corno, Review of the state of the art and forthcoming evolutions in intelligent patent informatics, World Patent Information, 32 (2014) 30-38.
[9]
N. Bouayad-Agha, A. Burga, G. Casamayor, J. Codina, R. Nazar, L. Wanner, An exercise in reuse of resources: Adapting general discourse coreference resolution for detecting lexical chains in patent documentation, 2014.
[10]
N. Bouayad-Agha, G. Casamayor, G. Ferraro, S. Mille, V. Vidal, L.Wanner, Improving the comprehension of legal documentation: The case of patent claims, 2009.
[11]
D. Bourigault, Surface grammatical analysis for the extraction of terminological noun phrases, 1992.
[12]
A. Burga, J. Codina, G. Ferraro, H. Saggion, L. Wanner, The challenge of syntactic dependency parsing adaptation for the patent domain, 2013.
[13]
C. Cardie, D. Pierce, Error-driven pruning of treebank grammars for base noun phrase identification, 1998.
[14]
S. Choi, H. Kim, J. Yoon, K. Kim, J.Y. Lee, An SAO-based text-mining approach for technology roadmapping using patent information, R&D Management, 43 (2013) 52-74.
[15]
H. Cunningham, Text Processing with GATE (Version 6) ISBN 0956599311, University of Sheffield, Department of Computer Science, 2011.
[16]
A. Elkiss, S. Shen, A. Fader, G. Erkan, D. States, D. Radev, Blind men and elephants: What do citation summaries tell us about a research article?, Journal of the American Society for Information Science and Technology, 59 (2008) 51-62.
[17]
P. rdi, K. Makovi, Z. Somogyvri, K. Strandburg, J. Tobochnik, P. Volf, L. Zalnyi, Prediction of emerging technologies based on analysis of the us patent citation network, Scientometrics, 95 (2013) 225-242.
[18]
G. Erkan, D.R. Radev, Lexrank: Graph-based lexical centrality as salience in text summarization, Journal of Artificial Intelligence Research, 22 (2004) 457-479.
[19]
H. Grice, Logic and conversation, in: Syntax and semantics 3: Speech acts, Academic Press, New York, 1975, pp. 41-58.
[20]
M.R.H. Halliday, Cohesion in English, Longman, London, 1976.
[21]
T. Hirao, Y. Yoshida, M. Nishino, N. Yasuda, M. Nagata, Single-document summarization as a tree knapsack problem, 2013.
[22]
A. Judea, H. Schtze, Unsupervised training set generation for automatic acquisition of technical terminology in patents, 2014.
[23]
A. Khan, N. Salim, A review on abstractive summarization methods, Journal of Theoretical and Applied Information Technology, 59 (2014) 64-72.
[24]
J. Kupiec, J. Pedersen, F. Chen, A trainable document summarizer, ACM, New York, NY, USA, 1995.
[25]
C. Lee, B. Song, Y. Park, How to assess patent infringement risks: a semantic patent claim analysis using dependency relationships, Technology Analysis & Strategic Management, 25 (2013) 23-38.
[26]
H. Lee, A. Chang, Y. Peirsman, N. Chambers, M. Surdeanu, D. Jurafsky, Deterministic coreference resolution based on entity-centric, precision-ranked rules, Compututational Linguistics, 39 (2013) 885-916.
[27]
H. Lee, Y. Peirsman, A. Chang, N. Chambers, M. Surdeanu, D. Jurafsky, Stanfords multi-pass sieve coreference resolution system at the CoNLL-2011shared task, 2011.
[28]
C.-Y. Lin, E. Hovy, Identifying topics by position, ACL, Morristown, PA, USA, 1997.
[29]
H.P. Luhn, The automatic creation of literature abstracts, IBM Journal of Research and Development, 2 (1958) 159-165.
[30]
M. Lupu, K. Mayer, J. Tait, A. Trippe, Springer, Heidelberg, Berlin, New York, 2011.
[31]
W.C. Mann, S. Thompson, Rhetorical structure theory: Toward a functional theory of text organization, Text, 8 (1988) 243-281.
[32]
S. Mille, L. Wanner, Multilingual summarization in practice: The case of patent claims, 2008.
[33]
J. Morris, G. Hirst, Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linguistics, 17 (1991) 21-48.
[34]
Nenkova, A., & Vanderwende, L. (2005). The impact of frequency on summarization. Redmond, Washington: Microsoft Research, Tech. Rep. MSR-TR-2005-101.
[35]
H. Park, J. Yoon, K. Kim, Identifying patent infringement using SAO based semantic technological similarities, Scientometrics, 90 (2011) 515-529.
[36]
D. Pressman, Nolo, Berkeley, CA, USA, 2006.
[37]
V. Qazvinian, D.R. Radev, Scientific paper summarization using citation summary networks, Association for Computational Linguistics, Stroudsburg, PA, USA, 2008.
[38]
V. Qazvinian, D.R. Radev, Identifying non-explicit citing sentences for citation-based summarization, Uppsala, Sweden, 2010.
[39]
K. Raghunathan, H. Lee, S. Rangarajan, N. Chambers, M. Surdeanu, D. Jurafsky, C. Manning, A multi-pass sieve for coreference resolution, ACL, Morristown, PA, USA, 2010.
[40]
H. Saggion, SUMMA. A robust and adaptable summarization tool, TAL, 49 (2008) 103-125.
[41]
H. Saggion, Creating summarization systems with SUMMA, Reykjavik, Iceland, 2014.
[42]
H. Saggion, R. Gaizauskas, Multi-document summarization by cluster/profile relevance and redundancy removal, 2004.
[43]
H. Saggion, G. Lapalme, Generating indicative-informative summaries with sumum, Computational Linguistics, 28 (2002) 497-526.
[44]
H. Saggion, T. Poibeau, Automatic text summarization: Past, present and future, in: Multi-source, multilingual information extraction and summarization, Springer Verlag, Berlin, 2013.
[45]
Y. Seki, Sentence extraction by tf/idf and position weighting from newspaper articles, 2003.
[46]
A. Shinmori, M. Okumura, Y. Marukawa, M. Iwayama, Patent processing for readability. structure analysis and term explanation, ACL, Morristown, PA, USA, 2003.
[47]
H.G. Silber, K.F. McCoy, Efficiently computed lexical chains as an intermediate representation for automatic text summarization, Computational Linguistics, 28 (2002) 487-496.
[48]
R. Subhashini, V. Kumar, Shallow NLP techniques for noun phrase extraction, 2010.
[49]
J. Tang, B. Wang, Y. Yang, P. Hu, Y. Zhao, X. Yan, W. Li, Patentminer: Topic-driven patent analysis and mining, ACM, 2012.
[50]
L. Taylor, C. Grover, T. Briscoe, The syntactic regularity of English noun phrases, 1989.
[51]
S. Teufel, M. Moens, Summarizing scientific articles: Experiments with relevance and rhetorical status, Computational Linguistic, 28 (2002) 409-445.
[52]
A. Trappey, C. Trappey, C.-Y. Wu, Automatic patent document summarization for collaborative knowledge systems and services, Journal of Systems Science and Systems Engineering, 18 (2009) 71-94.
[53]
A.J.C. Trappey, C.V. Trappey, An R&D knowledge management method for patent document summarization, Industrial Management and Data Systems, 108 (2008) 245-257.
[54]
Y.-H. Tseng, C.-J. Lin, Y.-I. Lin, Text mining techniques for patent analysis, Information Processing and Management, 43 (2007) 1216-1247.
[55]
USPTO, Manual of patent examining procedure, 2015.
[56]
B. Van Looy, B. Baesens, T. Magerman, K. Debackere, Assessment of latent semantic analysis LSA) text mining algorithms for large scale mapping of patent and scientific publication documents, 2011.
[57]
L. Wanner, R. Baeza-Yates, S. Brgmann, J. Codina, B. Diallo, E. Escorsa, V. Zervaki, Towards content-oriented patent document processing, World Patent Information Journal, 30 (2008) 21-33.
[58]
I.H. Witten, E. Frank, M.A. Hall, Data mining: Practical machine learning tools and techniques (3rd ed.), Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 2011.

Cited By

View all
  • (2022)Summarization, simplification, and generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117627205:COnline publication date: 1-Nov-2022
  • (2021)A comparative study of abstractive and extractive summarization techniques to label subgroups on patent datasetScientometrics10.1007/s11192-020-03732-x126:1(135-156)Online publication date: 1-Jan-2021
  • (2019)Capturing information on technology convergence, international collaboration, and knowledge flow from patent documentsInformation Processing and Management: an International Journal10.1016/j.ipm.2018.09.00756:4(1576-1591)Online publication date: 1-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 53, Issue 1
January 2017
168 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 January 2017

Author Tags

  1. Lexical chains
  2. Patents
  3. Segment-based summarization
  4. Segmentation
  5. Sentence aggregation
  6. Summarization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Summarization, simplification, and generationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.117627205:COnline publication date: 1-Nov-2022
  • (2021)A comparative study of abstractive and extractive summarization techniques to label subgroups on patent datasetScientometrics10.1007/s11192-020-03732-x126:1(135-156)Online publication date: 1-Jan-2021
  • (2019)Capturing information on technology convergence, international collaboration, and knowledge flow from patent documentsInformation Processing and Management: an International Journal10.1016/j.ipm.2018.09.00756:4(1576-1591)Online publication date: 1-Jul-2019
  • (2019)Using Summarization Techniques on Patent Database Through Computational IntelligenceProgress in Artificial Intelligence10.1007/978-3-030-30244-3_42(508-519)Online publication date: 3-Sep-2019
  • (2017)How far we can go with extractive text summarization? Heuristic methods to obtain near upper boundsExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.08.04090:C(439-463)Online publication date: 30-Dec-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media