Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unsupervised technical phrase extraction by incorporating structure and position information

Published: 02 July 2024 Publication History
  • Get Citation Alerts
  • Abstract

    The vigorous development of patent applications in recent years provides an opportunity to unveil the inherent laws of innovation, but it also puts forward higher requirements for patent mining technology. An essential step for patent text mining is to establish a technology portrait for each patent, that is, identify the technical phrases involved, which can be summarized and represented by the patent from the technical point of view. Currently, there is a large body of work focusing on keyword extraction. However, technical phrase extraction differs from keyword extraction due to the unique properties of technical phrases. Specifically, technical phrases must contain rich technical information and are essential to the entire patent text from a technical perspective. Meanwhile, finding potential relationships between phrases with different technical meanings is challenging for technical phrase extraction. Based on the analysis of the characteristics of technical phrases, we found that the position of technical phrases in the patent text and the structural relationship between technical phrases are crucial, and how to make good use of these two pieces of information is a challenge. Motivated by this, we propose a new Unsupervised Technical phrase Extraction model from the Structure and Position information perspective, named UTESP. Specifically, UTESP includes four key steps: candidate generation, graph construction, candidate score, and candidate selection. The structure information refers to adjusting the incoming edge weight of candidate phrases through the distance relations between candidate phrases and applying the graph ranking algorithm to obtain the structure score of the candidate phrase. The position information simultaneously incorporates the position and frequency of candidate phrases in the patent text to calculate a position score for candidate technical phrases. The effectiveness of our framework has been demonstrated by comparing with seven competitive algorithms on the patent datasets in terms of three evaluation metrics: Precision, Recall, and F1 scores. Besides, our new framework indicated significant improvements in the representation ability of technical phrases by comparing Information Retrieval Efficiency (IRE) with competitive algorithms.

    Highlights

    Take advantage of structure and position information for technical phrases.
    Propose a new unsupervised technical phrase extraction framework.
    Extensive experiments indicate the improved.

    References

    [1]
    Akbik, A., Bergmann, T., & Vollgraf, R. (2019). Pooled contextualized embeddings for named entity recognition. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 724–728).
    [2]
    Bellaachia A., Al-Dhelaan M., Ne-rank: A novel graph-based keyphrase extraction in twitter, in: 2012 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology, volume 1, IEEE, 2012, pp. 372–379.
    [3]
    Bird S., Klein E., Loper E., Natural language processing with python: analyzing text with the natural language toolkit, O’Reilly Media, Inc., 2009.
    [4]
    Biswas S.K., Bordoloi M., Shreya J., A graph based keyword extraction model using collective node weight, Expert Systems with Applications 97 (2018) 51–59.
    [5]
    Boudin F., Unsupervised keyphrase extraction with multipartite graphs, 2018, arXiv preprint arXiv:1803.08721.
    [6]
    Bougouin, A., Boudin, F., & Daille, B. (2013). Topicrank: Graph-based topic ranking for keyphrase extraction. In International joint conference on natural language processing (IJCNLP) (pp. 543–551).
    [7]
    Brin S., Page L., The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems 30 (1–7) (1998) 107–117.
    [8]
    Chiu J.P., Nichols E., Named entity recognition with bidirectional LSTM-CNNs, Transactions of the Association for Computational Linguistics 4 (2016) 357–370.
    [9]
    Daiber, J., Jakob, M., Hokamp, C., & Mendes, P. N. (2013). Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th international conference on semantic systems (pp. 121–124).
    [10]
    Demšar J., Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006) 1–30.
    [11]
    Duari S., Bhatnagar V., Complex network based supervised keyword extractor, Expert Systems with Applications 140 (2020).
    [12]
    Fang S., Huang Z., He M., Tong S., Huang X., Liu Y., Huang J., Liu Q., Guided attention network for concept extraction, in: IJCAI, 2021, pp. 1449–1455.
    [13]
    Florescu, C., & Caragea, C. (2017). Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers) (pp. 1105–1115).
    [14]
    Goz F., Mutlu A., Mgrank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure, Knowledge-Based Systems 251 (2022).
    [15]
    Hasan, K. S., & Ng, V. (2014). Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers) (pp. 1262–1273).
    [16]
    Honnibal M., Montani I., spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing, To appear 7 (1) (2017) 411–420.
    [17]
    Hu J., Li S., Yao Y., Yu L., Yang G., Hu J., Patent keyword extraction algorithm based on distributed representation for patent classification, Entropy 20 (2) (2018) 104.
    [18]
    Ji B., Xie Y., Yu J., Li S., Ma J., Ji Y., Liu H., A novel bundling learning paradigm for named entity recognition, Knowledge-Based Systems 248 (2022).
    [19]
    Li T., Hu L., Li H., Sun C., Li S., Chi L., TripleRank: An unsupervised keyphrase extraction algorithm, Knowledge-Based Systems 219 (2021).
    [20]
    Li K., Zha H., Su Y., Yan X., Concept mining via embedding, in: 2018 IEEE international conference on data mining (ICDM), IEEE, 2018, pp. 267–276.
    [21]
    Liu Q., Ge Y., Li Z., Chen E., Xiong H., Personalized travel package recommendation, in: 2011 IEEE 11th international conference on data mining, IEEE, 2011, pp. 407–416.
    [22]
    Liu, Y., Hseuh, P.-y., Lawrence, R., Meliksetian, S., Perlich, C., & Veen, A. (2011). Latent graphical models for quantifying and predicting patent quality. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1145–1153).
    [23]
    Liu Y., Wu H., Huang Z., Wang H., Ma J., Liu Q., Chen E., Tao H., Rui K., Technical phrase extraction for patent mining: A multi-level approach, in: 2020 IEEE international conference on data mining (ICDM), IEEE, 2020, pp. 1142–1147.
    [24]
    Liu Q., Wu H., Ye Y., Zhao H., Liu C., Du D., Patent litigation prediction: A convolutional tensor factorization approach., in: IJCAI, 2018, pp. 5052–5059.
    [25]
    Mayhew, S., Nitish, G., & Roth, D. (2020). Robust named entity recognition with truecasing pretraining. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34 (pp. 8480–8487).
    [26]
    Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404–411).
    [27]
    Papagiannopoulou E., Tsoumakas G., A review of keyphrase extraction, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10 (2) (2020).
    [28]
    Rose S., Engel D., Cramer N., Cowley W., Automatic keyword extraction from individual documents, in: Text mining: applications and theory, Wiley Online Library, 2010, pp. 1–20.
    [29]
    Shang J., Liu J., Jiang M., Ren X., Voss C.R., Han J., Automated phrase mining from massive text corpora, IEEE Transactions on Knowledge and Data Engineering 30 (10) (2018) 1825–1837.
    [30]
    Si Y., Wang J., Xu H., Roberts K., Enhancing clinical concept extraction with contextual embeddings, Journal of the American Medical Informatics Association 26 (11) (2019) 1297–1304.
    [31]
    Wang H., Chen E., Liu Q., Xu T., Du D., Su W., Zhang X., A united approach to learning sparse attributed network embedding, in: 2018 IEEE international conference on data mining (ICDM), IEEE, 2018, pp. 557–566.
    [32]
    Wu H., Zhang K., Lv G., Liu Q., Yu R., Zhao W., Chen E., Ma J., Deep technology tracing for high-tech companies, in: 2019 IEEE international conference on data mining (ICDM), IEEE, 2019, pp. 1396–1401.
    [33]
    Xie F., Wu X., Zhu X., Efficient sequential pattern mining with wildcards for keyphrase extraction, Knowledge-Based Systems 115 (2017) 27–39.
    [34]
    Yang X., Bian J., Hogan W.R., Wu Y., Clinical concept extraction using transformers, Journal of the American Medical Informatics Association 27 (12) (2020) 1935–1942.
    [35]
    Yang, Z., Chen, H., Zhang, J., Ma, J., & Chang, Y. (2020). Attention-based multi-level feature fusion for named entity recognition. In International joint conference on artificial intelligence.
    [36]
    Yu J., Bohnet B., Poesio M., Named entity recognition as dependency parsing, 2020, arXiv preprint arXiv:2005.07150.
    [37]
    Yu Y., Ng V., Wikirank: Improving keyphrase extraction based on background knowledge, 2018, arXiv preprint arXiv:1803.09000.
    [38]
    Zhang L., Li L., Li T., Patent mining: a survey, ACM SIGKDD Explorations Newsletter 16 (2) (2015) 1–19.
    [39]
    Zhang, L., Li, L., Li, T., & Zhang, Q. (2014). Patentline: analyzing technology evolution on multi-view patent graphs. In Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval (pp. 1095–1098).

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Expert Systems with Applications: An International Journal
    Expert Systems with Applications: An International Journal  Volume 245, Issue C
    Jul 2024
    1580 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 02 July 2024

    Author Tags

    1. Patent text mining
    2. Technical phrase extraction
    3. Graph construction
    4. Structure and position information

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media