Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

liuyeah/TechPat

Repository files navigation

TechPat

Source code for our paper: "TechPat: Technical Phrase Extraction for Patent Mining" [1].

Datasets

The patent data we use is provided by the United States Patent and Trademark Office (USPTO, https://www.uspto.gov). You can download the whole USPTO data from PatentsView (https://patentsview.org/download/data-download-tables).

Data Format

You can organize the downloaded data to the format utilized in our code. Please refer to the ./example_data.

Processed Datasets

You can obtain the datasets utilized in this paper via https://drive.google.com/file/d/1G45OFG-j285bRYsBAtxHePqw-8mle0A5/view?usp=share_link.

Model

Please run the "candidate_run.sh" and "extract_run.sh", and the final result is in folder "result".

If you want to use our code, please cite our paper[1,2]. The codes of candidate generation part is partically learned from ECON[3].

Citation

[1] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Yuting Ning, Jianhui Ma, Qi Liu, and Enhong Chen. Techpat: Technical phrase extraction for patent mining. ACM Transactions on Knowledge Discovery from Data, 2023.
[2] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Jianhui Ma, Qi Liu, Enhong Chen, Hanqing Tao, and Ke Rui. 2020. Technical phrase extraction for patent mining: A multi-level approach. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1142–1147.
[3] Keqian Li, Hanwen Zha, Yu Su, and Xifeng Yan. Concept mining via embedding. In 2018 IEEE International Conference on Data Mining (ICDM), pages 267–276. IEEE, 2018.

Acknowledgements

We thank Yanghai Zhang, Feihu Yin and Zhuofan Chen for helping us with this work.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published