Source code for our paper: "TechPat: Technical Phrase Extraction for Patent Mining" [1].
The patent data we use is provided by the United States Patent and Trademark Office (USPTO, https://www.uspto.gov). You can download the whole USPTO data from PatentsView (https://patentsview.org/download/data-download-tables).
You can organize the downloaded data to the format utilized in our code. Please refer to the ./example_data.
You can obtain the datasets utilized in this paper via https://drive.google.com/file/d/1G45OFG-j285bRYsBAtxHePqw-8mle0A5/view?usp=share_link.
Please run the "candidate_run.sh" and "extract_run.sh", and the final result is in folder "result".
If you want to use our code, please cite our paper[1,2]. The codes of candidate generation part is partically learned from ECON[3].
[1] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Yuting Ning, Jianhui Ma, Qi Liu, and Enhong Chen. Techpat: Technical phrase extraction for patent mining. ACM Transactions on Knowledge Discovery from Data, 2023.
[2] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Jianhui Ma, Qi Liu, Enhong Chen, Hanqing Tao, and Ke Rui. 2020. Technical phrase extraction for patent mining: A multi-level approach. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1142–1147.
[3] Keqian Li, Hanwen Zha, Yu Su, and Xifeng Yan. Concept mining via embedding. In 2018 IEEE International Conference on Data Mining (ICDM), pages 267–276. IEEE, 2018.
We thank Yanghai Zhang, Feihu Yin and Zhuofan Chen for helping us with this work.