TechPat

Source code for our paper: "TechPat: Technical Phrase Extraction for Patent Mining" [1].

Datasets

The patent data we use is provided by the United States Patent and Trademark Office (USPTO, https://www.uspto.gov). You can download the whole USPTO data from PatentsView (https://patentsview.org/download/data-download-tables).

Data Format

You can organize the downloaded data to the format utilized in our code. Please refer to the ./example_data.

Processed Datasets

You can obtain the datasets utilized in this paper via https://drive.google.com/file/d/1G45OFG-j285bRYsBAtxHePqw-8mle0A5/view?usp=share_link.

Model

Please run the "candidate_run.sh" and "extract_run.sh", and the final result is in folder "result".

If you want to use our code, please cite our paper[1,2]. The codes of candidate generation part is partically learned from ECON[3].

Citation

[1] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Yuting Ning, Jianhui Ma, Qi Liu, and Enhong Chen. Techpat: Technical phrase extraction for patent mining. ACM Transactions on Knowledge Discovery from Data, 2023.
[2] Ye Liu, Han Wu, Zhenya Huang, Hao Wang, Jianhui Ma, Qi Liu, Enhong Chen, Hanqing Tao, and Ke Rui. 2020. Technical phrase extraction for patent mining: A multi-level approach. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1142–1147.
[3] Keqian Li, Hanwen Zha, Yu Su, and Xifeng Yan. Concept mining via embedding. In 2018 IEEE International Conference on Data Mining (ICDM), pages 267–276. IEEE, 2018.

Acknowledgements

We thank Yanghai Zhang, Feihu Yin and Zhuofan Chen for helping us with this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TechPat

Datasets

Data Format

Processed Datasets

Model

Citation

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

TechPat

Datasets

Data Format

Processed Datasets

Model

Citation

Acknowledgements