🎨 IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents
We introduce IMPACT (Integrated Multimodal Patent Analysis and CreaTion Dataset) for Design Patents.
Check our papers here: OpenReview 🔥
✒️ It is a large-scale multimodal patent dataset with detailed captions for design patent figures.
💥 Our dataset includes half a million design patents comprising 3.61 million figures along with captions from patents granted by the United States Patent and Trademark Office USPTO over a 16-year period from 2007 to 2022.
📗 Dataset can be viewed and downloaded here.
import os
from huggingface_hub import hf_hub_download
CSV_FILE = '2022.csv'
os.makedirs(TARGET_DIR, exist_ok=True)
path = hf_hub_download(repo_id='AI4Patents/IMPACT', filename=CSV_FILE, repo_type="dataset")
destination = os.path.join('data', CSV_FILE)
os.rename(path, destination)
python classification.py
🔥 PatentCLIP is based on CLIP, and we use an open source open_clip implementation for finetuning and inference
Please download train and val set.
🤗 PatentCLIP-ViT-B checkpoint
🤗 PatentCLIP-Title-ViT-B checkpoint
Load a PatentCLIP model:
import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:ellen625/PatentCLIP_ViT_B', device=device)
Text-image retrieval with PatentCLIP
Dataset | Backbone | Text-Image | Image-Text | |||||
---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | |||
Zero-shot | Image-Title | ResNet50 | 0.52 | 2.10 | 3.32 | 0.20 | 0.72 | 1.64 |
ResNet101 | 1.02 | 3.20 | 4.72 | 0.30 | 0.82 | 1.28 | ||
ViT-B-32 | 1.06 | 3.54 | 5.56 | 0.38 | 1.62 | 2.60 | ||
ViT-L-14 | 2.78 | 7.38 | 10.40 | 1.16 | 4.30 | 7.32 | ||
Image-Caption | ResNet50 | 0.82 | 2.52 | 4.08 | 0.78 | 2.32 | 3.48 | |
ResNet101 | 1.44 | 4.52 | 6.48 | 0.98 | 2.98 | 4.96 | ||
ViT-B-32 | 1.98 | 5.24 | 7.42 | 1.06 | 4.26 | 6.32 | ||
ViT-L-14 | 4.46 | 10.74 | 15.16 | 3.42 | 8.90 | 12.88 | ||
Finetuned | Image-Caption | ResNet50 | 5.38 | 15.52 | 22.7 | 5.9 | 16.6 | 23.86 |
ResNet101 | 7.44 | 20.6 | 28.48 | 7.02 | 19.70 | 27.58 | ||
ViT-B-32 | 10.24 | 25.56 | 35.06 | 9.88 | 25.90 | 35.08 | ||
ViT-L-14 | 20.58 | 43.14 | 53.00 | 20.44 | 42.34 | 52.56 |
If you use the code or data in this repo for your work, please consider citing our paper and staring this repo:
@inproceedings{patent2024impact,
title={{IMPACT}: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents},
author={Homaira Huda Shomee, Zhu Wang, Sathya N. Ravi, Sourav Medya},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=l0Ydsl10ci}
}