Skip to content

IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)

Notifications You must be signed in to change notification settings

AI4Patents/IMPACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎨 IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents

We introduce IMPACT (Integrated Multimodal Patent Analysis and CreaTion Dataset) for Design Patents.

Check our papers here: OpenReview 🔥

✒️ It is a large-scale multimodal patent dataset with detailed captions for design patent figures.

💥 Our dataset includes half a million design patents comprising 3.61 million figures along with captions from patents granted by the United States Patent and Trademark Office USPTO over a 16-year period from 2007 to 2022.

main_fig

Data

📗 Dataset can be viewed and downloaded here.

import os
from huggingface_hub import hf_hub_download

CSV_FILE = '2022.csv'
os.makedirs(TARGET_DIR, exist_ok=True)
path = hf_hub_download(repo_id='AI4Patents/IMPACT', filename=CSV_FILE, repo_type="dataset")
destination = os.path.join('data', CSV_FILE)
os.rename(path, destination)

Patent Classification

python classification.py

PatentCLIP and multimodal retrieval tasks

🔥 PatentCLIP is based on CLIP, and we use an open source open_clip implementation for finetuning and inference

PatentCLIP with IMPACT dataset

Please download train and val set.

🤗 PatentCLIP-ViT-B checkpoint

🤗 PatentCLIP-Title-ViT-B checkpoint

Usage

Load a PatentCLIP model:

import open_clip
model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:ellen625/PatentCLIP_ViT_B', device=device)

Demo on PatentCLIP and text-image retrieval

Text-image retrieval with PatentCLIP Open In Colab

Multimodal retrieval results

Dataset Backbone Text-Image Image-Text
R@1 R@5 R@10 R@1 R@5 R@10
Zero-shot Image-Title ResNet50 0.52 2.10 3.32 0.20 0.72 1.64
ResNet101 1.02 3.20 4.72 0.30 0.82 1.28
ViT-B-32 1.06 3.54 5.56 0.38 1.62 2.60
ViT-L-14 2.78 7.38 10.40 1.16 4.30 7.32
Image-Caption ResNet50 0.82 2.52 4.08 0.78 2.32 3.48
ResNet101 1.44 4.52 6.48 0.98 2.98 4.96
ViT-B-32 1.98 5.24 7.42 1.06 4.26 6.32
ViT-L-14 4.46 10.74 15.16 3.42 8.90 12.88
Finetuned Image-Caption ResNet50 5.38 15.52 22.7 5.9 16.6 23.86
ResNet101 7.44 20.6 28.48 7.02 19.70 27.58
ViT-B-32 10.24 25.56 35.06 9.88 25.90 35.08
ViT-L-14 20.58 43.14 53.00 20.44 42.34 52.56

Acknowledgement

  • open-clip: the code base we built on for PatentCLIP.
  • LLaVa for caption generation.

Citation

If you use the code or data in this repo for your work, please consider citing our paper and staring this repo:

@inproceedings{patent2024impact,
    title={{IMPACT}: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents},
    author={Homaira Huda Shomee, Zhu Wang, Sathya N. Ravi, Sourav Medya},
    booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year={2024},
    url={https://openreview.net/forum?id=l0Ydsl10ci}
    }

About

IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •