Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
📑 Paper
| 🤗 Hugging Models
| 🤗 Spaces Demo
| 🕹️ OpenBayes贝式计算 Demo
🤗 Datasets | 💬 X (Twitter)
| 🖥️ Computer Use
| 📖 GUI Paper List
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou
Show Lab @ National University of Singapore, Microsoft
- [2024.12.28] Update GPT-4o annotation recaptioning scripts.
- [2024.12.27] Update training codes and instructions.
- [2024.12.23] Update
showui
for UI-guided token selection implementation. - [2024.12.15] ShowUI received Outstanding Paper Award at NeurIPS2024 Open-World Agents workshop.
- [2024.12.9] Support int8 Quantization.
- [2024.12.5] Major Update: ShowUI is integrated into OOTB for local run!
- [2024.12.1] We support iterative refinement to improve grounding accuracy. Try it at HF Spaces demo.
- [2024.11.27] We release the arXiv paper, HF Spaces demo and
ShowUI-desktop-8K
. - [2024.11.16]
showlab/ShowUI-2B
is available at huggingface.
See Computer Use OOTB for using ShowUI to control your PC.
computer_use_with_showui-en-s.mp4
See Quick Start for model usage.
See Gradio for installation.
Our Training codebases supports:
- Wandb training monitor
- Self-customized model
- DeepSpeed Zero1, Zero2, Zero3
- Full-tuning (FP32, FP16, BF16), LoRA, QLoRA
- SDPA, Flash Attention 2
- Multiple datasets mixed training
- Interleaved data streaming
- Image randomly resize (crop, pad)
See Train for training set up.
Try test.ipynb
, which seamless support for Qwen2VL models.
Try recaption.ipynb
, where we provide instructions on how to recaption the original annotations using GPT-4o.
We extend our gratitude to SeeClick for providing their codes and datasets.
Special thanks to Siyuan for assistance with the Gradio demo and OOTB support.
If you find our work helpful, please kindly consider citing our paper.
@misc{lin2024showui,
title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
year={2024},
eprint={2411.17465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17465},
}
If you like our project, please give us a star ⭐ on GitHub for the latest update.