a state-of-the-art-level open visual language model | 多模态预训练模型
-
Updated
May 29, 2024 - Python
a state-of-the-art-level open visual language model | 多模态预训练模型
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
Commanding robots using only Language Models' prompts
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
Implementation of the "Learn No to Say Yes Better" paper.
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Chain of Images for Intuitively Reasoning
Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).
A benchmark for evaluating hallucinations in large visual language models
PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.
CLI for converting UForm models to CoreML.
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
Add a description, image, and links to the visual-language-models topic page so that developers can more easily learn about it.
To associate your repository with the visual-language-models topic, visit your repo's landing page and select "manage topics."