#

visual-language-models

Here are 20 public repositories matching this topic...

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

pretrained-models language-model multi-modal cross-modality visual-language-models

Updated May 29, 2024
Python

crab

camel-ai / crab

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

multi-agent-systems gui-automation large-language-models language-model-agent visual-language-models

Updated Nov 22, 2024
Python

bilel-bj / ROSGPT_Vision

Commanding robots using only Language Models' prompts

robotics language-models ros2 robotic-vision large-language-models llm prompt-engineering chatgpt language-models-are-next robotic-design-patterns prompting-robotic-modalities visual-language-models

Updated Aug 7, 2024
Python

hk-zh / language-conditioned-robot-manipulation-models

https://arxiv.org/abs/2312.10807

reinforcement-learning imitation-learning robot-manipulation neural-symbolic foundation-models visual-language-models language-conditioned-learning large-languge-models

Updated Dec 1, 2024

AlignGPT-VL / AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

large-language-models multimodal-large-language-models visual-language-models

Updated Jul 12, 2024
Python

tianyu-z / VCR

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

benchmark deep-learning visual-language-models

Updated Jan 17, 2025
Python

xinyanghuang7 / Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

visual-language-learning large-language-models visual-language-models multimodel-large-language-model

Updated Jun 19, 2024
Python

jaisidhsingh / CoN-CLIP

Implementation of the "Learn No to Say Yes Better" paper.

deep-learning pytorch multimodal compositionality image-captions image-text-matching visual-language-models

Updated Nov 2, 2024
Python

Sid2697 / HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

dataset dataset-generation vlm hand-object-interaction egocentric-vision large-language-models visual-language-models

Updated Apr 16, 2024
Python

amathislab / wildclip

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

behavior computer-vision clip camera-trap computervision visual-language-models

Updated Mar 8, 2024
Python

sduzpf / UAP_VLP

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

deep-neural-networks adversarial-attacks visual-language-models

Updated Oct 4, 2024
Python

declare-lab / Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models naacl2024

Updated Jul 25, 2024
Python

csebuetnlp / IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

vqa vqa-dataset optical-illusions visual-language-models

Updated Oct 6, 2024
Jupyter Notebook

GraphPKU / CoI

Chain of Images for Intuitively Reasoning

chatbot llama multimodal chatgpt llava visual-language-models gpt4v dalle3 chain-of-throught chain-of-image

Updated Nov 29, 2023
Python

CristianoPatricio / concept-based-interpretability-VLM

Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).

deep-learning medical-imaging clip interpretability explainable-ai skin-lesion-classification melanoma-diagnosis concept-based-explanations visual-language-models ieee-isbi

Updated Jun 5, 2024
Jupyter Notebook

AikyamLab / hallucinogen

A benchmark for evaluating hallucinations in large visual language models

ai aisafety visual-language-models hallucination-evaluation hallucination-detection medical-safety medical-visual-language-model

Updated Jan 5, 2025
Python

vlvink / PaliGemma-from-scratch

PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices from the original tutorial.

machine-learning computer-vision language-model vlm generative-ai visual-language-models

Updated Jan 19, 2025
Python

laclouis5 / uform-coreml-converters

CLI for converting UForm models to CoreML.

transformers coreml coremltools uform visual-language-models

Updated Jan 12, 2024
Python

ARResearch-1 / DiverseAR-Dataset

Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

augmented-reality mixed-reality scene-understanding visual-language-models content-evaluation

Updated Feb 5, 2025

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.

transformers natural-language-queries semantic-search pdf-processing image-indexing multimodal-analysis blip2 retrieval-augmented-generation visual-language-models qwen2-vl

Updated Jan 11, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the visual-language-models topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the visual-language-models topic, visit your repo's landing page and select "manage topics."