Skip to content

Commit

Permalink
vault backup: 2023-12-02 - 1 files
Browse files Browse the repository at this point in the history
Affected files:
stub notes/IMAGE2TEXT.md
  • Loading branch information
swyx committed Dec 2, 2023
1 parent 105d8b3 commit 17b6129
Showing 1 changed file with 11 additions and 1 deletion.
12 changes: 11 additions & 1 deletion stub notes/IMAGE2TEXT.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,14 @@ pulsr io and more from this thread
https://twitter.com/tunguz/status/1616190582606467089?s=46&t=eCig8-Pc5CuJQeXulVU7qQ


Flamingo model https://arxiv.org/abs/2204.14198
Flamingo model https://arxiv.org/abs/2204.14198


## VQA

LlaVA
- Visual Instruction Tuning [Haotian Liu](https://arxiv.org/search/cs?searchtype=author&query=Liu,+H)[Chunyuan Li](https://arxiv.org/search/cs?searchtype=author&query=Li,+C)[Qingyang Wu](https://arxiv.org/search/cs?searchtype=author&query=Wu,+Q)[Yong Jae Lee](https://arxiv.org/search/cs?searchtype=author&query=Lee,+Y+J)

> Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available.

- https://llava-vl.github.io/

0 comments on commit 17b6129

Please sign in to comment.