Skip to content

Commit

Permalink
Added GPT4o
Browse files Browse the repository at this point in the history
  • Loading branch information
vinesmsuic committed May 17, 2024
1 parent 4e15aca commit d1f75a6
Show file tree
Hide file tree
Showing 5 changed files with 111 additions and 26 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
imagen_museum
136 changes: 110 additions & 26 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ <h2 class="title is-3">How effective is VIEScore with current state-of-the-art M
<div class="item">
<p>But how well can Multimodal large language models access different tasks of conditional Image generation? We reported that the best model GPT4v’s performance is significantly better than the open-source models. Most open-source MLLMs failed to adapt to our VieScore except LLaVA.</p>
<!-- Your image here -->
<img src="static/images/table_overall.png" alt="MY ALT TEXT" />
<img src="static/images/table_overall_new.png" alt="MY ALT TEXT" />
<h2 class="subtitle">
Table 1: Correlations across all tasks with different backbone models. We highlight the highest correlation numbers in green. Visit our paper for insights and challenges in VIEScore.
</h2>
Expand Down Expand Up @@ -256,10 +256,22 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td>-0.0114</td>
<td>-0.0881</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4989</td>
<td style="color:black; font-weight: bold;">0.2495</td>
<td>0.3928</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.5124</td>
<td>0.0336</td>
<td>0.4042</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4885</td>
<td style="color:black; font-weight: bold;">0.2379</td>
<td>0.4885</td>
<td>0.2379</td>
<td style="color:blue; font-weight: bold;">0.4614</td>
</tr>
<tr>
Expand Down Expand Up @@ -296,10 +308,22 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td>-0.0694</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4508</td>
<td style="color:black; font-weight: bold;">0.2859</td>
<td style="color:blue; font-weight: bold;">0.4069</td>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.5421</td>
<td style="color:black; font-weight: bold;">0.3469</td>
<td style="color:blue; font-weight: bold;">0.4769</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.5246</td>
<td>0.1272</td>
<td>0.4432</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td>0.4508</td>
<td>0.2859</td>
<td>0.4069</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>1shot</sub>)</td>
Expand Down Expand Up @@ -335,10 +359,22 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td>0.1142</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.2610</td>
<td style="color:black; font-weight: bold;">0.4274</td>
<td style="color:blue; font-weight: bold;">0.2456</td>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4062</td>
<td style="color:black; font-weight: bold;">0.4863</td>
<td style="color:blue; font-weight: bold;">0.3821</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.3684</td>
<td>0.1939</td>
<td>0.3438</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td>0.2610</td>
<td>0.4274</td>
<td>0.2456</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>1shot</sub>)</td>
Expand Down Expand Up @@ -368,17 +404,29 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td style="color:gray; font-weight: bold;">0.4653</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">DINO</td>
<td style="color:black; font-weight: bold;">0.4160</td>
<td>DINO</td>
<td>0.4160</td>
<td>0.1206</td>
<td style="color:blue; font-weight: bold;">0.4246</td>
<td>0.4246</td>
</tr>
<tr>
<td>CLIP-I</td>
<td>0.2961</td>
<td>0.1694</td>
<td>0.3058</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4806</td>
<td style="color:black; font-weight: bold;">0.2576</td>
<td style="color:blue; font-weight: bold;">0.4637</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.4685</td>
<td>-0.0171</td>
<td>0.4292</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td>0.3979</td>
Expand All @@ -388,7 +436,7 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<tr>
<td>VIEScore(GPT-4v<sub>1shot</sub>)</td>
<td>0.2757</td>
<td style="color:black; font-weight: bold;">0.2261</td>
<td>0.2261</td>
<td>0.2753</td>
</tr>
<tr>
Expand All @@ -413,21 +461,33 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td style="color:gray; font-weight: bold;">0.4747</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">DINO</td>
<td>DINO</td>
<td>0.3022</td>
<td>-0.0381</td>
<td style="color:blue; font-weight: bold;">0.3005</td>
<td>0.3005</td>
</tr>
<tr>
<td>CLIP-I</td>
<td>0.2834</td>
<td>0.1248</td>
<td>0.2813</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4800</td>
<td style="color:black; font-weight: bold;">0.3734</td>
<td style="color:blue; font-weight: bold;">0.3268</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.3862</td>
<td>0.1273</td>
<td>0.3268</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.3274</td>
<td style="color:black; font-weight: bold;">0.2960</td>
<td>0.3274</td>
<td>0.2960</td>
<td>0.1507</td>
</tr>
<tr>
Expand Down Expand Up @@ -470,10 +530,22 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td>0.1498</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.3209</td>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4516</td>
<td>0.2751</td>
<td style="color:blue; font-weight: bold;">0.4136</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.4120</td>
<td>-0.0141</td>
<td>0.3523</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td>0.3209</td>
<td style="color:black; font-weight: bold;">0.3025</td>
<td style="color:blue; font-weight: bold;">0.3346</td>
<td>0.3346</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>1shot</sub>)</td>
Expand Down Expand Up @@ -508,17 +580,29 @@ <h2 class="title is-3">How is Traditional Metrics correlating with human compare
<td>0.4204</td>
<td>0.4133</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4o<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4972</td>
<td style="color:black; font-weight: bold;">0.4892</td>
<td style="color:blue; font-weight: bold;">0.5439</td>
</tr>
<tr>
<td>VIEScore(GPT-4o<sub>1shot</sub>)</td>
<td>0.5544</td>
<td>0.3699</td>
<td>0.5238</td>
</tr>
<tr>
<td>VIEScore(GPT-4v<sub>0shot</sub>)</td>
<td style="color:black; font-weight: bold;">0.4360</td>
<td style="color:black; font-weight: bold;">0.4975</td>
<td>0.4360</td>
<td>0.4975</td>
<td>0.3999</td>
</tr>
<tr>
<td style="color:blue; font-weight: bold;">VIEScore(GPT-4v<sub>1shot</sub>)</td>
<td>VIEScore(GPT-4v<sub>1shot</sub>)</td>
<td>0.3892</td>
<td>0.4132</td>
<td style="color:blue; font-weight: bold;">0.4237</td>
<td>0.4237</td>
</tr>
<tr>
<td>VIEScore(LLaVA<sub>0shot</sub>)</td>
Expand Down
Binary file removed static/images/table_full1.png
Binary file not shown.
Binary file removed static/images/table_full2.png
Binary file not shown.
Binary file added static/images/table_overall_new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d1f75a6

Please sign in to comment.