-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtask-avg.en.html
30 lines (29 loc) · 1.18 KB
/
task-avg.en.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
title: Task+Overall
lang: en
layout: default
views:
- type: rader
target: ja
width: 6
title: Japanese
- type: rader
target: en
width: 6
title: English
- type: rader
target: ja_mtb
width: 6
title: Japanese MT-Bench
- type: bar
target: avb
width: 6
title: Average
aspect_portrait: 1.1
aspect_landscape: 1.1
persistent_group: task
instructions:
- title: Usage and Notes
text: "Scores for all tasks in Japanese, Japanese MT-Bench, and English benchmark for the LLMs selected in the table below are visualized in radar charts. In adition, average scores are visualized in a bar chart. You can copy the permalink corresponding to the selected model from the icon 🔗 in the upper left corner of the site. Note that <strong>it may be inappropriate to discuss the superiority of some models based on their average scores or sort order, since some tasks have not been evaluated.</strong> For example, GPT-3.5 and GPT-4 are presumed to show high performance in Japanese and English tasks, but since no evaluation was conducted, the average score for these tasks is treated as 0, and the sort order is also at the end."
---
{% include view.html %}