-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
30 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
83ccb4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to update the radar graph again, based on the latest models?
83ccb4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing that out. Most of the latest models typically provide only overall scores rather than the fine-grained scores needed for radar graphs. Given the fast-paced evolution of these models, it's challenging for us to keep the radar graph continuously updated. However, we are actively maintaining the leaderboard, which you can find here: Leaderboard.
83ccb4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting back! I indeed notices many models only providing overall score. Is there a specific reason for this? It would be nice to have more complete score per category.
83ccb4c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible reason may be that these models are evaluated on a large number of datasets. Providing the overall scores makes it easier to compare different models.