-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMLU Benchmarks #1163
base: main
Are you sure you want to change the base?
MMLU Benchmarks #1163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Do we have logs to see how the test results looks like?
Here is the output for llama3.1-8b with default 0-shot prompt: Final accuracy on MMLU dataset: 0.6413 Subcategory Accuracies: Category Accuracies: |
MMLU Benchmark Script (0-shot)
Tests
Tested for llama2-7b and llama3.1-8b
Checklist
Before submitting this PR, please make sure (put X in square brackets):