Issues reproducing Llama3.2-1B results on MMLU #2528

VoiceBeer · 2024-12-01T09:12:15Z

Results Log

hf (pretrained=/data/models/meta-llama/Llama-3.2-1B,dtype=auto,), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (4)

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mmlu	2	none		acc	↑	0.3107	±	0.0039
- humanities	2	none		acc	↑	0.2912	±	0.0066
- formal_logic	1	none	5	acc	↑	0.1984	±	0.0357
- high_school_european_history	1	none	5	acc	↑	0.3758	±	0.0378
- high_school_us_history	1	none	5	acc	↑	0.3284	±	0.0330
- high_school_world_history	1	none	5	acc	↑	0.3291	±	0.0306
- international_law	1	none	5	acc	↑	0.4215	±	0.0451
- jurisprudence	1	none	5	acc	↑	0.3889	±	0.0471
- logical_fallacies	1	none	5	acc	↑	0.2761	±	0.0351
- moral_disputes	1	none	5	acc	↑	0.2775	±	0.0241
- moral_scenarios	1	none	5	acc	↑	0.2380	±	0.0142
- philosophy	1	none	5	acc	↑	0.3183	±	0.0265
- prehistory	1	none	5	acc	↑	0.3673	±	0.0268
- professional_law	1	none	5	acc	↑	0.2595	±	0.0112
- world_religions	1	none	5	acc	↑	0.4386	±	0.0381
- other	2	none		acc	↑	0.3602	±	0.0086
- business_ethics	1	none	5	acc	↑	0.3600	±	0.0482
- clinical_knowledge	1	none	5	acc	↑	0.3358	±	0.0291
- college_medicine	1	none	5	acc	↑	0.2890	±	0.0346
- global_facts	1	none	5	acc	↑	0.2000	±	0.0402
- human_aging	1	none	5	acc	↑	0.3812	±	0.0326
- management	1	none	5	acc	↑	0.3301	±	0.0466
- marketing	1	none	5	acc	↑	0.4103	±	0.0322
- medical_genetics	1	none	5	acc	↑	0.3900	±	0.0490
- miscellaneous	1	none	5	acc	↑	0.4266	±	0.0177
- nutrition	1	none	5	acc	↑	0.3824	±	0.0278
- professional_accounting	1	none	5	acc	↑	0.2624	±	0.0262
- professional_medicine	1	none	5	acc	↑	0.2868	±	0.0275
- virology	1	none	5	acc	↑	0.4036	±	0.0382
- social sciences	2	none		acc	↑	0.3191	±	0.0084
- econometrics	1	none	5	acc	↑	0.2807	±	0.0423
- high_school_geography	1	none	5	acc	↑	0.3687	±	0.0344
- high_school_government_and_politics	1	none	5	acc	↑	0.3472	±	0.0344
- high_school_macroeconomics	1	none	5	acc	↑	0.2462	±	0.0218
- high_school_microeconomics	1	none	5	acc	↑	0.2605	±	0.0285
- high_school_psychology	1	none	5	acc	↑	0.3376	±	0.0203
- human_sexuality	1	none	5	acc	↑	0.3511	±	0.0419
- professional_psychology	1	none	5	acc	↑	0.3023	±	0.0186
- public_relations	1	none	5	acc	↑	0.3000	±	0.0439
- security_studies	1	none	5	acc	↑	0.3510	±	0.0306
- sociology	1	none	5	acc	↑	0.3284	±	0.0332
- us_foreign_policy	1	none	5	acc	↑	0.5200	±	0.0502
- stem	2	none		acc	↑	0.2829	±	0.0080
- abstract_algebra	1	none	5	acc	↑	0.2600	±	0.0441
- anatomy	1	none	5	acc	↑	0.3704	±	0.0417
- astronomy	1	none	5	acc	↑	0.2303	±	0.0343
- college_biology	1	none	5	acc	↑	0.2500	±	0.0362
- college_chemistry	1	none	5	acc	↑	0.2200	±	0.0416
- college_computer_science	1	none	5	acc	↑	0.3000	±	0.0461
- college_mathematics	1	none	5	acc	↑	0.2500	±	0.0435
- college_physics	1	none	5	acc	↑	0.2255	±	0.0416
- computer_security	1	none	5	acc	↑	0.5000	±	0.0503
- conceptual_physics	1	none	5	acc	↑	0.3447	±	0.0311
- electrical_engineering	1	none	5	acc	↑	0.2690	±	0.0370
- elementary_mathematics	1	none	5	acc	↑	0.2354	±	0.0219
- high_school_biology	1	none	5	acc	↑	0.3032	±	0.0261
- high_school_chemistry	1	none	5	acc	↑	0.2611	±	0.0309
- high_school_computer_science	1	none	5	acc	↑	0.3200	±	0.0469
- high_school_mathematics	1	none	5	acc	↑	0.2667	±	0.0270
- high_school_physics	1	none	5	acc	↑	0.2318	±	0.0345
- high_school_statistics	1	none	5	acc	↑	0.2778	±	0.0305
- machine_learning	1	none	5	acc	↑	0.3571	±	0.0455

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.3107	±	0.0039
- humanities	2	none	acc	↑	0.2912	±	0.0066
- other	2	none	acc	↑	0.3602	±	0.0086
- social sciences	2	none	acc	↑	0.3191	±	0.0084
- stem	2	none	acc	↑	0.2829	±	0.0080

Hi, thx for the work. I'm trying to reproduce the reported results for the Llama3.2-1B model on MMLU. The result I got is 0.3107, which is lower than the 0.493 reported by Meta

Could you pls let me know if there are any specific settings I might have missed? Thx in advance!

The text was updated successfully, but these errors were encountered:

wukaixingxp · 2024-12-02T02:44:22Z

Hi! @VoiceBeer Meta MMLU has a different prompt style, please follow this readme and checkout this PR to reproduce our MMLU number for 1B.

VoiceBeer · 2024-12-02T02:46:22Z

Thx @wukaixingxp！Appreciate it!

VoiceBeer closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues reproducing Llama3.2-1B results on MMLU #2528

Issues reproducing Llama3.2-1B results on MMLU #2528

VoiceBeer commented Dec 1, 2024

wukaixingxp commented Dec 2, 2024 •

edited

Loading

VoiceBeer commented Dec 2, 2024

Issues reproducing Llama3.2-1B results on MMLU #2528

Issues reproducing Llama3.2-1B results on MMLU #2528

Comments

VoiceBeer commented Dec 1, 2024

wukaixingxp commented Dec 2, 2024 • edited Loading

VoiceBeer commented Dec 2, 2024

wukaixingxp commented Dec 2, 2024 •

edited

Loading