Experimental Results

MT-Bench

Note: The baselines are used up to 2 times training FLOPs. OpenMoE-8B has similar FLOPs as a 1.6B dense model.

Model	Training Cost	Inference Cost	MT-Bench 1st Turn	MT-Bench 2nd Turn	MT-Bench Avg
GPT-J-6B (0.4T)	2.4	6	2.51	2.35	2.43
TinyLLaMA-1.1B (3T)	3.3	1.1	4.08	2.54	3.31
OpenLLaMA-3B (1T)	3	3	4.36	3.62	3.99
OpenMoE-8B/32E (1.1T)	1.8	1.6	4.69	3.26	3.98