train opt-125M from scratch #725

emrecanacikgoz · 2023-05-30T08:35:30Z

I couldn't find a detailed documentation (or a step-by-step guideline) about pre-training opt125 with exactly the same corpus and model architecture that you used in paper. In short, I would like to reproduce your smallest model results, from scratch.

Could you point out where can find the regarding guideline or provide anything else which can help? Special thanks,

Gusicun · 2023-07-08T08:07:01Z

same questions

CaptainSlowWZY · 2024-06-14T06:45:46Z

same questions

emrecanacikgoz added the question Further information is requested label May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train opt-125M from scratch #725

train opt-125M from scratch #725

emrecanacikgoz commented May 30, 2023

Gusicun commented Jul 8, 2023

CaptainSlowWZY commented Jun 14, 2024

train opt-125M from scratch #725

train opt-125M from scratch #725

Comments

emrecanacikgoz commented May 30, 2023

Gusicun commented Jul 8, 2023

CaptainSlowWZY commented Jun 14, 2024