Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic][Improvement] Testing overhaul #368

Open
JamesKunstle opened this issue Dec 16, 2024 · 1 comment
Open

[Epic][Improvement] Testing overhaul #368

JamesKunstle opened this issue Dec 16, 2024 · 1 comment

Comments

@JamesKunstle
Copy link
Contributor

This repo itself only has basic smoketests. In instructlab/instructlab, there are workflow tests that consume this library and confirm that training isn't outright broken, which is a good start.

There are multiple levels of testing that we should aspire to cover.

  1. Unit testing. These ought to prove that our utility functions (e.g. calculating packed batches with FFD) work, that our assertions (e.g. blocking unsupported model architectures) are obeyed, and that our organizational logic (e.g. loading checkpoints and restarting from a given epoch) function correctly.
  2. Correctness and performance testing. If we're making changes to the core training loop, we ought to be able to quickly invoke a test that checks indicators like (a) the behavior of the loss curve, (b) iteration and epoch training time.
  3. Hardware-stack testing: We now support five hardware runtime categories: CPU, MPS, Nvidia, AMD, Intel. We should be able to run appropriate tests on appropriate hardware without having to manually access machines and invoke tests ourselves.
@JamesKunstle
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant