-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add regression tests #995
Closed
BenjaminBossan
wants to merge
3
commits into
huggingface:main
from
BenjaminBossan:TST-regression-tests
Closed
TST: Add regression tests #995
BenjaminBossan
wants to merge
3
commits into
huggingface:main
from
BenjaminBossan:TST-regression-tests
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a first step towards adding regression tests to the project. These tests allow us to load old adapters with new PEFT versions and ensure that the output generated by the model does not change. The PR includes a framework for how to add regression artifacts and how to run regression tests based on those artifacts. Right now, only bnb + LoRA is covered, but it should be straightforward to add more tests. Description In general, for regression tests, we need two steps: 1. Creating the regression artifacts, in this case the adapter checkpoint and the expected output of the model. 2. Running the regression tests, i.e. loading the adapter and checking that the output of the model is the same as the expected output. My approach is to re-use as much code as possible betweeen those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling: REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py and to run the second step, we call: pytest tests/regression/test_regression.py Creating regression artifacts The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory tests/regression/lora_opt-125m_bnb_4bit/0.5.0/ that contains adapter_model.bin and output.pt. Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release. The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g: git checkout v0.5.0 before running the first step. The first step will also skip the creation of regression artifacts if they already exist. It is possible to circumvent all the aforementioned checks by setting the environment variable REGRESSION_FORCE_MODE to True like so: REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py You should only do this if you know exactly what you're doing. Running regression tests The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same. If more than one version is discovered for a given test setting, all of them are tested. Notes As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub).
The documentation is not available anymore as the PR was closed or merged. |
Tests are currently failing because bitsandbytes is not installed. Is there any specific reason why we don't have it for tests? |
BenjaminBossan
added a commit
to BenjaminBossan/peft
that referenced
this pull request
Nov 10, 2023
This PR supersedes huggingface#995. The description below is copied and modified from that PR. For some technical reasons, it was easier for me to create a new PR than to update the previous one, sorry for that. Description In general, for regression tests, we need two steps: 1. Creating the regression artifacts, in this case the adapter checkpoint and the expected output of the model. 2. Running the regression tests, i.e. loading the adapter and checking that the output of the model is the same as the expected output. My approach is to re-use as much code as possible between those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling: `REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py` and to run the second step, we call: `pytest tests/regression/test_regression.py` Creating regression artifacts The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory `tests/regression/lora_opt-125m_bnb_4bit/0.5.0/` that contains adapter_model.bin and output.pt. Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release. The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g: `git checkout v0.5.0` before running the first step. The first step will also skip the creation of regression artifacts if they already exist. It is possible to circumvent all the aforementioned checks by setting the environment variable `REGRESSION_FORCE_MODE` to True like so: `REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py` You should only do this if you know exactly what you're doing. Running regression tests The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same. If more than one version is discovered for a given test setting, all of them are tested. Notes As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub). The regression tests in this included in this PR were used to check that huggingface#994 still allows to load checkpoints created with PEFT v0.6.1.
Closing in favor of #1115 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first step towards adding regression tests to the project. These tests allow us to load old adapters with new PEFT versions and ensure that the output generated by the model does not change.
The PR includes a framework for how to add regression artifacts and how to run regression tests based on those artifacts. Right now, only bnb + LoRA is covered, but it should be straightforward to add more tests.
Description
In general, for regression tests, we need two steps:
My approach is to re-use as much code as possible between those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling:
REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py
and to run the second step, we call:
pytest tests/regression/test_regression.py
Creating regression artifacts
The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory
tests/regression/lora_opt-125m_bnb_4bit/0.5.0/
that contains adapter_model.bin and output.pt.Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release.
The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g:
git checkout v0.5.0
before running the first step.
The first step will also skip the creation of regression artifacts if they already exist.
It is possible to circumvent all the aforementioned checks by setting the environment variable
REGRESSION_FORCE_MODE
to True like so:REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py
You should only do this if you know exactly what you're doing.
Running regression tests
The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same.
If more than one version is discovered for a given test setting, all of them are tested.
Notes
As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub).
The regression tests in this included in this PR were used to check that #994 still allows to load checkpoints created with PEFT v0.5.0.