TST: Add regression tests #995

BenjaminBossan · 2023-10-05T14:37:56Z

This is a first step towards adding regression tests to the project. These tests allow us to load old adapters with new PEFT versions and ensure that the output generated by the model does not change.

The PR includes a framework for how to add regression artifacts and how to run regression tests based on those artifacts. Right now, only bnb + LoRA is covered, but it should be straightforward to add more tests.

Description

In general, for regression tests, we need two steps:

Creating the regression artifacts, in this case the adapter checkpoint and the expected output of the model.
Running the regression tests, i.e. loading the adapter and checking that the output of the model is the same as the expected output.

My approach is to re-use as much code as possible between those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling:

REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py

and to run the second step, we call:

pytest tests/regression/test_regression.py

Creating regression artifacts

The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory tests/regression/lora_opt-125m_bnb_4bit/0.5.0/ that contains adapter_model.bin and output.pt.

Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release.

The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g:

git checkout v0.5.0

before running the first step.

The first step will also skip the creation of regression artifacts if they already exist.

It is possible to circumvent all the aforementioned checks by setting the environment variable REGRESSION_FORCE_MODE to True like so:

REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py

You should only do this if you know exactly what you're doing.

Running regression tests

The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same.

If more than one version is discovered for a given test setting, all of them are tested.

Notes

As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub).

The regression tests in this included in this PR were used to check that #994 still allows to load checkpoints created with PEFT v0.5.0.

This is a first step towards adding regression tests to the project. These tests allow us to load old adapters with new PEFT versions and ensure that the output generated by the model does not change. The PR includes a framework for how to add regression artifacts and how to run regression tests based on those artifacts. Right now, only bnb + LoRA is covered, but it should be straightforward to add more tests. Description In general, for regression tests, we need two steps: 1. Creating the regression artifacts, in this case the adapter checkpoint and the expected output of the model. 2. Running the regression tests, i.e. loading the adapter and checking that the output of the model is the same as the expected output. My approach is to re-use as much code as possible betweeen those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling: REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py and to run the second step, we call: pytest tests/regression/test_regression.py Creating regression artifacts The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory tests/regression/lora_opt-125m_bnb_4bit/0.5.0/ that contains adapter_model.bin and output.pt. Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release. The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g: git checkout v0.5.0 before running the first step. The first step will also skip the creation of regression artifacts if they already exist. It is possible to circumvent all the aforementioned checks by setting the environment variable REGRESSION_FORCE_MODE to True like so: REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py You should only do this if you know exactly what you're doing. Running regression tests The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same. If more than one version is discovered for a given test setting, all of them are tested. Notes As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub).

HuggingFaceDocBuilderDev · 2023-10-05T14:42:19Z

The documentation is not available anymore as the PR was closed or merged.

BenjaminBossan · 2023-10-05T15:05:38Z

Tests are currently failing because bitsandbytes is not installed. Is there any specific reason why we don't have it for tests?

This PR supersedes huggingface#995. The description below is copied and modified from that PR. For some technical reasons, it was easier for me to create a new PR than to update the previous one, sorry for that. Description In general, for regression tests, we need two steps: 1. Creating the regression artifacts, in this case the adapter checkpoint and the expected output of the model. 2. Running the regression tests, i.e. loading the adapter and checking that the output of the model is the same as the expected output. My approach is to re-use as much code as possible between those two steps. Therefore, the same test script can be used for both, with only an environment variable to distinguish between the two. Step 1 is invoked by calling: `REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py` and to run the second step, we call: `pytest tests/regression/test_regression.py` Creating regression artifacts The first step will create an adapter checkpoint and an output for the given PEFT version and test setting in a new directory. E.g. it will create a directory `tests/regression/lora_opt-125m_bnb_4bit/0.5.0/` that contains adapter_model.bin and output.pt. Before this step runs, there is a check that the git repo is clean (no dirty worktree) and that the commit is tagged (i.e. corresponds to a release version of PEFT). Otherwise, we may accidentally create regression artifacts that do not correspond to any PEFT release. The easiest way to get such a clean state (say, for PEFT v0.5.0) is by checking out a tagged commit, e.g: `git checkout v0.5.0` before running the first step. The first step will also skip the creation of regression artifacts if they already exist. It is possible to circumvent all the aforementioned checks by setting the environment variable `REGRESSION_FORCE_MODE` to True like so: `REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py` You should only do this if you know exactly what you're doing. Running regression tests The second step is much simpler. It will load the adapters and the output created in the first step, and compare the output to the output from a new PEFT model using the loaded adapter. The outputs should be the same. If more than one version is discovered for a given test setting, all of them are tested. Notes As is, the adapters are stored in the git repo itself. Since they're relatively small, the total size of the repo is still reasonable. However, it could be better to store those adapters on HF Hub instead. This would, however, make things a bit more complicated (not sure how to parse directories etc. on Hub). The regression tests in this included in this PR were used to check that huggingface#994 still allows to load checkpoints created with PEFT v0.6.1.

BenjaminBossan · 2023-11-10T17:20:27Z

Closing in favor of #1115

BenjaminBossan requested review from pacman100 and younesbelkada October 5, 2023 15:05

BenjaminBossan added 2 commits October 6, 2023 12:11

Require --regression flag to run regression tests

9b95fa6

Make style

5e32beb

BenjaminBossan mentioned this pull request Nov 7, 2023

FEAT: Make safe serialization the default one #1088

Merged

huggingface deleted a comment from github-actions bot Nov 9, 2023

BenjaminBossan mentioned this pull request Nov 10, 2023

Refactor base layer pattern #1106

Merged

BenjaminBossan mentioned this pull request Nov 10, 2023

TST: Add regression tests 2 #1115

Merged

BenjaminBossan closed this Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Add regression tests #995

TST: Add regression tests #995

BenjaminBossan commented Oct 5, 2023

HuggingFaceDocBuilderDev commented Oct 5, 2023 •

edited

Loading

BenjaminBossan commented Oct 5, 2023

BenjaminBossan commented Nov 10, 2023

TST: Add regression tests #995

TST: Add regression tests #995

Conversation

BenjaminBossan commented Oct 5, 2023

Description

Creating regression artifacts

Running regression tests

Notes

HuggingFaceDocBuilderDev commented Oct 5, 2023 • edited Loading

BenjaminBossan commented Oct 5, 2023

BenjaminBossan commented Nov 10, 2023

HuggingFaceDocBuilderDev commented Oct 5, 2023 •

edited

Loading