-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor adapter composition implementation #591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me 👍
Can be merged once the backwards compatibility tests have been run (and once the part of the TODOs we want to do in this PR has been implemented).
And maybe rename tests_adapters/methods/test_adapter_common.py
to sth. like tests_adapters/methods/test_bottleneck_adapter.py
Co-authored-by: Leon Engländer <[email protected]>
@hSterz @lenglaender also updated the docs accordingly now. Feel free to have another look. |
The documentation looks good to me |
Follow-up to #591. This PR provides initial support for adapter composition in LoRA & (IA)³ modules. Currently LoRA & (IA)³ don't support composition. With this PR, the following blocks will be supported: **Stack, BatchSplit, Average, Parallel** Additionally, the LoRA implementation is refactored a bit in an effort to make it cleaner. ### Limitations - Split & Fuse compositions are **not** supported - LoRA/ (IA)³ composition is **not** supported for models using the `LoRAMergedLinear` implementation. These currently are: **GPT-2, DeBERTa (v1)**
Refactors the implementation of composition blocks in the model forward pass such that more of the logic is shared between all adapter methods.
Changes
methods
folderComposableAdapterLayerBase
as subclass ofAdapterLayerbase
as shared base class of all adapter methods that support composition.Stack
,Parallel
,BatchSplit
,Average
) which can be used by all subclasses.ComposableAdapterLayerBase
must be implemented. See https://github.com/calpt/adapter-transformers/blob/55fdc0cbe2f695914108a9c0e208127b13bc617e/src/adapters/methods/adapter_layer_base.py#L132-L222.NamedTuple
in the base class. Deriving classes should define concreteNamedTuple
-derived state classes. E.g., see https://github.com/calpt/adapter-transformers/blob/55fdc0cbe2f695914108a9c0e208127b13bc617e/src/adapters/methods/bottleneck.py#L22Split
composition block to support more than two child blocks. Splits are defined as a list of split indices, ie.Split("a", "b", "c", splits=[64, 64, 64])
. Breaking changeAdapterLayer
->BottleneckLayer
;PrefixTuningShim
->PrefixTuningLayer
Todo
cc @TimoImhof would be nice to run some backwards compability tests here :)