From e4209b28da0dd8677cffb7af21e2f297eb7d07d4 Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Mon, 4 Dec 2023 11:45:26 -0800 Subject: [PATCH] [docs] API docs (#1196) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * first draft * fix path * fix all paths * typo * last typo 🤞 * fix toctree * typo * fix section title * feedback * update --- docs/source/_toctree.yml | 49 +++++++++++++++---- docs/source/package_reference/adalora.md | 31 ++++++++++++ .../source/package_reference/adapter_utils.md | 31 ++++++++++++ docs/source/package_reference/config.md | 2 +- docs/source/package_reference/ia3.md | 31 ++++++++++++ .../source/package_reference/llama_adapter.md | 31 ++++++++++++ docs/source/package_reference/loha.md | 31 ++++++++++++ docs/source/package_reference/lokr.md | 27 ++++++++++ docs/source/package_reference/lora.md | 31 ++++++++++++ .../multitask_prompt_tuning.md | 31 ++++++++++++ docs/source/package_reference/p_tuning.md | 31 ++++++++++++ docs/source/package_reference/peft_model.md | 2 +- docs/source/package_reference/peft_types.md | 27 ++++++++++ .../source/package_reference/prefix_tuning.md | 31 ++++++++++++ .../source/package_reference/prompt_tuning.md | 31 ++++++++++++ docs/source/package_reference/tuners.md | 48 ++++++------------ 16 files changed, 421 insertions(+), 44 deletions(-) create mode 100644 docs/source/package_reference/adalora.md create mode 100644 docs/source/package_reference/adapter_utils.md create mode 100644 docs/source/package_reference/ia3.md create mode 100644 docs/source/package_reference/llama_adapter.md create mode 100644 docs/source/package_reference/loha.md create mode 100644 docs/source/package_reference/lokr.md create mode 100644 docs/source/package_reference/lora.md create mode 100644 docs/source/package_reference/multitask_prompt_tuning.md create mode 100644 docs/source/package_reference/p_tuning.md create mode 100644 docs/source/package_reference/peft_types.md create mode 100644 docs/source/package_reference/prefix_tuning.md create mode 100644 docs/source/package_reference/prompt_tuning.md diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 6b31749f0a8..43029ad158e 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -59,13 +59,42 @@ - local: conceptual_guides/ia3 title: IA3 -- title: Reference - sections: - - local: package_reference/auto_class - title: AutoPeftModel - - local: package_reference/peft_model - title: PEFT model - - local: package_reference/config - title: Configuration - - local: package_reference/tuners - title: Tuners +- sections: + - sections: + - local: package_reference/auto_class + title: AutoPeftModel + - local: package_reference/peft_model + title: PEFT model + - local: package_reference/peft_types + title: PEFT types + - local: package_reference/config + title: Configuration + - local: package_reference/tuners + title: Tuner + title: Main classes + - sections: + - local: package_reference/adalora + title: AdaLoRA + - local: package_reference/ia3 + title: IA3 + - local: package_reference/llama_adapter + title: Llama-Adapter + - local: package_reference/loha + title: LoHa + - local: package_reference/lokr + title: LoKr + - local: package_reference/lora + title: LoRA + - local: package_reference/adapter_utils + title: LyCORIS + - local: package_reference/multitask_prompt_tuning + title: Multitask Prompt Tuning + - local: package_reference/p_tuning + title: P-tuning + - local: package_reference/prefix_tuning + title: Prefix tuning + - local: package_reference/prompt_tuning + title: Prompt tuning + title: Adapters + title: API reference + diff --git a/docs/source/package_reference/adalora.md b/docs/source/package_reference/adalora.md new file mode 100644 index 00000000000..9cc51d0e091 --- /dev/null +++ b/docs/source/package_reference/adalora.md @@ -0,0 +1,31 @@ + + +# AdaLoRA + +[AdaLoRA](https://hf.co/papers/2303.10512) is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters. + +The abstract from the paper is: + +*Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA*. + +## AdaLoraConfig + +[[autodoc]] tuners.adalora.config.AdaLoraConfig + +## AdaLoraModel + +[[autodoc]] tuners.adalora.model.AdaLoraModel \ No newline at end of file diff --git a/docs/source/package_reference/adapter_utils.md b/docs/source/package_reference/adapter_utils.md new file mode 100644 index 00000000000..8f8b4e6c7f7 --- /dev/null +++ b/docs/source/package_reference/adapter_utils.md @@ -0,0 +1,31 @@ + + +# LyCORIS + +[LyCORIS](https://hf.co/papers/2309.14859) (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) are LoRA-like matrix decomposition adapters that modify the cross-attention layer of the UNet. The [LoHa](loha) and [LoKr](lokr) methods inherit from the `Lycoris` classes here. + +## LycorisConfig + +[[autodoc]] tuners.lycoris_utils.LycorisConfig + +## LycorisLayer + +[[autodoc]] tuners.lycoris_utils.LycorisLayer + +## LycorisTuner + +[[autodoc]] tuners.lycoris_utils.LycorisTuner \ No newline at end of file diff --git a/docs/source/package_reference/config.md b/docs/source/package_reference/config.md index 075ce906f60..9a4f755e1cd 100644 --- a/docs/source/package_reference/config.md +++ b/docs/source/package_reference/config.md @@ -4,7 +4,7 @@ rendered properly in your Markdown viewer. # Configuration -The configuration classes stores the configuration of a [`PeftModel`], PEFT adapter models, and the configurations of [`PrefixTuning`], [`PromptTuning`], and [`PromptEncoder`]. They contain methods for saving and loading model configurations from the Hub, specifying the PEFT method to use, type of task to perform, and model configurations like number of layers and number of attention heads. +[`PeftConfigMixin`] is the base configuration class for storing the adapter configuration of a [`PeftModel`], and [`PromptLearningConfig`] is the base configuration class for soft prompt methods (p-tuning, prefix tuning, and prompt tuning). These base classes contain methods for saving and loading model configurations from the Hub, specifying the PEFT method to use, type of task to perform, and model configurations like number of layers and number of attention heads. ## PeftConfigMixin diff --git a/docs/source/package_reference/ia3.md b/docs/source/package_reference/ia3.md new file mode 100644 index 00000000000..3885fd9c602 --- /dev/null +++ b/docs/source/package_reference/ia3.md @@ -0,0 +1,31 @@ + + +# IA3 + +Infused Adapter by Inhibiting and Amplifying Inner Activations, or [IA3](https://hf.co/papers/2205.05638), is a method that adds three learned vectors to rescale the keys and values of the self-attention and encoder-decoder attention layers, and the intermediate activation of the position-wise feed-forward network. + +The abstract from the paper is: + +*Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)^3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available*. + +## IA3Config + +[[autodoc]] tuners.ia3.config.IA3Config + +## IA3Model + +[[autodoc]] tuners.ia3.model.IA3Model \ No newline at end of file diff --git a/docs/source/package_reference/llama_adapter.md b/docs/source/package_reference/llama_adapter.md new file mode 100644 index 00000000000..52e6c537b20 --- /dev/null +++ b/docs/source/package_reference/llama_adapter.md @@ -0,0 +1,31 @@ + + +# Llama-Adapter + +[Llama-Adapter](https://hf.co/papers/2303.16199) is a PEFT method specifically designed for turning Llama into an instruction-following model. The Llama model is frozen and only a set of adaptation prompts prefixed to the input instruction tokens are learned. Since randomly initialized modules inserted into the model can cause the model to lose some of its existing knowledge, Llama-Adapter uses zero-initialized attention with zero gating to progressively add the instructional prompts to the model. + +The abstract from the paper is: + +*We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter*. + +## AdaptionPromptConfig + +[[autodoc]] tuners.adaption_prompt.config.AdaptionPromptConfig + +## AdaptionPromptModel + +[[autodoc]] tuners.adaption_prompt.model.AdaptionPromptModel \ No newline at end of file diff --git a/docs/source/package_reference/loha.md b/docs/source/package_reference/loha.md new file mode 100644 index 00000000000..b4ca21ee14e --- /dev/null +++ b/docs/source/package_reference/loha.md @@ -0,0 +1,31 @@ + + +# LoHa + +Low-Rank Hadamard Product ([LoHa](https://huggingface.co/papers/2108.06098)), is similar to LoRA except it approximates the large weight matrix with more low-rank matrices and combines them with the Hadamard product. This method is even more parameter-efficient than LoRA and achieves comparable performance. + +The abstract from the paper is: + +*In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints, and thereby it has a far larger capacity. This property enables to achieve comparable performance while requiring 3 to 10 times lower communication costs than the model with the original layers, which is not achievable by the traditional low-rank methods. The efficiency of our method can be further improved by combining with other efficient FL optimizers. In addition, we extend our method to a personalized FL application, pFedPara, which separates parameters into global and local ones. We show that pFedPara outperforms competing personalized FL methods with more than three times fewer parameters*. + +## LoHaConfig + +[[autodoc]] tuners.loha.config.LoHaConfig + +## LoHaModel + +[[autodoc]] tuners.loha.model.LoHaModel \ No newline at end of file diff --git a/docs/source/package_reference/lokr.md b/docs/source/package_reference/lokr.md new file mode 100644 index 00000000000..5be43f85467 --- /dev/null +++ b/docs/source/package_reference/lokr.md @@ -0,0 +1,27 @@ + + +# LoKr + +Low-Rank Kronecker Product ([LoKr](https://hf.co/papers/2309.14859)), is a LoRA-variant method that approximates the large weight matrix with two low-rank matrices and combines them with the Kronecker product. LoKr also provides an optional third low-rank matrix to provide better control during fine-tuning. + +## LoKrConfig + +[[autodoc]] tuners.lokr.config.LoKrConfig + +## LoKrModel + +[[autodoc]] tuners.lokr.model.LoKrModel \ No newline at end of file diff --git a/docs/source/package_reference/lora.md b/docs/source/package_reference/lora.md new file mode 100644 index 00000000000..813b08211a2 --- /dev/null +++ b/docs/source/package_reference/lora.md @@ -0,0 +1,31 @@ + + +# LoRA + +Low-Rank Adaptation ([LoRA](https://huggingface.co/papers/2309.15223)) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned. + +The abstract from the paper is: + +*We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.*. + +## LoraConfig + +[[autodoc]] tuners.lora.config.LoraConfig + +## LoraModel + +[[autodoc]] tuners.lora.model.LoraModel \ No newline at end of file diff --git a/docs/source/package_reference/multitask_prompt_tuning.md b/docs/source/package_reference/multitask_prompt_tuning.md new file mode 100644 index 00000000000..26bbbf6dab0 --- /dev/null +++ b/docs/source/package_reference/multitask_prompt_tuning.md @@ -0,0 +1,31 @@ + + +# Multitask Prompt Tuning + +[Multitask Prompt Tuning](https://huggingface.co/papers/2303.02861) decomposes the soft prompts of each task into a single learned transferable prompt instead of a separate prompt for each task. The single learned prompt can be adapted for each task by multiplicative low rank updates. + +The abstract from the paper is: + +*Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters*. + +## MultitaskPromptTuningConfig + +[[autodoc]] tuners.multitask_prompt_tuning.config.MultitaskPromptTuningConfig + +## MultitaskPromptEmbedding + +[[autodoc]] tuners.multitask_prompt_tuning.model.MultitaskPromptEmbedding \ No newline at end of file diff --git a/docs/source/package_reference/p_tuning.md b/docs/source/package_reference/p_tuning.md new file mode 100644 index 00000000000..a35f7244c34 --- /dev/null +++ b/docs/source/package_reference/p_tuning.md @@ -0,0 +1,31 @@ + + +# P-tuning + +[P-tuning](https://hf.co/papers/2103.10385) adds trainable prompt embeddings to the input that is optimized by a prompt encoder to find a better prompt, eliminating the need to manually design prompts. The prompt tokens can be added anywhere in the input sequence, and p-tuning also introduces anchor tokens for improving performance. + +The abstract from the paper is: + +*While GPTs with traditional fine-tuning fail to achieve strong results on natural language understanding (NLU), we show that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning -- which employs trainable continuous prompt embeddings. On the knowledge probing (LAMA) benchmark, the best GPT recovers 64\% (P@1) of world knowledge without any additional text provided during test time, which substantially improves the previous best by 20+ percentage points. On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning. Importantly, we find that P-tuning also improves BERTs' performance in both few-shot and supervised settings while largely reducing the need for prompt engineering. Consequently, P-tuning outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.*. + +## PromptEncoderConfig + +[[autodoc]] tuners.p_tuning.config.PromptEncoderConfig + +## PromptEncoder + +[[autodoc]] tuners.p_tuning.model.PromptEncoder \ No newline at end of file diff --git a/docs/source/package_reference/peft_model.md b/docs/source/package_reference/peft_model.md index a7bbcda9da9..0c98a918528 100644 --- a/docs/source/package_reference/peft_model.md +++ b/docs/source/package_reference/peft_model.md @@ -4,7 +4,7 @@ rendered properly in your Markdown viewer. # Models -[`PeftModel`] is the base model class for specifying the base Transformer model and configuration to apply a PEFT method to. The base `PeftModel` contains methods for loading and saving models from the Hub, and supports the [`PromptEncoder`] for prompt learning. +[`PeftModel`] is the base model class for specifying the base Transformer model and configuration to apply a PEFT method to. The base `PeftModel` contains methods for loading and saving models from the Hub. ## PeftModel diff --git a/docs/source/package_reference/peft_types.md b/docs/source/package_reference/peft_types.md new file mode 100644 index 00000000000..55edbbd21a4 --- /dev/null +++ b/docs/source/package_reference/peft_types.md @@ -0,0 +1,27 @@ + + +# PEFT types + +[`PeftType`] includes the supported adapters in PEFT, and [`TaskType`] includes PEFT-supported tasks. + +## PeftType + +[[autodoc]] utils.peft_types.PeftType + +## TaskType + +[[autodoc]] utils.peft_types.TaskType \ No newline at end of file diff --git a/docs/source/package_reference/prefix_tuning.md b/docs/source/package_reference/prefix_tuning.md new file mode 100644 index 00000000000..62df037bb0a --- /dev/null +++ b/docs/source/package_reference/prefix_tuning.md @@ -0,0 +1,31 @@ + + +# Prefix tuning + +[Prefix tuning](https://hf.co/papers/2101.00190) prefixes a series of task-specific vectors to the input sequence that can be learned while keeping the pretrained model frozen. The prefix parameters are inserted in all of the model layers. + +The abstract from the paper is: + +*Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training*. + +## PrefixTuningConfig + +[[autodoc]] tuners.prefix_tuning.config.PrefixTuningConfig + +## PrefixEncoder + +[[autodoc]] tuners.prefix_tuning.model.PrefixEncoder \ No newline at end of file diff --git a/docs/source/package_reference/prompt_tuning.md b/docs/source/package_reference/prompt_tuning.md new file mode 100644 index 00000000000..61dbb6a2e93 --- /dev/null +++ b/docs/source/package_reference/prompt_tuning.md @@ -0,0 +1,31 @@ + + +# Prompt tuning + +[Prompt tuning](https://hf.co/papers/2104.08691) adds task-specific prompts to the input, and these prompt parameters are updated independently of the pretrained model parameters which are frozen. + +The abstract from the paper is: + +*In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning*. + +## PromptTuningConfig + +[[autodoc]] tuners.prompt_tuning.config.PromptTuningConfig + +## PromptEmbedding + +[[autodoc]] tuners.prompt_tuning.model.PromptEmbedding \ No newline at end of file diff --git a/docs/source/package_reference/tuners.md b/docs/source/package_reference/tuners.md index a4b7305864b..ae059462471 100644 --- a/docs/source/package_reference/tuners.md +++ b/docs/source/package_reference/tuners.md @@ -1,43 +1,27 @@ - - -# Tuners - -Each tuner (or PEFT method) has a configuration and model. - -## LoRA - -For finetuning a model with LoRA. - -[[autodoc]] LoraConfig + -## Prompt tuning +# Tuners -[[autodoc]] tuners.prompt_tuning.PromptTuningConfig +A tuner (or adapter) is a module that can be plugged into a `torch.nn.Module`. [`BaseTuner`] base class for other tuners and provides shared methods and attributes for preparing an adapter configuration and replacing a target module with the adapter module. [`BaseTunerLayer`] is a base class for adapter layers. It offers methods and attributes for managing adapters such as activating and disabling adapters. -[[autodoc]] tuners.prompt_tuning.PromptEmbedding +## BaseTuner -## IA3 +[[autodoc]] tuners.tuners_utils.BaseTuner -[[autodoc]] tuners.ia3.IA3Config +## BaseTunerLayer -[[autodoc]] tuners.ia3.IA3Model \ No newline at end of file +[[autodoc]] tuners.tuners_utils.BaseTunerLayer \ No newline at end of file