From 9f2535a2719d03231994d776acbccb83828c3d9b Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Mon, 4 Dec 2023 11:00:29 -0800 Subject: [PATCH] [docs] Update index and quicktour (#1191) * first draft * fix toctree * lora subby section * feedback * iframe height * feedback --- docs/source/_toctree.yml | 28 ++-- docs/source/index.md | 109 +------------ docs/source/package_reference/auto_class.md | 48 ++++++ docs/source/quicktour.md | 164 +++++++++++--------- 4 files changed, 164 insertions(+), 185 deletions(-) create mode 100644 docs/source/package_reference/auto_class.md diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 25992b3966e..6b31749f0a8 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -9,24 +9,26 @@ - title: Task guides sections: - - local: task_guides/image_classification_lora - title: Image classification using LoRA - local: task_guides/seq2seq-prefix-tuning title: Prefix tuning for conditional generation - local: task_guides/clm-prompt-tuning title: Prompt tuning for causal language modeling - - local: task_guides/semantic_segmentation_lora - title: Semantic segmentation using LoRA - local: task_guides/ptuning-seq-classification title: P-tuning for sequence classification - - local: task_guides/dreambooth_lora - title: Dreambooth fine-tuning with LoRA - - local: task_guides/token-classification-lora - title: LoRA for token classification - - local: task_guides/int8-asr - title: int8 training for automatic speech recognition - - local: task_guides/semantic-similarity-lora - title: Semantic similarity with LoRA + - title: LoRA + sections: + - local: task_guides/image_classification_lora + title: Image classification + - local: task_guides/semantic_segmentation_lora + title: Semantic segmentation + - local: task_guides/token-classification-lora + title: Token classification + - local: task_guides/semantic-similarity-lora + title: Semantic similarity + - local: task_guides/int8-asr + title: int8 training for automatic speech recognition + - local: task_guides/dreambooth_lora + title: DreamBooth - title: Developer guides sections: @@ -59,6 +61,8 @@ - title: Reference sections: + - local: package_reference/auto_class + title: AutoPeftModel - local: package_reference/peft_model title: PEFT model - local: package_reference/config diff --git a/docs/source/index.md b/docs/source/index.md index 5faf706e50e..cfb57c0678d 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -16,11 +16,9 @@ rendered properly in your Markdown viewer. # PEFT -🤗 PEFT, or Parameter-Efficient Fine-Tuning (PEFT), is a library for efficiently adapting pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. -PEFT methods only fine-tune a small number of (extra) model parameters, significantly decreasing computational and storage costs because fine-tuning large-scale PLMs is prohibitively costly. -Recent state-of-the-art PEFT techniques achieve performance comparable to that of full fine-tuning. +🤗 PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pretrained models to various downstream applications without fine-tuning all of a model's parameters because it is prohibitively costly. PEFT methods only fine-tune a small number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model. This makes it more accessible to train and store large language models (LLMs) on consumer hardware. -PEFT is seamlessly integrated with 🤗 Accelerate for large-scale models leveraging DeepSpeed and [Big Model Inference](https://huggingface.co/docs/accelerate/usage_guides/big_modeling). +PEFT is integrated with the Transformers, Diffusers, and Accelerate libraries to provide a faster and easier way to load, train, and use large models for inference.
@@ -43,100 +41,9 @@ PEFT is seamlessly integrated with 🤗 Accelerate for large-scale models levera
-## Supported methods - -1. LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf) -2. Prefix Tuning: [Prefix-Tuning: Optimizing Continuous Prompts for Generation](https://aclanthology.org/2021.acl-long.353/), [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf) -3. P-Tuning: [GPT Understands, Too](https://arxiv.org/pdf/2103.10385.pdf) -4. Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf) -5. AdaLoRA: [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/abs/2303.10512) -6. [LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention](https://github.com/ZrrSkywalker/LLaMA-Adapter) -7. IA3: [Infused Adapter by Inhibiting and Amplifying Inner Activations](https://arxiv.org/abs/2205.05638) - -## Supported models - -The tables provided below list the PEFT methods and models supported for each task. To apply a particular PEFT method for -a task, please refer to the corresponding Task guides. - -### Causal Language Modeling - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -|--------------| ---- | ---- | ---- | ---- | ---- | -| GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | -| Bloom | ✅ | ✅ | ✅ | ✅ | ✅ | -| OPT | ✅ | ✅ | ✅ | ✅ | ✅ | -| GPT-Neo | ✅ | ✅ | ✅ | ✅ | ✅ | -| GPT-J | ✅ | ✅ | ✅ | ✅ | ✅ | -| GPT-NeoX-20B | ✅ | ✅ | ✅ | ✅ | ✅ | -| LLaMA | ✅ | ✅ | ✅ | ✅ | ✅ | -| ChatGLM | ✅ | ✅ | ✅ | ✅ | ✅ | - -### Conditional Generation - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | -| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | -| BART | ✅ | ✅ | ✅ | ✅ | ✅ | - -### Sequence Classification - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | -| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | -| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | -| GPT-2 | ✅ | ✅ | ✅ | ✅ | | -| Bloom | ✅ | ✅ | ✅ | ✅ | | -| OPT | ✅ | ✅ | ✅ | ✅ | | -| GPT-Neo | ✅ | ✅ | ✅ | ✅ | | -| GPT-J | ✅ | ✅ | ✅ | ✅ | | -| Deberta | ✅ | | ✅ | ✅ | | -| Deberta-v2 | ✅ | | ✅ | ✅ | | - -### Token Classification - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | --- | -| BERT | ✅ | ✅ | | | | -| RoBERTa | ✅ | ✅ | | | | -| GPT-2 | ✅ | ✅ | | | | -| Bloom | ✅ | ✅ | | | | -| OPT | ✅ | ✅ | | | | -| GPT-Neo | ✅ | ✅ | | | | -| GPT-J | ✅ | ✅ | | | | -| Deberta | ✅ | | | | | -| Deberta-v2 | ✅ | | | | | - -### Text-to-Image Generation - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | -| Stable Diffusion | ✅ | | | | | - - -### Image Classification - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | ---- | -| ViT | ✅ | | | | | -| Swin | ✅ | | | | | - -### Image to text (Multi-modal models) - -We have tested LoRA for [ViT](https://huggingface.co/docs/transformers/model_doc/vit) and [Swin](https://huggingface.co/docs/transformers/model_doc/swin) for fine-tuning on image classification. -However, it should be possible to use LoRA for any [ViT-based model](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=vit) from 🤗 Transformers. -Check out the [Image classification](/task_guides/image_classification_lora) task guide to learn more. If you run into problems, please open an issue. - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | -| Blip-2 | ✅ | | | | | - - -### Semantic Segmentation - -As with image-to-text models, you should be able to apply LoRA to any of the [segmentation models](https://huggingface.co/models?pipeline_tag=image-segmentation&sort=downloads). -It's worth noting that we haven't tested this with every architecture yet. Therefore, if you come across any issues, kindly create an issue report. - -| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 | -| --------- | ---- | ---- | ---- | ---- | ---- | -| SegFormer | ✅ | | | | | - + diff --git a/docs/source/package_reference/auto_class.md b/docs/source/package_reference/auto_class.md new file mode 100644 index 00000000000..c1b78a2c342 --- /dev/null +++ b/docs/source/package_reference/auto_class.md @@ -0,0 +1,48 @@ + + +# AutoPeftModels + +The `AutoPeftModel` classes loads the appropriate PEFT model for the task type by automatically inferring it from the configuration file. They are designed to quickly and easily load a PEFT model in a single line of code without having to worry about which exact model class you need or manually loading a [`PeftConfig`]. + +## AutoPeftModel + +[[autodoc]] auto.AutoPeftModel + - from_pretrained + +## AutoPeftModelForCausalLM + +[[autodoc]] auto.AutoPeftModelForCausalLM + +## AutoPeftModelForSeq2SeqLM + +[[autodoc]] auto.AutoPeftModelForSeq2SeqLM + +## AutoPeftModelForSequenceClassification + +[[autodoc]] auto.AutoPeftModelForSequenceClassification + +## AutoPeftModelForTokenClassification + +[[autodoc]] auto.AutoPeftModelForTokenClassification + +## AutoPeftModelForQuestionAnswering + +[[autodoc]] auto.AutoPeftModelForQuestionAnswering + +## AutoPeftModelForFeatureExtraction + +[[autodoc]] auto.AutoPeftModelForFeatureExtraction diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md index a6678f59a88..d7dae7b7ad4 100644 --- a/docs/source/quicktour.md +++ b/docs/source/quicktour.md @@ -16,21 +16,19 @@ rendered properly in your Markdown viewer. # Quicktour -🤗 PEFT contains parameter-efficient finetuning methods for training large pretrained models. The traditional paradigm is to finetune all of a model's parameters for each downstream task, but this is becoming exceedingly costly and impractical because of the enormous number of parameters in models today. Instead, it is more efficient to train a smaller number of prompt parameters or use a reparametrization method like low-rank adaptation (LoRA) to reduce the number of trainable parameters. +PEFT offers parameter-efficient methods for finetuning large pretrained models. The traditional paradigm is to finetune all of a model's parameters for each downstream task, but this is becoming exceedingly costly and impractical because of the enormous number of parameters in models today. Instead, it is more efficient to train a smaller number of prompt parameters or use a reparametrization method like low-rank adaptation (LoRA) to reduce the number of trainable parameters. -This quicktour will show you 🤗 PEFT's main features and help you train large pretrained models that would typically be inaccessible on consumer devices. You'll see how to train the 1.2B parameter [`bigscience/mt0-large`](https://huggingface.co/bigscience/mt0-large) model with LoRA to generate a classification label and use it for inference. +This quicktour will show you PEFT's main features and how you can train or run inference on large models that would typically be inaccessible on consumer devices. -## PeftConfig +## Train -Each 🤗 PEFT method is defined by a [`PeftConfig`] class that stores all the important parameters for building a [`PeftModel`]. +Each PEFT method is defined by a [`PeftConfig`] class that stores all the important parameters for building a [`PeftModel`]. For example, to train with LoRA, load and create a [`LoraConfig`] class and specify the following parameters: -Because you're going to use LoRA, you'll need to load and create a [`LoraConfig`] class. Within `LoraConfig`, specify the following parameters: - -- the `task_type`, or sequence-to-sequence language modeling in this case -- `inference_mode`, whether you're using the model for inference or not -- `r`, the dimension of the low-rank matrices -- `lora_alpha`, the scaling factor for the low-rank matrices -- `lora_dropout`, the dropout probability of the LoRA layers +- `task_type`: the task to train for (sequence-to-sequence language modeling in this case) +- `inference_mode`: whether you're using the model for inference or not +- `r`: the dimension of the low-rank matrices +- `lora_alpha`: the scaling factor for the low-rank matrices +- `lora_dropout`: the dropout probability of the LoRA layers ```python from peft import LoraConfig, TaskType @@ -40,25 +38,21 @@ peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, -💡 See the [`LoraConfig`] reference for more details about other parameters you can adjust. +See the [`LoraConfig`] reference for more details about other parameters you can adjust, such as the modules to target or the bias type. -## PeftModel - -A [`PeftModel`] is created by the [`get_peft_model`] function. It takes a base model - which you can load from the 🤗 Transformers library - and the [`PeftConfig`] containing the instructions for how to configure a model for a specific 🤗 PEFT method. +Once the [`LoraConfig`] is setup, create a [`PeftModel`] with the [`get_peft_model`] function. It takes a base model - which you can load from the Transformers library - and the [`LoraConfig`] containing the parameters for how to configure a model for training with LoRA. -Start by loading the base model you want to finetune. +Load the base model you want to finetune. ```python from transformers import AutoModelForSeq2SeqLM -model_name_or_path = "bigscience/mt0-large" -tokenizer_name_or_path = "bigscience/mt0-large" -model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path) +model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large") ``` -Wrap your base model and `peft_config` with the `get_peft_model` function to create a [`PeftModel`]. To get a sense of the number of trainable parameters in your model, use the [`print_trainable_parameters`] method. In this case, you're only training 0.19% of the model's parameters! 🤏 +Wrap the base model and `peft_config` with the [`get_peft_model`] function to create a [`PeftModel`]. To get a sense of the number of trainable parameters in your model, use the [`print_trainable_parameters`] method. ```python from peft import get_peft_model @@ -68,83 +62,109 @@ model.print_trainable_parameters() "output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282" ``` -That is it 🎉! Now you can train the model using the 🤗 Transformers [`~transformers.Trainer`], 🤗 Accelerate, or any custom PyTorch training loop. +Out of [bigscience/mt0-large's](https://huggingface.co/bigscience/mt0-large) 1.2B parameters, you're only training 0.19% of them! -## Save and load a model +That is it 🎉! Now you can train the model with the Transformers [`~transformers.Trainer`], Accelerate, or any custom PyTorch training loop. -After your model is finished training, you can save your model to a directory using the [`~transformers.PreTrainedModel.save_pretrained`] function. You can also save your model to the Hub (make sure you log in to your Hugging Face account first) with the [`~transformers.PreTrainedModel.push_to_hub`] function. +For example, to train with the [`~transformers.Trainer`] class, setup a [`~transformers.TrainingArguments`] class with some training hyperparameters. -```python +```py +training_args = TrainingArguments( + output_dir="your-name/bigscience/mt0-large-lora", + learning_rate=1e-3, + per_device_train_batch_size=32, + per_device_eval_batch_size=32, + num_train_epochs=2, + weight_decay=0.01, + evaluation_strategy="epoch", + save_strategy="epoch", + load_best_model_at_end=True, +) +``` + +Pass the model, training arguments, dataset, tokenizer, and any other necessary component to the [`~transformers.Trainer`], and call [`~transformers.Trainer.train`] to start training. + +```py +trainer = Trainer( + model=model, + args=training_args, + train_dataset=tokenized_datasets["train"], + eval_dataset=tokenized_datasets["test"], + tokenizer=tokenizer, + data_collator=data_collator, + compute_metrics=compute_metrics, +) + +trainer.train() +``` + +### Save model + +After your model is finished training, you can save your model to a directory using the [`~transformers.PreTrainedModel.save_pretrained`] function. + +```py model.save_pretrained("output_dir") +``` + +You can also save your model to the Hub (make sure you're logged in to your Hugging Face account first) with the [`~transformers.PreTrainedModel.push_to_hub`] function. -# if pushing to Hub +```python from huggingface_hub import notebook_login notebook_login() -model.push_to_hub("my_awesome_peft_model") +model.push_to_hub("your-name/bigscience/mt0-large-lora") ``` -This only saves the incremental 🤗 PEFT weights that were trained, meaning it is super efficient to store, transfer, and load. For example, this [`bigscience/T0_3B`](https://huggingface.co/smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM) model trained with LoRA on the [`twitter_complaints`](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints/train) subset of the RAFT [dataset](https://huggingface.co/datasets/ought/raft) only contains two files: `adapter_config.json` and `adapter_model.bin`. The latter file is just 19MB! +Both methods only save the extra PEFT weights that were trained, meaning it is super efficient to store, transfer, and load. For example, this [facebook/opt-350m](https://huggingface.co/ybelkada/opt-350m-lora) model trained with LoRA only contains two files: `adapter_config.json` and `adapter_model.safetensors`. The `adapter_model.safetensors` file is just 6.3MB! -Easily load your model for inference using the [`~transformers.PreTrainedModel.from_pretrained`] function: +
+ +
The adapter weights for a opt-350m model stored on the Hub are only ~6MB compared to the full size of the model weights, which can be ~700MB.
+
-```diff - from transformers import AutoModelForCausalLM, AutoTokenizer -+ from peft import PeftModel, PeftConfig +## Inference -+ peft_model_id = "merve/Mistral-7B-Instruct-v0.2" -+ config = PeftConfig.from_pretrained(peft_model_id) - model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path) -+ model = PeftModel.from_pretrained(model, peft_model_id) - tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) + - model = model.to(device) - model.eval() - inputs = tokenizer("Tell me the recipe for chocolate chip cookie", return_tensors="pt") +Take a look at the [AutoPeftModel](package_reference/auto_class) API reference for a complete list of available `AutoPeftModel` classes. - with torch.no_grad(): - outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=10) - print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]) - 'Tell me the recipe for chocolate chip cookie dough. - 1. Preheat oven' -``` + -## Easy loading with Auto classes +Easily load any PEFT-trained model for inference with the [`AutoPeftModel`] class and the [`~transformers.PreTrainedModel.from_pretrained`] method: -If you have saved your adapter locally or on the Hub, you can leverage the `AutoPeftModelForxxx` classes and load any PEFT model with a single line of code: +```py +from peft import AutoPeftModelForCausalLM +from transformers import AutoTokenizer +import torch -```diff -- from peft import PeftConfig, PeftModel -- from transformers import AutoModelForCausalLM -+ from peft import AutoPeftModelForCausalLM +model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora") +tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m") -- peft_config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora") -- base_model_path = peft_config.base_model_name_or_path -- transformers_model = AutoModelForCausalLM.from_pretrained(base_model_path) -- peft_model = PeftModel.from_pretrained(transformers_model, peft_config) -+ peft_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora") -``` +model = model.to("cuda") +model.eval() +inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt") + +outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50) +print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]) -Currently, supported auto classes are: `AutoPeftModelForCausalLM`, `AutoPeftModelForSequenceClassification`, `AutoPeftModelForSeq2SeqLM`, `AutoPeftModelForTokenClassification`, `AutoPeftModelForQuestionAnswering` and `AutoPeftModelForFeatureExtraction`. For other tasks (e.g. Whisper, StableDiffusion), you can load the model with: +"Preheat the oven to 350 degrees and place the cookie dough in the center of the oven. In a large bowl, combine the flour, baking powder, baking soda, salt, and cinnamon. In a separate bowl, combine the egg yolks, sugar, and vanilla." +``` -```diff -- from peft import PeftModel, PeftConfig, AutoPeftModel -+ from peft import AutoPeftModel -- from transformers import WhisperForConditionalGeneration +For other tasks that aren't explicitly supported with an `AutoPeftModelFor` class - such as automatic speech recognition - you can still use the base [`AutoPeftModel`] class to load a model for the task. -- model_id = "smangrul/openai-whisper-large-v2-LORA-colab" +```py +from peft import AutoPeftModel -peft_model_id = "smangrul/openai-whisper-large-v2-LORA-colab" -- peft_config = PeftConfig.from_pretrained(peft_model_id) -- model = WhisperForConditionalGeneration.from_pretrained( -- peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto" -- ) -- model = PeftModel.from_pretrained(model, peft_model_id) -+ model = AutoPeftModel.from_pretrained(peft_model_id) +model = AutoPeftModel.from_pretrained("smangrul/openai-whisper-large-v2-LORA-colab") ``` ## Next steps -Now that you've seen how to train a model with one of the 🤗 PEFT methods, we encourage you to try out some of the other methods like prompt tuning. The steps are very similar to the ones shown in this quickstart; prepare a [`PeftConfig`] for a 🤗 PEFT method, and use the `get_peft_model` to create a [`PeftModel`] from the configuration and base model. Then you can train it however you like! +Now that you've seen how to train a model with one of the PEFT methods, we encourage you to try out some of the other methods like prompt tuning. The steps are very similar to the ones shown in the quicktour: + +1. prepare a [`PeftConfig`] for a PEFT method +2. use the [`get_peft_model`] method to create a [`PeftModel`] from the configuration and base model + +Then you can train it however you like! To load a PEFT model for inference, you can use the [`AutoPeftModel`] class. -Feel free to also take a look at the task guides if you're interested in training a model with a 🤗 PEFT method for a specific task such as semantic segmentation, multilingual automatic speech recognition, DreamBooth, and token classification. +Feel free to also take a look at the task guides if you're interested in training a model with another PEFT method for a specific task such as semantic segmentation, multilingual automatic speech recognition, DreamBooth, token classification, and more.