Skip to content

Commit

Permalink
Merge branch 'huggingface:main' into patch-1
Browse files Browse the repository at this point in the history
  • Loading branch information
Akash190104 authored Dec 4, 2023
2 parents 136e92f + 6a57472 commit f196f99
Show file tree
Hide file tree
Showing 111 changed files with 11,371 additions and 1,584 deletions.
26 changes: 14 additions & 12 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,30 @@ on:
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}


jobs:
run_all_tests_single_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
strategy:
fail-fast: false
runs-on: [self-hosted, single-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb"
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
working-directory: peft/
shell: bash
steps:
- name: Update clone & pip install
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
git config --global --add safe.directory '*'
git fetch && git checkout ${{ github.sha }}
pip install -e . --no-deps
pip install pytest-reportlog
Expand All @@ -55,23 +57,23 @@ jobs:
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
strategy:
fail-fast: false
runs-on: [self-hosted, multi-gpu, nvidia-gpu, t4, ci]
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb"
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
working-directory: peft/
shell: bash
steps:
- name: Update clone
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
git config --global --add safe.directory '*'
git fetch && git checkout ${{ github.sha }}
pip install -e . --no-deps
pip install pytest-reportlog
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
needs: check_code_quality
strategy:
matrix:
python-version: ["3.8", "3.9", "3.10"]
python-version: ["3.8", "3.9", "3.10", "3.11"]
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
runs-on: ${{ matrix.os }}
steps:
Expand Down
51 changes: 45 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ Supported methods:
6. $(IA)^3$: [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](https://arxiv.org/abs/2205.05638)
7. MultiTask Prompt Tuning: [Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning](https://arxiv.org/abs/2303.02861)
8. LoHa: [FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning](https://arxiv.org/abs/2108.06098)
9. LoKr: [KronA: Parameter Efficient Tuning with Kronecker Adapter](https://arxiv.org/abs/2212.10650) based on [Navigating Text-To-Image Customization:From LyCORIS Fine-Tuning to Model Evaluation](https://arxiv.org/abs/2309.14859) implementation
10. LoftQ: [LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models](https://arxiv.org/abs/2310.08659)
11. OFT: [Controlling Text-to-Image Diffusion by Orthogonal Finetuning](https://arxiv.org/abs/2306.07280)

## Getting started

Expand Down Expand Up @@ -134,13 +137,13 @@ Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:
**NEW** ✨ Multi Adapter support and combining multiple LoRA adapters in a weighted combination
![peft lora dreambooth weighted adapter](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/weighted_adapter_dreambooth_lora.png)

**NEW** ✨ Dreambooth training for Stable Diffusion using LoHa adapter [`examples/stable_diffusion/train_dreambooth_loha.py`](examples/stable_diffusion/train_dreambooth_loha.py)
**NEW** ✨ Dreambooth training for Stable Diffusion using LoHa and LoKr adapters [`examples/stable_diffusion/train_dreambooth.py`](examples/stable_diffusion/train_dreambooth.py)

### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy
- Here is an example in [trl](https://github.com/lvwerra/trl) library using PEFT+INT8 for tuning policy model: [gpt2-sentiment_peft.py](https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt2-sentiment_peft.py) and corresponding [Blog](https://huggingface.co/blog/trl-peft)
- Example using PEFT for Instruction finetuning, reward model and policy : [stack_llama](https://github.com/lvwerra/trl/tree/main/examples/research_projects/stack_llama/scripts) and corresponding [Blog](https://huggingface.co/blog/stackllama)

### INT8 training of large models in Colab using PEFT LoRA and bits_and_bytes
### INT8 training of large models in Colab using PEFT LoRA and bitsandbytes

- Here is now a demo on how to fine tune [OPT-6.7b](https://huggingface.co/facebook/opt-6.7b) (14GB in fp16) in a Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing)

Expand Down Expand Up @@ -222,11 +225,13 @@ DeepSpeed version required `v0.8.0`. An example is provided in `~examples/condit
```

### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
An example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
An example is provided in [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb).


## Models support matrix

Find models that are supported out of the box below. Note that PEFT works with almost all models -- if it is not listed, you just need to [do some manual configuration](https://huggingface.co/docs/peft/developer_guides/custom_models).

### Causal Language Modeling
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
|--------------| ---- | ---- | ---- | ---- | ---- |
Expand All @@ -238,6 +243,7 @@ An example is provided in `~examples/causal_language_modeling/peft_lora_clm_acce
| GPT-NeoX-20B ||||||
| LLaMA ||||||
| ChatGLM ||||||
| Mistral || | | | |

### Conditional Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
Expand Down Expand Up @@ -273,9 +279,9 @@ An example is provided in `~examples/causal_language_modeling/peft_lora_clm_acce

### Text-to-Image Generation

| Model | LoRA | LoHa | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- | ---- |
| Stable Diffusion ||| | | |
| Model | LoRA | LoHa | LoKr | OFT | Prefix Tuning | P-Tuning | Prompt Tuning | IA3 |
| --------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
| Stable Diffusion ||| || | | |


### Image Classification
Expand Down Expand Up @@ -361,6 +367,8 @@ any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consume

## 🤗 PEFT as a utility library

### Injecting adapters directly into the model

Inject trainable adapters on any `torch` model using `inject_adapter_in_model` method. Note the method will make no further change to the model.

```python
Expand Down Expand Up @@ -395,6 +403,37 @@ dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```

Learn more about the [low level API in the docs](https://huggingface.co/docs/peft/developer_guides/low_level_api).

### Mixing different adapter types

Ususally, it is not possible to combine different adapter types in the same model, e.g. combining LoRA with AdaLoRA, LoHa, or LoKr. Using a mixed model, this can, however, be achieved:

```python
from peft import PeftMixedModel

model = AutoModelForCausalLM.from_pretrained("hf-internal-testing/tiny-random-OPTForCausalLM").eval()
peft_model = PeftMixedModel.from_pretrained(model, <path-to-adapter-0>, "adapter0")
peft_model.load_adapter(<path-to-adapter-1>, "adapter1")
peft_model.set_adapter(["adapter0", "adapter1"])
result = peft_model(**inputs)
```

The main intent is to load already trained adapters and use this only for inference. However, it is also possible to create a PEFT model for training by passing `mixed=True` to `get_peft_model`:

```python
from peft import get_peft_model, LoraConfig, LoKrConfig

base_model = ...
config0 = LoraConfig(...)
config1 = LoKrConfig(...)
peft_model = get_peft_model(base_model, config0, "adapter0", mixed=True)
peft_model.add_adapter(config1, "adapter1")
peft_model.set_adapter(["adapter0", "adapter1"])
for batch in dataloader:
...
```

## Contributing

If you would like to contribute to PEFT, please check out our [contributing guide](https://huggingface.co/docs/peft/developer_guides/contributing).
Expand Down
24 changes: 14 additions & 10 deletions docker/peft-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,9 @@ ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft

# Stage 2
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 AS build-image
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH

Expand All @@ -55,6 +46,19 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists*

# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft

RUN source activate peft && \
pip freeze | grep transformers

RUN echo "source activate peft" >> ~/.profile

# Activate the virtualenv
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
title: Working with custom models
- local: developer_guides/low_level_api
title: PEFT low level API
- local: developer_guides/mixed_models
title: Mixing different adapter types
- local: developer_guides/contributing
title: Contributing to PEFT
- local: developer_guides/troubleshooting
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# DeepSpeed

[DeepSpeed](https://www.deepspeed.ai/) is a library designed for speed and scale for distributed training of large models with billions of parameters. At its core is the Zero Redundancy Optimizer (ZeRO) that shards optimizer states (ZeRO-1), gradients (ZeRO-2), and parameters (ZeRO-3) across data parallel processes. This drastically reduces memory usage, allowing you to scale your training to billion parameter models. To unlock even more memory efficiency, ZeRO-Offload reduces GPU compute and memory by leveraging CPU resources during optimization.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Fully Sharded Data Parallel

[Fully sharded data parallel](https://pytorch.org/docs/stable/fsdp.html) (FSDP) is developed for distributed training of large pretrained models up to 1T parameters. FSDP achieves this by sharding the model parameters, gradients, and optimizer states across data parallel processes and it can also offload sharded model parameters to a CPU. The memory efficiency afforded by FSDP allows you to scale training to larger batch or model sizes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# IA3
Expand All @@ -28,10 +32,13 @@ Being similar to LoRA, IA3 carries many of the same advantages:
* Performance of models fine-tuned using IA3 is comparable to the performance of fully fine-tuned models.
* IA3 does not add any inference latency because adapter weights can be merged with the base model.

In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
of a Transformer model. Given the target layers for injecting IA3 parameters, the number of trainable parameters
can be determined based on the size of the weight matrices.
In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
of a Transformer model. To be specific, for transformer models, IA3 weights are added to the outputs of key and value layers, and to the input of the second feedforward layer
in each transformer block.

Given the target layers for injecting IA3 parameters, the number of trainable parameters
can be determined based on the size of the weight matrices.


## Common IA3 parameters in PEFT
Expand All @@ -43,10 +50,19 @@ As with other methods supported by PEFT, to fine-tune a model using IA3, you nee
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.

`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:
`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:

- `target_modules`: The modules (for example, attention blocks) to apply the IA3 vectors.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers. Note that `feedforward_modules` must be a subset of `target_modules`.
- `modules_to_save`: List of modules apart from IA3 layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.

## Example Usage

For the task of sequence classification, one can initialize the IA3 config for a Llama model as follows:

```py
peft_config = IA3Config(
task_type=TaskType.SEQ_CLS, target_modules=["k_proj", "v_proj", "down_proj"], feedforward_modules=["down_proj"]
)
```
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# LoRA
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->


# Prompting

Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as *prompting*. Prompting primes a frozen pretrained model for a specific downstream task by including a text prompt that describes the task or even demonstrates an example of the task. With prompting, you can avoid fully training a separate model for each downstream task, and use the same frozen pretrained model instead. This is a lot easier because you can use the same model for several different tasks, and it is significantly more efficient to train and store a smaller set of prompt parameters than to train all the model's parameters.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Contributing to PEFT
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,17 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Working with custom models

Some fine-tuning techniques, such as prompt tuning, are specific to language models. That means in 🤗 PEFT, it is
assumed a 🤗 Transformers model is being used. However, other fine-tuning techniques - like
[LoRA](./conceptual_guides/lora) - are not restricted to specific model types.
[LoRA](../conceptual_guides/lora) - are not restricted to specific model types.

In this guide, we will see how LoRA can be applied to a multilayer perceptron and a computer vision model from the [timm](https://huggingface.co/docs/timm/index) library.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# PEFT as a utility library
Expand All @@ -17,7 +21,7 @@ The development of this API has been motivated by the need for super users to no

## Supported tuner types

Currently the supported adapter types are the 'injectable' adapters, meaning adapters where an inplace modification of the model is sufficient to correctly perform the fine tuning. As such, only [LoRA](./conceptual_guides/lora), AdaLoRA and [IA3](./conceptual_guides/ia3) are currently supported in this API.
Currently the supported adapter types are the 'injectable' adapters, meaning adapters where an inplace modification of the model is sufficient to correctly perform the fine tuning. As such, only [LoRA](../conceptual_guides/lora), AdaLoRA and [IA3](../conceptual_guides/ia3) are currently supported in this API.

## `inject_adapter_in_model` method

Expand Down
Loading

0 comments on commit f196f99

Please sign in to comment.