Skip to content

Commit

Permalink
Merge pull request #6 from zenml-io/feature/OSSK-569-accelerated-temp…
Browse files Browse the repository at this point in the history
…late

Accelerated template
  • Loading branch information
avishniakov authored Jun 20, 2024
2 parents e379d3a + a38ec25 commit 91e6f82
Show file tree
Hide file tree
Showing 22 changed files with 388 additions and 130 deletions.
2 changes: 1 addition & 1 deletion .github/actions/llm_finetuning_template_test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ runs:
- name: Run pytests
shell: bash
run: |
pytest ./local_checkout/tests
pytest -s ./local_checkout/tests
- name: Clean-up
shell: bash
Expand Down
4 changes: 4 additions & 0 deletions copier.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ steps_of_finetuning:
type: int
help: The number of steps of finetuning job.
default: 300
use_fast_tokenizer:
type: bool
help: Wether to use the fast tokenization or not, make sure your base model supports that
default: false
cuda_version:
type: str
help: The available cuda version. (Only relevant when using a remote orchestrator)
Expand Down
56 changes: 36 additions & 20 deletions template/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ pip install -r requirements.txt

### 👷 Combined feature engineering and finetuning pipeline

> [!WARNING]
> All steps of this pipeline have a `clean_gpu_memory(force=True)` at the beginning. This is used to ensure that the memory is properly cleared after previous steps.
>
> This functionality might affect other GPU processes running on the same environment, so if you don't want to clean the GPU memory between the steps, you can delete those utility calls from all steps.
The easiest way to get started with just a single command is to run the finetuning pipeline with the `orchestrator_finetune.yaml` configuration file, which will do data preparation, model finetuning, evaluation with [Rouge](https://huggingface.co/spaces/evaluate-metric/rouge) and promotion:

```shell
Expand All @@ -50,6 +55,17 @@ When running the pipeline like this, the trained model will be stored in the Zen
<br/>
</div>

### ⚡ Accelerate your finetuning

Do you want to benefit from multi-GPU-training with Distributed Data Parallelism (DDP)? Then you can use other configuration files prepared for this purpose.
For example, `orchestrator_finetune.yaml` can run a finetuning of the [`{{ model_repository }}`](https://huggingface.co/{{ model_repository }}) powered by [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/en/index) on all GPUs available in the environment. To do so, just call:

```shell
python run.py --config orchestrator_finetune.yaml --accelerate
```

Under the hood, the finetuning step will spin up the accelerated job using the step code, which will run on all available GPUs.

## ☁️ Running with a step operator in the stack

To finetune an LLM on remote infrastructure, you can either use a remote orchestrator or a remote step operator. Follow these steps to set up a complete remote stack:
Expand Down Expand Up @@ -80,26 +96,26 @@ The project loosely follows [the recommended ZenML project structure](https://do

```
.
├── configs # pipeline configuration files
│ ├── orchestrator_finetune.yaml # default local or remote orchestrator
│ └── remote_finetune.yaml # default step operator configuration
├── configs # pipeline configuration files
│ ├── orchestrator_finetune.yaml # default local or remote orchestrator configuration
│ └── remote_finetune.yaml # default step operator configuration
├── materializers
│ └── directory_materializer.py # custom materializer to push whole directories to the artifact store and back
├── pipelines # `zenml.pipeline` implementations
│ └── train.py # Finetuning and evaluation pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── evaluate_model.py # evaluate base and finetuned models using Rouge metrics
│ ├── finetune.py # finetune the base model
│ ├── prepare_datasets.py # load and tokenize dataset
── promote.py # promote good models to target environment
── utils # utility functions
├── callbacks.py # custom callbacks
│ ├── cuda.py # helpers for CUDA
│ ├── loaders.py # loaders for models and data
│ ├── logging.py # logging helpers
│ └── tokenizer.py # load and tokenize
│ └── directory_materializer.py # custom materializer to push whole directories to the artifact store and back
├── pipelines # `zenml.pipeline` implementations
│ └── train.py # Finetuning and evaluation pipeline
├── steps # logically grouped `zenml.steps` implementations
│ ├── evaluate_model.py # evaluate base and finetuned models using Rouge metrics
│ ├── finetune.py # finetune the base model
│ ├── log_metadata.py # helper step to ensure that model metadata is always logged
── prepare_datasets.py # load and tokenize dataset
│ └── promote.py # promote good models to target environment
├── utils # utility functions
│ ├── callbacks.py # custom callbacks
│ ├── loaders.py # loaders for models and data
│ ├── logging.py # logging helpers
│ └── tokenizer.py # load and tokenize
├── .dockerignore
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack
├── README.md # this file
├── requirements.txt # extra Python dependencies
└── run.py # CLI tool to run pipelines on ZenML Stack
```
7 changes: 5 additions & 2 deletions template/configs/orchestrator_finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,18 @@ settings:
parent_image: pytorch/pytorch:2.2.2-{{ cuda_version }}-cudnn8-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
PJRT_DEVICE: CUDA
USE_TORCH_XLA: "false"
MKL_SERVICE_FORCE_INTEL: "1"

parameters:
base_model_id: {{ model_repository }}
use_fast: False
use_fast: {{ use_fast_tokenizer }}
load_in_4bit: True
system_prompt: |
{{ system_prompt.split("\n") | join("\n ") }}
Expand All @@ -32,7 +36,6 @@ steps:
dataset_name: {{ dataset_name }}

finetune:
enable_step_logs: False
parameters:
max_steps: {{ steps_of_finetuning }}
eval_steps: {{ steps_of_finetuning // 10 }}
Expand Down
19 changes: 17 additions & 2 deletions template/configs/remote_finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,18 @@ settings:
parent_image: pytorch/pytorch:2.2.2-{{ cuda_version }}-cudnn8-runtime
requirements: requirements.txt
python_package_installer: uv
python_package_installer_args:
system: null
apt_packages:
- git
environment:
PJRT_DEVICE: CUDA
USE_TORCH_XLA: "false"
MKL_SERVICE_FORCE_INTEL: "1"

parameters:
base_model_id: {{ model_repository }}
use_fast: False
use_fast: {{ use_fast_tokenizer }}
load_in_4bit: True
system_prompt: |
{{ system_prompt.split("\n") | join("\n ") }}
Expand All @@ -32,17 +36,28 @@ steps:
dataset_name: {{ dataset_name }}

finetune:
enable_step_logs: False
retry:
max_retries: 3
delay: 10
backoff: 2
step_operator: {{ step_operator }}
parameters:
max_steps: {{ steps_of_finetuning }}
eval_steps: {{ steps_of_finetuning // 10 }}
bf16: {{ bf16 }}

evaluate_finetuned:
retry:
max_retries: 3
delay: 10
backoff: 2
step_operator: {{ step_operator }}

evaluate_base:
retry:
max_retries: 3
delay: 10
backoff: 2
step_operator: {{ step_operator }}

promote:
Expand Down
47 changes: 35 additions & 12 deletions template/pipelines/train.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# {% include 'template/license_header' %}


from steps import evaluate_model, finetune, prepare_data, promote
from steps import evaluate_model, finetune, prepare_data, promote, log_metadata_from_step_artifact
from zenml import pipeline


Expand All @@ -13,7 +13,7 @@ def {{ product_name.replace("-","_") }}_full_finetune(
load_in_8bit: bool = False,
load_in_4bit: bool = False,
):
"""Pipeline for finetuning an LLM with peft.
"""Pipeline for finetuning an LLM with PEFT.
It will run the following steps:
Expand All @@ -22,36 +22,59 @@ def {{ product_name.replace("-","_") }}_full_finetune(
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")

datasets_dir = prepare_data(
base_model_id=base_model_id,
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
)
ft_model_dir = finetune(

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
use_fast=use_fast,
load_in_4bit=load_in_4bit,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
)
evaluate_model(
log_metadata_from_step_artifact(
"evaluate_base",
"base_model_rouge_metrics",
after=["evaluate_base"],
id="log_metadata_evaluation_base"
)

ft_model_dir = finetune(
base_model_id,
system_prompt,
datasets_dir,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_finetuned",
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
id="evaluate_finetuned",
)
promote(after=["evaluate_finetuned", "evaluate_base"])
log_metadata_from_step_artifact(
"evaluate_finetuned",
"finetuned_model_rouge_metrics",
after=["evaluate_finetuned"],
id="log_metadata_evaluation_finetuned"
)

promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
86 changes: 86 additions & 0 deletions template/pipelines/train_accelerated.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# {% include 'template/license_header' %}

from steps import (
evaluate_model,
finetune,
prepare_data,
promote,
log_metadata_from_step_artifact,
)
from zenml import pipeline
from zenml.integrations.huggingface.steps import run_with_accelerate


@pipeline
def {{ product_name.replace("-","_") }}_full_finetune(
system_prompt: str,
base_model_id: str,
use_fast: bool = True,
load_in_8bit: bool = False,
load_in_4bit: bool = False,
):
"""Pipeline for finetuning an LLM with PEFT powered by Accelerate.
It will run the following steps:
- prepare_data: prepare the datasets and tokenize them
- finetune: finetune the model
- evaluate_model: evaluate the base and finetuned model
- promote: promote the model to the target stage, if evaluation was successful
"""
if not load_in_8bit and not load_in_4bit:
raise ValueError(
"At least one of `load_in_8bit` and `load_in_4bit` must be True."
)
if load_in_4bit and load_in_8bit:
raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")

datasets_dir = prepare_data(
base_model_id=base_model_id,
system_prompt=system_prompt,
use_fast=use_fast,
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
None,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_base",
)
log_metadata_from_step_artifact(
"evaluate_base",
"base_model_rouge_metrics",
after=["evaluate_base"],
id="log_metadata_evaluation_base"
)

ft_model_dir = run_with_accelerate(finetune)(
base_model_id=base_model_id,
dataset_dir=datasets_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
)

evaluate_model(
base_model_id,
system_prompt,
datasets_dir,
ft_model_dir,
use_fast=use_fast,
load_in_8bit=load_in_8bit,
load_in_4bit=load_in_4bit,
id="evaluate_finetuned",
)
log_metadata_from_step_artifact(
"evaluate_finetuned",
"finetuned_model_rouge_metrics",
after=["evaluate_finetuned"],
id="log_metadata_evaluation_finetuned"
)

promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
Loading

0 comments on commit 91e6f82

Please sign in to comment.