How to Fine-Tune LLMs with Axolotl on RunPod

Learn how to fine-tune large language models (LLMs) using Axolotl on RunPod. This step-by-step guide covers setup, configuration, and training with LoRA, 8-bit quantization, and DeepSpeed—all on scalable GPU infrastructure.

How to Fine-Tune LLMs with Axolotl on RunPod

Introduction

Axolotl offers a range of tools for fine-tuning language models (LLMs) with pre-trained weights and support frameworks like Hugging Face Transformers. RunPod is a scalable GPU cloud server provider that provides good environments for running machine learning workloads, which makes it a good option for high-resource LLM fine-tuning tasks. This tutorial will show how to set up Axolotl on RunPod to streamline LLM fine-tuning.

Prerequisites

To get the best out of this guide, you need specific resources and technical skills:

  • A high-end GPU, a compatible OS, and Python 3.8 or higher.
  • A RunPod account.
  • Proficiency with basic Linux commands, python, and model-finetuning principles.

Setting Up the Environment on RunPod

Choosing a RunPod instance

When selecting your instance, you should meet your model’s demand and pick the accurate GPU, storage, and RAM. Running a 7B-parameter model on a single A100 with 40GB VRAM might scale but larger models like 13B or above will not scale on that same instance but a multi-GPU instance or an A100 with 80GB VRAM.

There’s an overview of the hourly cost involved in running each instance type on RunPod’s pricing page, and you can choose based on your workload requirements and budget.

If you'd like to skip the setup below, feel free to just deploy this axolotl template by winglian. If you'd rather install it from scratch, you can do that in any Pytorch pod.

Installing Axolotl and Setup

Environment Setup

Create a virtual environment for your project if you prefer:

python3 -m venv axolotl_env
source axolotl_env/bin/activate

Install axolotl

You can easily install Axolotl on the terminal from GitHub with the code below:

pip install "git+https://github.com/OpenAccess-AI-Collective/axolotl.git#egg=axolotl[flash-attn,deepspeed]"

Data preparation for fine-tuning

Axoltol supports data in different formats like CSV, JSON, etc, so structuring the dataset to meet the training, validation, testing, and testing models is important.

Uploading data to RunPod

You can transfer the dataset to RunPod via SCP or use cloud storage like S3 and SFTP. For example, using SCP to transfer a dataset file:

scp path/to/dataset.json user@runpod_ip:/path/to/upload/

If you are working with a small dataset, you could easily simply drag and drop it into the pod with Jupyter Notebook, or upload it using runpodctl.

Data formatting

[
  {
    "instruction": "Write a poem about mountains",
    "input": "",
    "output": "Majestic peaks that touch the sky..."
  },
  {
    "instruction": "Summarize this text",
    "input": "AI is transforming industries around the world...",
    "output": "AI is driving global industrial change."
  }
]

Configuring Axolotl for fine-tuning

Creating a configuration file

Axolotl uses YAML configuration files. Create a file named config.yml with the following structure:

base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer

datasets:
  - path: /path/to/dataset.json
    type: json

dataset_prepared_path: /tmp/dataset_cache
val_set_size: 0.05
output_dir: ./output

adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - up_proj
  - down_proj

sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true

load_in_8bit: true
adapter_load_in_8bit: true

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
learning_rate: 2e-4
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
wandb_project: my-finetune-project

You might also look at the /examples/ folder for several premade .yml files that might also suit your needs.

Adjust parameters like base_model, model_type, lora_target_modules, and resources based on your specific model and hardware constraints.

Parameter explanation

  • Efficient training methods:
    • load_in_8bit: true: Uses 8-bit quantization to reduce VRAM usage
    • adapter: lora: Uses LoRA adapter for parameter-efficient fine-tuning
    • lora_r, lora_alpha: Controls the rank and scaling of LoRA adapters
  • Batch size and resources:
    • micro_batch_size: Size of each training batch
    • gradient_accumulation_steps: Accumulates gradients before updating weights
    • Adjust these based on your GPU memory

Running the fine-tuning process on RunPod

Start the fine-tuning process with:

accelerate launch -m axolotl.cli.train config.yml

For multi-GPU training with DeepSpeed:

deepspeed --num_gpus=2 -m axolotl.cli.train config.yml

Monitoring the training process

Monitor training progress directly in the terminal output. For more detailed monitoring:

  1. Weights & Biases: If you've configured wandb_project in your config, you can monitor training metrics in real-time at wandb.ai.
  2. TensorBoard: Axolotl saves logs that can be viewed with TensorBoard: --logdir ./output/tensorboard
  3. GPU Monitoring: Use RunPod's dashboard or run: -n 1 nvidia-smi

Evaluating and exporting the fine-tuned model

Evaluating the model’s performance

Evaluate your model with Axolotl's built-in evaluation:

accelerate launch -m axolotl.cli.evaluate config.yml

Conclusion

To maximize the efficiency and minimize the costs on RunPod:

  1. Select the perfect instance size for your model
  2. Use LoRA and quantization techniques to reduce VRAM requirements
  3. Utilize RunPod volumes for data persistence between sessions
  4. Monitor training actively with W&B or TensorBoard
  5. Consider spot instances for non-critical training jobs to reduce costs

For hyperparameter tuning, experiment with different learning rates, LoRA configurations, and batch sizes while monitoring the model's performance.