How to Fine-Tune LLMs with Axolotl on RunPod
Learn how to fine-tune large language models (LLMs) using Axolotl on RunPod. This step-by-step guide covers setup, configuration, and training with LoRA, 8-bit quantization, and DeepSpeed—all on scalable GPU infrastructure.

Introduction
Axolotl offers a range of tools for fine-tuning language models (LLMs) with pre-trained weights and support frameworks like Hugging Face Transformers. RunPod is a scalable GPU cloud server provider that provides good environments for running machine learning workloads, which makes it a good option for high-resource LLM fine-tuning tasks. This tutorial will show how to set up Axolotl on RunPod to streamline LLM fine-tuning.
Prerequisites
To get the best out of this guide, you need specific resources and technical skills:
- A high-end GPU, a compatible OS, and Python 3.8 or higher.
- A RunPod account.
- Proficiency with basic Linux commands, python, and model-finetuning principles.
Setting Up the Environment on RunPod
Choosing a RunPod instance
When selecting your instance, you should meet your model’s demand and pick the accurate GPU, storage, and RAM. Running a 7B-parameter model on a single A100 with 40GB VRAM might scale but larger models like 13B or above will not scale on that same instance but a multi-GPU instance or an A100 with 80GB VRAM.
There’s an overview of the hourly cost involved in running each instance type on RunPod’s pricing page, and you can choose based on your workload requirements and budget.
If you'd like to skip the setup below, feel free to just deploy this axolotl template by winglian. If you'd rather install it from scratch, you can do that in any Pytorch pod.
Installing Axolotl and Setup
Environment Setup
Create a virtual environment for your project if you prefer:
python3 -m venv axolotl_env
source axolotl_env/bin/activate
Install axolotl
You can easily install Axolotl on the terminal from GitHub with the code below:
pip install "git+https://github.com/OpenAccess-AI-Collective/axolotl.git#egg=axolotl[flash-attn,deepspeed]"
Data preparation for fine-tuning
Axoltol supports data in different formats like CSV, JSON, etc, so structuring the dataset to meet the training, validation, testing, and testing models is important.
Uploading data to RunPod
You can transfer the dataset to RunPod via SCP or use cloud storage like S3 and SFTP. For example, using SCP to transfer a dataset file:
scp path/to/dataset.json user@runpod_ip:/path/to/upload/
If you are working with a small dataset, you could easily simply drag and drop it into the pod with Jupyter Notebook, or upload it using runpodctl.
Data formatting
[
{
"instruction": "Write a poem about mountains",
"input": "",
"output": "Majestic peaks that touch the sky..."
},
{
"instruction": "Summarize this text",
"input": "AI is transforming industries around the world...",
"output": "AI is driving global industrial change."
}
]
Configuring Axolotl for fine-tuning
Creating a configuration file
Axolotl uses YAML configuration files. Create a file named config.yml
with the following structure:
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
datasets:
- path: /path/to/dataset.json
type: json
dataset_prepared_path: /tmp/dataset_cache
val_set_size: 0.05
output_dir: ./output
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true
load_in_8bit: true
adapter_load_in_8bit: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
learning_rate: 2e-4
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
wandb_project: my-finetune-project
You might also look at the /examples/ folder for several premade .yml files that might also suit your needs.
Adjust parameters like base_model
, model_type
, lora_target_modules
, and resources based on your specific model and hardware constraints.
Parameter explanation
- Efficient training methods:
load_in_8bit: true
: Uses 8-bit quantization to reduce VRAM usageadapter: lora
: Uses LoRA adapter for parameter-efficient fine-tuninglora_r
,lora_alpha
: Controls the rank and scaling of LoRA adapters
- Batch size and resources:
micro_batch_size
: Size of each training batchgradient_accumulation_steps
: Accumulates gradients before updating weights- Adjust these based on your GPU memory
Running the fine-tuning process on RunPod
Start the fine-tuning process with:
accelerate launch -m axolotl.cli.train config.yml
For multi-GPU training with DeepSpeed:
deepspeed --num_gpus=2 -m axolotl.cli.train config.yml
Monitoring the training process
Monitor training progress directly in the terminal output. For more detailed monitoring:
- Weights & Biases: If you've configured
wandb_project
in your config, you can monitor training metrics in real-time at wandb.ai. - TensorBoard: Axolotl saves logs that can be viewed with TensorBoard:
--logdir ./output/tensorboard
- GPU Monitoring: Use RunPod's dashboard or run:
-n 1 nvidia-smi
Evaluating and exporting the fine-tuned model
Evaluating the model’s performance
Evaluate your model with Axolotl's built-in evaluation:
accelerate launch -m axolotl.cli.evaluate config.yml
Conclusion
To maximize the efficiency and minimize the costs on RunPod:
- Select the perfect instance size for your model
- Use LoRA and quantization techniques to reduce VRAM requirements
- Utilize RunPod volumes for data persistence between sessions
- Monitor training actively with W&B or TensorBoard
- Consider spot instances for non-critical training jobs to reduce costs
For hyperparameter tuning, experiment with different learning rates, LoRA configurations, and batch sizes while monitoring the model's performance.