Orchestrating RunPod's Workloads Using dstack
Today, we're announcing the integration between Runpod and dstack, an open-source orchestration engine, that aims to simplify the development, training, and deployment of AI models while leveraging the open-source ecosystem.
What is dstack?
While dstack shares a number of similarities with Kubernetes, it is more lightweight and focuses entirely on the training and deployment of AI workloads. With dstack, you can describe workloads declaratively and conveniently run them via the CLI. it supports development environments, tasks, and services.
Getting Started With RunPod and dstack
To use RunPod with dstack, you only need to install dstack and configure it with your RunPod API key.
pip install dstack[all]
Then, specify your RunPod API key in ~/.dstack/server/config.yml
:
projects:
- name: main
backends:
- type: runpod
creds:
type: api_key
api_key: US9XTPDIV8AR42MMINY8TCKRB8S4E7LNRQ6CAUQ9
Once it's configured, the dstack server can be started:
dstack server
Applying ~/.dstack/server/config.yml...
The admin token is bbae0f28-d3dd-4820-bf61-8f4bb40815da
The server is running at http://127.0.0.1:3000/
Now, you can use dstack's CLI (or API) to run and manage workloads on RunPod.
The Capabilities dstack Brings To RunPod Users
dstack supports three types of configurations: dev-environment
(for provisioning interactive development environments), task
(for running training, fine-tuning, and various other jobs), and service
(for deploying models).
Here's an example of a task:
type: task
python: "3.11" # Or specify your Docker image
env:
- HUGGING_FACE_HUB_TOKEN
- HF_HUB_ENABLE_HF_TRANSFER=1
commands:
- pip install -r fine-tuning/qlora/requirements.txt
- tensorboard --logdir results/runs &
- python fine-tuning/qlora/train.py --merge_and_push ${{ run.args }}
ports:
- 6006
resources:
gpu: 16GB..24GB
Once defined, one can run the configuration via the dstack CLI.
dstack run . -f fine-tuning/qlora/train.dstack.yml
It will automatically provision resources via RunPod, and take care of everything, including uploading code, port-forwarding, etc.
You can find more examples of various training and deployment configurations here, we'd love to hear about your deployments either in the dstack or RunPod Discord servers!