Orchestrating RunPod's Workloads Using dstack

Orchestrating RunPod's Workloads Using dstack

Today, we're announcing the integration between Runpod and dstack, an open-source orchestration engine, that aims to simplify the development, training, and deployment of AI models while leveraging the open-source ecosystem.

What is dstack?

While dstack shares a number of similarities with Kubernetes, it is more lightweight and focuses entirely on the training and deployment of AI workloads. With dstack, you can describe workloads declaratively and conveniently run them via the CLI. it supports development environments, tasks, and services.

Getting Started With RunPod and dstack

To use RunPod with dstack, you only need to install dstack and configure it with your RunPod API key.

pip install dstack[all]

Then, specify your RunPod API key in ~/.dstack/server/config.yml:

projects:
- name: main
  backends:
  - type: runpod
    creds:
      type: api_key
      api_key: US9XTPDIV8AR42MMINY8TCKRB8S4E7LNRQ6CAUQ9

Once it's configured, the dstack server can be started:

dstack server

Applying ~/.dstack/server/config.yml...

The admin token is bbae0f28-d3dd-4820-bf61-8f4bb40815da
The server is running at http://127.0.0.1:3000/

Now, you can use dstack's CLI (or API) to run and manage workloads on RunPod.

The Capabilities dstack Brings To RunPod Users

dstack supports three types of configurations: dev-environment (for provisioning interactive development environments), task (for running training, fine-tuning, and various other jobs), and service (for deploying models).

Here's an example of a task:

type: task

python: "3.11" # Or specify your Docker image

env:
  - HUGGING_FACE_HUB_TOKEN
  - HF_HUB_ENABLE_HF_TRANSFER=1

commands:
  - pip install -r fine-tuning/qlora/requirements.txt
  - tensorboard --logdir results/runs &
  - python fine-tuning/qlora/train.py --merge_and_push ${{ run.args }}
ports:
  - 6006

resources:
  gpu: 16GB..24GB

Once defined, one can run the configuration via the dstack CLI.

dstack run . -f fine-tuning/qlora/train.dstack.yml

It will automatically provision resources via RunPod, and take care of everything, including uploading code, port-forwarding, etc.

You can find more examples of various training and deployment configurations here, we'd love to hear about your deployments either in the dstack or RunPod Discord servers!