Introducing Instant Clusters: Multi-Node AI Compute, On Demand

Introducing Instant Clusters: Multi-Node AI Compute, On Demand

Until now, RunPod users could generally scale up to 8 GPUs in a single pod. For most use cases—like running inference on Llama 70B or fine-tuning FLUX—that was plenty. But some workloads need more compute than a single server. They need to scale across multiple machines.

Today, we’re excited to launch Instant Clusters: a fast, on-demand way to deploy networked multi-node GPU clusters on RunPod’s platform.

With Instant Clusters, your GPUs aren’t limited to a single node anymore. You can now connect up to 8 nodes for up to 64 H100s, with high-speed interconnects that enable private node-to-node communication right out of the box. No delays talking to sales, waiting for integration, or long commits - launch large GPU clusters instantly.

Why This Matters

The rise of large-scale models like DeepSeek R1 (720B parameters) and LLaMA 405B (405B parameters) is pushing infrastructure to its limits. Even with 8x H100s and 640 GB of VRAM, you're nowhere near the 1600 GB+ needed to run these models efficiently.

To meet these demands, you need more than just powerful GPUs—you need infrastructure that can scale across machines. Instant Clusters make that possible.

Here are just a few things Instant Clusters make possible:

  • Inference on massive models with 720B+ parameters
  • Fine-tuning foundational models like LLaMA 405B
  • Training smaller foundational models from scratch (250M to 7B)
  • Accelerating simulations and research in fields like computational biology, physics, and finance

With support for 16 to 64 GPUs, and no long-term contracts, Instant Clusters give researchers and engineers the flexibility they’ve been waiting for.

How It Works

With Instant Clusters, you can spin up multi-node GPU clusters in minutes—no bare metal setup, no SSH juggling. Once your cluster is live, you can run distributed jobs using the frameworks you already know and love, like Slurm, Ray, or PyTorch’s torchrun utility.

Here’s how it looks in practice:

Example: Multi-Node Job with Pytorch

  1. Ensure main.py exists for every node, then run the following command on all nodes
export NCCL_DEBUG=WARN
torchrun \
  --nproc_per_node=$NUM_TRAINERS \
  --nnodes=$NUM_NODES \
  --node_rank=$NODE_RANK \
  --master_addr=$MASTER_ADDR \
  --master_port=$MASTER_PORT \
main.py

This example assumes:

  • You’re using PyTorch with torchrun for multi-node orchestration
  • main.py is your training script (e.g., training something like Mistral-7B)
  • Each node in your cluster runs this command with different values for node_rank, etc.

You can find more detailed implementation of these examples in our Documentation — but this shows how simple it is to get started with multi-node training on RunPod.

Instant Clusters vs. Bare Metal: What’s the Difference?

Some teams turn to long-term bare metal contracts when they need full system access or specialized configurations. That makes sense for many production environments—but it comes with tradeoffs: setup time, long-term commitments, and more manual overhead.

Instant Clusters offer a different approach:

  • Deploy clusters in minutes, not days
  • Pay only for what you use, billed by the second
  • Manage your cluster through RunPod’s intuitive UI, with templates, billing insights, and team-level controls

Technical Details

  • GPU Type: NVIDIA H100 (more GPU types upcoming)
  • Cluster Size: 16 to 64 GPUs (2 to 8 nodes)
  • Containerized: Runs on Docker
  • Interconnect: High-performance networking
  • End to end onboarding time: Just a few minutes

Ideal For

  • ML Engineers fine-tuning large models
  • Research labs training from scratch
  • Startups iterating quickly with flexible infrastructure
  • Open-source projects needing temporary access to high-end hardware

Instant Clusters bring the power of a full-scale training cluster to anyone—with no commitments, no setup headaches, and no overpriced contracts.

For full system-level access, Bare Metal is still your go-to. But for fast, flexible scaling without any commitments or contracts, Instant Clusters are a game changer.

Try It Now

Ready to deploy your first Instant Cluster? Head to your RunPod console and choose "Clusters" to get started.

0:00
/0:21

Questions? Feedback? Reach out at clusters@runpod.io or join us on Discord.


Instant Clusters are now live. Let the multi-node era begin.