What is the Difference Between Spot and On-Demand Instances?
We've gotten a lot of questions around "spot" instances vs "on-demand" instances. While these terms are familiar to those who have used AWS EC2 instances in the past, it's clear that many people are not familiar with these terms. Let's do a quick summary of the pros and cons of each type of instance and go through examples of use cases for each instance type.
Spot instances are originally a type of AWS EC2 instance that allows you to request spare compute capacity from AWS at a discounted price, but can be interrupted if that compute is needed elsewhere. On-demand instances are a type of AWS EC2 instance that allows you to pay for compute capacity by the hour with no long-term commitments. The key difference are the price and the availability of these instances.
RunPod instances are similar. Spot instances can be interrupted without notice, while on-demand instances are non-interruptible. Why would you ever choose a spot instance then, you ask? Well, spot instances are usually much cheaper (50%) than their on-demand counterparts. As of this writing, a spot A6000 instance on RunPod costs $0.232/gpu/hour while an on-demand instance costs $0.491/gpu/hour. This discount does, however, come with some risks as your workload can be abruptly stopped.
Spot instances are great for workloads that are stateless or have built-in checkpointing. For example, if you run a training algorithm that can automatically checkpoint to persistent volume storage and maybe also upload to cloud every once in a while, that may be a good candidate for using a spot instance. If it get's interrupted, you can resume from your latest checkpoint. This strategy allows you to get your training done for much cheaper, but it may take you longer to complete your training in real time.
On-demand instances are better for interactive workloads or cases where time is of the essence. No one wants to be interrupted in the middle of their flow if you're experimenting in a Jupyter notebook!
To summarize, use spot instances when things are well automated, or when the workload just isn't that important and you can take a gamble. Use on-demand instances if you need the guarantee that your work won't be stopped.