RunPod Blog

RunPod Blog

Sign in Subscribe

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

In many sports – golf, baseball, tennis, among others – there is a "sweet spot" to aim for which results in the maximum amount of lift or distance for the ball given an equivalent amount of kinetic energy in the swing. While you'll still get somewhere with an

Introducing RunPod’s New and Improved Referral Program

Introducing RunPod’s New and Improved Referral Program

Referring friends to RunPod just got much easier. From now until the end of the year (December 31st, 2024), we've removed all eligibility requirements for the referral program and increased the referral commission from 2% to 3% on GPU Pods and from 0% to 5% on Serverless. No

RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs

RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 💸 Pricing Changes RunPod pricing is dropping by up to -40% on Serverless and up to -18% on Secure Cloud. Why We're Doing This GPUs aren't cheap, nor is the infrastructure

How to run vLLM with RunPod Serverless

How to run vLLM with RunPod Serverless

In this blog you’ll learn: 1. When to choose between closed source LLMs like ChatGPT and open source LLMs like Llama-7b 2. How to deploy an open source LLM with vLLM If you're not familiar, vLLM is a powerful LLM inference engine that boosts performance (up to

RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod is dropping prices across our Serverless and Secure Cloud services. Why? Because we believe in giving you the firepower you need to build applications without breaking the bank. The Lowdown on Our New Pricing Let's cut to the chase. Here's what's changing: Serverless:

RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

Large Language Models (LLMs) have changed the way we interact with technology, powering everything from chatbots to content-generation tools. But these models often struggle with handling domain-specific prompts and new information that isn't included in their training data. So, how can we make these powerful models more adaptable?

Understanding VRAM and how Much Your LLM Needs

Understanding VRAM and how Much Your LLM Needs

You found an open-source large language model (LLM) model you want to deploy, but aren't sure which GPU you should run your model with. Each GPU has a "VRAM" size, but what does that really mean? You could just play it safe and pick the highest

Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization

Community Contribution

Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization

I just love the idea of running an LLM locally. It has huge implications for data security and the ability to use AI on private datasets. Get your company’s DevOps teams some real GPU servers as soon as possible. Benchmarking LLM performance has been a blast, and I’ve

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

There’s no denying Nvidia's historical dominance when it comes to AI training and inference. Nearly all production AI workloads run on their graphics cards. However, there’s been some optimism recently around AMD, seeing as the MI300X, their intended competitor to Nvidia's H100, is strictly

Partnering with Defined AI to bridge the data wealth gap

Partnering with Defined AI to bridge the data wealth gap

RunPod is dedicated to democratizing access to AI development and bridging the data wealth gap. Alongside Defined.ai, the world’s largest ethical AI training data marketplace, RunPod launched a pilot program to give startups access to enterprise-grade datasets for training SOTA models. The Genesis of Collaboration To build SOTA

Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)

Language Models

Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)

Up until now, RunPod has only supported using a single GPU in Serverless, with the exception of using two 48GB cards (which honestly didn't help, given the overhead involved in multi-GPU setups for LLMs.) You were effectively limited to what you could fit in 80GB, so you would

Introduction to vLLM and PagedAttention

Introduction to vLLM and PagedAttention

What is vLLM? vLLM is an open-source LLM inference and serving engine that utilizes a novel memory allocation algorithm called PagedAttention. It can run your models with up to 24x higher throughput than HuggingFace Transformers (HF) and up to 3.5x higher throughput than HuggingFace Text Generation Inference (TGI). How

Announcing RunPod's New Serverless CPU Feature

Announcing RunPod's New Serverless CPU Feature

We are thrilled to introduce the latest addition to the RunPod platform: Serverless CPU. This feature allows you to create high-performance VM containers with up to 3.75 GHz deviated cores, DDR5 memory, and NVME SSD storage. With Serverless CPU, you have the flexibility to choose between Compute-Optimized or General

Enable SSH Password Authentication on a RunPod Pod

Enable SSH Password Authentication on a RunPod Pod

When connecting to a RunPod Pod, a common issue is that SSH doesn't work out of the box. In this tutorial, we will examine a method of using a username and password to access a RunPod Pod through SSH. By the end of this guide, you'll

RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team

RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team

Exciting news! RunPod has raised $20MM led by Intel Capital and Dell Technologies Capital. This boost will further our mission to revolutionize AI/ML cloud computing.

How Coframe used RunPod Serverless to Scale During their Viral Product Hunt Launch

How Coframe used RunPod Serverless to Scale During their Viral Product Hunt Launch

Coframe uses RunPod Serverless to scale inference from 0 GPUs to hundreds in minutes. With RunPod, Coframe launched their generative UI tool on Product Hunt to thousands of users in a single day without having to worry about their infrastructure failing. In under a week, Coframe was able to deploy

How KRNL AI Scaled to 10,000+ Concurrent Users while Cutting Infrastructure Costs by 65% with RunPod Serverless

How KRNL AI Scaled to 10,000+ Concurrent Users while Cutting Infrastructure Costs by 65% with RunPod Serverless

When Giacomo, Founder and CPO of KRNL, reached out to RunPod in May 2023, we weren’t actually sure if we could support his use case. They needed a provider that could cost-effectively scale up to handle hundreds of thousands of users, and scale back down to zero in minutes.

Configurable Endpoints for Deploying Large Language Models

Configurable Endpoints for Deploying Large Language Models

RunPod introduces Configurable Templates, a powerful feature that allows users to easily deploy and run any large language model. With this feature, users can provide the Hugging Face model name and customize various template parameters to create tailored endpoints for their specific needs. Why Use Configurable Templates? Configurable Templates offer

Orchestrating RunPod's Workloads Using dstack

Orchestrating RunPod's Workloads Using dstack

Today, we're announcing the integration between Runpod and dstack, an open-source orchestration engine, that aims to simplify the development, training, and deployment of AI models while leveraging the open-source ecosystem. What is dstack? While dstack shares a number of similarities with Kubernetes, it is more lightweight and focuses

Revolutionizing Real Estate: Virtual Staging AI's Success Story with RunPod

Revolutionizing Real Estate: Virtual Staging AI's Success Story with RunPod

Virtual Staging AI, an innovative startup from the Harvard Innovation Lab, is transforming the real estate industry by leveraging cutting-edge AI technology and RunPod's powerful GPU infrastructure. Their state-of-the-art solution enables realtors to virtually stage properties in just 30 seconds at a fraction of the cost of traditional

Generate Images with Stable Diffusion on RunPod

💡RunPod is hosting an AI art contest, find out more on our Discord in the #art-contest channel. In this tutorial, you will learn how to generate images using Stable Diffusion, a powerful text-to-image model, on the RunPod platform. By following the step-by-step instructions, you'll set up the prerequisites,

Announcing RunPod’s Integration with SkyPilot

Announcing RunPod’s Integration with SkyPilot

RunPod is excited to announce its latest integration with SkyPilot, an open-source framework for running LLMs, AI, and batch jobs on any cloud. This collaboration is designed to significantly enhance the efficiency and cost-effectiveness of your development process, particularly for training, fine-tuning, and deploying models. What is SkyPilot? SkyPilot is

Elevating Veterinary Care: A Customer Success Story with ScribbleVet and RunPod

Customer Success

Elevating Veterinary Care: A Customer Success Story with ScribbleVet and RunPod

Discover how ScribbleVet transformed veterinary care with RunPod's AI technology, showcasing our commitment to empowering businesses and enhancing service quality.

Introducing the A40 GPUs: Revolutionize Machine Learning with Unmatched Efficiency

Introducing the A40 GPUs: Revolutionize Machine Learning with Unmatched Efficiency

In the rapidly evolving world of artificial intelligence and machine learning, the need for powerful, cost-effective hardware has never been more critical. The launch of the A40 GPUs marks a significant milestone in this journey, offering unparalleled performance and affordability. These GPUs are designed to cater to the needs of