RunPod Blog
  • RunPod
  • Docs
Sign in Subscribe
RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs
RunPod Weekly

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 📈 Pricing Updates We've been running a temporary promotion for A40 48GB GPUs, known for their exceptional combination of vRAM, performance, and pricing. We've been thrilled to see the amazing products
30 Aug 2024 3 min read
Run Gemma 7b with vLLM on RunPod Serverless

Run Gemma 7b with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Google’s Gemma 7B * A step-by-step guide to get Google Gemma 7B up and running on RunPod Serverless with the quick deploy vLLM worker.
22 Aug 2024 5 min read
Run Llama 3.1 with vLLM on RunPod Serverless

Run Llama 3.1 with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Meta's Llama 3.1 * A step-by-step guide to get Meta Llama 3.1's 8b-instruct version up and running on RunPod
20 Aug 2024 7 min read
RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs
RunPod Weekly

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: ✨ Serverless Improvements Our workers view has been revamped to give a more in-depth overview of each worker, where it's located, and it's current state. You can now also expose HTTP
16 Aug 2024 2 min read
Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

RunPod is proud to partner with LMSys once again to put a spotlight on its inference engine SGLang. LMSys has a storied history within the realm of language models with prior contributions such as the Chatbot Arena which compares outputs from competing models, Vicuna, an open source competitor to ChatGPT,
15 Aug 2024 6 min read
How to Run Flux Image Generator with ComfyUI

How to Run Flux Image Generator with ComfyUI

What is Flux? Flux is an innovative text-to-image AI model developed by Black Forest Labs that has quickly gained popularity among generative AI enthusiasts and digital artists. Its ability to generate high-quality images from simple text prompts sets it apart. The Flux 1 family includes three versions of their image
13 Aug 2024 5 min read
How to run Flux image generator with RunPod

How to run Flux image generator with RunPod

What is Flux? Flux is a new and exciting text-to-image AI model developed by Black Forest Labs. This innovative model family has quickly captured the attention of generative AI enthusiasts and digital artists alike, thanks to its remarkable ability to generate high-quality images from simple text prompts. The Flux 1family
08 Aug 2024 6 min read
RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs
RunPod Weekly

RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 🤝 New Referral Program We've reworked our referral program to make it easier (and more lucrative) for anyone to get started. These changes include higher reward rates, a new serverless referral program, no
02 Aug 2024 3 min read
How to run SAM 2 on a cloud GPU with RunPod

How to run SAM 2 on a cloud GPU with RunPod

What is SAM 2? Meta has unveiled Segment Anything Model 2 (SAM 2), a revolutionary advancement in object segmentation. Building on the success of its predecessor, SAM 2 integrates real-time, promptable object segmentation for both images and videos, enhancing accuracy and speed. Its ability to operate across previously unseen visual
02 Aug 2024 6 min read
Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K)
29 Jul 2024 5 min read
Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

In many sports – golf, baseball, tennis, among others – there is a "sweet spot" to aim for which results in the maximum amount of lift or distance for the ball given an equivalent amount of kinetic energy in the swing. While you'll still get somewhere with an
25 Jul 2024 7 min read
Introducing RunPod’s New and Improved Referral Program

Introducing RunPod’s New and Improved Referral Program

Referring friends to RunPod just got much easier. From now until the end of the year (December 31st, 2024), we've removed all eligibility requirements for the referral program and increased the referral commission from 2% to 3% on GPU Pods and from 0% to 5% on Serverless. No
23 Jul 2024 2 min read
RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs
RunPod Weekly

RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 💸 Pricing Changes RunPod pricing is dropping by up to -40% on Serverless and up to -18% on Secure Cloud. Why We're Doing This GPUs aren't cheap, nor is the infrastructure
19 Jul 2024 3 min read
How to run vLLM with RunPod Serverless

How to run vLLM with RunPod Serverless

In this blog you’ll learn: 1. When to choose between closed source LLMs like ChatGPT and open source LLMs like Llama-7b 2. How to deploy an open source LLM with vLLM If you're not familiar, vLLM is a powerful LLM inference engine that boosts performance (up to
18 Jul 2024 5 min read
RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod is dropping prices across our Serverless and Secure Cloud services. Why? Because we believe in giving you the firepower you need to build applications without breaking the bank. The Lowdown on Our New Pricing Let's cut to the chase. Here's what's changing: Serverless:
12 Jul 2024 3 min read
RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

Large Language Models (LLMs) have changed the way we interact with technology, powering everything from chatbots to content-generation tools. But these models often struggle with handling domain-specific prompts and new information that isn't included in their training data. So, how can we make these powerful models more adaptable?
11 Jul 2024 8 min read
How Much VRAM Does Your LLM Need? A Guide to GPU Memory Requirements
GPU Power

How Much VRAM Does Your LLM Need? A Guide to GPU Memory Requirements

Discover how to determine the right VRAM for your Large Language Model (LLM). Learn about GPU memory requirements, model parameters, and tools to optimize your AI deployments.
08 Jul 2024 5 min read
Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization
Community Contribution

Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization

I just love the idea of running an LLM locally. It has huge implications for data security and the ability to use AI on private datasets. Get your company’s DevOps teams some real GPU servers as soon as possible. Benchmarking LLM performance has been a blast, and I’ve
04 Jul 2024 5 min read
AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

There’s no denying Nvidia's historical dominance when it comes to AI training and inference. Nearly all production AI workloads run on their graphics cards. However, there’s been some optimism recently around AMD, seeing as the MI300X, their intended competitor to Nvidia's H100, is strictly
01 Jul 2024 7 min read
Partnering with Defined AI to bridge the data wealth gap

Partnering with Defined AI to bridge the data wealth gap

RunPod is dedicated to democratizing access to AI development and bridging the data wealth gap. Alongside Defined.ai, the world’s largest ethical AI training data marketplace, RunPod launched a pilot program to give startups access to enterprise-grade datasets for training SOTA models. The Genesis of Collaboration To build SOTA
17 Jun 2024 3 min read
Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)
Language Models

Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)

Up until now, RunPod has only supported using a single GPU in Serverless, with the exception of using two 48GB cards (which honestly didn't help, given the overhead involved in multi-GPU setups for LLMs.) You were effectively limited to what you could fit in 80GB, so you would
06 Jun 2024 3 min read
Introduction to vLLM and PagedAttention

Introduction to vLLM and PagedAttention

What is vLLM? vLLM is an open-source LLM inference and serving engine that utilizes a novel memory allocation algorithm called PagedAttention. It can run your models with up to 24x higher throughput than HuggingFace Transformers (HF) and up to 3.5x higher throughput than HuggingFace Text Generation Inference (TGI). How
31 May 2024 11 min read
Announcing RunPod's New Serverless CPU Feature

Announcing RunPod's New Serverless CPU Feature

We are thrilled to introduce the latest addition to the RunPod platform: Serverless CPU. This feature allows you to create high-performance VM containers with up to 3.75 GHz deviated cores, DDR5 memory, and NVME SSD storage. With Serverless CPU, you have the flexibility to choose between Compute-Optimized or General
28 May 2024 2 min read
Enable SSH Password Authentication on a RunPod Pod

Enable SSH Password Authentication on a RunPod Pod

When connecting to a RunPod Pod, a common issue is that SSH doesn't work out of the box. In this tutorial, we will examine a method of using a username and password to access a RunPod Pod through SSH. By the end of this guide, you'll
16 May 2024 2 min read
RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team
Featured

RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team

Exciting news! RunPod has raised $20MM led by Intel Capital and Dell Technologies Capital. This boost will further our mission to revolutionize AI/ML cloud computing.
08 May 2024 4 min read
← Newer Posts Page 4 of 9 Older Posts →
RunPod Blog © 2025
  • Sign up
Powered by Ghost