RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following:

๐Ÿ“ˆ Pricing Updates

We've been running a temporary promotion for A40 48GB GPUs, known for their exceptional combination of vRAM, performance, and pricing. We've been thrilled to see the amazing products that people have used A40s to build.

We'll be slowly increasing the pricing of A40s to market rate going forward, on September 1st A40s will increase to $0.39/hour. This is your chance to secure the current lower rate!

Why Act Now?

By committing to a savings plan today, you can lock in the current low rates and maximize your savings over the long term. Our long-term plans are designed to keep your costs down, allowing you to continue building without interruption.

๐Ÿงช SGLang Worker (Beta)

SGLang is a fast serving framework for large language and vision models. It makes your interactions with models faster and more controllable by co-designing the backend runtime and frontend language. In some cases, SGLang outperforms vLLM, with up to 3.1x higher throughput on Llama-70B.

We've published Docker images for the SGLang worker under the tag runpod/worker-sglang:preview-cuda12.1.0. These are preview images because SGLang is still in active development and undergoing frequent changes.

As SGLang stabilizes and we are more confident in its readiness for production use, we'll add it as a quick deploy template on serverless.

โœ๏ธ Blogs

We're thrilled to share two new blog posts, packed with tons of valuable information.

Run Llama 3.1 with vLLM on RunPod Serverless

This blog introduces RunPod's latest vLLM worker for deploying Meta's Llama 3.1 model, specifically focusing on the 8B instruct version. It explains why vLLM is an excellent choice for running Llama 3.1, highlighting its superior speed and wide-ranging model support. This post provides a comprehensive step-by-step guide on how to deploy Llama 3.1 8B with vLLM on RunPod Serverless, including instructions for testing the model using both RunPod's Web UI and Google Colab. It emphasizes the ease of setup and the potential for high-performance language model inference, making it accessible for various applications.

Run Gemma 7B with vLLM on RunPod Serverless

This blog introduces RunPod's latest vLLM worker for deploying Google's Gemma 7B model. It explains why vLLM is an excellent choice for running Gemma 7B, highlighting its speed and extensive model support. This post provides a detailed step-by-step guide on how to deploy Gemma 7B with vLLM on RunPod Serverless, including instructions for testing the model using both RunPod's Web UI and Google Colab. It emphasizes the ease of setup and the potential for high-performance language model inference.

Read previous editions of RunPod Weekly: RunPod Weekly #16, RunPod Weekly #15, and RunPod Weekly #14.


That's all for this week's newsletter. We're constantly striving to improve our platform and services, and your feedback is invaluable in this journey. We welcome you to join our Discord server and share what you've been working on.

Thanks for being part of the RunPod community!

p.s. We're still hiring, learn more here!