RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following:

✨ Serverless Improvements

Our workers view has been revamped to give a more in-depth overview of each worker, where it's located, and it's current state. You can now also expose HTTP and TCP ports when going through the process of creating a serverless template.

0:00
/0:16

🦙 Llama 3.1 on vLLM

We've released a new version of our vLLM worker which now supports Llama 3.1 on serverless! Click here to deploy vLLM on RunPod serverless and get started with Llama 3.1 in minutes.

🔍 Better Rag Support

We've released an Infinity Vector Embeddings quick deploy template which supports deploying a wide range of text-embedding models and frameworks using infinity on RunPod serverless. Click here to get started with deploying an embedding.

✍️ Blogs

We're thrilled to share three new blog posts, packed with tons of valuable information.

How To Run Flux Image Generator With RunPod

Black Forest Labs' Flux is a new text-to-image AI model family featuring three versions: Pro, Dev, and Schnell. This post highlights Flux's standout features, including its speed, hybrid architecture, and superior image quality. It then provides a step-by-step guide on how to run Flux 1 Schnell on RunPod, demonstrating its accessibility and potential for AI-driven creativity in various applications.

How To Run Flux Image Generator With ComfyUI

Similar to the above blog post, this post explains Flux's standout features and provides a step-by-step guide on how to run Flux 1 Dev using ComfyUI on RunPod.

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

This blog introduces SGLang, an efficient inference engine for large language models developed by LMSys. It highlights SGLang's impressive performance, including its ability to process up to 10,000 tokens per second with certain models. This post explains how SGLang achieves its efficiency gains and provides a step-by-step guide on how to get started with SGLang on RunPod, emphasizing its potential to significantly improve response times and reduce costs in LLM applications.

Read previous editions of RunPod Weekly: RunPod Weekly #14, RunPod Weekly #15


That's all for this week's newsletter. We're constantly striving to improve our platform and services, and your feedback is invaluable in this journey. We welcome you to join our Discord server and share what you've been working on.

Thanks for being part of the RunPod community!

p.s. We're still hiring, learn more here!