RunPod Blog (Page 4)

RunPod Blog

Sign in Subscribe

Stability.ai Releases Stable Diffusion 3.5 - What's New in the Latest Generation?

Stability.ai Releases Stable Diffusion 3.5 - What's New in the Latest Generation?

On October 22, Stability.AI released its latest version of Stable Diffusion, SD3.5 There are currently two versions out (Large and Large Turbo), with the former geared towards quality while the latter favoring efficiency. Next week, Medium will release, aimed at smaller GPU specs. You can quickly and easily

NVidia's Llama 3.1 Nemotron 70b Instruct: Can It Handle My Unsolved LLM Problem?

NVidia's Llama 3.1 Nemotron 70b Instruct: Can It Handle My Unsolved LLM Problem?

Earlier this month, NVidia released Llama 3.1 Nemotron Instruct, a 70b model that has taken some notably high spots on various leaderboards, seeming to punch far above its weight. As of October 14th, it is not only beating high-end closed source models that far outweigh it like Claude 3

How to Code Directly With Stable Diffusion Within Python On RunPod

How to Code Directly With Stable Diffusion Within Python On RunPod

While there are many useful front ends for prompting Stable Diffusion, in some ways it can be easier to simply it directly within Jupyter Notebook, which comes pre-installed within many RunPod templates. Once you spin up a pod you get instant access to Jupyter as well, allowing you to directly

Why LLMs Can't Spell 'Strawberry' And Other Odd Use Cases

Why LLMs Can't Spell 'Strawberry' And Other Odd Use Cases

Picture this: You've got an AI language model - let's call it Bahama-3-70b - who can write sonnets, explain quantum physics, and even crack jokes. But ask it to count the r's in "strawberry," and suddenly it's like a toddler

How to Easily Work with GGUF Quantizations In KoboldCPP

Text Generation

How to Easily Work with GGUF Quantizations In KoboldCPP

Everyone wants more bang for their buck when it comes to their business expenditures, and we want to ensure you have as many options as possible. Although you could certainly load full-weight fp16 models, it turns out that you may not actually need that level of precision, and it may

Introducing Better Launcher: Spin Up New Stable Diffusion Pods Quicker Than Before

Image Generation

Introducing Better Launcher: Spin Up New Stable Diffusion Pods Quicker Than Before

Our very own Madiator2011 has done it again with the release of Better Forge, a streamlined template that lets you spin up an instance with a minimum of fuss. One fairly consistent piece of feedback brought up by RunPod users is how long it takes to start up an image

Use RunPod Serverless To Run Very Large Language Models Securely and Privately

Use RunPod Serverless To Run Very Large Language Models Securely and Privately

As discussed previously, a human interacting with a chatbot is one of the prime use cases for RunPod serverless functions. Because the vast majority of the elapsed time is on the human's end, where they are reading, procesisng, and responding, the GPU sits idle for the vast majority

Evaluate Multiple LLMs Simultaneously in a Flash with ollama

Evaluate Multiple LLMs Simultaneously in a Flash with ollama

Imagine you are a studio manager tasked with serving up a creative writing assistant to your users, and are directed to select only a few best candidates to run on endpoints to keep the project maintainable and within scope. As of the writing of this article, there are more than

Optimize Your vLLM Deployments on RunPod with GuideLLM

Optimize Your vLLM Deployments on RunPod with GuideLLM

As a RunPod user, you're already leveraging the power of GPU cloud computing for your machine learning projects. But are you getting the most out of your vLLM deployments? Enter GuideLLM, a powerful tool that can help you evaluate and optimize your Large Language Model (LLM) deployments for

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

RunPod Weekly #17 - Pricing Updates, SGLang Worker (Beta), Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 📈 Pricing Updates We've been running a temporary promotion for A40 48GB GPUs, known for their exceptional combination of vRAM, performance, and pricing. We've been thrilled to see the amazing products

Run Gemma 7b with vLLM on RunPod Serverless

Run Gemma 7b with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Google’s Gemma 7B * A step-by-step guide to get Google Gemma 7B up and running on RunPod Serverless with the quick deploy vLLM worker.

Run Llama 3.1 with vLLM on RunPod Serverless

Run Llama 3.1 with vLLM on RunPod Serverless

In this blog, you'll learn: * About RunPod's latest vLLM worker for the newest models * Why vLLM is an excellent choice for running Meta's Llama 3.1 * A step-by-step guide to get Meta Llama 3.1's 8b-instruct version up and running on RunPod

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

RunPod Weekly #16 - Serverless Improvements, Llama 3.1 on vLLM, Better Rag Support, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: ✨ Serverless Improvements Our workers view has been revamped to give a more in-depth overview of each worker, where it's located, and it's current state. You can now also expose HTTP

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

Supercharge Your LLMs Using SGLang For Inference: Why Speed and Efficiency Matter More Than Ever

RunPod is proud to partner with LMSys once again to put a spotlight on its inference engine SGLang. LMSys has a storied history within the realm of language models with prior contributions such as the Chatbot Arena which compares outputs from competing models, Vicuna, an open source competitor to ChatGPT,

How to Run Flux Image Generator with ComfyUI

How to Run Flux Image Generator with ComfyUI

What is Flux? Flux is an innovative text-to-image AI model developed by Black Forest Labs that has quickly gained popularity among generative AI enthusiasts and digital artists. Its ability to generate high-quality images from simple text prompts sets it apart. The Flux 1 family includes three versions of their image

How to run Flux image generator with RunPod

How to run Flux image generator with RunPod

What is Flux? Flux is a new and exciting text-to-image AI model developed by Black Forest Labs. This innovative model family has quickly captured the attention of generative AI enthusiasts and digital artists alike, thanks to its remarkable ability to generate high-quality images from simple text prompts. The Flux 1family

RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs

RunPod Weekly #15 - New Referral Program, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 🤝 New Referral Program We've reworked our referral program to make it easier (and more lucrative) for anyone to get started. These changes include higher reward rates, a new serverless referral program, no

How to run SAM 2 on a cloud GPU with RunPod

How to run SAM 2 on a cloud GPU with RunPod

What is SAM 2? Meta has unveiled Segment Anything Model 2 (SAM 2), a revolutionary advancement in object segmentation. Building on the success of its predecessor, SAM 2 integrates real-time, promptable object segmentation for both images and videos, enhancing accuracy and speed. Its ability to operate across previously unseen visual

Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Run Llama 3.1 405B with Ollama: A Step-by-Step Guide

Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K)

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

Master the Art of Serverless Scaling: Optimize Performance and Costs on RunPod

In many sports – golf, baseball, tennis, among others – there is a "sweet spot" to aim for which results in the maximum amount of lift or distance for the ball given an equivalent amount of kinetic energy in the swing. While you'll still get somewhere with an

Introducing RunPod’s New and Improved Referral Program

Introducing RunPod’s New and Improved Referral Program

Referring friends to RunPod just got much easier. From now until the end of the year (December 31st, 2024), we've removed all eligibility requirements for the referral program and increased the referral commission from 2% to 3% on GPU Pods and from 0% to 5% on Serverless. No

RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs

RunPod Weekly #14 - Pricing Changes, Community Changelog, Blogs

Welcome to another round of RunPod Weekly! This week, we are excited to share the following: 💸 Pricing Changes RunPod pricing is dropping by up to -40% on Serverless and up to -18% on Secure Cloud. Why We're Doing This GPUs aren't cheap, nor is the infrastructure

How to run vLLM with RunPod Serverless

How to run vLLM with RunPod Serverless

In this blog you’ll learn: 1. When to choose between closed source LLMs like ChatGPT and open source LLMs like Llama-7b 2. How to deploy an open source LLM with vLLM If you're not familiar, vLLM is a powerful LLM inference engine that boosts performance (up to

RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod Slashes GPU Prices: Powering Your AI Applications for Less

RunPod is dropping prices across our Serverless and Secure Cloud services. Why? Because we believe in giving you the firepower you need to build applications without breaking the bank. The Lowdown on Our New Pricing Let's cut to the chase. Here's what's changing: Serverless:

RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

RAG vs. Fine-Tuning: Which Method is Best for Large Language Models (LLMs)?

Large Language Models (LLMs) have changed the way we interact with technology, powering everything from chatbots to content-generation tools. But these models often struggle with handling domain-specific prompts and new information that isn't included in their training data. So, how can we make these powerful models more adaptable?