RunPod Blog (Page 5)

RunPod Blog

Sign in Subscribe

How Much VRAM Does Your LLM Need? A Guide to GPU Memory Requirements

How Much VRAM Does Your LLM Need? A Guide to GPU Memory Requirements

Discover how to determine the right VRAM for your Large Language Model (LLM). Learn about GPU memory requirements, model parameters, and tools to optimize your AI deployments.

Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization

Community Contribution

Benchmarking LLMs: A Deep Dive into Local Deployment and Performance Optimization

I just love the idea of running an LLM locally. It has huge implications for data security and the ability to use AI on private datasets. Get your company’s DevOps teams some real GPU servers as soon as possible. Benchmarking LLM performance has been a blast, and I’ve

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

AMD MI300X vs. Nvidia H100 SXM: Performance Comparison on Mixtral 8x7B Inference

There’s no denying Nvidia's historical dominance when it comes to AI training and inference. Nearly all production AI workloads run on their graphics cards. However, there’s been some optimism recently around AMD, seeing as the MI300X, their intended competitor to Nvidia's H100, is strictly

Partnering with Defined AI to bridge the data wealth gap

Partnering with Defined AI to bridge the data wealth gap

RunPod is dedicated to democratizing access to AI development and bridging the data wealth gap. Alongside Defined.ai, the world’s largest ethical AI training data marketplace, RunPod launched a pilot program to give startups access to enterprise-grade datasets for training SOTA models. The Genesis of Collaboration To build SOTA

Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)

Language Models

Run Larger LLMs on RunPod Serverless Than Ever Before - Llama-3 70B (and beyond!)

Up until now, RunPod has only supported using a single GPU in Serverless, with the exception of using two 48GB cards (which honestly didn't help, given the overhead involved in multi-GPU setups for LLMs.) You were effectively limited to what you could fit in 80GB, so you would

Introduction to vLLM and PagedAttention

Introduction to vLLM and PagedAttention

What is vLLM? vLLM is an open-source LLM inference and serving engine that utilizes a novel memory allocation algorithm called PagedAttention. It can run your models with up to 24x higher throughput than HuggingFace Transformers (HF) and up to 3.5x higher throughput than HuggingFace Text Generation Inference (TGI). How

Announcing RunPod's New Serverless CPU Feature

Announcing RunPod's New Serverless CPU Feature

We are thrilled to introduce the latest addition to the RunPod platform: Serverless CPU. This feature allows you to create high-performance VM containers with up to 3.75 GHz deviated cores, DDR5 memory, and NVME SSD storage. With Serverless CPU, you have the flexibility to choose between Compute-Optimized or General

Enable SSH Password Authentication on a RunPod Pod

Enable SSH Password Authentication on a RunPod Pod

When connecting to a RunPod Pod, a common issue is that SSH doesn't work out of the box. In this tutorial, we will examine a method of using a username and password to access a RunPod Pod through SSH. By the end of this guide, you'll

RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team

RunPod's $20MM Milestone: Fueling Our Vision, Empowering Our Team

Exciting news! RunPod has raised $20MM led by Intel Capital and Dell Technologies Capital. This boost will further our mission to revolutionize AI/ML cloud computing.

How Coframe used RunPod Serverless to Scale During their Viral Product Hunt Launch

How Coframe used RunPod Serverless to Scale During their Viral Product Hunt Launch

Coframe uses RunPod Serverless to scale inference from 0 GPUs to hundreds in minutes. With RunPod, Coframe launched their generative UI tool on Product Hunt to thousands of users in a single day without having to worry about their infrastructure failing. In under a week, Coframe was able to deploy

How KRNL AI Scaled to 10,000+ Concurrent Users while Cutting Infrastructure Costs by 65% with RunPod Serverless

How KRNL AI Scaled to 10,000+ Concurrent Users while Cutting Infrastructure Costs by 65% with RunPod Serverless

When Giacomo, Founder and CPO of KRNL, reached out to RunPod in May 2023, we weren’t actually sure if we could support his use case. They needed a provider that could cost-effectively scale up to handle hundreds of thousands of users, and scale back down to zero in minutes.

Configurable Endpoints for Deploying Large Language Models

Configurable Endpoints for Deploying Large Language Models

RunPod introduces Configurable Templates, a powerful feature that allows users to easily deploy and run any large language model. With this feature, users can provide the Hugging Face model name and customize various template parameters to create tailored endpoints for their specific needs. Why Use Configurable Templates? Configurable Templates offer

Orchestrating RunPod's Workloads Using dstack

Orchestrating RunPod's Workloads Using dstack

Today, we're announcing the integration between Runpod and dstack, an open-source orchestration engine, that aims to simplify the development, training, and deployment of AI models while leveraging the open-source ecosystem. What is dstack? While dstack shares a number of similarities with Kubernetes, it is more lightweight and focuses

Revolutionizing Real Estate: Virtual Staging AI's Success Story with RunPod

Revolutionizing Real Estate: Virtual Staging AI's Success Story with RunPod

Virtual Staging AI, an innovative startup from the Harvard Innovation Lab, is transforming the real estate industry by leveraging cutting-edge AI technology and RunPod's powerful GPU infrastructure. Their state-of-the-art solution enables realtors to virtually stage properties in just 30 seconds at a fraction of the cost of traditional

Generate Images with Stable Diffusion on RunPod

Generate Images with Stable Diffusion on RunPod

💡RunPod is hosting an AI art contest, find out more on our Discord in the #art-contest channel. In this tutorial, you will learn how to generate images using Stable Diffusion, a powerful text-to-image model, on the RunPod platform. By following the step-by-step instructions, you'll set up the prerequisites,

Announcing RunPod’s Integration with SkyPilot

Announcing RunPod’s Integration with SkyPilot

RunPod is excited to announce its latest integration with SkyPilot, an open-source framework for running LLMs, AI, and batch jobs on any cloud. This collaboration is designed to significantly enhance the efficiency and cost-effectiveness of your development process, particularly for training, fine-tuning, and deploying models. What is SkyPilot? SkyPilot is

Elevating Veterinary Care: A Customer Success Story with ScribbleVet and RunPod

Customer Success

Elevating Veterinary Care: A Customer Success Story with ScribbleVet and RunPod

Discover how ScribbleVet transformed veterinary care with RunPod's AI technology, showcasing our commitment to empowering businesses and enhancing service quality.

Introducing the A40 GPUs: Revolutionize Machine Learning with Unmatched Efficiency

Introducing the A40 GPUs: Revolutionize Machine Learning with Unmatched Efficiency

In the rapidly evolving world of artificial intelligence and machine learning, the need for powerful, cost-effective hardware has never been more critical. The launch of the A40 GPUs marks a significant milestone in this journey, offering unparalleled performance and affordability. These GPUs are designed to cater to the needs of

RunPod's Latest Innovation: Dockerless CLI for Streamlined AI Development

RunPod's Latest Innovation: Dockerless CLI for Streamlined AI Development

Discover the future of AI development with RunPod's Dockerless CLI tool. Experience seamless deployment, enhanced performance, and intuitive design, revolutionizing how you bring AI projects from concept to reality.

Embracing New Beginnings: Welcoming Banana.dev Community to RunPod

Runpod Platform

Embracing New Beginnings: Welcoming Banana.dev Community to RunPod

RunPod extends a warm welcome to the Banana.dev community, offering a supportive transition to our platform. Honoring the path paved by Banana.dev, we commit to empowering developers with innovative serverless solutions.

Maximizing AI Efficiency on a Budget: The Unbeatable Value of NVIDIA A40 and A6000 GPUs for Fine-Tuning LLMs

Maximizing AI Efficiency on a Budget: The Unbeatable Value of NVIDIA A40 and A6000 GPUs for Fine-Tuning LLMs

Harnessing Power and Economy in AI Hardware In the dynamic world of AI, the balance between cutting-edge performance and cost-effectiveness is a crucial consideration for those fine-tuning large language models (LLMs). While the allure of NVIDIA's flagship H100 and A100 GPUs is undeniable, the focus of this exploration

A Fresh Chapter in RunPod's Documentation Saga: Embracing Docusaurus for Enhanced User Experience

DocumentationOverhaul

A Fresh Chapter in RunPod's Documentation Saga: Embracing Docusaurus for Enhanced User Experience

Discover RunPod's revamped documentation, now more intuitive and user-friendly. Our recent overhaul with Docusaurus offers a seamless, engaging experience, ensuring easy access to our comprehensive GPU computing resources. Explore at docs.runpod.io

New Navigational Changes To RunPod UI

Runpod Platform

New Navigational Changes To RunPod UI

Today, we are releasing a brand new look to the RunPod control panel, resulting in saved clicks and faster navigation through the platform. A few key changes will need some attention as you get acclimated. Here's a quick rundown of what has changed! GPU Cloud Secure and Community