James Sandy - RunPod Blog

RunPod Blog

Sign in Subscribe

James Sandy

How to Fine-Tune LLMs with Axolotl on RunPod

How to Fine-Tune LLMs with Axolotl on RunPod

Learn how to fine-tune large language models (LLMs) using Axolotl on RunPod. This step-by-step guide covers setup, configuration, and training with LoRA, 8-bit quantization, and DeepSpeed—all on scalable GPU infrastructure.

Cost-effective Computing with Autoscaling on RunPod

Runpod Platform

Cost-effective Computing with Autoscaling on RunPod

Learn how RunPod helps you autoscale AI workloads for both training and inference. Explore Pods vs. Serverless, cost-saving strategies, and real-world examples of dynamic resource management for efficient, high-performance compute.

Deploying Multimodal Models on RunPod

Deploying Multimodal Models on RunPod

Multimodal AI models integrate various types of data, such as text, images, audio, or video, to allow tasks such as image-text retrieval, video question answering, or speech-to-text. Examples are CLIP, BLIP, and Flamingo, among others, showing what is possible by combining these modes–(but deploying them presents unique challenges including

Serverless for Artificial Intelligence and Machine Learning Workloads

Serverless for Artificial Intelligence and Machine Learning Workloads

The need to upscale, reduce operational overhead, and bring cost efficiency allows serverless computing to revolutionize AI/ML workloads. Scaling often results in expensive cost management and hardware maintenance that becomes unbearable with traditional infrastructure. RunPod dynamically allocates resources in these instances to work seamlessly with modern AI workflows. This

How Much Can a GPU Cloud Save You, Really?

How Much Can a GPU Cloud Save You, Really?

Machine learning, AI, and data science workloads rely on powerful GPUs to run effectively, so organizations are deciding to either invest in on-prem GPU clusters or use cloud-based GPU solutions like RunPod. This article will show considerations of infrastructure requirements and compare the cost and performance to help you choose

Comparing Different Quantization Methods: Speed Versus Quality Tradeoffs

Comparing Different Quantization Methods: Speed Versus Quality Tradeoffs

Introduction Quantization is a key technique in machine learning that is used to reduce the model size and speed up inference, especially when deploying models on hardware with resource constraints. Nevertheless, achieving a good quantization setup means balancing the model performance against the computational efficiency required by the deployment environment.