Moritz Wallawitsch - RunPod Blog

RunPod Blog

Sign in Subscribe

Moritz Wallawitsch

Introduction to vLLM and PagedAttention

Introduction to vLLM and PagedAttention

What is vLLM? vLLM is an open-source LLM inference and serving engine that utilizes a novel memory allocation algorithm called PagedAttention. It can run your models with up to 24x higher throughput than HuggingFace Transformers (HF) and up to 3.5x higher throughput than HuggingFace Text Generation Inference (TGI). How