Talking to AI, at Human Scale: How Scatterlab Powers 1,000+ RPS with RunPod

Learn how Scatterlab scaled to 1,000+ requests per second using RunPod to deliver real-time AI conversations at half the cost of hyperscalers.

Talking to AI, at Human Scale: How Scatterlab Powers 1,000+ RPS with RunPod

When you fall into a deep conversation with an AI character, the last thing on your mind is GPU architecture. But for the team at Scatterlab, keeping those conversations alive—fluid, responsive, and emotionally resonant—takes hundreds of GPUs running in perfect sync.

Their flagship platform, Zeta, is a place where people can become the main character in a story and talk to AI characters like they’re real. Available in Korea and Japan, Zeta doesn’t just simulate conversation—it sparks connection. And the numbers back that up: within its first year, Zeta grew to 2.1 million cumulative active users, with users spending over 2 hours per day on average chatting with AI personas. That kind of engagement isn’t just impressive—it’s rare, surpassing even platforms like TikTok and YouTube.

But building intimacy at scale comes with its own set of challenges. And for Scatterlab, the biggest one wasn’t user growth. It was infrastructure.


The Problem: A Ceiling at the Peak of Growth

Behind every AI conversation on Zeta is a large language model processing inputs in real time. That means a massive LLM serving infrastructure, capable of handling thousands of simultaneous interactions with low latency and high reliability.

To make that happen, Scatterlab needed hundreds of GPUs running live—every day, around the clock. But when they turned to the major cloud providers to expand, they hit a wall.

“Even with access to AWS, GCP, and Azure, we couldn’t get the GPUs we needed,” one team member shared. “It felt like hitting a ceiling just as we were trying to soar.”

It wasn’t just the scarcity of GPUs that posed a problem. The cost of available instances was often prohibitively high, putting the economics of their rapidly growing business at risk. At the exact moment they needed to scale their infrastructure to meet overwhelming user demand, they were stuck navigating quota limitations and hardware shortages.

And the clock was ticking.


The Solution: Scaling Without Friction

Scatterlab responded with a thoughtful shift in strategy: rearchitecting their system to support a multi-cloud GPU deployment model. Instead of depending on one provider, they would draw resources from multiple sources—and orchestrate it all themselves.

A critical component of that shift was RunPod.

“By leveraging RunPod’s APIs, we were able to dynamically scale the number of GPU cloud servers according to the live service load,” the team explained. “It allowed us to serve our large-scale infrastructure at nearly half the cost compared to major cloud providers.”

This wasn’t a backup plan—it was an upgrade. With RunPod’s stable and affordable GPU resources, Scatterlab gained the ability to allocate exactly the right number of GPUs at exactly the right time, adapting their capacity in real time as user demand fluctuated.

Integration was seamless, thanks to RunPod’s developer-friendly APIs. Autoscaling was built into their workflow. What had once been a bottleneck became a flexible, reliable foundation for real-time inference at scale.


The Results: Conversations That Keep Up With Demand

The impact was immediate. With RunPod in the mix, Scatterlab’s LLM infrastructure can now handle over 1,000 requests per second—serving users at scale without compromising on speed or quality.

And the cost savings are real: compared to hyperscalers, RunPod cut infrastructure spend by nearly 50%, giving the team both the breathing room and the confidence to keep growing.

“RunPod gave us a way forward when everything else felt stuck,” one team member reflected. “It’s not just a vendor—it’s part of how we operate.”

Thanks to dynamic autoscaling, Scatterlab doesn’t have to over-provision or guess. Their GPU fleet adjusts live, expanding and contracting with user demand. That means better performance, lower latency, and a more reliable experience for every user—even during peak usage windows.


Looking Ahead: Infrastructure That Enables Emotion

Scatterlab isn’t just scaling conversations—they’re reimagining what digital interaction can feel like. As Zeta evolves, the stakes get higher: more users, more nuanced conversations, and new markets to reach.

With RunPod, they’ve built the infrastructure that lets them dream bigger.

“Without RunPod, sustaining our business would have been extremely difficult. We truly appreciate it.”

It’s a partnership built not just on compute power, but on shared values—agility, accessibility, and the belief that great infrastructure should empower people, not constrain them.

And while most users will never see the servers behind their conversations, they’ll feel the difference in every seamless, responsive moment with their favorite AI character.