Careers @ RunPod

Good examples of career pages:

About Us

Our mission is to deliver the best user experience in core GPU computing. Whether you're an experienced ML developer training a large language model, or an enthusiast tinkering with text-to-image models, we strive to make accessing GPU resources as seamless and affordable as possible.

Our founding team has decades of cloud architecture and machine learning experience - we're deeply familiar with the points developers face when training, benchmarking, and scaling AI models in production.

The Roles

All roles are remote. Most of our team is based in New Jersey and SF.

Everyone on our team is technical. Whether you're doing sales, operations, design, or product. Our customers are developers - everything we do is downstream from their needs. So naturally, having a strong grasp of existing GPU cloud workflows and being able to deeply resonate with the pain points of our customers is a strong plus. Being technical is not a hard requirement, but preferred.

Everyone on our team wears multiple hats. Our sales team helps manage infrastructure, our growth team ships new features based on customer feedback, our engineering team helps onboard customers, our ML team runs benchmarks and creates custom deployments for enterprise customers, etc. At this stage, we need to move quickly. That means you may have to take on a couple of functions that aren't in your job description as they come up.

We would like you to grow into a leadership position as the team scales. We currently have 9 full-time people on the team, and we anticipate scaling to ~30 in the next 6-9mo. We're looking for people with a bias towards leadership, who can hire and manage talented builders as their function within RunPod scales to dozens/hundreds of employees.

Below is a list of open positions at RunPod. After applying, we'll provide a more thorough outline of roles and responsibilities.

ML Engineer

ML Engineer with a solid research background and a track record of implementing AI best practices.

  • You can develop and deploy holistic solutions, enabling users to train new models, fine-tune existing ones, or run efficient inferences.
  • You can keep up with the latest in AI research, ensuring our tech stack remains up to par with best practices.
  • You can engage proactively with RunPod customers, grasping their challenges and consistently delivering valuable solutions.
  • You are proficient in identifying and circumventing common technical pitfalls.

FullStack Engineer

Software engineer who is able to deliver end-to-end functionality of a product from Backend (NodeJs) to Frontend (ReactJS / NextJS).

  • You have developed end-to-end features using NodeJS & ReactJS.
  • You can communicate and understand complex web architectures.
  • You have insight into the current AI landscape.

Support Engineer

Generalist engineer who can communicate with clients, knows how to debug common issues, and has general knowledge of the web.

  • You have excellent communication skills.
  • You are a problem solver by any means necessary and can engineer complex solutions.
  • You can debug logs, data sets and other sources to find root cause of issues.

Systems Engineer

Ability to solve complex problems to help accelerate AI adoption.

  • You can develop and optimize complex systems using Golang or Rust.
  • You have a proven track record of accomplishments.
  • You are a lone wolf and can work with a team when needed.
  • You strive for perfection but understand MVP delivery.
  • You can reduce container cold-starts for AI workloads.
  • You can optimize network storage to increase throughout and store LLM models at scale.
  • You can optimize container runtime for specific workloads to get the best performance.

UX Engineer

UX Engineer who specializes in pixel-perfect user interfaces.

  • You are a designer first, engineer second.
  • You can create pixel-perfect UI.
  • You can design and develop in ReactJS.
  • You have a strong understanding of worklows used for deploying and scaling AI models.

Customer Support Technician

You make sure our customers are happy and solve their problems. You have a level of technical proficiency that allows you to diagnose customer issues and report them to our engineering team. You are patient, can multi-process dozens of communication channels, and have a passion for creating an impeccable user experience. The ideal candidate will be able to :

  • Provide exceptional client service through technical written support, addressing inquiries and ensuring client satisfaction.
  • Solve client issues using your AI, GPU, and ML knowledge.
  • Adapt to changing environments, prioritize support inquiries, and foster positive relationships.
  • Be part of RunPod by exchanging knowledge daily with your coworkers and engaging with your new favorite Discord community.

The RunPod Engineering Stack

We've built RunPod atop dozens of frameworks (many of which we've written ourselves), but here are the primary stacks you'll be using:

AI: Python
RunPod website and user console: NodeJS, NextJS, GraphQL
The Cloud: Golang

The RunPod Team

Work is an incredible place when you’re working with a team of people who are relentless about the mission and energized to help each other grow in every way. Through the highs and lows, we have shared many moments of laughter, tears, jokes, and joy.

Things that make our culture what it is:

  • We all have experience building products and hacking on GPUs. Many of our founding team members ran data centers and joined RunPod after integrating their hardware onto the platform.
  • We live and breathe discord. We hate needless bureaucracy and make many of most important decisions over discord voice. Feel free to join our discord and say hi!
  • We love new product ideas. Regardless of your official role, if you have a great idea and want to see it implemented, you can always make a PR. 9/10 times, we will be all for it.
  • We are a team of intrinsically curious and ambitious people. We ask a lot of questions, move quickly, and pivot on the fly. We value a bias towards action very highly and take pride in our work.
  • Our last offsite was in New Jersey. We raced go-karts and went to Dave and Buster's.

Our Value Prop to You

  • Compensation package with sign-on bonus, company equity, and benefits. We know how rare mission-driven talent is, and we strive to reflect this through ownership and pay.
  • Environment for growth and learning. You will have the opportunity to drive great impact and gain exposure to all functions of the company. Here, you can flex multiple realms of your skillset, strategic mindset, and creativity.
  • Accelerate innovation in GPU cloud infrastructure. We are leading the change in GPU cloud infrastructure against Big Cloud and outdated systems. You’ll be able to operate in a fast-paced environment and iterate quickly.
  • An energizing, ambitious team. Our team cares deeply about each other. We strive to elevate and uplift each other in our day-to-day work to do the best for one another. We don't believe in bureaucratic nonsense.
  • Supporting your wellbeing. We provide benefits to allow you to do your best work:
  1. Remote and in-person hybrid work options. We’re based in NJ and SF.
  2. Stipend to upgrade your work-from-home setup.
  3. Unlimited paid time off (PTO).
  4. Paid company off-sites, meetups, and team bonding events. You’ll get to see everyone outside of their Zoom box.

RunPod's Founding Story

Our founding team comes from Comcast, where we lead the cloud architecture division and cut costs by 100M per year.

We founded RunPod in March 2022 with 2 core insights: 1) AI infrastructure requirements are compounding every year, and will continue to grow exponentially over the next decade. 2) There aren't any AI-native cloud service providers built specifically to accelerate training, benchmarking, and inference workflows.

Existing providers like Big Cloud (AWS, GCP, Azure) have made it incredibly costly for developers, startups, and enthusiasts to access GPU resources. We knew we wouldn't get far as a fancy wrapper on top of Big Cloud, so we built our own infrastructure from the ground up.

In the early days, we didn't have the capital required to purchase thousands of GPUs, so we turned to the AI community for support. Hundreds of GPU owners across the world deeply resonated with our mission and listed their GPUs on Community Cloud - RunPod's first on-demand cloud platform.

As we scaled our capacity to the thousands, we saw more and more users reach out about needing larger clusters, higher reliability, and extremely fast networking speeds to train foundational models and deploy them in production. So we introduced Secure Cloud to the platform - GPUs we source and manage in some of the most reliable data centers across the world. With Secure Cloud, developers can access clusters of up to 1000x GPUs with incredibly high data transfer speeds, RAID 2 redundancy, localized network volumes, and best-in-class security, all at a 50%+ lower rate than Big Cloud.

Since then, we've built Serverless - autoscaling architecture that abstracts away all of the devops expertise required to scale infrastructure up and down for inference. We also launched Flashboot, our cache architecture that allows for <250ms P70 cold start times on hundreds of models.

We have lots of cool stuff on our product roadmap, and we're excited to bring on engineers who can help shape RunPod into the world's best platform for building and scaling AI.