About LiveKit
LiveKit is building the infrastructure layer for the agentic era of computing. Our platform gives developers everything they need to build, test, deploy, scale, and observe AI agents in production. Founded in 2021, LiveKit powers voice and agentic AI applications for OpenAI, Salesforce, Spotify, Meta, and tens of thousands of other developers, collectively facilitating billions of calls each year.
About This Role
We're hiring a Senior Infrastructure Engineer to join the small team that owns the cloud foundation all of LiveKit's real-time products run on — reliability, networking, and shared platform primitives like CockroachDB, NATS, and Nebula.
LiveKit is at 3B+ calls per year, and the reliability surface area is growing faster than team capacity. There are three buckets of work on this team: (1) Product SRE — jumping directly into the product codebase and implementing reliability goals within it: load balancing, load shedding, instrumentation, scalability, efficiency. (2) Self-service platform tooling — building frameworks and libraries so product teams can self-serve reliability without Infra as a bottleneck. (3) Reactive work — on-call, urgent feature requests, reliability debt. The goal is to minimize #3 through #1 and #2.
There is still meaningful reliability debt to work through when you join. We're being honest about that up front. The upside: small team, high ownership, and real influence over how a global real-time platform scales.
You'll Thrive Here If You:
obsess over crafting code that is fast, reliable, and practical for the problem
are known as the go-to person for tackling tough technical problems
work hard and can build and ship fast
can clearly explain complex technical concepts to other engineers
are a fast learner, frequently picking up new languages and tools
The best way to impress us is with thoughtful Issues and/or PRs on our GitHub repos.
What You'll Do
Ramp on LiveKit's global architecture — CockroachDB, NATS, Nebula, Kubernetes — and map where reliability debt lives
Ship product SRE work directly in the product codebase: load balancing, load shedding, instrumentation, scalability, efficiency
Build and extend common tooling so product teams can self-service reliability without Infra as a bottleneck
Participate in the on-call rotation and help resolve recurring reliability patterns
Bring informed systems opinions that improve how the team makes architectural decisions
Who You Are
Strong Go fluency — you write production Go, not just read it
Kubernetes at depth (internals + networking), not just ops
Comfortable working directly in product codebases, not only around them
Debug by looking behind the curtain, not just at dashboards
Track record of building or operating production distributed systems at scale
Nice to Have
Experience with CockroachDB, NATS, or similar distributed systems
Google SRE or equivalent high-scale background
Nebula, WireGuard, or similar overlay networking experience
Open source contributions, especially on infra or developer tools
Experience in realtime, audio/video, or low-latency systems
Our Commitment to You
An opportunity to build something truly impactful to the world
Contribute to open source alongside world-class engineers
Competitive salary and equity package
Health, dental, and vision benefits
Flexible vacation policy
LiveKit is an equal opportunity employer and does not discriminate on the basis of any characteristic protected by applicable law. If you require a reasonable accommodation during the application or interview process, please contact recruiting@livekit.io.
