Why this role
As our Director of Site Reliability Engineering, reporting to our VP of Platform Engineering, you'll own the core infrastructure layers that everything at Doctolib runs on: cloud infrastructure, database operations, network infrastructure, and observability. You will also lead the Doctolib Operations Center (DOC) and drive a decisive shift from reactive operations to a proactive, world-class reliability culture.
This is a rare opportunity to shape the infrastructure backbone of Europe's leading healthtech company, at a moment when Doctolib is actively expanding multi-cloud capabilities, scaling to new countries, and building the reliability culture that will define the next decade of healthcare innovation.
Why this is an extraordinary challenge
-
Real stakes, every day. When Doctolib is down, consultations don't happen, diagnoses are delayed, care journeys are interrupted. The infrastructure you build is a direct lever on patient outcomes — in a world where 8 of the top 10 causes of death in Europe are preventable.
-
A once-in-a-generation platform transition. Multi-cloud, monolith modularisation, international expansion — all happening simultaneously. You won't inherit a finished platform. You'll define what it becomes.
-
Reliability as the competitive moat. As we scale AI health companions, automate clinical workflows, and launch across Europe, the speed and resilience of the platform directly determines how fast 700+ engineers can ship innovations that change healthcare.
-
A cultural build, not just a technical one. The incident response culture, observability standards, and operational ownership model you establish here will shape how Doctolib engineers work for years to come.
What you'll do
- Build and run a world-class SRE org of 25+ engineers across Cloud Infrastructure, Database & Storage, Network Infrastructure, Observability Tooling, and the Doctolib Operations Center
- Own the infrastructure strategy and roadmap — cloud, database, network, observability — and deliver against company OKRs
- Lead the Doctolib Operations Center: set incident response standards, drive MTTR reduction, embed blameless post-mortem culture across engineering
- Architect and execute our multi-cloud strategy — reducing vendor lock-in, cutting migration costs, and enabling international expansion
- Own network infrastructure at scale: load balancing, CDN/WAF, VPCs, peering, zero-trust networking across a high-traffic, multi-country platform
- Drive observability as a product — give 700+ engineers true visibility into system health and turn observability maturity into an operational excellence lever
- Lead from the front as a senior technical voice in the Platform org and broader Tech leadership team
Who you are
- 12+ years in software engineering, including 5+ years leading managers and running infrastructure or SRE organisations at scale
- Track record of taking SRE practices from reactive to proactive — with measurable reductions in incidents and MTTR
- Strong multi-cloud and network infrastructure experience: load balancing, CDN/WAF, VPCs, peering, at high-traffic scale
- Deep database operations background: large-scale transactional systems (PostgreSQL, Aurora), streaming/CDC (Kafka), data layer FinOps
- Experience building observability platforms that give teams genuine visibility — metrics, logs, traces, alerting
- Sharp process thinking: SLOs, error budgets, incident management, blameless post-mortems
- Outcome-driven: you track reliability, cost efficiency, and engineering velocity as business metrics, not just technical ones
- Strong communicator and influencer at executive level — equally credible with senior engineers and business stakeholders
- Builder of high-performing, people-first engineering cultures
- Fluent in English; comfortable in fast-paced, international environments
- You recognise yourself in our playbook values
Bonus Points If You Have…
- Experience in healthcare, regulated, or high-compliance industries (HDS, ISO 27001, SOC2, GDPR, data sovereignty)
- Familiarity with our stack: Ruby on Rails, Node.js, Go, Python, React, AWS, GCP, Kubernetes, PostgreSQL, Datadog, GitHub Actions
- French language proficiency
- Experience with AI-augmented infrastructure tooling or ML platform operations
- M&A or post-acquisition infrastructure integration experience
What we offer
- Free comprehensive health insurance for you and your children
- Parent Care Program: receive additional leave on top of the legal parental leave
- Free mental health and coaching services through our partner Moka.care
- For caregivers and workers with disabilities, a package including an adaptation of the remote policy, extra days off for medical reasons, and psychological support
- Work from abroad for up to 10 days per year thanks to our flexibility days policy
- Work Council subsidy to refund part of sport club membership or creative class
- Up to 14 days of RTT
- A subsidy from the work council to refund part of the membership to a sport club or a creative class
- Lunch voucher with Swile card
