Lead Site Reliability Engineer
13 hours ago
Own Reliability. Shape the Platform. Empower Millions.
At Holland & Barrett, we're transforming into a truly product- and platform-led technology organisation — and we're looking for a
Lead Site Reliability Engineer
who's excited by scale, complexity, and impact.
Our mission? Build and evolve the resilient, high-performance systems that power health and wellness for millions of customers. If you're obsessed with reliability, driven by automation, and thrive in high-ownership engineering cultures, this is your opportunity to lead from the front.
What You'll Lead & Deliver
Reliability & Performance at Scale
- Architect and improve cloud-native systems with reliability as a first-class principle.
- Shape SLIs/SLOs, error budgets, capacity planning, and performance strategies.
- Continuously evolve availability, efficiency, and resilience across our platforms.
Technical Leadership That Raises the Bar
- Mentor SREs, platform engineers, and developers across the organisation.
- Champion automation, observability, DevSecOps, and modern operational practices.
- Influence engineering culture and architectural direction.
Operational Excellence
- Own and lead high-severity incident response with calm, clarity, and technical depth.
- Run world-class post-incident reviews and drive meaningful, measurable improvements.
- Strengthen monitoring, alerting, on-call practices, and reliability processes.
- Support resilience validation through load testing, stress testing, and chaos engineering.
Automation, Tooling & Engineering Efficiency
- Build tools and automation that remove toil and accelerate teams.
- Develop CI/CD pipelines and Infrastructure-as-Code environments.
- Drive consistency, repeatability, and self-service across engineering.
Cross-Team Collaboration
- Partner with Security, Platform, and Engineering teams to align reliability with security and resilience goals.
- Lead teams toward better design, operational readiness, and measurable service health.
- Contribute to documentation, runbooks, and operational processes that scale.
Key requirements:
- 5–8+ years in SRE, Platform, Cloud Infrastructure, or operational engineering roles.
- Hands-on experience architecting and improving large-scale, distributed systems.
- Strong coding proficiency in Python, Go, Bash, or similar automation-focused languages.
- Expertise with observability stacks: Datadog, Prometheus, Grafana, OpenTelemetry.
- Deep AWS experience across EC2, EKS, Lambda, VPC, DynamoDB, S3, CloudFront, RDS, IAM, KMS, and more.
- Proficiency with Terraform, CloudFormation, or AWS CDK.
- Incident response leadership and root-cause analysis expertise.
- Excellent documentation and communication skills.
- Strong analytical and troubleshooting abilities.
Bonus
- Experience mentoring or leading engineers within SRE or platform teams.
- Experience with load testing, stress testing, and chaos engineering.
- A passion for uplifting engineering culture through tooling, automation, and reliability-first thinking.
Why Build the Future with Holland & Barrett?
Technology is at the heart of our mission to make health & wellness accessible to everyone. As a Lead SRE, you won't just keep systems running — you'll
design the reliability, resilience, and operational maturity
that accelerates our entire business.
We offer:
- A modern engineering culture built on autonomy, experimentation, and learning.
- The chance to create real impact across critical customer and internal platforms.
- A collaborative team that values innovation, continuous improvement, and technical excellence.
If you're ready to lead reliability for platforms with massive real-world impact, we'd love to meet you.
Apply now and help shape the future of H&B Technology.
-
Site Reliability Engineer
3 days ago
London Area, United Kingdom Amelco Limited Full timeRole:Site Reliability EngineerType:Full-time permanent roleLocation:Hybrid/ Shoreditch, London 3 days per weekAbout UsAmelco Ltdare a leading gaming and gambling solution software provider with a strong presence in the USA, UK, and Europe. Through partnerships with global gaming companies, we build cutting-edge technical platforms across sportsbooks,...
-
Site Reliability Engineer
5 days ago
London Area, United Kingdom Autonomai Recruitment Full time £120,000 - £200,000 per yearRole:Senior SRESkills:Deep Linux, Scripting - Python, DevOps, KubernetesSalary:£500k PlusLocation:LondonThe ideal candidate comes from a top-tier tech environment (FAANG, elite trading, hyperscale infra). They have experience building technology0→1, owning systems end-to-end, and working close to the metal. They will operate across everything...
-
Site Reliability Engineer Team Lead
3 days ago
London Area, United Kingdom Cornwallis Elt Full timeSite Reliability Engineer Team Lead – Leadership, Cloud, SLI/SLO, Infrastructure, Risk, Incident Management, Monitoring, Automation – Financial Services – Up to £110,000 Base + BonusMy client, a leading Private and Commercial Bank is seeking an experienced SRE Lead to join their London based team on a permanent basis.In this role, you will define and...
-
Lead Cloud Site Reliability Engineer
5 days ago
London Area, United Kingdom LSA Recruit Full time £60,000 - £120,000 per yearJob opportunity forLead Cloud Site Reliability Engineer (SRE)based inLondon, UK - Contract (SC Cleared)Job Description:Job Description –We're looking for aLead Cloud Site Reliability Engineer (SRE)with strong expertise inAzure, Kubernetes, Terraform, and GitHubto lead large-scale projects and mentor a growing team.Key ResponsibilitiesLead SRE activities...
-
Site Reliability Engineer
5 days ago
London Area, United Kingdom Xpertise Recruitment Full time £50,000 - £150,000 per yearSite Reliability Engineer (SRE) – AWSLocation:LondonSalary:£100,000 per annum + Bonus + Excellent BenefitsI am looking for an SRE for a large-scale digital organisation in the middle of a major engineering modernisation journey. This is not a BAU support role, this is a chance to help define what "good" looks like as SRE is brought fully in-house for the...
-
Site Reliability Engineer
3 days ago
London Area, United Kingdom Eutelsat Full timeConnect with EutelsatBe part of a new era in communications, transforming connectivity with Eutelsat – the world's first GEO-LEO integrated global satellite operator.As a leader in satellite communications, we provide global connectivity solutions—connecting businesses, communities, and governments around the world. Whether on land, at sea, or in the...
-
Lead Cloud Site Reliability Engineer
5 days ago
London Area, United Kingdom Response Informatics Full time £60,000 - £120,000 per yearJob Description –We're looking for aLead Cloud Site Reliability Engineer (SRE)with strong expertise inAzure, Kubernetes, Terraform, and GitHubto lead large-scale projects and mentor a growing team.Key ResponsibilitiesLead SRE activities for large-scale cloud projects, providing technical guidance to engineers.Deliver solutions across VMs and Kubernetes ,...
-
Site Reliability Engineer
3 days ago
London Area, United Kingdom Vertus Partners Full timeMy client, specialising in systematic trading is looking to hire a Site Reliability Engineer to help form a team that will be responsible for ensuring best in class reliability and performance for their low latency market making systems.This is a unique opportunity to shape the SRE function within the organisation and work on several initiatives around...
-
Lead Site Reliability Engineer
5 days ago
London Area, United Kingdom GetGround Full time £65,000 - £130,000 per yearLondon, Waterloo (Hybrid, 4 days in-office - Wednesday is our set work from home day, though you can come in on Wednesday too if you wish)We are disrupting one of the world's largest asset classes, property. With £2Bn+ assets on our platform and 30,000+ users across 70 countries, we're building the future of asset ownership and in doing so, are able to...
-
Site Reliability Engineering Manager
1 day ago
London Area, United Kingdom Corecom Consulting Full timeSRE ManagerLondon or Leeds (Hybrid) – up to £95,000MUST BE UK BASEDWe're looking for an SRE-focused Platform Manager to take full ownership of a mission-critical, cloud-native platform reshaping how the UK housing market operates. In this role, you'll lead the UK Site Reliability function, drive incident and operational excellence, and build the...