Site Reliability Engineer
5 days ago
Site Reliability Engineer
Permanent
Clapham Junction / Hybrid
This is a hybrid role with time required in our HQ in East Croydon. However, in March 2025 we will be moving to our new HQ in Clapham Junction.
As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and delightful experiences for every user by maintaining highly available, performant, and observable cloud infrastructure. You will collaborate across Development, DevOps, InfoSec, QA, and SRE teams to continuously improve system reliability, deployment strategies, and alerting infrastructure.
Key duties & responsibilities:
- Maintain and enhance monitoring, logging, and alerting systems to proactively detect and resolve potential issues across our digital channels
- Collaborate with Development, Platform/DevOps, InfoSec, QA, and SRE teams, as well as with Technical Architects and the Digital Ops Manager, to ensure reliability and observability of infrastructure and applications.
- Optimise deployment strategies and streamline recovery processes to support high availability and performance in a cloud environment.
- Build resilient, observable systems using a modern stack that includes Terraform, Kubernetes, GitHub, Azure DevOps, Service Bus, Cosmos DB, Redis, and Cloudflare.
- Support the transition to a microservices-based architecture that leverages Microsoft Azure and Azure APM, while welcoming knowledge of other cloud providers and toolchains.
- Lead continuous improvement initiatives for deployment practices, monitoring, and alerting to ensure seamless user experiences.
- Contribute to incident response strategies, including detection, communication, and swift recovery processes, without on-call obligations outside of office hours
Essential Skills:
- Can articulate core SRE principles (e.g. Golden Signals, SLIs and SLOs, SRE metrics, release engineering, blameless retrospective, process capability) and apply them in practice
- Excellent log analysis and incident triage skills
- Performance monitoring
- Dashboard creation and alerting rules management
- Shell scripting and coding (e.g. bash, powershell, python)
- Understanding of Root Cause Analysis, Fault Tree Analysis, FMEA and/or similar safety engineering and reliability engineering methods
- Expertise with DevSecOps tools and methodologies and with Infrastructure-as-Code
- Deep experience with a major public cloud platform (e.g. Azure, AWS, GCP)
- Containerisation (docker, helm, etc)
- Awareness of network security and networking protocols
- Strong general computing knowledge (e.g. hardware performance metrics, software faults modes, vulnerability patching and hardening
Desirable Skills:
- Microsoft Azure (e.g VNETs, Storage Containers, Application Gateway, APIM, App Service)
- Kubernetes
- Azure DevOps (YAML Pipelines)
- Azure Monitor, Azure Application Insights
- Terraform
- FinOps and cloud infrastructure optimisation
- Cloudflare
- Azure Active Directory / Entra ID
- Deploying and supporting Nodejs stack applications
- Deploying and supporting dotnet stack applications
- GitOps / Policy-As-Code
- Design patterns for distributed systems (e.g. event-driven, microservices)
Benefits:
- 25 days holiday + plus bank hols.
- Pension is 5% employee and either 3 or 4% employer contribution depending on which scheme they opt for (auto-enroll or staff pension)
- Share purchase plan eligibility
- Save as you earn
- Cycle to work
- Gym membership from day 1, spouse/friend membership after 6 months
- Single private medical insurance (after 6 months)
- Up to 10% performance related bonus
- Life Assurance 3x annual salary
-
Site Reliability Engineer
1 week ago
Clapham, Bedfordshire, United Kingdom The Gym Group Full timeSite Reliability Engineer Permanent Clapham Junction / Hybrid This is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office. As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable,...
-
Site Reliability Engineer
3 days ago
Clapham, Bedfordshire, United Kingdom The Gym Group Full timeSite Reliability Engineer Permanent Clapham Junction / Hybrid This is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office. As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast,...
-
Site Reliability Engineer
5 days ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with time required in our HQ in East Croydon. However, in March 2025 we will be moving to our new HQ in Clapham Junction.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and delightful experiences for every user by maintaining highly...
-
Site Reliability Engineer
5 days ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with time required in our HQ in East Croydon. However, in March 2025 we will be moving to our new HQ in Clapham Junction.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and delightful experiences for every user by maintaining highly...
-
Site Reliability Engineer
5 days ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with time required in our HQ in East Croydon. However, in March 2025 we will be moving to our new HQ in Clapham Junction.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and delightful experiences for every user by maintaining highly...
-
Site Reliability Engineer
4 days ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability Engineer Permanent Clapham Junction / Hybrid This is a hybrid role with time required in our HQ in East Croydon. However, in March 2025 we will be moving to our new HQ in Clapham Junction. As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and delightful experiences for every user by maintaining...
-
Site Reliability Engineer
1 week ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability Engineer Permanent Clapham Junction / Hybrid This is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office. As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast,...
-
Site Reliability Engineer
1 week ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and...
-
Site Reliability Engineer
1 week ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and...
-
Site Reliability Engineer
7 days ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability Engineer Permanent Clapham Junction / Hybrid This is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office. As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable,...
-
Site Reliability Engineer
1 week ago
Clapham, United Kingdom The Gym Group Full timeSite Reliability EngineerPermanentClapham Junction / HybridThis is a hybrid role with 2 days a week at our HQ in East Croydon and 3 days working from home. However, in March 2025 we will be moving to our new HQ in Clapham Junction with 3 days a week in the office.As a Site Reliability Engineer at The Gym Group (TGG), you’ll ensure fast, reliable, and...
-
Reliability Engineering Expert
4 days ago
Clapham, Bedford, United Kingdom The Gym Group Full timeAbout The Gym GroupWe're a dynamic and innovative company that's passionate about helping people achieve their fitness goals. As a Site Reliability Engineer at TGG, you'll be part of a close-knit team that's dedicated to delivering exceptional customer experiences.Job DescriptionThis is a unique opportunity to work on a variety of projects, including...
-
Reliability Maintenance Engineering Manager
2 months ago
Dunstable, Bedfordshire, United Kingdom Amazon TA Full timeAmazon Operations sits at the heart of the Amazon customer experience. Across Europe we have more than 50 Fulfillment Centers, hundreds of Delivery Stations, thousands of machines, and tens of thousands of employees, all working together in harmony to make sure the right item gets delivered to the right person, in the right place, at the right time. We are...
-
Dunstable, Bedfordshire, United Kingdom Amazon TA Full timeAmazon Operations sits at the heart of the Amazon customer experience. Across Europe we have more than 50 Fulfillment Centers, hundreds of Delivery Stations, thousands of machines, and tens of thousands of employees, all working together in harmony to make sure the right item gets delivered to the right person, in the right place, at the right time. We are...
-
Lead Reliability Engineer
4 weeks ago
Dunstable, Bedfordshire, United Kingdom Amazon TA Full timeOur Reliability Maintenance Engineering (RME) team is central to Amazons commitment to innovation. As Amazon evolves and adapts, this team makes sure that the tools and technologies we use do as well. youll help us stay one step ahead, adopting the latest technologies and attention to our processes to help maintain our high standards, and you’ll put Keep...
-
Reliability Maintenance Engineering Manager
2 months ago
Dunstable, Bedfordshire, United Kingdom Amazon TA Full timeDESCRIPTION: Amazon Operations sits at the heart of the Amazon customer experience. We look after everything from the moment a customer clicks buy, to the moment their item is delivered – from desktop to doorstep. Across Europe we have more than 50 Fulfillment Centers, hundreds of Delivery Stations, thousands of machines, and tens of thousands of...
-
Reliability Professional
3 days ago
Clapham, United Kingdom The Gym Group Full timeAbout the CompanyThe Gym Group is a leading health and fitness company operating over 200 gyms across the UK. We are committed to providing exceptional customer experiences and investing in innovative technology to support our growth.Role OverviewThis Systems Engineer position plays a critical role in ensuring the reliability and performance of our cloud...
-
Reliability and Performance Specialist
1 week ago
Clapham, Bedford, United Kingdom The Gym Group Full time**About the Role:**We are looking for a Reliability and Performance Specialist to join our SRE team at The Gym Group. As a Reliability and Performance Specialist, you will be responsible for ensuring the reliability, performance, and security of our cloud infrastructure and applications.The ideal candidate will have experience with designing, implementing,...
-
Dunstable, Bedfordshire, United Kingdom Amazon TA Full timeDESCRIPTION: Amazon Operations sits at the heart of the Amazon customer experience. We look after everything from the moment a customer clicks buy, to the moment their item is delivered – from desktop to doorstep. Across Europe we have more than 50 Fulfillment Centers, hundreds of Delivery Stations, thousands of machines, and tens of thousands of...
-
Reliability, Maintenance
4 weeks ago
Bedford, Bedfordshire, United Kingdom Amazon TA Full timeThe Reliability Maintenance Engineering (RME) team at Amazon is fundamental toour operations – they’re the ones keeping vital machinery running at alltimes. Youll help tokeep your colleagues safe and prevent machine downtime. Our RME Techniciansboost the availability and quality of our equipment and work to enhance theoperational environment too.Key job...