Current jobs related to Head of Site Reliability Engineering - London, Greater London - loveholidays
-
Head of Site Reliability Engineering
7 days ago
London, Greater London, United Kingdom Rewardgateway Full timeJob Title: Head of Site Reliability EngineeringJob Summary:We are seeking a highly experienced Head of Site Reliability Engineering to join our team at Rewardgateway. As a key member of our infrastructure team, you will be responsible for establishing and managing our new SRE function, operating and modernising our existing cloud infrastructure, and...
-
Head of Site Reliability Engineering
1 week ago
London, Greater London, United Kingdom Rewardgateway Full timeJob Title: Head of Site Reliability EngineeringAt Reward Gateway, we're seeking a highly skilled and experienced Head of Site Reliability Engineering to join our team. As a key member of our engineering organization, you will be responsible for establishing and managing our new SRE function, operating and modernizing our existing cloud infrastructure, and...
-
Digital Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom American Institute of CPAs Full timeAbout the Role:We are seeking a highly skilled Site Reliability Engineering Lead to join our team at the American Institute of CPAs. As a key member of our digital team, you will be responsible for leading the Site Reliability Engineering team and ensuring the performance and scalability of our systems and services.Key Responsibilities:Promote automation to...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Insight Global Full timeSite Reliability Engineer OpportunityInsight Global is seeking a skilled Site Reliability Engineer to join their team in West London. As a key member of the streaming engineering group, you will be responsible for ensuring the smooth operation of linear streaming channels.The ideal candidate will have previous experience in Site Reliability Engineering, with...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Insight Global Full timeSite Reliability Engineer OpportunityInsight Global is seeking a skilled Site Reliability Engineer to join their team in West London. As a key member of the streaming engineering group, you will be responsible for ensuring the smooth operation of linear streaming channels.The ideal candidate will have previous experience in Site Reliability Engineering, with...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom LinuxRecruit Full timeUnlock Your Potential as a Site Reliability EngineerWe are seeking a seasoned Site Reliability Engineer to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will work closely...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom LinuxRecruit Full timeUnlock Your Potential as a Site Reliability EngineerWe are seeking a seasoned Site Reliability Engineer to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will work closely...
-
Site Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom Lorien Full timeSite Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Lorien Full timeSite Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...
-
Site Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom Lorien Full timeSite Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...
-
Site Reliability Engineering Manager
2 weeks ago
London, Greater London, United Kingdom College of Charleston Full timeTransformative SRE Leadership OpportunityAre you a seasoned leader with a passion for strategy, leadership, and engineering excellence? Do you want to make a meaningful impact at a global financial institution? We're seeking a talented Site Reliability Engineering Manager to join our Operations and Technology Chief Information Office Business area.About the...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom iO Associates - UKEU Full time £500Job Opportunity: Site Reliability EngineeriO Associates - UK/EU is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.Project...
-
Site Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom iO Associates Full timeJob Opportunity: Site Reliability EngineeriO Associates is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.This role offers a...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom iO Associates - UKEU Full time £500Job Opportunity: Site Reliability EngineeriO Associates - UK/EU is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.Project...
-
Site Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom iO Associates Full timeJob Opportunity: Site Reliability EngineeriO Associates is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.This role offers a...
-
Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom LinuxRecruit Full timeUnlock Your Potential as a Site Reliability EngineerAre you a seasoned Site Reliability Engineer looking to take your skills to the next level? Do you thrive in fast-paced environments and have a keen eye for detail? We're seeking a talented individual to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site...
Head of Site Reliability Engineering
2 months ago
About Us
We are a dynamic online travel agency that places technology at the forefront of our operations. Our platform facilitates millions of dream vacations each year.
With a daily influx of a million visitors, our extensive range of services manages 8,000 requests per second, while achieving a p95 search latency of 150ms. Our observability framework processes an impressive 1TB of logs daily and 350,000 metric samples every second.
We prioritize differentiation through a strong reliance on open-source solutions, actively contributing to public repositories and embracing open-source principles.
Responsibilities
As our inaugural Site Reliability Engineer, you will play a crucial role in advancing SRE methodologies such as incident management, blameless postmortems, SLOs, and error budgets. Your contributions will be vital in constructing reliable, high-performance, auto-scalable, and highly available systems, supported by our existing Platform Infrastructure team.
- Applying our operational strategies through an SRE perspective.
- Enhancing SRE practices across various teams.
- Boosting the reliability KPIs of our platform.
- Balancing reliability with feature delivery through SLOs and error budgets.
Our engineering teams are responsible for the entire lifecycle of services, from initial development to high-load production operations. Your role will be to empower engineering teams in their operational success rather than managing their services directly.
What You'll Be Working On
- Launching our SRE function by promoting best practices and processes for reliability.
- Identifying slow-running code paths in critical applications using tools like Java Flight Recorder or Go's pprof.
- Developing tools or enhancing existing applications with a focus on reliability and performance.
- Ensuring our systems and their components can handle tenfold load increases through improvements.
- Reducing mean time to discovery and recovery by enhancing observability and alerting mechanisms.
- Identifying system vulnerabilities through rigorous analysis.
Our runtime architecture is service-based and hosted on a robust cloud infrastructure. Our engineering teams provision and manage their services' infrastructure using modern tools and practices.
We emphasize observability, continuously refining our monitoring and alerting stack, which is currently centered around a comprehensive ecosystem. Our service mesh provides consistent observability of all production services at ten-second intervals.
Performance and scalability are core to our software and infrastructure development processes, achieved by integrating computer science fundamentals with cutting-edge cloud technologies.
Our teams are encouraged to select the most suitable tools for their tasks, programming in languages such as Java, Go, Rust, Python, JavaScript, and more.
You Should Have a Strong Understanding Of
- Site Reliability Engineering principles.
- Performance and scalability considerations.
- HTTP, web services, and RESTful APIs.
- Containerization and cloud technologies.
- Testing, reliability, and monitoring practices.
- Linux operating systems.
- Low-level debugging and troubleshooting techniques.
What We'll Offer You
- Company pension contributions at 5%.
- A training budget to enhance your skills and knowledge.
- Discounted holiday packages for you, your family, and friends.
- 25 days of annual leave (plus 8 public holidays), increasing by one day for every two years of service, up to a maximum of 30 days.
- Options to buy and sell annual leave.
- Cycle-to-work scheme, season ticket loans, and eye care vouchers.