Current jobs related to Head of Site Reliability Engineering - London, Greater London - loveholidays


  • London, Greater London, United Kingdom Rewardgateway Full time

    Job Title: Head of Site Reliability EngineeringJob Summary:We are seeking a highly experienced Head of Site Reliability Engineering to join our team at Rewardgateway. As a key member of our infrastructure team, you will be responsible for establishing and managing our new SRE function, operating and modernising our existing cloud infrastructure, and...


  • London, Greater London, United Kingdom Rewardgateway Full time

    Job Title: Head of Site Reliability EngineeringAt Reward Gateway, we're seeking a highly skilled and experienced Head of Site Reliability Engineering to join our team. As a key member of our engineering organization, you will be responsible for establishing and managing our new SRE function, operating and modernizing our existing cloud infrastructure, and...


  • London, Greater London, United Kingdom American Institute of CPAs Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineering Lead to join our team at the American Institute of CPAs. As a key member of our digital team, you will be responsible for leading the Site Reliability Engineering team and ensuring the performance and scalability of our systems and services.Key Responsibilities:Promote automation to...


  • London, Greater London, United Kingdom Insight Global Full time

    Site Reliability Engineer OpportunityInsight Global is seeking a skilled Site Reliability Engineer to join their team in West London. As a key member of the streaming engineering group, you will be responsible for ensuring the smooth operation of linear streaming channels.The ideal candidate will have previous experience in Site Reliability Engineering, with...


  • London, Greater London, United Kingdom Insight Global Full time

    Site Reliability Engineer OpportunityInsight Global is seeking a skilled Site Reliability Engineer to join their team in West London. As a key member of the streaming engineering group, you will be responsible for ensuring the smooth operation of linear streaming channels.The ideal candidate will have previous experience in Site Reliability Engineering, with...


  • London, Greater London, United Kingdom LinuxRecruit Full time

    Unlock Your Potential as a Site Reliability EngineerWe are seeking a seasoned Site Reliability Engineer to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will work closely...


  • London, Greater London, United Kingdom LinuxRecruit Full time

    Unlock Your Potential as a Site Reliability EngineerWe are seeking a seasoned Site Reliability Engineer to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site Reliability Engineer, you will be responsible for designing, developing, and maintaining systems and applications using Golang. You will work closely...


  • London, Greater London, United Kingdom Lorien Full time

    Site Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...


  • London, Greater London, United Kingdom Lorien Full time

    Site Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...


  • London, Greater London, United Kingdom Lorien Full time

    Site Reliability Engineer / DevOps EngineerWe are seeking a skilled Site Reliability Engineer / DevOps Engineer to join our team at Lorien, a leading consultancy. The ideal candidate will have experience with Salesforce automation and a strong background in SRE / Site Reliability Engineering.Key Responsibilities:Design and implement scalable and efficient...


  • London, Greater London, United Kingdom College of Charleston Full time

    Transformative SRE Leadership OpportunityAre you a seasoned leader with a passion for strategy, leadership, and engineering excellence? Do you want to make a meaningful impact at a global financial institution? We're seeking a talented Site Reliability Engineering Manager to join our Operations and Technology Chief Information Office Business area.About the...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesWe are seeking a highly skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools to enhance and monitor our production systems.Required SkillsExcellent Python scripting skillsExperience with Version Control best practicesGood knowledge...


  • London, Greater London, United Kingdom iO Associates - UKEU Full time £500

    Job Opportunity: Site Reliability EngineeriO Associates - UK/EU is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.Project...


  • London, Greater London, United Kingdom iO Associates Full time

    Job Opportunity: Site Reliability EngineeriO Associates is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.This role offers a...


  • London, Greater London, United Kingdom iO Associates - UKEU Full time £500

    Job Opportunity: Site Reliability EngineeriO Associates - UK/EU is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.Project...


  • London, Greater London, United Kingdom iO Associates Full time

    Job Opportunity: Site Reliability EngineeriO Associates is seeking a skilled Site Reliability Engineer to join our team for a short-term project within the Law Enforcement sector.Key Responsibilities:Monitor system performance and security to ensure optimal functionality.Collaborate with our team to identify and resolve technical issues.This role offers a...


  • London, Greater London, United Kingdom LinuxRecruit Full time

    Unlock Your Potential as a Site Reliability EngineerAre you a seasoned Site Reliability Engineer looking to take your skills to the next level? Do you thrive in fast-paced environments and have a keen eye for detail? We're seeking a talented individual to join our team and contribute to the development and maintenance of our cutting-edge platform.As a Site...

Head of Site Reliability Engineering

2 months ago


London, Greater London, United Kingdom loveholidays Full time

About Us

We are a dynamic online travel agency that places technology at the forefront of our operations. Our platform facilitates millions of dream vacations each year.

With a daily influx of a million visitors, our extensive range of services manages 8,000 requests per second, while achieving a p95 search latency of 150ms. Our observability framework processes an impressive 1TB of logs daily and 350,000 metric samples every second.

We prioritize differentiation through a strong reliance on open-source solutions, actively contributing to public repositories and embracing open-source principles.

Responsibilities

As our inaugural Site Reliability Engineer, you will play a crucial role in advancing SRE methodologies such as incident management, blameless postmortems, SLOs, and error budgets. Your contributions will be vital in constructing reliable, high-performance, auto-scalable, and highly available systems, supported by our existing Platform Infrastructure team.

  • Applying our operational strategies through an SRE perspective.
  • Enhancing SRE practices across various teams.
  • Boosting the reliability KPIs of our platform.
  • Balancing reliability with feature delivery through SLOs and error budgets.

Our engineering teams are responsible for the entire lifecycle of services, from initial development to high-load production operations. Your role will be to empower engineering teams in their operational success rather than managing their services directly.

What You'll Be Working On

  • Launching our SRE function by promoting best practices and processes for reliability.
  • Identifying slow-running code paths in critical applications using tools like Java Flight Recorder or Go's pprof.
  • Developing tools or enhancing existing applications with a focus on reliability and performance.
  • Ensuring our systems and their components can handle tenfold load increases through improvements.
  • Reducing mean time to discovery and recovery by enhancing observability and alerting mechanisms.
  • Identifying system vulnerabilities through rigorous analysis.

Our runtime architecture is service-based and hosted on a robust cloud infrastructure. Our engineering teams provision and manage their services' infrastructure using modern tools and practices.

We emphasize observability, continuously refining our monitoring and alerting stack, which is currently centered around a comprehensive ecosystem. Our service mesh provides consistent observability of all production services at ten-second intervals.

Performance and scalability are core to our software and infrastructure development processes, achieved by integrating computer science fundamentals with cutting-edge cloud technologies.

Our teams are encouraged to select the most suitable tools for their tasks, programming in languages such as Java, Go, Rust, Python, JavaScript, and more.

You Should Have a Strong Understanding Of

  • Site Reliability Engineering principles.
  • Performance and scalability considerations.
  • HTTP, web services, and RESTful APIs.
  • Containerization and cloud technologies.
  • Testing, reliability, and monitoring practices.
  • Linux operating systems.
  • Low-level debugging and troubleshooting techniques.

What We'll Offer You

  • Company pension contributions at 5%.
  • A training budget to enhance your skills and knowledge.
  • Discounted holiday packages for you, your family, and friends.
  • 25 days of annual leave (plus 8 public holidays), increasing by one day for every two years of service, up to a maximum of 30 days.
  • Options to buy and sell annual leave.
  • Cycle-to-work scheme, season ticket loans, and eye care vouchers.