Lead Site Reliability Engineer

3 months ago


London, United Kingdom loveholidays Full time

About us

We are a rapidly growing online travel agency with technology at the heart of our success. In 2022, we sent millions of people on their dream holiday.

With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms. Our observability captures and processes 1TB of logs a day and 350k metric samples a second.

We focus on differentiation by relying heavily on open source, while also giving back through contributions to public repositories, open sourcing and .

Responsibilities

As our first Site Reliability Engineer, you will contribute to the evolution of SRE practices like incident management, blameless postmortems, SLOs and error budgets. You will contribute to building reliable, performant, auto-scalable and highly available systems. You will have support of the existing Platform Infrastructure team.

The application of our through a SRE lens. Leveling up of SRE practices across the teams. Improvement of reliability KPIs of the platform. Help balance reliability with feature delivery using SLOs and error budgets.

Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.

What you'll be working on

Kick-start our SRE function by evangelising reliability best practices and processes. Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go’s pprof. Writing tools or modifying existing applications with reliability and performance in mind. Ensuring our systems and their individual components can withstand x10 load by improving our . Shortening mean time to discovery and recovery with improvements to observability and alerting. Exposing system weaknesses with .

Our runtime architecture is Service Based and hosted on . Our engineering teams provision and manage their services' infrastructure with using and .

We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the (), , , ecosystem. Our service mesh () provides uniform observability of all production services at 10s intervals.

Performance and scalability are integral to our software and infrastructure development process, achieved by combining Computer Science fundamentals and cutting edge cloud technologies.

Our teams are encouraged to use the right tool for the right job. We program in Java, Go, Rust, Python, JavaScript and others.

You should have a good understanding of

principles. Performance, scalability. HTTP, web services, REST. Containers, cloud. Testing, reliability, monitoring. Linux. Low-level debugging and troubleshooting.

What we'll give back to you

Company pension contributions at 5%. Training budget for you to learn on the job and level yourself up. Discounted holidays for you, your family and friends. 25 days of holidays per annum (plus 8 public holidays) increases by 1 day for every second year of service, up to a maximum 30 days per annum. Ability to buy and sell annual leave. Cycle to work scheme, season ticket loan and eye care vouchers.

  • London, United Kingdom Robert Walters Workforce Consultancy Full time

    LEAD SITE RELIABILITY ENGINEERSalary: £100,000 + 5% bonusLocation: London, hybrid working with 2 days per week in the officeWe have an exciting new opportunity for a Lead Site Reliability Engineer to join Robert Walters as a Workforce Consultant. As an Employed Workforce Consultant, you will benefit from permanent employment with Resource Solutions and will...


  • City of London, United Kingdom Robert Walters Workforce Consultancy Full time

    LEAD SITE RELIABILITY ENGINEERSalary: £100,000 + 5% bonusLocation: London, hybrid working with 2 days per week in the officeWe have an exciting new opportunity for a Lead Site Reliability Engineer to join Robert Walters as a Workforce Consultant. As an Employed Workforce Consultant, you will benefit from permanent employment with Resource Solutions and will...


  • London, Greater London, United Kingdom GoCardless Full time

    About GoCardless:At GoCardless, we are committed to revolutionizing the payment landscape by leveraging bank payments as the most efficient means for both sending and receiving funds. We also recognize the significant role of bank account data in enabling faster and more informed decision-making. Our mission is to streamline the utilization of bank payments...


  • London, United Kingdom Robert Walters Workforce Consultancy Full time

    LEAD SITE RELIABILITY ENGINEER - 6-month contractSalary: £700 - £800 (Inside IR35)Location: London, hybrid working with 2 days per week in the officeOur client is seeking a Lead Site Reliability Engineer to support an exciting project focusing on the visibility of products and applications. You will be at the heart of product optimisation, improving...


  • London, United Kingdom Bright Purple Full time

    Site Reliability Engineer London - Hybrid (3 Days onsite) Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn. What is in it for you: Hybrid working arrangements with global offices Generous holiday allowance Private healthcare Professional and personal growth The role: DevOps or Site...


  • London, United Kingdom Bright Purple Full time

    Site Reliability Engineer London - Hybrid (3 Days onsite)Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn.What is in it for you:Salary up to £75,000including equity in the companyHybrid working arrangements with global offices28 days holiday allowancePrivate healthcareProfessional and personal growthThe...


  • London, Greater London, United Kingdom Legal & General Full time

    About the RoleWe are seeking a seasoned Site Reliability Engineer to join our team at Legal & General. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our systems, working closely with development, architecture, and service management teams.Key ResponsibilitiesSystem Reliability and Scalability:...


  • London, Greater London, United Kingdom RemoteStar Full time

    Remote Senior Site Reliability Engineer LeadRemoteStar is seeking a highly skilled Remote Senior Site Reliability Engineer Lead to join our client's team in the UK. This is a fully remote work opportunity.The client is a leading B2B diamond and gemstones marketplace, connecting jewellery retailers to gemstone suppliers.Job SummaryAs the SRE Lead, you will...


  • London, Greater London, United Kingdom Legal & General Full time

    About the RoleWe are seeking a seasoned Site Reliability Engineer to join our team at Legal & General. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our systems, working closely with development, architecture, and service management teams.Key ResponsibilitiesSystem Reliability and Scalability:...


  • London, United Kingdom The Hyde Group Full time

    Hyde is looking to recruit a Site Reliability Engineer. Hyde is one of the UK’s leading and award-winning providers of affordable homes in London, the South-East, and surrounding areas. We provide and manage 50,000 homes for over 100,000 customers. Our ethos is simple: by providing customers with a safe and decent home, we enable them to realise their...


  • City of London, United Kingdom Robert Walters Workforce Consultancy Full time

    Job OverviewPosition: Lead Site Reliability Engineer - Contract RoleCompensation: £700 - £800 (Inside IR35)Work Arrangement: Hybrid model with in-office presence requiredOur organization is in search of a Lead Site Reliability Engineer to drive a pivotal initiative aimed at enhancing the visibility of our products and applications.In this role, you will...


  • London, Greater London, United Kingdom loveholidays Full time

    Company OverviewAt loveholidays, we are a dynamic online travel agency dedicated to utilizing innovative technology to enhance our services. Our goal is to facilitate unforgettable travel experiences for countless individuals each year.Position SummaryWe are in search of a skilled Site Reliability Engineer to become a vital member of our Platform...


  • London, United Kingdom TEKsystems Full time

    I am currently looking for a Site reliability engineer with experience on Azure, Terraform, Kubernetes and migration experience Terraform AKS kubernetes GitOps migration Azure Site reliability engineer cloud on-prem If you're a Site reliability engineer then please apply Job Title: Site Reliability Engineering Location: London, UK ...


  • London, Greater London, United Kingdom Opus Recruitment Solutions Full time

    Site Reliability Engineer | Remote | Competitive SalaryCloud Computing | DevOps | Google Cloud Platform | Amazon Web Services | Kubernetes | Infrastructure | SRE | ELK StackWe are collaborating with a dynamic online retail company seeking to enhance their technical team by adding a Site Reliability Engineer. This role focuses on managing the reliability and...


  • London, United Kingdom Matchtech Full time

    My client is seeking a skilled Site Reliability Engineer to join their dynamic team on a contract basis. Located in London with remote working options, this role involves designing, building, and configuring applications to meet business process and application requirements within the technology sector. Key Responsibilities: Design, build and configure...


  • London, United Kingdom Bright Purple Full time

    Site Reliability Engineer – London - Hybrid (3 Days onsite)Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn.What is in it for you:Salary up to £75,000 including equity in the companyHybrid working arrangements with global officesGenerous holiday allowancePrivate healthcareProfessional and personal growthThe...


  • London,, UK, United Kingdom Bright Purple Full time

    Site Reliability Engineer – London - Hybrid (3 Days onsite)Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn.What is in it for you:Salary up to £75,000 including equity in the companyHybrid working arrangements with global officesGenerous holiday allowancePrivate healthcareProfessional and personal growthThe...


  • London, United Kingdom Bright Purple Full time

    Site Reliability Engineer – London - Hybrid (3 Days onsite)Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn.What is in it for you:Salary up to £75,000 including equity in the companyHybrid working arrangements with global officesGenerous holiday allowancePrivate healthcareProfessional and personal growthThe...


  • London, United Kingdom Bright Purple Full time €75,000

    Site Reliability Engineer – London - Hybrid (3 Days onsite) Step into a role that promises not just a job, but a rewarding career with a leading tech unicorn. What is in it for you: Salary up to £75,000 including equity in the company Hybrid working arrangements with global offices Generous holiday allowance Private healthcare Professional and...


  • London, United Kingdom The Hyde Group Full time

    Site Reliability Engineer London Bridge (2 days office-based, 3 days remote working) Up to £56,000 Hyde is looking to recruit a Site Reliability Engineer. Hyde is one of the UK’s leading and award-winning providers of affordable homes in London, the South-East, and surrounding areas. We provide and manage 50,000 homes for over 100,000 customers. Our ethos...