Reliable Systems Engineer

6 days ago


London, Greater London, United Kingdom loveholidays Full time

We are a rapidly growing online travel agency with technology at the heart of our success.

In 2022, we sent millions of people on their dream holiday. With a million visitors a day, our 100+ services handle 8k requests per second, while maintaining p95 search latency of 150ms.

You will contribute to building reliable, performant, auto-scalable, and highly available systems, with the support of the existing Platform Infrastructure team.

  • Help balance reliability with feature delivery using SLOs and error budgets.Our engineering teams own the lifecycle of services from first commit to high-load operation in production. Your responsibility will be to help engineering teams succeed at operations, not to run their services for them.

Exposing slow running code paths in critical applications using tools like Java Flight Recorder or Go's pprof.Writing tools or modifying existing applications with reliability and performance in mind.Exposing system weaknesses with proactive analysis.We place a strong focus on observability, continually evolving our monitoring and alerting stack, currently centred around the cloud ecosystem. Our service mesh provides uniform observability of all production services at 10s intervals.

Technical Skills
  • HTTP, web services, REST
  • Testing, reliability, monitoring
  • Linux

Salary: £80,000 - £110,000 per annum depending on experience.Benefits include company pension contributions at 5%, training budget for you to learn on the job and level yourself up, discounted holidays for you, your family, and friends, 25 days of holidays per annum (plus 8 public holidays), ability to buy and sell annual leave, cycle to work scheme, season ticket loan, and eye care vouchers.



  • London, Greater London, United Kingdom Google Full time

    Job DescriptionAs a System Reliability Engineer at Google, you will play a critical role in ensuring the reliability and scalability of our systems. You will work closely with cross-functional teams to design, deploy, and operate large-scale systems that are fault-tolerant and highly available. Your expertise will help us build and maintain infrastructure...


  • London, Greater London, United Kingdom FactSet Full time

    About the RoleWe are seeking a highly skilled Senior System Reliability Engineer to join our team at FactSet. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure.The ideal candidate will possess a strong background in coding, automation,...


  • London, Greater London, United Kingdom Trade Nation Full time

    Site Reliability Engineer Job DescriptionAt Trade Nation, we're seeking a highly skilled Site Reliability Engineer to join our dynamic team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and reliable systems that ensure high availability and performance.Key ResponsibilitiesDesign and Implement...


  • London, Greater London, United Kingdom Trade Nation Full time

    Trade Nation is a dynamic and collaborative team that ensures the reliability, availability, and performance of our web services and applications. As a Site Reliability Engineer at Trade Nation, you will work closely with developers, operations, and product teams to design, build, and maintain scalable, secure, and efficient systems.About YouWe are looking...


  • London, Greater London, United Kingdom loveholidays Full time

    About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at LoveHolidays. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and performance of our systems, which handle millions of users and thousands of requests per second.Our runtime architecture is Service Based and hosted on cloud...


  • London, Greater London, United Kingdom Amazon Full time

    Veeqo is an innovative company that helps high-growth ecommerce businesses build efficient inventory and fulfillment operations. We're seeking a talented DevOps Engineer to join our team and help us improve our system's resilience and security.About the JobAs a DevOps Engineer, you will work closely with multiple teams to build tooling and infrastructure...


  • London, Greater London, United Kingdom Apexon Full time

    Job OpportunityAt Apexon, we specialize in accelerating business transformation and delivering digital experiences. We are a technology services firm accelerating digital transformation and delivering human-centric experiences. With expertise in AI, analytics, app development, cloud, commerce, CX, data, DevOps, IoT, mobile, quality engineering, UX, and more,...


  • London, Greater London, United Kingdom LinuxRecruit Full time

    Company OverviewWe are a leading technology company, celebrated globally for our cutting-edge solutions and unparalleled user base. Our culture of excellence, collaboration, and continuous learning drives us to innovate and push boundaries.Salary: $120,000 - $180,000 per year (dependent on experience)Job Description: As a Site Reliability Engineer, you will...


  • London, Greater London, United Kingdom CloudFlare Full time

    About CloudflareCloudflare is a leading edge technology company that helps build a better Internet. Our mission is to protect and accelerate any Internet application online without adding hardware, installing software, or changing a line of code.Job DescriptionWe are seeking a highly skilled System Reliability Engineer to join our team. As an SRE, you will...


  • London, Greater London, United Kingdom Amazon TA Full time

    Job DescriptionThe Robotics Systems Engineer, Reliability and Automation Engineering Team is a critical role within Amazon, responsible for implementing and continuously improving world-class maintenance, repair, and supportability solutions for the Amazon Robotics portfolio of automated fulfillment and sortation systems.This position will work closely with...


  • London, Greater London, United Kingdom Marshall Aerospace and Defence Group Full time

    Reliable Refrigeration Systems EngineerAn exciting opportunity has become available for a Mobile Refrigeration Systems Specialist to work remotely covering South West/Wales.This role would suit an experienced Transport Refrigeration Engineer, although consideration would be given to skilled Motor Vehicle Technicians looking to get into the Transport...


  • London, Greater London, United Kingdom Anson McCade Full time

    About the RoleSpearhead the development of a cutting-edge Network Automation platform as a Site Reliability Engineer for Anson McCade. Collaborate with the elite Infrastructure and Cloud Operations team to architect scalable, automated infrastructure solutions.As a Site Reliability Engineer, you will be responsible for designing and deploying infrastructure...


  • London, Greater London, United Kingdom Viasat Full time

    Job Title: Digital Reliability EngineerJob Summary: We are seeking a Digital Reliability Engineer to join our platform team at Viasat. The successful candidate will be responsible for ensuring the reliability and resilience of our cloud-based systems.Lead the design and implementation of cloud-based solutions to enhance platform reliability and...


  • London, Greater London, United Kingdom Apple Inc. Full time

    About the Opportunity: We're looking for an experienced Site Reliability Engineer to join our Wallet & Payments Engineering team. You will play a key role in designing, implementing, and deploying high-scalability solutions for critical payment systems.What We Offer:A competitive salary ($170,000 - $200,000 per year) and comprehensive benefits package.The...

  • Reliability Engineer

    4 weeks ago


    London, Greater London, United Kingdom xAI Full time

    About xAI's Distributed Systems TeamThe xAI London team is a team of software engineers with a focus on building high-quality, scalable and reliable distributed systems. Our team works on various levels of the stack, from build systems and production backend infrastructure to frontend development. We focus on solving complex problems the right way and aren't...


  • London, Greater London, United Kingdom Apple Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. The successful candidate will be responsible for developing and running automated services that provide extraordinary availability, scalability, and security for our customers.This is a full-time position that involves understanding the team's roadmap, taking...


  • London, Greater London, United Kingdom ClearScore Full time

    About UsClearScore is a leading fintech company that prioritizes productivity, reliability, and efficiency. We're expanding our Site Reliability Engineering team to support the growth of our internal developer platform, which requires three nines of uptime for critical services.Estimated Salary: £80,000 - £110,000 per annumThe RoleWe're seeking...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesAs a Site Reliability Engineer at Fourier, you will be responsible for designing and implementing tools to enhance the reliability and resilience of our production systems. This includes investigating failures, improving system performance, and automating manual processes.Required SkillsExcellent Python scripting skillsExperience with...


  • London, Greater London, United Kingdom undisclosed Full time

    Role OverviewAt our company, we're pushing the boundaries of trading technology. As a member of our Trading Engineering team, you'll play a critical role in ensuring our systems operate with high availability, stability, and performance. You'll work closely with traders and technologists to develop and implement innovative solutions that drive our business...

  • Reliability Engineer

    3 weeks ago


    London, Greater London, United Kingdom Butterworths Limited Company Full time

    About the RoleYou will be a key member of our team responsible for ensuring the reliability and performance of our systems.As a Senior Site Reliability Engineer at Butterworths Limited Company, you will play a critical role in designing, implementing, and maintaining monitoring tools and processes to ensure continuous tracking of system performance,...