Reliability and Performance Specialist

4 weeks ago


London, Greater London, United Kingdom Apple Inc. Full time
What You'll Do

You will lead SRE teams responsible for the reliability and performance of on-prem and cloud-based services. This includes managing staging and production environments to maximize availability and scalability. Growing and developing engineers on your team to achieve exceptional results is also crucial for success in this role.

As a leader, you will promote observability of systems for monitoring, alerting, and metrics reporting. Advocating best practices of reliability engineering will drive excellence in our team's work. We value a strong understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts.

Additionally, experience with large-scale distributed systems, especially ML infrastructure and services, including LLMs, Generative AI, and transformers, is highly valued. A Bachelor's or Master's degree in computer science or equivalent field is also required.



  • London, Greater London, United Kingdom Huntress Talent Full time

    Site Reliability Engineer Position at Huntress TalentEstimated annual salary: £80,000 - £100,000. Job Role:We are seeking an experienced Site Reliability Engineer to join our team. As a key member of our hybrid team, you will be responsible for delivering high-quality information systems and resolving system issues.Main Responsibilities:Real-Time System...


  • London, Greater London, United Kingdom Subsea Oil and Gas Directory Full time

    **Job Overview:**We are seeking a highly skilled Reliability and Maintenance Specialist to join our team at Shell UK in Norfolk, UK.The successful candidate will be responsible for improving offshore and onshore structures performance through maintenance analysis and reliability loops.Key Job Responsibilities:Responsible for planning structural inspections...


  • London, Greater London, United Kingdom Oil And Gas Job Search Ltd Full time

    About the RoleWe are seeking a highly skilled Reliability and Safety Specialist to join our team at Oil And Gas Job Search Ltd. This exciting opportunity is available due to the company's growth in the industry.Job SummaryThe Senior Maintenance Professional will be responsible for ensuring the reliability and safety of our systems, processes, and equipment....


  • London, Greater London, United Kingdom Reliability Plus Full time

    About Reliability PlusWe are a leading provider of web-based applications for financial institutions. Our mission is to deliver innovative solutions that streamline complex workflows.Our team works closely with clients to understand their needs and develop tailored designs that meet their business objectives.


  • London, Greater London, United Kingdom BCT Resourcing Full time

    Job RequirementsEnsure system performance and reliability.The ideal candidate will have strong technical skills and experience in developing and maintaining IT infrastructure.We are looking for a highly motivated individual who can work effectively in a team environment and communicate complex technical information to non-technical stakeholders.


  • London, Greater London, United Kingdom ProTech Recruitment Ltd Full time

    Job SummaryWe are seeking a Reliability Specialist to join our team at ProTech Recruitment Ltd. The successful candidate will play a critical role in designing and executing a reliability program to minimise risks, reduce production losses, and control maintenance costs.Main ResponsibilitiesDevelop and Implement Reliability Program: Create and implement a...


  • London, Greater London, United Kingdom Experian Full time

    Reliability and Performance Expert Job DescriptionAs a Reliability and Performance Expert at Experian, you'll join our Data Quality team in London and work in a hybrid setup reporting directly to the QA Director. Your main responsibility will be to guarantee the dependability, efficiency, and expandability of our data management products.Key tasks include...


  • London, Greater London, United Kingdom Recruit Mint Full time

    We are recruiting for a Reliable Maintenance Specialist to join our team at Recruit Mint. As a key member of our maintenance department, you will be responsible for ensuring the reliability and optimal performance of our process equipment. The ideal candidate will have a strong background in mechanical maintenance, with experience in production equipment and...


  • London, Greater London, United Kingdom InfoSum Full time

    Job Description:As a Reliability and Performance Engineer at InfoSum, you will play a crucial role in ensuring the reliability, scalability, and performance of our platform. This includes designing, implementing, and maintaining cloud infrastructure components, developing and maintaining automation tools, and participating in incident response...


  • London, Greater London, United Kingdom Clarke Energy Full time

    Job Description: Reliability SpecialistLocation: Northern IrelandJob Purpose:To maximize the availability and performance of gas engines and associated systems through expert maintenance, troubleshooting, and fault resolution.Responsibilities:Perform scheduled maintenance to ensure equipment reliability and avoid unnecessary part replacements.Respond...


  • London, Greater London, United Kingdom JLL Full time

    Job OverviewJLL is seeking an experienced Cloud Reliability Specialist to support the administration and maintenance of the Datadog monitoring platform. This role focuses on ensuring the reliability, scalability, and efficiency of Datadog for monitoring and AIOps within the organization. The primary goal is to maximize application availability, performance,...


  • London, Greater London, United Kingdom Avature Full time

    Key ResponsibilitiesThe Reliability Specialist will play a crucial role in maintaining the reliability of our print machinery. This includes performing planned maintenance, responding to breakdowns, and analyzing problems to minimize machine downtime.


  • London, Greater London, United Kingdom Spectrum IT Recruitment Full time £75,000 - £85,000

    Key ResponsibilitiesIn this role, you will be responsible for maintaining and improving the reliability, performance, and scalability of our software systems. This includes analyzing issues, developing fixes, and implementing changes using cloud-based infrastructure and containerization tools.You will also work closely with our engineering team to design,...


  • London, Greater London, United Kingdom The ICE Group, LLC. Full time

    The ICE Group, LLC. is a leading provider of search-driven data analytics solutions, used by some of the world's most successful companies. We're seeking a talented Cloud Reliability Specialist to join our team in Europe.In this role, you'll be responsible for ensuring the reliability and performance of our cloud-based data analytics solution. You'll work...


  • London, Greater London, United Kingdom Shorterm Group Full time

    Role Overview:The Reliability and Availability Specialist will be responsible for ensuring the optimal performance of our Class 345 Crossrail fleet of trains. This role involves high-level fault finding electrically and mechanically, providing technical advice on train systems engineering, and maintaining warranty issues. If you have a strong background in...


  • London, Greater London, United Kingdom Tbwa ChiatDay Inc Full time

    We're seeking a talented Reliable Infrastructure Specialist to join our team at Reddit. As a key member of our Infrastructure SRE team, you'll play a critical role in ensuring the reliability and performance of our engineering platforms and services.Job Description:Advise: Collaborate with engineering teams to design and develop systems that are resilient...


  • London, Greater London, United Kingdom Google Full time

    At Google, we're building a team of talented Site Reliability Specialists to help us ensure the reliability and uptime of our services. As a member of this team, you'll have the opportunity to work on complex technical challenges and develop solutions that impact the entire company.ResponsibilitiesMonitor and analyze system performance to identify areas for...


  • London, Greater London, United Kingdom Algo Capital Group Full time

    An innovative global systematic hedge fund is seeking a System Reliability Specialist to join their Crypto Assets division. In this critical role, you will ensure the reliability and scalability of their cryptocurrency trading infrastructure.**Responsibilities:Architect and maintain high-performance infrastructure for cryptocurrency trading...


  • London, Greater London, United Kingdom EFG Full time

    EFG, a pioneer in the esports world, is looking for an exceptional System Reliability Specialist to enhance their technical capabilities. This position comes with an estimated annual salary of $150,000 - $220,000.About the Role:In this role, you will be responsible for ensuring the stability and performance of EFG's systems, including monitoring,...


  • London, Greater London, United Kingdom The Blackstone Group L.P. Full time

    The Systems Performance Specialist will play a critical role in ensuring the reliability and performance of our systems at The Blackstone Group L.P. This position involves leading the development and implementation of system performance strategies, collaborating with service owners on design and implementation, and partnering with colleagues across various...