Reliability Engineer
3 weeks ago
About xAI's Distributed Systems Team
The xAI London team is a team of software engineers with a focus on building high-quality, scalable and reliable distributed systems. Our team works on various levels of the stack, from build systems and production backend infrastructure to frontend development. We focus on solving complex problems the right way and aren't afraid to delve into technically challenging topics to achieve high-quality software.
About the Role
We're looking for an experienced Site Reliability Engineer to join our dynamic team. The main responsibilities for this role are:
- Improving our observability by adding or adjusting metrics
- Building easily parsable dashboards using monitoring technologies such as Prometheus and Grafana
- Designing and overseeing on-call rotations to ensure high system availability
- Improving our deployment process to increase system reliability
An ideal candidate should have at least the following qualifications:
- Expert knowledge of at least one programming language that compiles to machine code, such as Rust, C++ or Go
- Expert knowledge of monitoring technologies
- Expert knowledge of deployment technologies, such as Pulumi or Terraform
- Expert knowledge of Kubernetes
Location
The role is based in our London office near Piccadilly Circus underground station. We work from the office 5 days a week, but allow for work-from-home days when needed. Candidates must be willing to attend late meetings to coordinate with our team in California and participate in semi-regular business trips to California.
Interview Process
After submitting your application, our team reviews your CV and statement of exceptional work. If your application passes this stage, you'll be invited to a 15-minute interview, followed by a series of technical interviews, including a coding interview, monitoring & deployment design interview, distributed systems design interview and a presentation about your most difficult technical problems solved.
Benefits
- Competitive cash-based compensation
- xAI equity
- Private health and dental insurance
- Unlimited time off subject to prior approval
-
London, Greater London, United Kingdom AVT Reliability Ltd Full timeAbout AVT Reliability LtdWe are a leading company in the field of asset integrity and reliability. Our team is passionate about delivering high-quality services to our clients.Job SummaryThis is an exciting opportunity for a talented engineering graduate to join our Asset Integrity Division as a specialist. You will be responsible for supporting a diverse...
-
Reliability Engineer
1 month ago
London, Greater London, United Kingdom newscientist - Jobboard Full timeWest London (hybrid working)We are seeking a skilled Reliability Engineer to join our team and contribute to the development of our RAMS engineering capabilities.The successful candidate will have a strong background in reliability engineering and a passion for ensuring the safety and maintainability of complex systems.Key Responsibilities:Develop and...
-
Reliability Engineering Lead
2 weeks ago
London, Greater London, United Kingdom Victrex Full timeSenior Reliability Engineer RoleAbout the JobWe are seeking an experienced Senior Reliability Engineer to lead our asset management strategy and drive improvements in plant performance across all UK plants.Job SummaryThe successful candidate will be responsible for developing and implementing systems and procedures that enhance safety, asset availability,...
-
Reliability Engineering Manager
1 month ago
London, Greater London, United Kingdom AWE Full timeAWE is seeking a Reliability Engineering Manager to lead the delivery of engineering services across the lifecycle of an asset. The successful candidate will have a background in reliability engineering management or maintenance and alteration of plant-based engineering projects.Key responsibilities include:Providing leadership to Maintenance & Reliability...
-
Reliability Systems Engineer
1 month ago
London, Greater London, United Kingdom https:www.energyjobline.comsitemap Full timeReliability Systems EngineerAWE is seeking a skilled Reliability Systems Engineer to join our Dependability Team. The successful candidate will provide specialist systems engineering support to technically challenging projects, ensuring the reliability and availability of our core products.The ideal candidate will have a deep understanding of dependability...
-
Reliability Engineer
2 days ago
London, Greater London, United Kingdom Florida Crystals ASR Group Full timeDESCRIPTIONS2: Job Overview">As a Maintenance Engineer at Tate & Lyle Sugars, you will be responsible for maintaining the efficiency and reliability of our plant and equipment.">Responsibilities">Perform routine maintenance tasks to prevent equipment failure and downtime.Conduct root cause analysis to identify and resolve equipment issues.Develop and...
-
Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom The Sterling Choice Full timeAbout the Role:As a Planned Maintenance Engineer at The Sterling Choice, you will play a pivotal role in ensuring the reliability and performance of equipment in our production facility. You will develop and implement maintenance strategies, conduct root cause analysis, and optimize preventive maintenance to reduce downtime and boost productivity.Key...
-
Digital Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom Viasat Full timeJob Title: Digital Reliability EngineerJob Summary: We are seeking a Digital Reliability Engineer to join our platform team at Viasat. The successful candidate will be responsible for ensuring the reliability and resilience of our cloud-based systems.Lead the design and implementation of cloud-based solutions to enhance platform reliability and...
-
Cloud Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom 83zero Full timeJob Description:We are seeking a skilled Cloud Reliability Engineer to join our team at 83zero, a global leader in digital services. As a Cloud Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and efficiency of our clients' platforms.Your Responsibilities:Ensure the reliability, scalability, and efficiency of clients'...
-
**Reliability Engineering Specialist**
3 weeks ago
London, Greater London, United Kingdom Cutover Full timeCutover is a pioneering enterprise that has developed the world's first work orchestration and observability platform, enabling seamless collaboration between humans and machines.We're looking for a skilled Reliability Engineer to join our team and ensure the robustness and performance of our Cutover Enterprise platform.The platform features a ReactJS...
-
Site Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesAs a Site Reliability Engineer at Fourier, you will be responsible for designing and implementing tools to enhance the reliability and resilience of our production systems. This includes investigating failures, improving system performance, and automating manual processes.Required SkillsExcellent Python scripting skillsExperience with...
-
Site Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Curve Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our team at Curve. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our infrastructure, identifying areas for improvement, and implementing solutions to optimize our systems.Key responsibilities include:Collaborating with...
-
Cloud Reliability Engineer
2 days ago
London, Greater London, United Kingdom GoCardless Full timeAbout the RoleWe are seeking an experienced Cloud Reliability Engineer to join our distributed team at GoCardless. As a key member of our engineering team, you will be responsible for designing and implementing scalable and reliable infrastructure solutions.With a strong interest in infrastructure management and site reliability engineering, you will...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom Fourier Full timeKey ResponsibilitiesWe are seeking a skilled Site Reliability Engineer to join our team at Fourier. As a member of our Site Reliability Engineering team, you will be responsible for developing tools for surveillance and enhancement of our production systems.Key responsibilities include increasing system resilience, investigating failure, and improving...
-
Senior Reliability Engineer
4 weeks ago
London, Greater London, United Kingdom Trevett Project Services Full timeJob Title: Senior Reliability EngineerJob Summary:We are seeking a Senior Reliability Engineer to join our team at Trevett Project Services. As a key member of our maintenance operations team, you will be responsible for ensuring the reliability and efficiency of our equipment and systems.Key Responsibilities:Provide technical support to engineers and...
-
Site Reliability Engineer
1 month ago
London, Greater London, United Kingdom ESL FACEIT Group Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at ESL FACEIT Group. As a key member of our infrastructure team, you will be responsible for designing, analyzing, and troubleshooting large-scale distributed systems.As a Site Reliability Engineer, you will work closely with our software engineering teams to deploy and...
-
Site Reliability Engineering Manager
1 month ago
London, Greater London, United Kingdom Apple Full timeAbout the RoleWe're seeking a seasoned Site Reliability Engineering Manager to lead our team of engineers responsible for the reliability and performance of our on-prem and cloud-based services. As a key member of our Apple Services Engineering team, you will be responsible for managing staging and production environments, promoting observability of systems,...
-
Reliability Maintenance Engineer
1 month ago
London, Greater London, United Kingdom Amazon UK Services Ltd. Full timeAmazon UK Services Ltd. is seeking a skilled Reliability Maintenance Engineering Technician to join our team. As a key member of our Reliability Maintenance Engineering (RME) team, you will play a vital role in maintaining the reliability and efficiency of our equipment and workspaces.**Key Responsibilities:**- Perform proactive and preventative maintenance...
-
Senior Reliability Engineer
1 month ago
London, Greater London, United Kingdom Transport for London Full timeJob Title: Senior Reliability EngineerAbout the Role:We are seeking a highly skilled Senior Reliability Engineer to join our team at Transport for London. As a key member of our RAM Engineering team, you will be responsible for leading the specification and delivery of Reliability, Availability, and Maintainability (RAM) activities to achieve the operational...
-
Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom Selby Jennings Full timeAbout Selby JenningsWe're a leading global financial services firm where technologists and investment professionals collaborate to drive innovation and operational excellence.About the RoleAs a Site Reliability Engineer, you'll apply your expertise in software and systems engineering to design, build, and maintain our robust infrastructure. You'll reduce...