Reliability Engineer

1 week ago


London, Greater London, United Kingdom Luma AI Full time £80,000 - £120,000 per year
The Opportunity At Luma AI, "full-stack" has a distinct meaning. It means understanding everything from the generative model down to the silicon it runs on. We are pushing the physical limits of current hardware to train Omni models that understand the world. This requires a level of engineering rigor that standard cloud environments simply do not demand. We are looking for engineers who are tired of high-level abstractions and want to work on the metal that powers the AI revolution.
Where You Come In You will operate at the jagged edge where software meets hardware. Standard cloud providers abstract away the complexity; we embrace it. You will be responsible for maximizing efficiency from our heterogeneous fleet of NVIDIA and AMD accelerators. This role is about precision, performance, and the relentless pursuit of system optimization in a multi-vendor supercomputing environment.
What You Will BuildThe Bare Metal Stack: Manage and optimize the lifecycle of bare-metal servers, ensuring that our OS, drivers, and firmware are tuned for peak AI performance.High-Throughput Interconnects: Engineer the software configurations for our InfiniBand and RoCE fabrics, solving the intricate data movement challenges that define modern distributed training.Performance Diagnostics: Build the tooling to visualize what is happening inside the cluster, turning opaque hardware counters into actionable signals for debugging latency and throughput.
The Profile We Are Looking ForLow-Level Fluency: You are not afraid of the kernel. You understand interrupts, memory management, and how the OS interacts with peripheral devices.Hardware Curiosity: You understand that software doesn't run in a vacuum. You are interested in the physical constraints of GPUs, networking cards, and storage subsystems.First-Principles Reasoning: When a system behaves unexpectedly, you don't just restart it; you investigate the physics of the failure to ensure it is solved permanently.

  • London, Greater London, United Kingdom JLL Full time

    JLL empowers you to shape a brighter way.  Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology for our clients. We are committed to hiring the best, most talented people  and empowering them to  thrive, grow meaningful careers and to find a place where...


  • London, Greater London, United Kingdom eMFusion Global Full time £60,000 - £120,000 per year

    Job Opportunity: Freelance Site Reliability Engineer (Outside IR35)£ | Remote (UK-Based) | Occasional travel to Farnborough or HammersmithContract until 2026We're hiring two hands-on Site Reliability Engineers (SREs) to join a fast-moving platform team on a long-term contract. This role is ideal for engineers with strong coding skills who are comfortable...


  • London, Greater London, United Kingdom La Fosse Full time £6,600 - £66,200 per year

    Contract Opportunity: Site Reliability Engineer (Azure & AWS)Location:UK (Hybrid/Remote)Rate:£550/day (Inside IR35)Contract Length:12 Months InitallyThe client is looking for ahighly skilled Site Reliability Engineer (SRE)with deep experience acrossAzure and AWSto take a lead role in migrating an existing on-premHPC solution into the Cloud. You'll be...


  • London, Greater London, United Kingdom Pinpoint Full time

    Product Reliability EngineerDepartment: EngineeringEmployment Type: Full TimeLocation: RemoteReporting To: VP of EngineeringDescription Hi I'm Dom, VP of Engineering at Pinpoint.We're a high-growth HR tech startup building and selling software that helps in-house recruitment teams attract, hire, and onboard the right talent. Today, we have a strong...


  • London, Greater London, United Kingdom Spait Infotech Private Limited Full time

    Job Description — Site Reliability Engineer (Remote, UK, Permanent)Job Title: Site Reliability Engineer (SRE)Location: Remote (United Kingdom)Experience: 0 -10 yearsEmployment Type: Full-time, PermanentEligibility: Must be eligible to work full-time in UK.Key ResponsibilitiesMaintain and improve availability, performance, and reliability of production and...


  • London, Greater London, United Kingdom -ea3a-4317-8f52-46b52766e55f Full time

    Join us in redefining the creator economy with AIFanvue is the fastest-growing creator monetisation platform in the creator economy. We are the leading AI-powered creator-first platform, designed to empower creators worldwide to directly monetise their audience. We're on a mission to redefine the creator economy by empowering creators to connect, share, and...

  • Reliability Engineer

    2 weeks ago


    London, Greater London, United Kingdom Digital Realty Global Full time £60,000 - £120,000 per year

    DescriptionYour roleThe Engineer will provide a range of support which may include technical difficulties, working with vendors to overcome intrinsic issues, working with site operations teams to improve usage and efficiency aspects, and identifying any areas for improvement. This may include site specific improvements, region wide improvement programmes,...


  • London, Greater London, United Kingdom Group Full time £40,000 - £80,000 per year

    **Site Reliability Engineer- UK**Optum is a global organisation that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture...


  • London, Greater London, United Kingdom Ditto Full time £60,000 - £120,000 per year

    About Ditto:Ditto is redefining how data moves at the edge. Our mission is to make it seamless for developers to build resilient, real-time applications, regardless of network conditions. Whether you're in a stadium, airplane, or remote military base, Ditto's peer-to-peer sync engine ensures devices stay connected and data stays consistent, even without...


  • London, Greater London, United Kingdom Pinpoint Full time

    DescriptionHi I'm Dom, VP of Engineering at Pinpoint.We're a high-growth HR tech startup building and selling software that helps in-house recruitment teams attract, hire, and onboard the right talent. Today, we have a strong foundation in place: a mature product, rapid growth, strong product-market fit, and happy customers.We're scaling fast - more...