Senior ML Infrastructure Engineer
5 hours ago
The Ellison Institute of Technology (EIT) Oxford's purpose is to have a global impact by fundamentally reimagining the way science and technology translate into end-to-end solutions and delivering these solutions in programmes and platforms that respond to humanity's most challenging problems.
EIT Oxford will ensure scientific discoveries and pioneering science are turned into products for the benefit of society that can have high-impact worldwide and, over time, be commercialised to ensure long-term sustainability.
Led by a world-class faculty of scientists, technologists, policy makers, economists and entrepreneurs, the Ellison Institute of Technology aims to develop and deploy commercially sustainable solutions to solve some of humanity's most enduring challenges. Our work is guided by four Humane Endeavours: Health, Medical Science & Generative Biology, Food Security & Sustainable Agriculture, Climate Change & Managing Atmospheric CO2 and Artificial Intelligence & Robotics.
Set for completion in 2027, the EIT Campus in Littlemore will include more than 300,000 sq ft of research laboratories, educational and gathering spaces. Fuelled by growing ambition and the strength of Oxford's science ecosystem, EIT is now expanding its footprint to a 2 million sq ft Campus across the western part of The Oxford Science Park. Designed by Foster + Partners led by Lord Norman Foster, this will become a transformative workplace for up to 7,000 people, with autonomous laboratories, purpose-built laboratories including a plant sciences building and dynamic spaces to spark interdisciplinary collaboration.
Our MLOps team
Join our MLOps team to build the cloud and compute foundation that enables scientific breakthroughs. Deliver reliable, secure platforms and self-service guardrails that accelerate experimentation and turn ideas into results—faster, at scale, and with confidence.
Day-to-day, you might:
- Build, operate, and continuously optimise our high-performance GPU training and inference clusters, focusing on robust, high-availability scheduling, isolation, and automated lifecycle management.
- Drive systems design and implementation for high-throughput data paths, optimising I/O, caching, and data locality across compute and storage (including our current Lustre implementation).
- Proactively benchmark, profile, and resolve performance bottlenecks across the compute, network, and orchestration layers to maximise efficiency for distributed training and inference.
- Establish comprehensive observability, resilience, and automated security controls to ensure compliance and robust operation of sensitive research environments.
- Partner with Research, Data, and Applied teams to forecast capacity and cost for GPU and storage needs, setting quotas and streamlining ML experimentation pipelines.
What makes you a great fit:
- Proven experience leading the design, build, and operation of high-performance ML compute clusters at scale
- A proactive, autonomous approach to systems design and the proven ability and desire to ideate, co-create and implement optimal solutions
- Exposure to migrating or transforming ML infrastructure from traditional schedulers to modern, containerised systems
- Expertise with high-throughput storage systems for ML/HPC workloads
- Expert-level understanding of GPU architecture, high-speed networking for distributed training, and performance profiling to resolve bottlenecks
- A solid grasp of IaC and CI/CD practices (e.g., Terraform, Argo CD)
We offer the following salary and benefits:
Enhanced holiday pay
Pension
Life Assurance
Income Protection
Private Medical Insurance
Hospital Cash Plan
Therapy Services
Perk Box
Electric Car Scheme
--
Why work for EIT:
At the Ellison Institute, we believe a collaborative, inclusive team is key to our success. We are building a supportive environment where creative risks are encouraged, and everyone feels heard. Valuing emotional intelligence, empathy, respect, and resilience, we encourage people to be curious and to have a shared commitment to excellence. Join us and make an impact
-
Senior ML Infrastructure Engineer
3 hours ago
Oxford, Oxfordshire, United Kingdom Ellison Institute of Technology Full timeThe Ellison Institute of Technology (EIT) Oxford's purpose is to have a global impact by fundamentally reimagining the way science and technology translate into end-to-end solutions and delivering these solutions in programmes and platforms that respond to humanity's most challenging problems. EIT Oxford will ensure scientific discoveries and pioneering...
-
Senior ML Infrastructure Engineer
1 week ago
Oxford, Oxfordshire, United Kingdom Ellison Institute of Technology Full time £80,000 - £120,000 per yearThe Ellison Institute of Technology (EIT) Oxford's purpose is to have a global impact by fundamentally reimagining the way science and technology translate into end-to-end solutions and delivering these solutions in programmes and platforms that respond to humanity's most challenging problems. EIT Oxford will ensure scientific discoveries and pioneering...
-
Senior Machine Learning Engineer
2 days ago
Oxford, Oxfordshire, United Kingdom Recursion Full time $142,500 - $206,900Your work will change lives. Including your own. The Impact You'll MakeAs a Senior Machine Learning Engineer, you will be building a platform that accelerates drug discovery with machine learning for a wide variety of use cases with respect to data modality, model architecture, and model size. By leveraging your experience in machine learning, software...
-
Machine Learning Data Engineer
4 days ago
Oxford, Oxfordshire, United Kingdom Cubiq Recruitment Full time £60,000 - £120,000 per yearLocation:Hybrid (Oxford / London)Market-leading compensation – Dedicated to matching or exceeding market-rate salaries.This is a firm applying machine learning to some of the most complex real-world challenges. From healthcare to robotics to climate change, you'll join a team building systems that push the boundaries of applied AI at scale.You'll be...
-
Principal Data Engineer
2 weeks ago
Oxford, Oxfordshire, United Kingdom KDR Talent Solutions Full time £100,000 - £150,000 per yearPrincipal Data Engineer | Cutting-Edge Research & Technology | Hybrid (Oxford/London) | up to £150k per annumThe CompanyOur client is an ambitious research and technology organisation at the forefront of AI and data-driven innovation. They are building one of the most advanced data platforms in the UK - designed to power the next generation of healthcare...
-
Principal Data Engineer
2 weeks ago
Oxford, Oxfordshire, United Kingdom KDR Talent Solutions Full time £80,000 - £198,912 per yearPrincipal Data Engineer | Cutting-Edge Research & Technology | Hybrid (Oxford/London) | up to £200k per annum + Bonus & Travel allowanceThe CompanyOur client is an ambitious research and technology organisation at the forefront of AI and data-driven innovation. They are building one of the most advanced data platforms in the UK - designed to power the next...
-
Machine Learning Engineer
6 days ago
Oxford, Oxfordshire, United Kingdom -239d-45a1-9b01-80915c18f6fd Full time £60,000 - £90,000 per yearWe want you to #JOINTHEREBELLIONFor 30 years we've been independently developing and publishing incredible video games at our multiple studios founded by Jason & Chris Kingsley, but Rebellion is more than just games. We have our own film studio, we create board games, publish books, and through 2000AD, publish comics and graphic novels such as the amazing...
-
Senior DevOps Engineer
2 weeks ago
Oxford, Oxfordshire, United Kingdom 562f516d-03a6-4393-90d1-f63fd39a42ea Full time £80,000 - £120,000 per yearAbout the roleQuantum is now, and it's built here.Oxford Ionics, now part of IonQ, is pioneering the next generation of quantum computing. Using our world-leading trapped-ion technology, we're building the most powerful, accurate and reliable quantum systems to tackle problems that today's supercomputers cannot solve.Joining Oxford Ionics means becoming part...
-
Senior DevOps Engineer
2 days ago
Oxford, Oxfordshire, United Kingdom Oxford Ionics Full timeQuantum is now, and it's built here.Oxford Ionics, now part of IonQ, is pioneering the next generation of quantum computing. Using our world-leading trapped-ion technology, we're building the most powerful, accurate and reliable quantum systems to tackle problems that today's supercomputers cannot solve.Joining Oxford Ionics means becoming part of a global...
-
Senior DevOps Engineer
2 days ago
Oxford, Oxfordshire, United Kingdom Oxford Ionics Full timeQuantum is now, and it's built here.Oxford Ionics, now part of IonQ, is pioneering the next generation of quantum computing. Using our world-leading trapped-ion technology, we're building the most powerful, accurate and reliable quantum systems to tackle problems that today's supercomputers cannot solve.Joining Oxford Ionics means becoming part of a global...