AI Infrastructure Engineer

1 week ago


London, Greater London, United Kingdom writer Full time

About Writer

We are a leading provider of transformative AI solutions for enterprises, empowering hundreds of customers like Accenture, Intuit, L'Oreal, and Vanguard to revolutionize their workflows.

Our all-in-one platform makes it easy to deploy customized AI apps and workflows that accelerate growth, increase productivity, and ensure compliance. We provide enterprise-grade accuracy, security, and efficiency through our suite of development tools supported by Palmyra – our state-of-the-art family of LLMs – alongside our industry-leading graph-based RAG and customizable AI guardrails.

About this role

As an AI Infrastructure Engineer, you will be responsible for deploying and managing cutting-edge infrastructure crucial for AI/ML operations. You will collaborate with AI/ML engineers and researchers to develop a robust CI/CD pipeline that supports safe and reproducible experiments. Your expertise will also extend to setting up and maintaining monitoring, logging, and alerting systems to oversee extensive training runs and client-facing APIs.

This role demands a proactive approach to maintaining large Kubernetes clusters, optimizing system performance, and providing operational support for our suite of software solutions. Some key responsibilities include:

  • Designing and deploying a CI/CD pipeline that ensures safe and reproducible experiments
  • Setting up and managing monitoring, logging, and alerting systems for extensive training runs and client-facing APIs
  • Ensuring training environments are consistently available and prepared across multiple clusters
  • Improving reliability, quality, and time-to-market of our suite of software solutions
  • Measuring and optimizing system performance
  • Providing primary operational support and engineering for multiple large-scale distributed software applications
Requirements

We are looking for someone with professional experience in the following areas:

  • Model training
  • Huggingface Transformers
  • Pytorch
  • vLLM
  • TensorRT
  • Infrastructure as code tools like Terraform
  • Scripting languages such as Python or Bash
  • Cloud platforms such as Google Cloud, AWS or Azure
  • Git and GitHub workflows
  • Tracing and Monitoring

Familiarity with high-performance, large-scale ML systems is also essential. You should have a knack for troubleshooting complex systems and enjoy solving challenging problems. Proactive identification of problems, performance bottlenecks, and areas for improvement is also required.

We offer a competitive salary of $150,000 - $180,000 per year, depending on experience, plus benefits including generous PTO, medical, dental, and vision coverage, paid parental leave, fertility and family planning support, and annual work-life stipends for home office setup, cell phone, internet, wellness, and learning and development.



  • London, Greater London, United Kingdom Xcede Full time

    Xcede is seeking an experienced Ai Infrastructure Engineer to join our growing GenAI team. This role requires a strong background in Python and proficiency in AWS, with a bonus for experience with Kafka, Databricks, and RAG. Your primary responsibility will be to develop effective prompts for AI models while fine-tuning them, collaborating with Data Science...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    We are seeking an exceptional Cybersecurity Research Engineer to join our team at the AI Safety Institute. Our goal is to develop first-of-its-kind government-run infrastructure to benchmark the progress of advanced AI capabilities in cyber security. The selected candidate will work closely with a cross-functional team of cybersecurity researchers, machine...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    As advanced AI systems continue to evolve, the potential risks associated with their cyber capabilities pose a significant threat to organizational and individual security. These risks are particularly concerning when combined with other AI risk areas, such as harmful outcomes from biological and chemical capabilities, and autonomous systems.The AI Safety...


  • London, Greater London, United Kingdom Microsoft Full time

    We are looking for a highly skilled and motivated AI Infrastructure Specialist to join our team at Microsoft AI.About UsAt Microsoft AI, we are on a mission to create the leading pretraining platform to develop the world's most capable AI frontier models. This platform will span one of the world's foremost GPU clusters, pushing the boundaries of scale,...


  • London, Greater London, United Kingdom Atla Ai Full time

    Atla Ai: Safeguarding the Future of HumanityAbout Us:We're Atla Ai, a pioneering London-based start-up dedicated to engineering safe and beneficial AI systems. Our mission is to drive positive change in the world by developing cutting-edge AI evaluation models.Role Overview:As our alignment research engineer, you'll play a pivotal role in shaping the future...


  • London, Greater London, United Kingdom Engine AI Full time

    We are expanding the AI capabilities of our company, Engine AI, and seeking a seasoned AI Data Insights Engineer to spearhead the development of Data Agents. This critical role involves creating tools that translate natural language questions into actionable insights, including SQL query generation, entity matching, and data visualizations.This is an...


  • London, Greater London, United Kingdom Atla Ai Full time

    About AtlaWe are a London-based start-up building the most capable AI evaluation models. Our mission is to engineer safe, beneficial AI systems that will have a massive positive impact on the future of humanity.RoleAs Atla's alignment research engineer, you'll develop language models as evaluators and use your insights to construct safety guardrails for...


  • London, Greater London, United Kingdom Encord Full time

    We are on the cusp of a revolution in AI infrastructure, and we need your expertise to take us to the next level. As a seasoned software engineer, you will play a crucial role in building and extending our cutting-edge platform. With $30M in Series B funding, we're a talented team of 60, working at the forefront of computer vision and deep learning.As a key...

  • AI Expert

    5 days ago


    London, Greater London, United Kingdom Engine AI Full time

    Senior AI EngineerWe're expanding our AI capabilities at Engine AI and seeking a seasoned Senior AI Engineer to spearhead the development of Data Agents. This role involves crafting tools that translate natural language queries into actionable insights, including SQL query generation, entity matching, and data visualizations.As a key member of our team,...


  • London, Greater London, United Kingdom C3 AI Full time

    About the RoleWe are seeking an experienced AI Solutions Architect to join our team at C3.ai. The ideal candidate will have a strong background in developing and deploying enterprise-scale AI applications.Job DescriptionThe successful candidate will work with large companies to build the next generation of AI-powered enterprise applications on the C3 AI...


  • London, Greater London, United Kingdom Atla Ai Full time

    Atla Ai is committed to creating safe, beneficial AI systems that will have a significant positive impact on humanity's future. We are a London-based start-up developing the most capable AI evaluation models.Role and ResponsibilitiesAs Atla Ai's Research Engineer, you'll develop and fine-tune language models as evaluators and use your insights to construct...


  • London, Greater London, United Kingdom Signal AI Full time

    About the Reputation TeamThe Reputation Team at Signal AI is dedicated to delivering exceptional customer experiences in the Reputation space. Our mission is to provide innovative tools and solutions that help PR executives and Chief Communications Officers navigate the vast volume of world media data.As a key member of our team, you will be responsible for...


  • London, Greater London, United Kingdom Aitopics Full time

    Job Title: Senior Infrastructure Engineer - AI Development and TrainingHuawei R&D UK is seeking a highly skilled Senior IT Engineer to manage a large-scale AI development and training infrastructure.The role involves overseeing GPU servers, Kubernetes clusters (Rancher), and storage systems to ensure seamless operations and optimized performance.You will...

  • AI Safety Engineer

    1 week ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    The Post-Training Team at the AI Safety Institute is dedicated to optimizing AI systems for state-of-the-art performance in various risk domains. This involves a combination of scaffolding, prompting, supervised and RL fine-tuning of AI models.Key Responsibilities:Improve model performance using cutting-edge machine learning techniquesDevelop methodologies...


  • London, Greater London, United Kingdom Higher - AI recruitment Full time

    About the Job DescriptionThis Data Engineer position is an exciting opportunity to join our team at Higher - AI recruitment and contribute to the development of sophisticated data-driven products that support our clients' journey towards Net Zero.As a mid-senior level Data Engineer, you will have the opportunity to work with cutting-edge technologies and...


  • London, Greater London, United Kingdom Symphony Industrial AI, Inc. Full time

    Job SummaryA highly skilled AI Engineer is needed to lead the development of cutting-edge AI solutions for our London-based Trading & Investing team. This role involves collaborating with Data Engineers and Full Stack developers to build innovative AI systems, leveraging expertise in multi-agent generative AI frameworks, machine learning, AIOps, and...


  • London, Greater London, United Kingdom Tag Full time

    AI Infrastructure SpecialistWe are seeking a highly skilled AI Infrastructure Specialist to join our team in London. In this role, you will be responsible for designing and implementing AI infrastructure that meets the needs of our data science teams.About the RoleYou will work closely with our data science teams to ensure seamless integration of machine...


  • London, Greater London, United Kingdom Encord Full time

    About EncordWe are a cutting-edge AI infrastructure company that is revolutionizing the field of computer vision and deep learning. Our team of talented engineers is pushing the boundaries of what is possible with AI, and we are looking for an experienced engineer to join us.Job Description:We are seeking an outstanding AI infrastructure engineer to help us...


  • London, Greater London, United Kingdom Artifact AI Full time

    Redefine accounting with intelligent AI agents, automating complex financial processes for businesses and accounting firms.Lead ML research and develop enterprise-grade AI agents to automate workflows. Drive technical vision and create robust AI agents handling bookkeeping, tax compliance, and more.Design scalable AI agents tackling real-world challenges in...


  • London, Greater London, United Kingdom Predict X Full time

    PredictX is a leading SaaS scale-up revolutionising critical decision-making for global businesses.Job Description:We are seeking an experienced Cloud Systems Engineer to lead the migration of our on-prem systems to Google Cloud Platform (GCP). As a key player in our IT infrastructure team, you will be responsible for setting up and maintaining computer...