Platform Engineer
1 week ago
We are building the UK's next generation AI platform, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need.
AI
Platform
Operations
Manager
Support Engineer / Cluster Administrator to provide Level 1 and Level 2 support for AI platform. This role will be customer facing, involve technical troubleshooting, and collaboration with vendor engineering teams to ensure seamless AI platform operations.
Key Responsibilities
- L1 support for customer-reported issues and requests
- L2 support by diagnosing, replicating, and troubleshooting issues across platform and infrastructure.
- Coordinate resolution of complex issues (L3) to (vendor) product/engineering teams and manage vendor responses
- Monitor system health, alerts, and customer usage patterns
- Document solutions/workarounds, create and maintain knowledge, document support procedures
- Automate common tasks and fixes
- Configure and integrate tooling to support optimal operation of the platform, and support tool selection
- Assist customers with platform configuration, onboarding, and usage best practices
- Collaborate with platform and infrastructure support/engineering teams to resolve platform integration issues
- Ensure SLAs and customer satisfaction targets are met
- Work with customers and multiple stakeholders to understand requirements and challenges, provide reporting on usage, workflow and billing
Technical responsibilities
- Cluster Infrastructure management: Managing the Nvidia GPU cluster
- High availability and resilience: Implement failover strategies and manage maintenance events to minimise downtime
- Resource allocation and optimisation: Resource partitioning (GPU resources), workload scheduling, capacity planning
- Performance monitoring and troubleshooting: Performance analysis, monitoring (realtime) with available Nvidia and HPE tools
- Incident response: node failure management, network issues, driver issues, troubleshooting common issues and then working with vendor support to resolve any critical issues
- Security and access control: Manage user permissions, RBAC, security hardening, data protection
Required Skills & Experience
- In-depth experience within technical support, system engineering, or platform operations
- Strong understanding of L1 and L2 support processes (ticketing, escalation, troubleshooting)
- Familiarity with cloud-based platforms, APIs, and distributed systems
- Understanding of AI/ML concepts and tooling (model training, inference, data pipelines basics)
- Experience with monitoring/logging tools (e.g., Grafana, Kibana, Splunk)
- Excellent communication skills to interface with both customers and internal / vendor teams
- Good understanding of tools requirements for ML engineers and data scientists, and how to optimize the experience
Core Technical Skills
- System administration experience with OS's like RHEL/CentOS, Ubuntu, tuning Linux kernel
- Proficiency with Ansible, Nvidia and CUDA toolkits, Kubernetes and container orchestration
- Understanding of automation, monitoring and security with GPU as a service
Preferred Experience
- Experience supporting HPE PCAI or other AI/HPC infrastructure and platforms
- Experience with GPU resource allocation (across instances, GPUs count and time)
- Advanced networking skills with High performance networking, troubleshooting and fine tuning
- Background in DevOps or SRE practices
- ITIL familiarity
Success Metrics
- Customers receive timely, effective support with minimal escalations
- Issues are resolved or routed correctly with high-quality documentation
- The platform maintains strong uptime and customer satisfaction
-
Platform Engineer
3 days ago
London Area, United Kingdom Carbon3 - The UK's AI Solution Platform Full time £24,000 - £90,000 per yearWe are building the UK's next generation AI platform, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need.*AIPlatformOperationsManager*Support Engineer / Cluster Administrator to provide Level 1 and Level 2 support for AI platform. This role will be customer facing, involve...
-
Platform Engineer
2 weeks ago
London Area, United Kingdom Oliver Bernard Full timeMid-Level Platform Engineer – AWS, Kubernetes, Terraform, CI/CD Oliver Bernard are currently working with a FinTech / Banking SME who are looking to take on a Mid-Level Platform Engineer as part of current expansion plans across tech, due to a surge in demand for their platform. You will work in an extensive Team, devised of Platform and Cloud...
-
Platform Engineer
2 weeks ago
London Area, United Kingdom DelTra Group Full timePlatform Engineer – Data & AI (London)Job type: Permanent role I’m hiring for a Platform Engineer to join a growing Data & AI enablement team within a leading global organisation.What you’ll do:Build self-service data infrastructure using Microsoft FabricDevelop reusable patterns, templates, and security controlsSupport federated data teams in creating...
-
Platform Engineer
2 weeks ago
London Area, United Kingdom Oliver Bernard Full timeMid-Level Platform Engineer – AWS, Kubernetes, Terraform, CI/CDOliver Bernard are currently working with a FinTech / Banking SME who are looking to take on a Mid-Level Platform Engineer as part of current expansion plans across tech, due to a surge in demand for their platform.You will work in an extensive Team, devised of Platform and Cloud Infrastructure...
-
Platform Engineer
1 hour ago
London Area, United Kingdom Oliver Bernard Full timeMid-Level Platform Engineer – AWS, Kubernetes, Terraform, CI/CDOliver Bernard are currently working with a FinTech / Banking SME who are looking to take on a Mid-Level Platform Engineer as part of current expansion plans across tech, due to a surge in demand for their platform.You will work in an extensive Team, devised of Platform and Cloud Infrastructure...
-
Platform Engineer
3 days ago
London, Greater London, United Kingdom Carbon3 - Building the UK's AI Solution Platform Full time £40,000 - £80,000 per yearWe are building the UK's next generation AI platform, powered by renewable energy, rooted in sovereign capability, and designed to give enterprises and innovators the compute they need.AI Platform OperationsSupport Engineer / Cluster Administrator to provide Level 1 and Level 2 support for AI platform. This role will be customer facing, involve technical...
-
Platform Engineer
1 hour ago
London Area, United Kingdom Harnham Full timePlatform Engineer (GCP)Hybrid London (2–3 days/week)Up to £75,000 + BenefitsWe’re hiring a Platform Engineer for one of the UK’s most loved fintechs — a bank built on modern technology, real ownership, and a strong engineering culture. This team powers every product, transaction, and customer experience the company delivers. If you enjoy solving...
-
Platform Engineer
1 week ago
London Area, United Kingdom ECOM Full time £60,000 - £72,000 per yearx2 Senior Platform Engineers (GCP / Kubernetes) – Outside IR35 | £650 P/D | 6 Months | Fully RemoteWe're working with anAI scale-uplooking fortwo Senior Platform Engineersto join their growing team. You'll help buildcloud-native, event-driven systemsonGCPin a modern, DevOps-driven environment. Tech:GCP | Kubernetes / GKE | Terraform | CI/CD | Cloud...
-
Platform Engineer
3 days ago
London Area, United Kingdom Project Brains Full time £60,000 - £120,000 per yearPlatform EngineerNeed OverviewThe primary purpose of this role is to support the AI team by managing and maintaining the platforms necessary for their work, including the setup and ongoing support of the LangGraph platform and Weaviate database on VM or OCP environments. In addition, you will assist the team with application development, leveraging your...
-
Platform Engineer
1 week ago
London Area, United Kingdom Durlston Partners Full time £85,000 - £150,000 per yearRole:Platform EngineerClient:Crypto HFTExperience Level:2+ yearsSalary:£85,000 + bonusesLocation:London (hybrid)Durlston Partners is working with a next-gen Crypto HFT firm that is building lightning-fast systems to trade digital assets globally, combining innovation, precision, and high-performance technology.The company is looking for a Platform Engineer...