Senior SRE Engineer
2 weeks ago
Join our team as an HPC SRE
- Manage, optimize, and ensure the reliability of high-performance computing environments
- Be the go-to expert for all technical aspects of HPC infrastructure
- Collaborate with cross-functional teams to drive innovations aligning with business objectives
- Provide 24/7 support to maintain high availability and performance for HPC systems
- Set up HPC clusters with DGX or HGX platforms, GPU Direct, and establish network optimization
- Configure and troubleshoot Networking R&S hardware from Cisco, Juniper, or relevant vendors
- Write, execute, and debug Ansible Playbooks for Cumulus Linux automation
- Lead investigations into high-priority incidents and prepare Root Cause Analysis
- Monitor data centre health checks, licensing, and life-cycle management upgrades
- Utilize observability metrics tools to monitor system health and performance
Performance:
- Continuously optimize the performance of HPC systems
- Set and meet clear Service Level Objectives (SLOs) for reliability and performance
- Define and monitor Service Level Indicators (SLIs) to ensure service quality
Requirements:
- Bachelor's or Master's degree in Telecommunications, Computer Science, Electrical and Computer Engineering (ECE), or related field
- 6+ years of proven experience in networking and data centre operations
- Expertise in networking technologies including network protocols and topologies
- Background in troubleshooting server hardware/firmware, Linux OS, and scripting
- Experience with automated configuration management systems
- Ability to handle high-pressure situations in HPC AI data centres
-
Senior SRE
2 weeks ago
London, Greater London, United Kingdom EF Education First Full timeJob DescriptionEF is investing big in new software innovation products for the next generation of Education experiences. We want to reinvent Learning and drive new and engaging ways for Students and Teachers to get the best out of our platform. We're looking for like-minded individuals who love to grow and solve new and exciting problems. ROLE: We are...
-
Senior Sre
2 weeks ago
London, Greater London, United Kingdom StarRez Full timeStarRez is a leading global proptech company with a strong differentiated market position focused on transforming the resident experience by providing the engagement solutions and insights critical to successful residential communities.Our team is committed to building software that positively impacts the lives of millions of residents each year. We're a...
-
SRE Engineering Manager
2 weeks ago
London, Greater London, United Kingdom Nominet Full timePress Tab to Move to Skip to Content Link Engineering Manager - Site Reliability Engineering Location: London / Hybrid, GB Engineering Manager – Site Reliability Engineering Contract Type: Permanent Location: Hybrid (minimum 20% on-site in our London Shoreditch office) We're proud to be an Equal Opportunity and Affirmative Action Employer, and we're...
-
SRE / Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom Durlston Partners Full time £120,000 - £150,000Job Description Senior SRE - Boutique HFT - Up to £150k + Bonus Our client is a boutique HFT hiring a Senior SRE to work on their ULL infrastructure. Your role will consist in optimising the firm's core infrastructure to support ULL, 24/7 trading operations - You will spend the majority of the day coding over monitoring and implement reactive strategies to...
-
Senior Database SRE
2 weeks ago
London, Greater London, United Kingdom Sky Group Full timeWe believe in better. And we make it happen. Better content. Better products. And better careers. Working in Tech, Product or Data at Sky is about building the next and the new. From broadband to broadcast, streaming to mobile, SkyQ to Sky Glass, we never stand still. We optimise and innovate. We turn big ideas into the products, content and services...
-
Sre Engineer
2 weeks ago
London, Greater London, United Kingdom eFinancialCareers Full timeTEKsystems is currently engaged with a financial services company to recruit Site Reliability Engineer. who will be responsible for delivering continuous improvement, automation and self-service offerings to operational teams across company.Primary: Develop software to make infrastructure services selfmanaging and selfservice Deliver continuous service...
-
DevOps / SRE Lead
2 weeks ago
London, Greater London, United Kingdom LinuxRecruit Full timeWe have an opportunity to Lead a team of SRE's responsible for building a new Kubernetes Product. You'll still be hands on, you'll have a background in AWS, Kubernetes and Terraform and you'll have an ability to code in Go.You'll enjoy working with a Software Engineering mindset, but you'll also enjoy building and maintaining Platforms. It's a pure DevOps...
-
Site Reliability Engineer Sre Gcp or Aws
2 weeks ago
London, Greater London, United Kingdom Prism Digital Full timeSite Reliability Engineer (SRE) | GCP or AWS & Kubernetes | SaaS HealthTech100% RemoteAfter successfully placing several Engineers into the cloud team here, I am now on the lookout for another SRE to join the growing cloud team. If you are passionate about Site Reliability and you are ready for your next challenge, this 'Greenfield' projectand the future...
-
SRE / Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom NP Group Full timeStart Date:ASAP My client is one of the leading absolute return/hedge fund managers, overseeing assets on behalf of institutional investors from around the world, including pension funds, endowments, insurance companies, government agencies, private banks and fund of funds. At least 5 years professional experience in a DevOps / SRE role # Experience building...
-
SRE / DevOps Lead
2 weeks ago
London, Greater London, United Kingdom LinuxRecruit Full timeMoving jobs can cause apprehension, it can also be a worrying thinking who you might end up working with, are they good enough, do they follow the same principles as you, could you share a beer with them in an evening, or will they put your stapler in jelly.... In an unique turn of events, we're looking for two people, a Lead/Manager and a trusted...
-
Network Site Reliability Engineer Sre
2 weeks ago
London, Greater London, United Kingdom eFinancialCareers Full timeThe successful Network Site Reliability Engineer / Network SRE will be based in the heart of Mayfair, incumbents will not only receive unrivalled compensation packages, including an above-market base salary and excellent annual bonus scheme, butour client also offers flexible working hours, extensive medical benefits for both you and your family, 25+ days...
-
SRE Manager
2 weeks ago
London, Greater London, United Kingdom Vodafone Full timeLocation: London OR Newbury + *Hybrid Salary: Excellent basic salary plus bonus and Vodafone benefits Working Hours: Full time hours per week – Mon to Fri *Hybrid At Vodafone UK we believe that through collaboration and connection we can achieve great things. Our hybrid working approach allows our people to work both in the office and at home,...
-
SRE / Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom Experian Health Full timeWe're looking for a highly skilled and motivated Site Reliability Engineer (SRE) to join our Experian Data Quality team. As an SRE, you will be responsible for ensuring the reliability, performance, and scalability of our market leading suite of data management products, with an initial focus on observability to support incident resolution and drive...
-
Senior Site Reliability Engineer- Remote
2 weeks ago
London, Greater London, United Kingdom HOVER SENIOR LIVING COMMUNITY Full timeSenior Site Reliability Engineer- Remote ClickHouse Published 10 Apr 2024 Share this job UK Remote Role Highlights GO SQL Data Governance Computer Science Distributed Systems SRE Site Reliability Security Operations Automation Database Tools, Libraries and Frameworks GCP ClickHouse AWS Docker Terraform Cisco Ansible Description As...
-
DevOps-SRE Lead Engineer
2 weeks ago
London, Greater London, United Kingdom Lloyds Banking Group Full time £85,255 - £127,300DevOps-SRE Lead Engineer at Lloyds Banking Group Location: London based, 2 days per week in the office and the rest from home Salary & Benefits: £85,255 to £127,300 per annum, plus annual personal bonus, 15% employer pension contribution, private medical insurance, 30 days holiday plus bank holidays About us: We are part of the Business and Client...
-
SRE / Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom Lloyds Banking Group Full timeWe support agile working Click here for more information on agile working options. Agile Working Options Agile Working Options Hybrid WorkingJOB TITLE:Site Reliability Engineer – Homes Platform LOCATION(S): Halifax or LeedsHOURS:[Full-time] Our work style is hybrid, which involves spending at least two days per week currently, or 40% of our time, at...
-
Site Reliability Engineer SRE
2 weeks ago
London, Greater London, United Kingdom Tec Partners Full timeJob Title: Site Reliability Engineer (Software Dev Background) Type: Permanent Location: Fully remote Salary: 55-65K Our client are growing their team and are looking for a Site Reliability Engineer - (ideally from a software development / software engineering background)to contribute to the development and maintenance of our cloud infrastructure, help...
-
SRE / Site Reliability Engineer
2 weeks ago
London, Greater London, United Kingdom Qurated Network Full timeJob Description Site Engineering Manager | Cross-Border Payment Fintech We are working with the leading cross-border payments provider that went through an IPO last year and is now completing an extensive digital transformation. You will be responsible for keeping their new technology platforms available 24/7/365 by monitoring the Performance, Reliability,...
-
London, Greater London, United Kingdom NearTech Search Full timeSenior Site Reliability Engineer (GCP, AWS, K8s), UK (remote), £120,000 + bens An extremely well-funded and fast-growing AI-Driven Data company are in need of a new (GCP & AWS) Senior Site Reliability Engineer to join their growing tech team. They have cultivated an extremely innovative culture and working environment where the team are encouraged to...
-
Senior Software Engineer/sre
2 weeks ago
London, Greater London, United Kingdom eFinancialCareers Full timeSite Reliability Engineers in Market Data at Bloomberg fill the mission-critical role of ensuring our complex, real-time enterprise product is healthy, automated, observable, and designed for reliability. We work at enormous scale - billions of financialticks are being processed every day - and we ingest, enrich, and deliver it to clients within...