System Reliability Engineer
3 weeks ago
We're seeking a highly skilled System Reliability Engineer to join our team as our first dedicated SRE/DevOps hire. This role offers an exciting opportunity to design, implement, and manage our infrastructure, CI/CD pipelines, and production operations from the ground up. You'll have autonomy in shaping our tech stack, defining best practices, and building scalable systems that will set the foundation for future engineering growth.
We're offering a competitive salary of $120,000 - $180,000 per year, depending on experience. If you thrive in startup environments and enjoy the blend of software engineering, operations, and infrastructure, we'd love to hear from you.
The ideal candidate should have 4+ years of experience in a DevOps, SRE, or related role with hands-on experience building and maintaining infrastructure. You should be proficient with Infrastructure as Code (IaC) tools and Kubernetes ecosystem tools, such as Terraform, Kubernetes, and FluxCD. Solid experience with Docker and Kubernetes for container management is also required.
Key Responsibilities:
- Set Up and Manage Infrastructure:
- Design, build, and maintain a robust, cloud-based infrastructure on Azure
- Develop and maintain infrastructure as code (IaC) using tools like Terraform
- Have ownership of our system's reliability and scalability, laying a strong foundation for our engineering environment
- Deploy and Orchestrate Containers:
- Use k8s and Docker to manage containerised applications, ensuring high availability, scaling, and resource optimisation
- Set up and manage k8s clusters to support reliable and scalable infrastructure
- Develop CI/CD Pipelines:
- Design and implement CI/CD pipelines to automate our build, test, and deployment processes
- Collaborate with development teams to streamline code integration and ensure high-quality releases across the board
- Implement Monitoring and Incident Management:
- Set up proactive monitoring, logging, and alerting systems to detect and resolve issues before they impact users
- Develop and refine incident response protocols and conduct root cause analyses to continuously improve system reliability
Requirements:
- 4+ years of experience in a DevOps, SRE, or related role with hands-on experience building and maintaining infrastructure
- Expertise with a major cloud provider (preferably Azure)
- Proficiency with Infrastructure as Code (IaC) tools and Kubernetes ecosystem tools, such as Terraform, Kubernetes, and FluxCD
- Solid experience with Docker and Kubernetes for container management
- Knowledge of CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI, GitHub Actions) and experience setting up automated workflows
- Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK stack, or DataDog
- Strong scripting skills (Python, Bash, or similar) for automation and tooling
Benefits:
- Competitive salary of $120,000 - $180,000 per year
- Opportunities for career growth and professional development
- Collaborative and dynamic work environment
-
Reliable Systems Engineer
2 weeks ago
London, Greater London, United Kingdom Apple Inc. Full timeAbout the Role:">We are seeking a highly skilled Reliable Systems Engineer to join our team at Apple Inc. This individual will be responsible for designing and developing scalable and reliable systems that meet the needs of our customers.">Responsibilities:">Design and develop scalable and reliable systemsCollaborate with development teams to identify system...
-
Reliability Systems Engineer
3 days ago
London, Greater London, United Kingdom Leap29 Full timeLeap29 is a leading provider of cloud implementation, application development, and managed services, with a strong focus on reliability and efficiency. As a Reliability Systems Engineer, you will be responsible for ensuring the seamless operation of their systems, meeting the high expectations of their clients.Responsibilities:Ensuring system reliability and...
-
System Reliability Engineer
3 weeks ago
London, Greater London, United Kingdom TRIA Full time £60,000 - £70,000TRIA is seeking a highly skilled System Reliability Engineer to join our team.Job Description:You will be responsible for designing, building, and maintaining scalable and reliable systems that meet the needs of our business.Develop and implement automation scripts using tools like Ansible or TerraformLiaise with the Platform team to ensure alignment with...
-
System Reliability Engineer
2 months ago
London, Greater London, United Kingdom Google Full timeJob DescriptionAs a System Reliability Engineer at Google, you will play a critical role in ensuring the reliability and scalability of our systems. You will work closely with cross-functional teams to design, deploy, and operate large-scale systems that are fault-tolerant and highly available. Your expertise will help us build and maintain infrastructure...
-
Site Reliability Systems Engineer
3 weeks ago
London, Greater London, United Kingdom Google Full timeWe are seeking an experienced Site Reliability Systems Engineer to join our Site Reliability Engineering team at Google. In this role, you will be responsible for designing, building, and maintaining large-scale distributed systems that support Google's product portfolio.As a Site Reliability Systems Engineer, you will work closely with cross-functional...
-
Electrical Systems Design Engineer
2 months ago
London, Greater London, United Kingdom AYS System Full timeRole OverviewWe are seeking an experienced Electrical Systems Design Engineer to join our team at AYS System. This is a fantastic opportunity for a motivated and skilled professional to take on new challenges and contribute to the success of our organization.Job DescriptionThe successful candidate will be responsible for designing, developing, and...
-
System Reliability Engineer
4 days ago
London, Greater London, United Kingdom Proactive Appointments Full timeSystem Reliability EngineerEstimated Salary: $90,000 - $120,000 per year.About the Job:This System Reliability Engineer position involves overseeing daily operations, ensuring seamless network performance, and participating in disaster recovery and business continuity planning.Key Responsibilities:Prioritize and efficiently handle 'Keep-The-Lights-On' (KTLO)...
-
Electrical Systems Specialist
2 weeks ago
London, Greater London, United Kingdom AYS System Full timeRole Summary:We are seeking an experienced Electrical Systems Specialist to join our team at AYS System.The ideal candidate will have a strong background in electrical engineering, with a focus on designing and developing high-voltage control systems.This is an excellent opportunity for a motivated professional to take their career to the next level and work...
-
Reliability Systems Engineer
3 days ago
London, Greater London, United Kingdom Arcus Search Full timeAbout Arcus Search">Arcus Search is a leading organization in the financial services industry, based in London.">Job Summary">We are seeking a Senior Site Reliability Engineer to join our established technology team. This role will provide an exciting opportunity to work on cutting-edge infrastructure, middleware, and CI/CD systems, driving performance,...
-
London, Greater London, United Kingdom AVT Reliability Ltd Full timeAbout AVT Reliability LtdWe are a leading company in the field of asset integrity and reliability. Our team is passionate about delivering high-quality services to our clients.Job SummaryThis is an exciting opportunity for a talented engineering graduate to join our Asset Integrity Division as a specialist. You will be responsible for supporting a diverse...
-
System Reliability Architect
3 weeks ago
London, Greater London, United Kingdom Bumble Full timeJob DescriptionWe are seeking a skilled System Reliability Architect to join our team at Bumble Inc. Estimated salary: $120,000 - $180,000 per year.About the RoleAs a System Reliability Architect, you will be responsible for ensuring the reliability, scalability, and performance of our software systems. This involves proactively managing, automating, and...
-
Reliable System Architect
2 days ago
London, Greater London, United Kingdom Promote Project Full timeWe are seeking an experienced Reliable System Architect to join our team. As a Reliable System Architect, you will be responsible for designing and implementing a reliable and scalable system architecture that meets the needs of Neon's growing customer base.About the RoleDesign and implement a reliable and scalable system architecture that meets the needs of...
-
System Reliability Specialist
2 days ago
London, Greater London, United Kingdom EFG Full timeEFG, a pioneer in the esports world, is looking for an exceptional System Reliability Specialist to enhance their technical capabilities. This position comes with an estimated annual salary of $150,000 - $220,000.About the Role:In this role, you will be responsible for ensuring the stability and performance of EFG's systems, including monitoring,...
-
Reliability Engineer
1 week ago
London, Greater London, United Kingdom The Blackstone Group L.P. Full time**About the Role**We are seeking a highly skilled Site Reliability Engineer to join our team at The Blackstone Group L.P. in London.The successful candidate will be responsible for improving the reliability of systems and services across the firm, working closely with service owners to design, implement, and manage services for continuous improvements.Key...
-
Cloud Systems Reliability Engineer
1 week ago
London, Greater London, United Kingdom Google Full timeAbout the RoleWe are seeking a highly skilled Cloud Systems Reliability Engineer to join our team at Google. As a key member of our Technical Infrastructure team, you will be responsible for designing, building, and maintaining large-scale, fault-tolerant systems that power our cloud platforms.Your primary focus will be on optimizing existing systems,...
-
Payment System Reliability Engineer
1 week ago
London, Greater London, United Kingdom Apple Inc. Full timeApple Inc. is seeking an experienced Payment System Reliability Engineer to join our Wallets & Payments team in London. As a key member of our team, you'll be responsible for designing, deploying, and maintaining our payment systems to ensure they meet the highest standards of scalability, reliability, and security.Key ResponsibilitiesDesign and implement...
-
Reliability Engineer
3 days ago
London, Greater London, United Kingdom BAE Systems Full timeJob Title:We are seeking a highly skilled Reliability Engineer to join our team in Warton. As a Reliability Engineer, you will be responsible for undertaking reliability and maintainability analysis and engineering tasks to influence the design and future sustainment of platforms and systems.Your Key Responsibilities:Undertake reliability and maintainability...
-
System Reliability Expert
2 weeks ago
London, Greater London, United Kingdom Anson McCade Full time £65,000 - £85,000System Reliability ExpertAnson McCade is seeking a seasoned System Reliability Expert to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our systems and applications.About the JobYou will collaborate with engineers to identify and implement improvements in system architecture and...
-
London, Greater London, United Kingdom ENGINEERINGUK Full timeJob SummaryThe role of a Robotics Systems Engineer in the Reliability and Automation Engineering Team involves working with cross-functional teams to drive the implementation and continuous improvement of world-class maintenance, repair, and supportability solutions for Amazon Robotics portfolio. You will analyze large-scale data from databases, PLCs,...
-
Reliability Engineer
3 days ago
London, Greater London, United Kingdom Jump Trading Full timeJob SummaryWe are seeking a highly skilled Reliability Engineer - Trading Systems to join our team at Jump Trading. As a key member of our operations team, you will be responsible for ensuring the smooth operation of our trading systems, providing technical support to our traders and developers, and contributing to the design and implementation of new...