Site Reliability Engineering Lead

18 hours ago


London, Greater London, United Kingdom BenevolentAI Full time

**Job Overview:**

We are seeking a highly skilled Senior Site Reliability Engineer to join our team at BenevolentAI. As a key member of our squad, you will play a crucial role in ensuring the reliability and scalability of our cloud infrastructure.

The ideal candidate will have a strong background in software development, with experience in implementing cloud infrastructure solutions using best engineering practices. You will work closely with our cross-functional teams to identify areas for improvement and develop strategies to increase infrastructure availability and reliability.

We are looking for someone who is passionate about staying up-to-date with recent technology trends and tools, and has hands-on experience with Kubernetes, AWS, and other cloud technologies. Your ability to code fluently in one or more programming languages (Python/Java/Go/C++) is essential.

**Responsibilities:*

  • Implementing software solutions for cloud infrastructure in accordance with specification and best engineering practices.
  • Working towards improving long-term infrastructure availability and reliability.
  • Monitoring and handling incident response of the infrastructure, platforms and core engineering services.
  • Constructing pipelines to automate infrastructure and software deployments.
  • Troubleshooting infrastructure, network and software issues.
  • Staying up to date with recent technology trends and tools.
  • Automating repetitive manual processes and procedures.

**Requirements:

  • Ability to code fluently in one or more programming languages (Python/Java/Go/C++).
  • Hands-on experience with Kubernetes.
  • Good understanding and experience in administering cloud technologies (we work with AWS, but experience with other cloud providers is also a benefit).
  • Comfortable working with Unix-based operating systems.
  • Good understanding of infrastructure-as-code and tools such as Ansible, Terraform, Helm.
  • Experience with cloud networking, cloud operations, automation and workload orchestration.
  • Understanding of quality of service measurement tools (SLIs, SLOs, SLAs).
  • Experience with monitoring and alerting solutions (for example InfluxDB/Grafana/Prometheus).
  • High-level understanding of database technologies (for example, relational, NoSQL, Graph) and their basic use cases.

**Estimated Salary:** £110,000 - £130,000 per year.



  • London, Greater London, United Kingdom BenevolentAI Full time

    About the Role:BenevolentAI is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for designing and implementing software solutions for cloud infrastructure, improving long-term infrastructure availability and reliability, and monitoring and handling incident response of...


  • London, Greater London, United Kingdom LinuxRecruit Full time

    Join Our TeamAbout Us: LinuxRecruit is a leading organization dedicated to delivering innovative solutions and exceptional user experiences.We are committed to fostering a culture of excellence, collaboration, and continuous learning. By joining our team, you will be part of a dynamic and inclusive organization that values your contributions and supports...


  • London, Greater London, United Kingdom Google Full time

    Job OverviewThis is an exciting opportunity to join our Site Reliability Engineering (SRE) team at Google Cloud, where you will play a critical role in ensuring the reliability and uptime of our services. As a seasoned technical leader, you will be responsible for leading teams, developing and implementing solutions, and providing expert guidance to drive...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesAs a Site Reliability Engineer at Fourier, you will be responsible for designing and implementing tools to enhance the reliability and resilience of our production systems. This includes investigating failures, improving system performance, and automating manual processes.Required SkillsExcellent Python scripting skillsExperience with...


  • London, Greater London, United Kingdom Selby Jennings Full time

    About Selby JenningsWe're a leading global financial services firm where technologists and investment professionals collaborate to drive innovation and operational excellence.About the RoleAs a Site Reliability Engineer, you'll apply your expertise in software and systems engineering to design, build, and maintain our robust infrastructure. You'll reduce...


  • London, Greater London, United Kingdom Rewardgateway Full time

    Engineering, LondonEarn a salary of £110,000 - £130,000 per year with Reward Gateway.We are seeking an experienced Site Reliability Engineer to lead our team and drive the transformation of our operational workloads to a Service Reliability Engineering (SRE) approach. The successful candidate will be responsible for establishing and managing our new SRE...


  • London, Greater London, United Kingdom Lorien Full time

    Key Responsibilities:Collaborate with the existing team to deliver a brand-new project.Work on a hybrid model with 1 day a week on-site in London.Develop and maintain reliable and efficient systems.Utilize experience with Java, Python, Splunk, ServiceNow, and MongoDB.Contribute to incident management and application monitoring.Ensure seamless interaction...


  • London, Greater London, United Kingdom Apple Inc. Full time

    Unlock the Future of Cloud ServicesAt Apple Inc., we're not just building products - we're crafting experiences that our customers love and depend on. Our Apple Services Engineering (ASE) team is responsible for the systems that make these daily experiences possible. If you've used Apple products, you've likely interacted with us. Our iCloud Services SRE...


  • London, Greater London, United Kingdom J Bandy Consulting Full time

    Job SummaryJ Bandy Consulting is seeking an experienced Site Reliability Engineer to join our team. The ideal candidate will have a strong background in software engineering and a passion for building scalable and reliable systems.Key ResponsibilitiesDevelop and implement automation tools to improve the efficiency of our systemsCollaborate with...


  • London, Greater London, United Kingdom Remotestar Full time

    Remotestar is seeking a Senior Site Reliability Engineering Manager to join our client's team in the UK. The client is building a B2B marketplace for diamonds, and we need someone to ensure the reliability, scalability, and performance of our infrastructure and services.The ideal candidate will have a strong track record of building and maintaining highly...


  • London, Greater London, United Kingdom GoCardless Full time

    The RoleGoCardless is looking for a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our payment and open banking products.Key ResponsibilitiesDesign and implement scalable and efficient infrastructure solutionsDevelop...


  • London, Greater London, United Kingdom Preqin Full time

    About the Role:Preqin is seeking an experienced Site Reliability Engineer to join our team in London. As a Site Reliability Engineer, you will work across Preqin's full suite of services, supporting our clients around the world.You will be responsible for designing, building, and operating our infrastructure, middleware, and CI/CD systems to ensure our teams...


  • London, Greater London, United Kingdom Apple Inc. Full time

    Site Reliability Engineering Manager, AppleAt Apple, we're not just building products - we're crafting experiences our customers love and depend on. Our Apple Services Engineering (ASE) team builds and supports the systems that make many of these daily experiences possible. If you've used Apple products, you've likely interacted with us. Our iCloud Services...


  • London, Greater London, United Kingdom Highfield Professional Solutions Ltd Full time

    Highfield Professional Solutions Ltd is seeking a Site Reliability Engineer to join our team in Central London. The successful candidate will be responsible for managing and maintaining critical engineering systems within our Data Centre, ensuring that they operate efficiently and effectively. This role offers a competitive salary of up to 48,000 per year,...


  • London, Greater London, United Kingdom BenevolentAI Full time

    BenevolentAIEstimated Salary: £110,000 - £140,000 per annum.Company Overview:BenevolentAI is a leading artificial intelligence company that uses machine learning to accelerate scientific discovery. We are seeking a highly skilled Senior Site Reliability Engineer to join our team and help us maintain the reliability and scalability of our cloud...

  • Site Reliability Lead

    3 weeks ago


    London, Greater London, United Kingdom Kroo Bank Ltd Full time

    Site Reliability EngineerWe're on a mission to build the world's greatest social bank, and we believe that banking needs to change for the better. When money is used correctly, it can transform our daily lives and positively impact the planet.We're a varied team of experienced tech, customer experience, marketing, legal and banking professionals, and we're...


  • London, Greater London, United Kingdom Kinetech Full time

    At Kinetech, we're seeking a talented Site Reliability Engineer to join our team. This role is responsible for ensuring the smooth operation of our software systems, with a focus on scalability, reliability, and performance.Key Responsibilities:Design and implement CI/CD pipelines to automate code integration, testing, and deployment.Automate repetitive...


  • London, Greater London, United Kingdom Victrex Full time

    Senior Reliability Engineer RoleAbout the JobWe are seeking an experienced Senior Reliability Engineer to lead our asset management strategy and drive improvements in plant performance across all UK plants.Job SummaryThe successful candidate will be responsible for developing and implementing systems and procedures that enhance safety, asset availability,...


  • London, Greater London, United Kingdom College of Charleston Full time

    Transformative SRE Leadership OpportunityAre you a seasoned leader with a passion for strategy, leadership, and engineering excellence? Do you want to make a meaningful impact at a global financial institution? We're seeking a talented Site Reliability Engineering Manager to join our Operations and Technology Chief Information Office Business area.About the...


  • London, Greater London, United Kingdom Citigroup, Inc. Full time

    As a seasoned Engineering Manager with Citigroup, Inc., you will be responsible for driving operational excellence and fostering collaboration within the Site Reliability Engineering (SRE) team. To achieve this, you will create effective reporting mechanisms, manage multiple projects simultaneously, and ensure timely and budget-conscious completion of work....