Current jobs related to Principal Cloud Reliability Engineer - London, Greater London - Department for Work and Pensions


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services. You will work closely with our development, operations, and support teams to integrate cost-effective practices into our software...


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services. You will work closely with our development, operations, and support teams to integrate cost-effective practices into our software...


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Design and implement cloud infrastructure to ensure scalability and reliabilityMonitor and analyze cloud...


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Design and implement cloud infrastructure to ensure scalability and reliabilityMonitor and analyze cloud...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerWe are seeking a highly skilled Lead Site Reliability Engineer to join our team at McGregor Boyall. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure on...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerWe are seeking a highly skilled Lead Site Reliability Engineer to join our team at McGregor Boyall. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure on...

  • Cloud Engineer

    2 weeks ago


    London, Greater London, United Kingdom Uniting Cloud Full time

    Cloud EngineerUniting Cloud is seeking a skilled Cloud Engineer to lead our DevOps and SRE efforts. As a key member of our team, you will be responsible for setting up, refining, and owning our platform, as well as building a team as the business grows.About the CompanyUniting Cloud is committed to simplifying our industry with technology. Our team is small...


  • London, Greater London, United Kingdom Arrows Full time

    Job Title: Cloud Reliability EngineerArrows is seeking a highly skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Help produce and maintain cloud spending dashboards to illustrate cloud spending...


  • London, Greater London, United Kingdom Arrows Full time

    Job Title: Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Help produce and maintain cloud spending dashboards to illustrate cloud spending improvements over...


  • London, Greater London, United Kingdom Arrows Full time

    Job Title: Cloud Reliability EngineerArrows is seeking a highly skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Help produce and maintain cloud spending dashboards to illustrate cloud spending...


  • London, Greater London, United Kingdom Arrows Full time

    Job Title: Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Help produce and maintain cloud spending dashboards to illustrate cloud spending improvements over...


  • London, Greater London, United Kingdom Moralis Full time

    Principal DevOps EngineerWe are seeking a highly skilled Principal DevOps Engineer to lead our cloud infrastructure strategy, ensuring it scales securely, efficiently, and reliably. As a senior technical leader, you will drive the adoption of best DevOps practices across the organisation, architect cloud solutions, and mentor teams to achieve operational...


  • London, Greater London, United Kingdom Moralis Full time

    Principal DevOps EngineerWe are seeking a highly skilled Principal DevOps Engineer to lead our cloud infrastructure strategy, ensuring it scales securely, efficiently, and reliably. As a senior technical leader, you will drive the adoption of best DevOps practices across the organisation, architect cloud solutions, and mentor teams to achieve operational...


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services. You will work closely with our development, operations, and support teams to integrate cost-effective practices into our software...


  • London, Greater London, United Kingdom Arrows Full time

    Cloud Reliability EngineerArrows is seeking a skilled Cloud Reliability Engineer to join our team. As a Cloud Reliability Engineer, you will be responsible for ensuring the high availability and performance of our cloud services. You will work closely with our development, operations, and support teams to integrate cost-effective practices into our software...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerA leading provider of financial services is seeking two experienced professionals to lead their Site Reliability Engineering team. The ideal candidate will have a solid background in Azure or GCP and a proven track record of optimizing existing systems, building new PaaS and IaaS solutions, and eliminating manual work through...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerA leading provider of financial services is seeking two experienced professionals to lead their Site Reliability Engineering team. The ideal candidate will have a solid background in Azure or GCP and a proven track record of optimizing existing systems, building new PaaS and IaaS solutions, and eliminating manual work through...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerA leading provider of financial services is seeking two experienced professionals to lead their Site Reliability Engineering team. The ideal candidate will have a solid background in Azure or GCP and a proven track record of optimizing existing systems, building new PaaS and IaaS solutions, and eliminating manual work through...


  • London, Greater London, United Kingdom McGregor Boyall Full time

    Lead Site Reliability EngineerA leading provider of financial services is seeking two experienced professionals to lead their Site Reliability Engineering team. The ideal candidate will have a solid background in Azure or GCP and a proven track record of optimizing existing systems, building new PaaS and IaaS solutions, and eliminating manual work through...


  • London, Greater London, United Kingdom Arrows Full time

    Job Title: Cloud Reliability EngineerArrows is seeking a highly skilled Cloud Reliability Engineer to join our team. As a key member of our cloud infrastructure team, you will be responsible for ensuring the high availability and performance of our cloud services.Key Responsibilities:Design and implement cloud infrastructure solutions to optimize resource...

Principal Cloud Reliability Engineer

2 months ago


London, Greater London, United Kingdom Department for Work and Pensions Full time

Position Overview

Are you adept at managing stakeholder relationships effectively?

Do you enjoy diagnosing issues and creating automated solutions to prevent future occurrences?

If this resonates with you, we would be eager to connect.

In the role of Senior Site Reliability Engineer, you will champion the implementation of SRE best practices throughout our cloud infrastructure.

Leveraging both your interpersonal skills and technical expertise, you will collaborate with various teams to ensure compliance with standards and governance during the onboarding of our services into the cloud, facilitated by a structured assessment process. This will ensure that our public-facing applications meet all operational and security requirements necessary for production environments.

Your responsibilities will include executing deployments through established runbooks, investigating production incidents, and providing dedicated support to teams in identifying root causes.

You will focus on minimizing manual work and enhancing automation by developing reliable systems that decrease the time to resolution and reduce costs associated with repetitive tasks.

Collaboration with development teams will be key, as you will provide guidance on best practices and ensure that application monitoring is effectively implemented.

Successful candidates will be expected to participate in on-call services to assist in restoring operations, utilizing either runbooks or technical knowledge.

Travel to other digital hubs may be required periodically, with the frequency to be discussed further upon selection.

Please note that this position requires successful clearance. For more details, refer to the 'Selection process details'.

Role Responsibilities

The SRE team will empower you to collaborate with application teams across the organization to develop reliable and secure solutions for citizens throughout the UK.

You will engage with development teams from the design phase, guiding them to adhere to best practices and departmental standards while constructing their application infrastructure.

Key responsibilities include:

  • Providing authoritative advice and guidance to internal and external stakeholders.
  • Designing and developing methodologies to enhance application reliability, including runbooks and knowledge transfer to the User Experience Command Centre (UXCC), alongside ongoing SRE strategies within your Functional and Professional Communities.
  • Managing the error budget established with the product owner for the application, ensuring a balanced workload in alignment with it.
  • Acting as the primary contact for investigating and resolving major or complex incidents, ensuring that skilled personnel are readily available to respond effectively.
  • Evaluating the impact of change requests in consultation with stakeholders, offering technical expertise and authorizing subsequent changes.
  • Overseeing on-call rotations to ensure all applications have out-of-hours SRE coverage.
  • Coaching and mentoring application development and operations engineers in SRE practices and techniques.
  • Conducting retrospectives for all high-priority and major incidents, ensuring they are completed promptly and documented.
  • Regularly soliciting feedback and ideas from stakeholders and team members for improvements, fostering collaboration and innovation.
  • Participating in interdepartmental discussions and meetings with various external organizations, leading community discussions on SRE best practices within Engineering.

Candidate Profile

When detailing your employment history and personal statement, please emphasize your experience in relation to the essential criteria outlined below:

  • * LEAD CRITERIA: Proficiency in scripting to automate processes and eliminate manual tasks, including infrastructure and configuration as code.
  • Experience in building and enhancing CI/CD pipelines.
  • Expertise in resolving complex technical incidents.
  • Experience in reliability engineering, including capacity and performance management through monitoring, logging, and alerting.
  • Familiarity with orchestration platforms and tools for managing containerized applications.
  • Experience in engaging with stakeholders at various levels to provide feedback and support.

An initial assessment may be conducted based on the lead criteria mentioned above. Candidates who pass this initial assessment will proceed to a comprehensive evaluation.

Benefits

• Employer pension contribution of up to
• Annual leave increasing up to 30 days, based on your working pattern.
• Family-friendly flexible working arrangements, including hybrid working, job sharing, term-time working, flexi-time, and compressed hours.
• Tailored learning and development opportunities, which may include industry-recognized qualifications, coaching, and mentoring.
• An inclusive and diverse workplace with opportunities to join staff networks such as the Women’s Network, National Race Network, National Disability Network (THRIVE), and more.

Salary Information

Compensation for this role ranges from £52,412 to £78,517.

The maximum salary for this grade is £63,517; however, a Digital Allowance of up to £15,000 per annum is available for exceptional candidates, based on our assessment of your skills and experience.

Our offer to successful candidates will be determined by an evaluation of your skills and experience as demonstrated during the interview process.

Current Civil Servants who secure a new role through lateral transfer will maintain their existing salary.

Current Civil Servants who receive a promotion may transition to the bottom of the next grade pay scale or receive a 10% salary increase, whichever is greater.