Sre- Observability Specialist

2 months ago


Reigate, United Kingdom Willis Towers Watson Full time

SRE- Observability Specialist
- Reigate, GB

May 12, 2023

You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we work on improving the delivery of value for customers and the business. You will be working in the Site Reliability and Response team, whose responsibility is to deliver and manage business critical services that are used 24×7 by our clients and colleagues around the world.
This role is open to flexible and hybrid working arrangements, with presence in the Reigate office on average two days per week.
- Develop and implement Observability strategies for SaaS products across Technology Delivery, ensuring metrics, logs and traces are effectively captured, analysed and actioned upon
- Manage Datadog implementation, showcasing its abilities, ensuring the tool is configured correctly and being used efficiently with our services whilst maintaining cost effectiveness Advocate of Observability a big part of the role is being able to clearly communicate the benefits of Observability to technical and non-technical stakeholders, emphasising its critical role to play in delivering successful SaaS offerings
- Collaboration with cross-functional teams-working alongside engineering managers, product owners and operational teams. Understanding their requirements and ensuring our Observability strategies are aligned with the business objectives
- Define and implement appropriate monitoring and alerting standards to proactively identify and address issues, whilst minimising alert fatigue
- Provide training and support toa wide variety ofteams, ensuring theyare familiar with the disciplines of Observability and how to take advantage of Datadog to maximize availability,performance andreliabilityof their service
Regularly review and assess our standards and practices to ensure the effectiveness of Datadog in our SaaS platform
- Solid experience in an Observability, Site Reliability Engineering or a similar role such as DevOps
- Previous involvement in defining, planning and implementing Observability strategies, using Datadog or similar tools
- Understanding of cloud infrastructure and services
- Experience with Vendor Management
- Strong interpersonal skills, with the ability to work effectively with many stakeholders
- Excellent communication and presentation skills, with the ability to effectively convey complex concepts to both technical and non-technical audiences
- Experience with conducting Post-mortems or Post Incident Reviews
- Confidence in making decisions and taking ownership of projects
- You’re collaborative, enjoy problem solving and mentoring other

**Other highly desirable, but not essential skills are**:

- Familiarity with Infrastructure as Code (IaC) tools like Pulumi, Terraform, ARM Templates, or Azure Bicep
- Understanding of programming languages such as C# would be welcome
- Experience with other popular monitoring tools e.g. Prometheus, Grafana, Elastic Stack is a plus
- Awareness of ITIL
- Understanding of Azure DevOps and CI/CD Pipelines and how to better integrate Observability into the development and deployment process

(ICT_TECH ED_2023_88R)



  • Reigate, United Kingdom WTW Full time

    We are seeking a Site Reliability Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms. You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we work...


  • Reigate, United Kingdom WTW Full time

    We are seeking a Site Reliability Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms.  You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we...


  • Reigate, Surrey, United Kingdom WeDo Full time

    Role: Senior Site Reliability Engineer (SRE) Salary: £85,000 Package: £106,250 Industry: Financial Services Working: Hybrid, 2 days onsite CEO recommendation: 88% Wedo is an exclusive partner to a Global Financial Services organisation that advises three-quarters of the world’s leading insurers and is one of the world's largest providers...


  • Reigate, United Kingdom WTW Full time

    We are seeking a Site Reliability Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms. You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we...


  • Reigate, United Kingdom WTW Full time

    We are seeking a Site Reliability Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms. You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we...


  • Reigate, Surrey, United Kingdom WTW Full time

    We are seeking a Site Reliability Engineer to join our SRE team based in Reigate. The ideal candidate will have excellent communication skills, experience working with multiple stakeholders, and a track record in Azure and Observability platforms.You will be joining Insurance Consulting and Technology (ICT) at an exciting time of transformation as we work on...