Sre Engineer
1 week ago
What you'll be doing 1.
Responsible for providing technical leadership to drive service restoration in the event of outages, including any remedial actions and ensure all parties who are impacted or can help to address the problem are engaged in a timely fashion to reach a solution 2.
Responsible for providing professional and effective 2nd and 3rd line support services and work with the development teams to resolve issues 3.
Proactively identifies and manages risk through regular assessment and diligent execution of controls and mitigations, proactively raising any concerns 4.
Responsible for performing and managing change activities in live and model environments to agreed timescales 5.
Operational responsibility for developing resource plans; prioritizes and allocates work to achieve projects, or components. 6.
Exercise effective control through planning, monitoring, progressing and cost control of the tasks assigned to you and where appropriate, third-party suppliers. 7.
Executes metric/monitoring analysis that creates stability, security, and performance improvements 8.
Designs, analyses, develops and troubleshoots highly-distributed large-scale production systems spanning on-prem and cloud-based hosting 9.
Executes approaches that scale systems sustainably through mechanisms like automation and evolves systems by pushing for changes that improve reliability and velocity 10.
Inspects queue and support processing to ensure early warning of support issues 11.
Implements robust monitoring and alerting systems and performs root cause analysis and post-mortems with an eye towards future prevention 12.
Executes retrospective and preventive actions after each high severity production incident 13.
Analyses complex systems from a reliability and resilience perspective and identifies sources of instability in distributed systems 14.
Champions, continuously develops and shares with team knowledge on emerging trends and changes in site reliability engineering best practices and industry standards 15.
Mentors other site reliability engineers, helping to improve the team's abilities by acting as a technical resource
-
Senior Infrastructure Automation Specialist
1 week ago
Southampton, Southampton, United Kingdom Spectrum It Recruitment Limited Full timeAbout the Role:We are seeking a Senior DevOps Engineer to join our engineering team. As a key member of our team, you will play a crucial role in streamlining software delivery pipelines, enhancing reliability, performance, and scalability of systems, and driving continuous improvement across the software lifecycle.Responsibilities:- Experience with GCP as...