Principal Site Reliability Engineer

3 days ago


London Warwick Court, UK, United Kingdom myGwork Full time

This inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community.

There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. ​ We are a premier asset manager focused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We invite you to explore the opportunity to join us and grow your career with us.

Job Title: Principal Site Reliability Engineer (SRE)

Department: CDO Technology Group

Summary:

We are seeking a highly motivated and experienced Principal Site Reliability Engineer (SRE) to join the CDO Technology leadership team to stand up and lead the SRE function within CDO Technology. In this role, you will be responsible for ensuring the availability, latency, performance, efficiency, and stability of our critical infrastructure, which supports a range of data platforms, applications, and services. You will collaborate closely with development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards.

Responsibilities:

Availability:

  • Proactively monitor and proactively identify potential issues that could impact the availability of our systems.

  • Implement and maintain automated alerting mechanisms to notify the appropriate parties of potential outages or performance degradation.

  • Collaborate with development teams to design and implement solutions that enhance system resilience and reduce downtime.

Latency

  • Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure.

  • Implement performance optimization techniques and tools to improve the overall responsiveness of our systems.

  • Work with development teams to ensure that new features and code changes do not introduce performance regressions.

Performance:

  • Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems.

  • Identify performance trends and anomalies that may indicate potential issues or areas for improvement.

  • Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems.

Efficiency

  • Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure.

  • Identify and implement cost-effective solutions to improve the efficiency of our IT operations.

Release Management:

  • Design and implement automated deployment and rollback procedures to mitigate risks associated with software updates.

  • Monitor the performance of new releases and address any issues that arise promptly.

  • Lead the team that executes the release management.

Monitoring:

  • Design, implement, and maintain a comprehensive monitoring infrastructure to track the health and performance of our systems.

  • Analyze monitoring data to identify potential issues and proactively troubleshoot problems before they impact users.

  • Develop and implement alerts and notifications for critical events to ensure timely intervention.

Emergency Response:

  • Build and lead the team that responds promptly to incidents and works collaboratively to resolve them in a timely manner.

  • Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence.

  • Document incident responses and communicate lessons learned to enhance our incident handling processes.

  • Collaborate with your peers on the leadership team to define a multi-year technical roadmap. Stay up to date with industry developments and enterprise infrastructure, and anticipate significant risks.

  • Work with development teams to review architecture design to ensure high availability and proper disaster recovery strategy

  • Collaborate with reliability and infrastructure engineering team in T Rowe Price to build synergy in tooling for the implementation of observability, tracing, and alerting

Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or a related field preferred.

  • 10+ years of experience as a Site Reliability Engineer or equivalent in a similar role.

  • Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems.

  • Expertise in Linux systems administration, including managing servers, operating systems, and network configurations.

  • Strong scripting and automation skills, preferably with experience in Bash, Python, or similar languages.

  • Familiarity with AWS.

  • Experience with DevOps tools and practices, such as GitLab CI/CD, and Docker.

  • Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues.

  • Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.

  • A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced financial environment.

Benefits:

  • Competitive salary and comprehensive benefits package.

  • Opportunity to work with cutting-edge technologies and contribute to the development of innovative solutions.

  • Collaborative and supportive work environment with a focus on continuous learning and professional development.

T. Rowe Price operates a hybrid working model with a minimum of two days per week in the London office expect

Commitment to Diversity, Equity, and Inclusion:

We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients. Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community. We know that a sense of belonging is key not only to your success at the firm, but also to your ability to bring your best each day.

T. Rowe Price is an equal opportunity employer and values diversity of thought, gender, and race. We believe our continued success depends upon the equal treatment of all associates and applicants for employment without discrimination on the basis of race, religion, creed, colour, national origin, sex, gender, age, mental or physical disability, marital status, sexual orientation, gender identity or expression, citizenship status, military or veteran status, pregnancy, or any other classification protected by country, federal, state, or local law.



  • London,, UK, United Kingdom Oliver Bernard Full time

    Site Reliability Engineer (DevOps)A Media client of ours is currently seeking a Site Reliability Engineer to come in and join their Infrastructure team. The current team consists of around 10 engineers across SRE, DevOps and Cloud.Up to £85k base salary1-2 days per week in LondonPrivate Cloud set up so engineers must be happy working with this, only around...


  • London,, UK, United Kingdom Talented Recruitment Group Full time

    Are you passionate about crafting robust, fault-tolerant systems that power unforgettable travel experiences? Do you thrive in an environment where innovation and collaboration are valued? If so, we have an incredible opportunity for you!About the company:We are working with a leading global travel company dedicated to providing exceptional experiences for...


  • London,, UK, United Kingdom ByteHire Full time

    Reference: BH-298cJob Role: Senior Site Reliability EngineerJob Type: ContractIR35: Inside IR35Day Rate: £600/DayContract Duration: 6 monthsWorking Hours: 5 days per weekRemote Working: 4 days remote working. 1 day on-site in LondonLocation: Hybrid Remote/London (UK only)Role Overview:We’re looking for a Senior Site Reliability Engineer with deep Google...


  • London, Warwick Court, UK, United Kingdom myGwork Full time

    This inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. ​ We are a premier asset manager focused on delivering global investment management excellence and retirement services that...


  • London,, UK, United Kingdom Salt Full time

    Site Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35)Duration: 6 – 12 months Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. You must be currently working as an SRE for a few years. This is a hybrid role so 2 days per week in the London office. Must have...


  • London, UK, United Kingdom McGregor Boyall Full time

    Site Reliability Engineer- Lead, Mentoring, Kubernetes, PaaS, IaaS, SQL, Azure DevOps, CI/CD A leading provider of financial services is seeking two Site Reliability Engineers- Leads with a solid and proven background in Azure or GCP. This position will also be based onsite in London two days per week. A key part of this opening is mentoring from a tech...


  • London,, UK, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer ToolingOur client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a core...


  • London, UK, UK, United Kingdom McGregor Boyall Full time

    Lead Site Reliability Engineer, Mentoring, Kubernetes, PaaS, IaaS, SQL, Azure DevOps, CI/CDA leading provider of financial services is seeking two Lead Site Reliability Engineers- Leads with a solid and proven background in Azure or GCP.This position will also be based onsite in London two days per week. A key part of this opening is mentoring from a tech...


  • London, UK, UK, United Kingdom McGregor Boyall Full time

    Site Reliability Engineer- Lead, Mentoring, Kubernetes, PaaS, IaaS, SQL, Azure DevOps, CI/CDA leading provider of financial services is seeking two Site Reliability Engineers- Leads with a solid and proven background in Azure or GCP.This position will also be based onsite in London two days per week. A key part of this opening is mentoring from a tech...


  • London, UK, United Kingdom Deutsche Bank Full time

    Job Description: Job Title Site Reliability Engineer Location London Corporate Title Vice President You will work closely with application developers and support teams to ensure stable, well monitored applications that are resilient to faults with a full automated delivery pipeline. You will have knowledge of and experience in relevant tools used in...


  • London,, UK, United Kingdom NP Group Full time

    Site Reliability Engineer – Google Cloud LondonExcellent Salary & Package including BonusKey Skills – SRE, GCP (Enterprise Deployments), HELM, Python/Golang/Java, IAC/Automation, Blockchain Technologies, Node Infrastructure, Security HardeningOverviewAn influential member of a team of highly skilled engineers building out cloud native infrastructure,...


  • London,, UK, United Kingdom Cititec Talent Full time

    Senior Principal Engineer.NET and Java stack Hybrid (London, UK)Cititec Talent are working with a worldwide FinTech company focused on creating secure, simple, and innovative solutions for the payments economy. They connect individuals, businesses, and financial institutions through seamless payment transactions. They are looking for a Senior Principal...


  • London, UK, UK, United Kingdom Legal & General Full time

    Life can sometimes be unpredictable, and it pays to plan ahead. Our aim at Legal & General Retail is to help our customers plan for the unexpected, achieve financial security for their tomorrow, and protect everything that’s important to them. To better understand our customers and meet their needs, we’ve brought our protection, retirement income,...


  • London, UK, UK, United Kingdom Fruition IT Full time

    Principal DevOps Engineer£110 - £130k salary + 15% bonus + 14% pension contribution + many more! Central London (twice per week on site) A market leader in retail credit, as well as a tech company providing leading products for FinTech companies, are looking to expand their team to work on new strategic projects within the data and AI space. The Principal...


  • London, UK, UK, United Kingdom Fruition IT Full time

    Principal DevOps Engineer£110 - £130k salary + 15% bonus + 14% pension contribution + many more! Central London (twice per week on site) A market leader in retail credit, as well as a tech company providing leading products for FinTech companies, are looking to expand their team to work on new strategic projects within the data and AI space. The Principal...


  • London, Warwick Court, UK, United Kingdom myGwork Full time

    This inclusive employer is a member of myGwork – the largest global platform for the LGBTQ+ business community. There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. ​ We are a premier asset manager focused on delivering global investment management excellence and retirement services that...


  • London,, UK, United Kingdom HCLTech Full time

    HCLTech is a global technology company, home to 219,000+ people across 54 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing,...


  • London,, UK, United Kingdom HCLTech Full time

    HCLTech is a global technology company, home to 219,000+ people across 54 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing,...


  • London,, UK, United Kingdom fierlo Full time

    Principal Software Engineer £120 - £130k - 2 days per week in office working with the teamWe have a great role for a Principal Engineer to come in working across a number of teams in this leading Saas organisation.Main Responsibilities a. Guiding and leading technology selections, solution design and engineering practices within approved standardsb. Plan...


  • London,, UK, United Kingdom fierlo Full time

    Principal Software Engineer £120 - £130k - 2 days per week in office working with the teamWe have a great role for a Principal Engineer to come and work across a number of teams in this leading Saas organisation.Main Responsibilities a. Guiding and leading technology selections, solution design and engineering practices within approved standardsb. Plan and...