Site Reliability Engineering Lead
2 weeks ago
Site Reliability Engineering Lead Are you passionate about building resilient systems and empowering teams to deliver reliable cloud solutions? Do you thrive in designing and managing scalable platforms that keep services running smoothly? About our team: The LexisNexis Intellectual Property (IP) division ( https://www.lexisnexisip.com ) provides international patent content and a suite of online and analytic tools that meet the evolving needs of the intellectual property market. We deliver data to support LexisNexis IP search and analytics applications, empowering our customers with actionable insights and metrics for critical business decisions. Our corporate culture thrives on excellence, innovation, and a strong dedication to our customers, employees, and communities. Working here means joining a vibrant, diverse, and collaborative team where you are free to grow and contribute actively. About the role: We are seeking a highly skilled and motivated SRE and Platform/Cloud Engineering Lead to lead a team responsible for ensuring the reliability, scalability, and resilience of mission‑critical systems for our IP business. This role is pivotal in managing a small team of senior engineers, driving operational excellence, and fostering a culture of continuous improvement. You will collaborate closely with the central SRE organization as well as work closely with IP product, development, architecture, and security teams to implement best practices in site reliability engineering, cloud platform management, and environment support for internal development and customer systems. The Lead will lead initiatives around incident response, disaster recovery, automation, monitoring, FinOps cost optimisation, and customer support escalations. This is a junior management‑level position requiring strong leadership, technical depth across cloud and infrastructure technologies, and the ability to influence both technical direction and business outcomes. Skills & Experience: Cloud Platforms & Services: Azure and AWS (EKS, EC2, S3, RDS, Lambda, Azure VMs, Functions). Infrastructure as Code: Terraform, ARM/BICEP. Containerization & Orchestration: Docker, Kubernetes (EKS/AKS), Helm, ArgoCD. Monitoring & Observability: Datadog, Splunk, Coralogix, CloudWatch, Azure Monitor, along with an understanding of baseline metrics. Scripting & Automation: Python, Bash, PowerShell, TypeScript, JavaScript. Programming Knowledge: Java, .NET/C#, SQL, React (for integration with supported products). Systems & Networking: Linux/UNIX/Windows administration, networking, and security best practices. Specialized Knowledge: Databricks, FinOps cost management, disaster recovery planning. Core Competencies: Incident management, troubleshooting, IT service management frameworks, and GitOps/DevOps practices. Soft Skills: Solid understanding of Site Reliability Engineering (SRE) principles and practices. Strong understanding of incident management, monitoring tools, IT service management frameworks and automation processes. Previous experience in customer‑facing roles or managing customer support escalations. Excellent technical problem‑solving and troubleshooting abilities. Strong communication and interpersonal skills, with the ability to collaborate across teams. Leadership skills with a track record of mentoring and guiding technical teams. Strong collaboration and advanced communication skills at peer and senior management level. Strong skills in setting, communicating, implementing, and achieving business objectives and goals through indirect leadership of and collaboration with others. Strong organization/project planning, time management, and change management skills across multiple functional groups and departments, and strong delegation skills involving prioritizing and reprioritizing projects and managing projects of varying size and complexity. Advanced problem‑solving experience involving leading teams in identifying, researching, and coordinating the resources necessary to effectively troubleshoot/diagnose complex project issues; prior success extracting/translating findings into alternatives/solutions; and identifying risks/impacts and schedule adjustments to facilitate management decision‑making. Ability to manage multiple priorities and work effectively in a fast‑paced environment. Passion for continuous learning and staying up‑to‑date with industry trends and best practices. Responsibilities: The SRE and Platform/Cloud Engineering Lead will be accountable for the following areas Building & Leading the SRE Organization Hire, mentor, and lead a team of SRE and platform engineers to ensure timely and accurate performance of all team activities Foster a culture of reliability, blameless post‑mortems, and proactive incident prevention. Define and implement SRE best practices for reliability, scalability, and performance. Customer & Incident Management – Manage intake, prioritization, and resolution of critical customer‑reported issues. Act as an escalation point for high‑severity incidents and outages. In collaboration with Product Support and Development Managers, ensure SLAs, performance benchmarks, and response protocols are met. Live System Monitoring & Support Design and maintain robust monitoring, alerting, and incident response systems. In collaboration with Product Support Manager, lead incident management from detection to resolution and post‑incident analysis. Ensure system high availability goals are met. Oversee disaster recovery and business continuity planning within IP Technology organization. Provide support for cloud resources management and workload capacity planning. Drive automation to reduce manual intervention and improve efficiency. Platform & Cloud Engineering Support product development teams with infrastructure, non‑functional requirements, and environment stability. Manage Kubernetes deployments, Databricks environments, and other critical platforms. Collaborate with cross‑functional teams to deliver secure, reliable, and cost‑effective platform and cloud solutions. Ensuring all systems comply with security patching and vulnerability management tools. In collaboration with architects, provide support for FinOps practices to monitor, optimize, and control cloud costs. Leadership & Continuous Improvement – Provide clear direction, performance evaluations, and career growth for team members. Ensure proper documentation, reporting, and compliance with security and regulatory standards. Promote continuous learning, knowledge sharing, and operational excellence. Writing and reviewing documentation for the management, improvement, and support of platforms/assets. Completing complex bug fixes and root‑cause investigations. Working closely with development and platform teams to understand requirements and translate them into high‑quality solutions. Implementing infrastructure management and deployment best practices, including code/solution reviews. Operating in various development environments (Agile, Waterfall, etc.) while collaborating with key stakeholders. Why Join Us? Join our team and contribute to a culture of innovation, collaboration, and excellence. If you are ready to advance your career and make a significant impact, we encourage you to apply. Benefits Dutch Share Purchase Plan Annual Profit Share Bonus Comprehensive Pension Plan Home, office or commuting allowance Generous vacation entitlement and option for sabbatical leave Maternity, Paternity, Adoption and Family Care leave Flexible working hours Personal Choice budget Variety of online training courses and career roadshows Wellbeing programs and gym facility in the office Internal communities and networks Various employee discounts Recruitment introduction reward Work from anywhere Employee Assistance Program (global) Annual Event We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. EEO Know Your Rights. #J-18808-Ljbffr
-
Lead Site Reliability Engineer
3 days ago
Greater London, United Kingdom JPMorganChase Full timeJob DescriptionJoin us and make a real impact by shaping the future of technology at JPMorgan Chase. As a Lead Site Reliability Engineer, you’ll collaborate with talented colleagues to deliver and operate firmwide solutions that power our business. You’ll have the opportunity to grow your career, apply your technical expertise, and solve diverse...
-
Lead Site Reliability Engineer
3 days ago
Greater London, United Kingdom JPMorganChase Full timeJob DescriptionJoin us and make a real impact by shaping the future of technology at JPMorgan Chase. As a Lead Site Reliability Engineer, you’ll collaborate with talented colleagues to deliver and operate firmwide solutions that power our business. You’ll have the opportunity to grow your career, apply your technical expertise, and solve diverse...
-
Site Reliability Engineer
7 days ago
Greater London, United Kingdom Arrows Full timeThis range is provided by Arrows. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range Site Reliability Engineer | Contract | London | Up to £600/day Inside IR35 | Hybrid - Up to £650 per day (Inside IR35) - 2 days per week onsite in London I'm working with a leading media and technology...
-
Site Reliability Engineer
4 days ago
Greater London, United Kingdom Trades Workforce Solutions Full timeSite Reliability Engineer (SC Cleared) Duration: 12 Months Rate: £675 per day Location: London or Manchester & remote (hybrid working) IR35 Status: Inside Start: ASAP Role Overview: A Site Reliability Engineer (SC Cleared) is required for our government department to be part of a multidisciplinary team developing and supporting the clients data hub which...
-
Site Reliability Engineer
1 day ago
Greater London, United Kingdom TP ICAP Full timeJoin to apply for the Site Reliability Engineer role at TP ICAP. The TP ICAP Group is a world leading provider of market infrastructure. Our purpose is to provide clients with access to global financial and commodities markets, improving price discovery, liquidity, and distribution of data, through responsible and innovative solutions. Through our people and...
-
Site Reliability Engineer
3 weeks ago
City of London, Greater London, United Kingdom Amelco Limited Full timeRole: Site Reliability Engineer Type: Full-time permanent role Location: Hybrid/ Shoreditch, London 3 days per week About Us Amelco Ltd are a leading gaming and gambling solution software provider with a strong presence in the USA, UK, and Europe. Through partnerships with global gaming companies, we build cutting-edge technical platforms across sportsbooks,...
-
Lead Site Reliability Engineer
6 days ago
London, Greater London, United Kingdom JPMorganChase Full time £80,000 - £120,000 per yearDescriptionAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms team, you hold a leadership role in your team, demonstrate strong knowledge...
-
Lead Site Reliability Engineer
9 hours ago
London, Greater London, United Kingdom JPMorgan Chase Full time £80,000 - £150,000 per yearAssume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms team, you hold a leadership role in your team, demonstrate strong knowledge across multiple...
-
Senior Site Reliability Engineer
2 days ago
Greater London, United Kingdom Stratospherec Ltd Full timeOverview Senior DevOps Engineer / Senior Site Reliability Engineer Fully Remote working for candidates based in the UK – Salary to £90k + Benefits We are looking for a Senior DevOps Engineer that has strong C# code knowledge combined with strong knowledge of DevOps tools like Kubernetes (EKS or AKS) and Azure or AWS Cloud platforms. We are looking for a...
-
Lead Site Reliability Engineer
4 days ago
London, United Kingdom JPMorganChase Full timeDescriptionJoin us and make a real impact by shaping the future of technology at JPMorgan Chase. As a Lead Site Reliability Engineer youll collaborate with talented colleagues to deliver and operate firmwide solutions that power our business. Youll have the opportunity to grow your career apply your technical expertise and solve diverse challenges across...