Senior Lead Site Reliability Engineer

4 weeks ago


United Kingdom JPMorgan Chase & Co. Full time

Out of the successful launch of Chase in 2021, we’re a new team, with a new mission. We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. Our team is key to our success. We’re people-first. We value collaboration, curiosity and commitment.

As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Accelerators Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of our customers. You have a curious mindset, thrive in collaborative squads, and are passionate about new technology. By your nature, you are also solution-oriented, commercially savvy and have a head for fintech. You thrive in working in tribes and squads that focus on specific products and projects – and depending on your strengths and interests, you'll have the opportunity to move between them.

While we’re looking for professional skills, culture is just as important to us. We understand that everyone's unique – and that diversity of thought, experience and background is what makes a good team, great. By bringing people with different points of view together, we can represent everyone and truly reflect the communities we serve. This way, there's scope for you to make a huge difference – on us as a company, and on our clients and business partners around the world

Job responsibilities

  • Creates high quality designs, roadmaps, and program charters that are delivered by you or the engineers under your guidance
  • Provides advice and mentoring to other engineers and acts as a key resource for technologists seeking advice on technical and business-related issues
  • Demonstrates site reliability principles and practices every day and champions the adoption of site reliability throughout your team
  • Collaborates with others to create and implement observability and reliability designs for complex systems that are robust, stable, and do not incur additional toil or technical debt
  • Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize our environments (Kubernetes, Helm charts), and leverage cloud technologies to meet our goals
  • Expertly manage, configure and troubleshoot operating system issues, storage (block and object), networking (VPCs, proxies and CDNs), and administer high-availability Cockroach, PostgreSQL and Redis clusters
  • Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations
  • Evolves and debug critical components of applications and platforms
  • Provides comprehensive and ongoing guidance, tools, and solutions to support the firms’ growth
  • Makes significant contributions to JPMorgan Chase’s site reliability community via internal forums, communities of practice, guilds, and conferences

Required qualifications, capabilities, and skills

  • Advanced knowledge in site reliability culture and principles with demonstrated ability to implement site reliability within an application or platform
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform
  • Proven public or private cloud experience (GCP preferred - AWS - AZURE) 
  • Fluency in at least one programming language such as (e.g., Python, Java, Go)
  • Extensive Kubernetes operational experience (ideally including Istio, ArgoCD)
  • Proficiency in continuous integration and continuous delivery tools e.g., Jenkins, GitHub, Terraform, etc
  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker, etc.)
  • Experience with troubleshooting common networking technologies and issues
  • Advanced knowledge and experience in observability such as white and black box monitoring, service level objectives, alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
  • Advanced knowledge of software applications and technical processes with considerable depth in one or more technical disciplines
  • Ability to communicate data-based solutions with complex reporting and visualization methods

Preferred qualifications, capabilities, and skills

  • Recognized as an active contributor of the engineering community

Our Technology Stack:

  • GCP (AWS and AZURE to come)
  • Kubernetes, ArgoCD, Helm, Ambassador, Istio, Fastly
  • JVM-based languages, GoLang
  • Infrastructure As Code (Terraform, Crossplane)
  • Grafana Cloud
  • Pulsar, Cockroach DB, HashiCorp Vault

#ICBEngineering #ICBcareers

#J-18808-Ljbffr

  • United Kingdom JPMorgan Chase & Co. Full time

    We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Accelerators Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of...


  • United Kingdom Understanding Recruitment Group Full time

    Direct message the job poster from Understanding Recruitment Lead Cloud Native/CTO Consultant and Host of The CTO Club Podcast Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds,...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. Deeply curious, creative, and...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. Deeply curious, creative, and...


  • United Kingdom Candour Solutions Full time

    Lead Site Reliability Engineer – Leeds (hybrid / remote) #TeamCandour have partnered with a true global player and genuine household name who are looking to build out their Leeds office with the addition of an accomplished Lead Site Reliability Engineer. This is an opportunity to collaborate with and lead a team of engineers around the world working on...


  • United Kingdom Candour Solutions Full time

    Lead Site Reliability Engineer – Leeds (hybrid / remote) #TeamCandour have partnered with a true global player and genuine household name who are looking to build out their Leeds office with the addition of an accomplished Lead Site Reliability Engineer. This is an opportunity to collaborate with and lead a team of engineers around the world working on...


  • United Kingdom THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems,...


  • United Kingdom THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems,...


  • United Kingdom THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...


  • United Kingdom THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...


  • United Kingdom THINKalpha Full time

    Location: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...


  • United Kingdom LexisNexis Risk Solutions Inc Full time

    Senior Site Reliability Engineer Would you like to join our great reliability engineering team? Do you have a passion for cloud infrastructure technologies? About The Business At Cirium, our goal is to keep the world connected. We are the industry leader in aviation analytics; helping our customers understand the past, present, and predicting what will...


  • United Kingdom RedRock Consulting Full time

    Site Reliability Engineer (Linux/K8S/AWS) - Leading SaaS / ERP provider! Excellent opportunity to join a leading SaaS provider, that are expanding operations due to growth and forecasted digital change. This is an exciting time to join the organisation as they embark on a range of enterprise technology projects. Flexible dependent on experience ...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...


  • United Kingdom TekStream Solutions Full time

    Our client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...


  • United Kingdom LexisNexis Risk Solutions Inc Full time

    Senior Site Reliability Engineer Would you like to join our great reliability engineering team? Do you have a passion for cloud infrastructure technologies? We are the industry leader in aviation analytics; helping our customers understand the past, present, and predicting what will happen tomorrow. Our mission is to transform the aviation industry by...