Principal Site Reliability Engineer

4 weeks ago


London, United Kingdom Prolific Full time

The Role

We are looking for a Principal Site Reliability Engineer to lead Site Reliability at Prolific, focusing on advancing the resilience and scalability of our GCP and AWS environments. You will play a pivotal role in overseeing and enhancing our Kubernetes clusters in GCP, which support our Django application, and in driving the SRE strategic transition to AWS, particularly towards serverless and event-driven architecture.

What you'll be doing
  • Strategic oversight of continuous monitoring, maintenance, and optimisation processes for our Django application, ensuring highest levels of performance and reliability.
  • Lead the evolution of our cloud Kubernetes estate , focusing on advanced security, reliability, and observability strategies.
  • Spearhead infrastructure optimisations and architectural improvements in collaboration with cross-functional teams, addressing complex challenges and ensuring scalability.
  • Promote knowledge sharing and reduce silos across teams to strengthen resilience and reduce dependency on key individuals, increasing our Bus Factor.
  • Drive hands-on coding and system design improvements , with a focus on Python/Django, to optimise system performance and efficiency.
  • Develop comprehensive documentation and training programs to elevate the operability skills of the engineering team and foster a culture of continuous learning.
  • Support our Service Delivery response strategies , by being part of an out-of-hours support rota, and collaborating with our Service Delivery Lead to enhance overall service quality.
  • Lead security initiatives , addressing emerging threats, ensuring robust compliance, and setting best practices for the organisation.
What you’ll bring
  • Extensive experience as a Site Reliability Engineer / Platform Engineer, with proven staff-plus leadership in managing a large-scale enterprise Kubernetes platform in GCP.
  • Deep expertise in security, compliance, and cloud architecture best practices.
  • A track record of implementing observability-first approaches and familiarity with tools like Datadog.
  • Experience in leading out-of-hours incident management and on-call rotations.
  • Demonstrated ability to mentor teams, lead strategic initiatives, and drive significant technology transformations.
  • Certification in any of the below would be an advantage, but not required
    • GCP Professional Cloud Architect
    • GCP Professional Security Engineer
    • GCP Professional Networking Engineer
    • GCP DevOps Engineer
    • CKA (Certified Kubernetes Administrator)
#J-18808-Ljbffr

  • London, United Kingdom Plutus Full time

    BPP Education is entering a new phase of its growth and evolution, attracting thousands more students each year and expanding into new verticals and new markets globally. The BPP Product & Technology (P&T) organisation is evolving rapidly, and driving transformation of its platforms, digital products and experiences, in order to help BPP Education scale and...


  • London, United Kingdom Apple Inc. Full time

    Site Reliability Engineering (SRE) Manager, iCloud People at Apple don’t just build products — they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you’ve used Apple products, you’ve likely interacted with us. iCloud Services...


  • London, United Kingdom TEKsystems Full time

    Site Reliability Engineer / SRE Description: My global client is looking for a Site Reliability Engineer / SRE to join their growing team who must have strong experience working within the financial services industry on large complex projects. To be successful in this Site Reliability / SRE project you will need expert experience within: AWS ...


  • London, United Kingdom Understanding Recruitment Full time

    Job DescriptionSite Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance,...


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer Check all associated application documentation thoroughly before clicking on the apply button at the bottom of this description.I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability,...


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer Check all associated application documentation thoroughly before clicking on the apply button at the bottom of this description.I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability,...


  • London, United Kingdom Understanding Recruitment Full time

    Job Description Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and...


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...


  • London, United Kingdom Experian Full time

    Job Description Work that matters – what you’ll be doing We’re looking for a Site Reliability Engineer to join our Experian Data Quality team where you will be working on cutting edge products within our Aperture suite (Data Studio and Data Governance). This role has aspects of both reliability engineering (SRE) and test engineering (SDET)....


  • London, United Kingdom N Consulting Ltd Full time

    Job title: Site Reliability EngineerWork Mode: 3 days office MandatoryLocation: 5 Broadgate, London EC2M 2QS, United KingdomContract Duration: 12 monthsWe’re looking for a Site Reliability Engineer to:· determine the reliability of our digital products, technology services, and the infrastructure that underpins them· minimize the risk and impact of...


  • London, United Kingdom McGregor Boyall Full time

    **Permanent role** **£70k - £120k per annum (+ package)** **SPONSORSHIP - AVAILABLE** **Location - Central London (hybrid working model)** **The Company** A Fortune 500 company based in Central London. **The Role** As a**Site Reliability Engineer**you will collaborate with product development teams. You will be instrumental providing engineering...


  • London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...


  • London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...


  • London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...


  • London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...


  • London, United Kingdom Apple Inc. Full time

    Site Reliability Engineering (SRE) Manager, iCloud People at Apple don’t just build products — they craft experiences our customers love and depend on. Apple Services Engineering (ASE) builds and supports the systems that make many of these daily experiences possible. If you’ve used Apple products, you’ve likely interacted with us. iCloud Services...


  • London, United Kingdom eFinancialCareers Full time

    Join us as a Senior Site Reliability Engineer - We'll look to you to establish and run a SRE function to help design, build, deliver and run highly reliable, scalable and secure software systems - This is a great opportunity to hone your existing engineering skills and advance your career in this critical role **What you'll do** As a Senior Site Reliability...


  • London, United Kingdom Mondrian Alpha Full time

    Site Reliability Engineer / Windows Enviroment / Prestigious Hedge Fund / London My client, a renowned hedge fund with a global presence, is in search of a seasoned Site Reliability Engineer to join their London team. As part of this team, you'll play a pivotal role in maintaining the technology infrastructure that drives the fund's operations, directly...