Site Reliability Engineer

5 days ago


London, Greater London, United Kingdom Oracle Full time £80,000 - £100,000 per year
Description

We are seeking an experienced Site Reliability Engineer with strong Linux troubleshooting skills and deep knowledge of virtual cloud networks and access technologies. The ideal candidate will have proven experience resolving complex issues across large-scale network infrastructure and cloud services in real time.

Responsibilities include diagnosing and resolving production incidents, writing Python and Bash scripts on the fly to support live troubleshooting and automation, and maintaining operational reliability across cloud networking environments.

Candidates should have hands-on expertise with remote access technologies such as FastConnect, IPsec, and BGP for secure and scalable route distribution. A strong understanding of Linux system processes, memory utilisation, disk and log management, network functionality, containerisation, and the TCP/IP stack is essential.

The role involves triaging and resolving Severity 1 and 2 incidents using logs, metrics, and CLI tools under pressure, including failed changes or system and process failures that directly impact customers in a 24/7 operational environment.

Responsibilities

Work with the Virtual Networking team to share full-stack ownership of a collection of services and technology areas, providing operational support as part of an on-call rotation. Understand the end-to-end configuration, technical dependencies, and overall behavioural characteristics of production services. Take responsibility for the delivery of the mission-critical stack with a strong focus on security, resiliency, scalability, and performance.

Hold authority for end-to-end performance and operability. Partner with global development teams to define and implement improvements in service architecture. Clearly articulate the technical characteristics of services and technology areas, guiding development teams to engineer and deliver premier capabilities within the Oracle Cloud service portfolio.

Develop and communicate a clear understanding of the scale, capacity, security, and performance attributes and requirements of the service and technology stack. Demonstrate a solid grasp of automation and orchestration principles.

Act as the ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Apply a deep understanding of service topologies and their dependencies to troubleshoot issues and define mitigations. Understand and explain the impact of product architecture decisions on distributed systems.

Exhibit professional curiosity and a desire to develop a deep technical understanding of services and technologies.

Ensure high quality, accurate and timely technical documentation of incidents, problems, changes, and standard operating procedures is maintained using tools such as Jira and Confluence

Work is non-routine and highly complex, involving the application of advanced technical and business skills within the Virtual Networking specialisation of Oracle Cloud Infrastructure (OCI).

Qualifications

  • Strong understanding of virtual network architecture, security, and automation
  • Understanding of TCP/IP stack and routing concepts in Linux systems and networking environments. IPSEC, VPNs and BGP specifically.
  • Experience with containerisation technologies and orchestration platforms.
  • Solid understanding of Virtual Cloud Networks (VCNs) in public cloud environments. 

  • Experience with CI/CD systems and release automation tools. 

  • Experience in scripting languages such as Python or Shell. 

  • Familiarity with infrastructure automation tools such as Terraform and Chef.

  • Possess leadership experience to ensure appropriate changes, upgrades, and enhancements are made based on the technical analysis.
  • Must support network segmentation (e.g., security lists, network security groups, or firewalls).
  • Deep Understanding of manipulating telemetry data (traffic flows, health status) using Grafana dashboards and MQL.
  • Experience with major public cloud providers (e.g., Oracle Cloud Infrastructure OCI, or equivalent).
  • Experience using Jira and Confluence for incident tracking, knowledge management, and ongoing technical documentation.

    This position requires the successful candidate to meet the criteria for UK Government security clearance. As per government policy, this normally includes holding UK citizenship.
Qualifications

Career Level - IC3




  • London, Greater London, United Kingdom eMFusion Global Full time £60,000 - £120,000 per year

    Job Opportunity: Freelance Site Reliability Engineer (Outside IR35)£ | Remote (UK-Based) | Occasional travel to Farnborough or HammersmithContract until 2026We're hiring two hands-on Site Reliability Engineers (SREs) to join a fast-moving platform team on a long-term contract. This role is ideal for engineers with strong coding skills who are comfortable...


  • London, Greater London, United Kingdom Arrows Full time £60,000 - £65,000 per year

    Site Reliability Engineer | Contract | London | Up to £600/day Inside IR35 | Hybrid- Up to £650 per day (Inside IR35)2 days per week onsite in LondonI'm working with a leading media and technology client that's building next-generation digital platforms used by millions across the UK. They're looking for an experienced Site Reliability Engineer to join...


  • London, Greater London, United Kingdom La Fosse Full time £6,600 - £66,200 per year

    Contract Opportunity: Site Reliability Engineer (Azure & AWS)Location:UK (Hybrid/Remote)Rate:£550/day (Inside IR35)Contract Length:12 Months InitallyThe client is looking for ahighly skilled Site Reliability Engineer (SRE)with deep experience acrossAzure and AWSto take a lead role in migrating an existing on-premHPC solution into the Cloud. You'll be...


  • London, Greater London, United Kingdom Group Full time £40,000 - £80,000 per year

    **Site Reliability Engineer- UK**Optum is a global organisation that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture...


  • London, Greater London, United Kingdom Ditto Full time £60,000 - £120,000 per year

    About Ditto:Ditto is redefining how data moves at the edge. Our mission is to make it seamless for developers to build resilient, real-time applications, regardless of network conditions. Whether you're in a stadium, airplane, or remote military base, Ditto's peer-to-peer sync engine ensures devices stay connected and data stays consistent, even without...

  • Site Reliability

    7 days ago


    London, Greater London, United Kingdom Arrows Full time

    Site Reliability Engineer (Contract)- Up to £650 per day (Inside IR35)2 days per week onsite in OsterleyI'm working with a leading media and technology client that's building next-generation digital platforms used by millions across the UK. They're looking for an experienced Site Reliability Engineer to join their growing team and help drive automation,...


  • London, Greater London, United Kingdom Lloyds Banking Group Full time £81,000 - £91,500 per year

    End DateSunday 16 November 2025Salary Range£81,999 - £91,110We support flexible working – click here for more information on flexible working optionsFlexible Working OptionsJob ShareJob Description Summary.Job DescriptionJOB TITLE: Senior Site Reliability EngineerSALARY: £70,929 – £78,810 (Chester); £81,999 - £91,110 (London only)LOCATION: Chester...


  • London, Greater London, United Kingdom Computer Futures Full time £6,300 - £6,480 per year

    Contract Role: SRE Engineer (AWS & Azure)Location: UKIR35: INSIDEDay Rate: £525-540Contract Length: 12 MonthsWe're looking for a skilledSRE(site reliability engineer) to lead the migration of an on-prem HPC solution to the cloud. You'll design and maintainscalable, reliable infrastructureacrossAzure and AWS, using automation and software engineering best...


  • London, Greater London, United Kingdom Leap29 Full time £100,000 - £120,000 per year

    Job DescriptionSenior Site Reliability Engineer (SRE) – Contract – UK-BasedAre you a proven Site Reliability Engineer with a passion for driving operational excellence and helping teams mature their SRE practices? We are working with a highly regarded cloud consultancy on a short-term assignment supporting a major digital services project.This is a...


  • London, Greater London, United Kingdom Apple Full time £60,000 - £120,000 per year

    At Apple, we build systems that power services used by hundreds of millions of people around the world, and every second counts. The Services Engineering organization is at the heart of this mission, ensuring our platforms are performant, secure, and always available. We're seeking a technically strong Site Reliability Engineer (SRE) to join our growing...