Site Reliability Engineer

1 week ago


London, Greater London, United Kingdom TikTok Full time £62,000 - £1,100,000 per year

Responsibilities
About The Team

The security engineering team is missioned to build security services, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures. In this team you'll have a unique opportunity to have first-hand exposure to the strategy of the company in key security initiatives, especially in building scalable and secure-by-design systems and solutions. You will also have opportunities to go through the whole lifecycle of security products or services, be encouraged to participate in each phase and each part of the projects to have the whole picture of what we are working on.

We count on our site reliability engineers (SREs) to empower our users with high availability and stellar performance level to pursue their missions from security perspective. As we expand our business, we are currently seeking experienced SRE to deliver insights from massive-scale security systems in real time. You will not only address those regular day-to-day technical problems but also are encouraged to bring fresh ideas, investigate the existing infrastructure, identify problems, and develop new solutions to those challenges of a kind not previously addressed by big tech. We are moving fast while expanding on a large scale (1B+ users). This role assumes a huge impact and plays a key role in the company's business.

Responsibilities

  • Lead or drive an SRE team to design and implement the security SRE framework for the company's security infrastructure, and build cutting-edge SRE technologies for system deployment, upgrade, capacity planning and rapid troubleshooting and disaster recovery.

  • With cutting edge technologies, drive the implementation and the improvement of automation and the intelligence of the SRE infrastructure and evolve it into a platform; build solutions to measure and monitor availability, scalability, latency and overall system health of security products and services developed by partner teams, and improve the efficiency and sustainability of system maintenance.

  • Drive the design and development of the SRE infrastructure and maintenance tools for the full lifecycle of security system development; support the rapid iteration and system reliability of the company's security services.

  • Coordinate support to cross-functional teams and external customers with security products and services.

  • Responsible for building, scaling, managing and coaching the SRE team, as well as driving technical decisions as a leader.

Qualifications
Minimum Qualifications

  • Bachelor's degree in Computer Science or related fields, with minimum 3+ years of relevant experience in developing and maintaining large-scale distributed SRE platform/tooling with automation.

  • Solid programming skills, mastering at least one of the programming skills such as Go/Java/Python/Shell, and being able to deliver high quality code; Familiar with at least one of the web frameworks, such as Gin/Django/Spring, with a decent understanding of their design principles.

  • Experienced and hands-on skills in debugging, troubleshooting and optimization of sophisticated distributed systems and platforms.

  • Deep understanding of OS (Linux, windows), Network (TCP/IP, HTTP, etc), with good exposure to network, storage, as well as computer architecture.

  • Familiar with Redis/MySQL/PostgreSQL database architecture and working principle, familiar with daily operation and maintenance including but not limited to high availability cluster construction, monitoring, backup, fault handling, Performance optimization.

  • Familiar with cloud native framework with experience in Kubernetes. Good experience with SRE tools such as Ansible, ELK, Prometheus and Grafana.

Preferred Qualifications

  • Passionate about self-studying cutting edge technologies and staying relevant. Strong communication and collaboration skills, and willing to take ownership.

About TikTok
TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.​

Why Join Us
Inspiring creativity is at the core of TikTok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and bring joy - a mission we work towards every day.​

We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We're resilient and embrace challenges as they come. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. When we create and grow together, the possibilities are limitless. Join us.​

Diversity & Inclusion​

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.​



  • London, Greater London, United Kingdom eMFusion Global Full time £60,000 - £120,000 per year

    Job Opportunity: Freelance Site Reliability Engineer (Outside IR35)£ | Remote (UK-Based) | Occasional travel to Farnborough or HammersmithContract until 2026We're hiring two hands-on Site Reliability Engineers (SREs) to join a fast-moving platform team on a long-term contract. This role is ideal for engineers with strong coding skills who are comfortable...


  • London, Greater London, United Kingdom La Fosse Full time £6,600 - £66,200 per year

    Contract Opportunity: Site Reliability Engineer (Azure & AWS)Location:UK (Hybrid/Remote)Rate:£550/day (Inside IR35)Contract Length:12 Months InitallyThe client is looking for ahighly skilled Site Reliability Engineer (SRE)with deep experience acrossAzure and AWSto take a lead role in migrating an existing on-premHPC solution into the Cloud. You'll be...


  • London, Greater London, United Kingdom Group Full time £40,000 - £80,000 per year

    **Site Reliability Engineer- UK**Optum is a global organisation that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture...


  • London, Greater London, United Kingdom Spait Infotech Private Limited Full time

    Job Description — Site Reliability Engineer (Remote, UK, Permanent)Job Title: Site Reliability Engineer (SRE)Location: Remote (United Kingdom)Experience: 0 -10 yearsEmployment Type: Full-time, PermanentEligibility: Must be eligible to work full-time in UK.Key ResponsibilitiesMaintain and improve availability, performance, and reliability of production and...


  • London, Greater London, United Kingdom Ditto Full time £60,000 - £120,000 per year

    About Ditto:Ditto is redefining how data moves at the edge. Our mission is to make it seamless for developers to build resilient, real-time applications, regardless of network conditions. Whether you're in a stadium, airplane, or remote military base, Ditto's peer-to-peer sync engine ensures devices stay connected and data stays consistent, even without...

  • Site Reliability

    2 weeks ago


    London, Greater London, United Kingdom Arrows Full time

    Site Reliability Engineer (Contract)- Up to £650 per day (Inside IR35)2 days per week onsite in OsterleyI'm working with a leading media and technology client that's building next-generation digital platforms used by millions across the UK. They're looking for an experienced Site Reliability Engineer to join their growing team and help drive automation,...


  • London, Greater London, United Kingdom -ea3a-4317-8f52-46b52766e55f Full time

    Join us in redefining the creator economy with AIFanvue is the fastest-growing creator monetisation platform in the creator economy. We are the leading AI-powered creator-first platform, designed to empower creators worldwide to directly monetise their audience. We're on a mission to redefine the creator economy by empowering creators to connect, share, and...


  • London, Greater London, United Kingdom Lloyds Banking Group Full time

    JOB TITLE:Senior Site Reliability EngineerSALARY:£81,999 - £91,110 per annumLOCATION:LondonHOURS:Full-timeWORKING PATTERN:Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at one of our office sitesJoin our Payments Lab team as a Senior Site Reliability Engineer and help shape the future of retail banking as...


  • London, Greater London, United Kingdom Leap29 Full time £100,000 - £120,000 per year

    Job DescriptionSenior Site Reliability Engineer (SRE) – Contract – UK-BasedAre you a proven Site Reliability Engineer with a passion for driving operational excellence and helping teams mature their SRE practices? We are working with a highly regarded cloud consultancy on a short-term assignment supporting a major digital services project.This is a...


  • London, Greater London, United Kingdom JPMorgan Chase Full time £80,000 - £150,000 per year

    Assume a critical role in defining the future of a globally recognized firm and have a direct and significant effect in a realm tailored for top achievers in site reliability. As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms team, you hold a leadership role in your team, demonstrate strong knowledge across multiple...