Site Reliability Engineer
3 days ago
At Tote, we're on a mission to deliver a seamless and reliable digital experience for racing fans across the UK and beyond. As a
Site Reliability Engineer (SRE)
, you'll play a critical role in keeping our online platforms and infrastructure fast, stable, and scalable — especially during the most exciting moments in the racing calendar. This is an opportunity to shape how we build, monitor, and continuously improve our systems while working in a collaborative, forward-thinking engineering culture.
What You'll Be Doing
In this role, you'll be at the forefront of ensuring the reliability, performance, and scalability of Tote's digital ecosystem. You'll monitor live production systems, using observability tools to detect potential issues before they impact users, and take proactive steps to optimise system performance and stability. You'll analyse telemetry data, identify bottlenecks, and drive improvements across our infrastructure and applications.
You'll lead the development of our SRE strategy, defining standards, best practices, and ways of working that embed reliability into everything we build. Working closely with engineering, operations, and product teams, you'll help shape our SLAs, SLOs, and error budgets to align with business priorities. Performance and resilience will be at the heart of what you do. You'll design and implement performance testing strategies to simulate peak traffic and ensure our systems remain stable during major racing events. You'll build intuitive dashboards, refine alerting systems, and create tools that provide clear visibility into system health, enabling the wider business to make confident, data-driven decisions.
Collaboration is key in this role. You'll work alongside software engineers to design scalable solutions, with compliance teams to meet internal and regulatory standards, and with operations to ensure smooth deployment and monitoring. You'll also play a crucial role in incident management, from leading real-time response efforts to conducting thorough post-incident reviews that identify root causes and long-term improvements.
What We Are Looking For
We're looking for an engineer who thrives on solving complex challenges and improving how systems perform at scale. You'll have a deep understanding of system reliability, performance optimisation, and cloud-native architectures. You'll bring strong hands-on experience with modern observability tools such as Grafana, Prometheus, and OpenTelemetry, as well as a solid grasp of distributed systems and networking fundamentals.
You'll be confident working with infrastructure-as-code tools (like Terraform) and container orchestration platforms such as Kubernetes. Experience in cloud environments, ideally AWS, will be highly beneficial. You'll also be comfortable coding in at least one modern programming language such as Go or .NET, using your technical expertise to automate processes, build internal tools, and debug complex issues.
Beyond your technical skills, you'll bring a calm, analytical mindset to high-pressure situations. You'll be the kind of person who thrives during live incidents — focused, clear-headed, and methodical — ensuring our teams can respond quickly and effectively. You'll also be an advocate for modern engineering practices, championing DevOps culture, CI/CD pipelines, and automation wherever possible.
Finally, communication is key. You'll work closely with engineers, operations specialists, and product managers, translating technical insights into clear, actionable information for all stakeholders. You'll value transparency, collaboration, and the shared goal of keeping our systems reliable and our customers delighted.
What's in it for you?
At the Tote you can expect a friendly working environment with a strong sense of teamwork and pride in what we do. Within this role you'll develop a broad range of skills and experiences that can enhance your career at the Tote. Additionally, our company benefits package includes;
- Competitive Basic Salary
- Discretionary Bonus Scheme
- Company Shares Option Plan
- Contributory pension scheme
- Life insurance (4 x basic salary)
- Simply Health Cash Plan
- Holiday entitlement (33 days inclusive of bank holidays)
- Study Support and opportunity for progression and development
- Confidential 24/7 365 employee assistance helpline
- Agile and collaborative office environment with free parking, fruit, biscuits, and drinks
Regular social events, charity events and volunteering opportunities
-
Site Reliability Engineer
2 weeks ago
Wigan, Greater Manchester, United Kingdom Searchability® Full timeSITE RELIABILITY ENGINEER £70,000 p/a Join a growing, technology-driven business operating at scale within the online gaming and sports sector. Opportunity to shape the SRE strategy. ABOUT THE CLIENT Our client is a fast-growing digital technology company at the forefront of delivering high-availability platforms for the sports and gaming industry. They...
-
Site Reliability Engineer
1 day ago
Greater Manchester, United Kingdom CustomAir Full timeOverview CustomAir Greater Manchester, England, United Kingdom At Tote, were on a mission to deliver a seamless and reliable digital experience for racing fans across the UK and beyond. As a Site Reliability Engineer (SRE), you’ll play a critical role in keeping our online platforms and infrastructure fast, stable, and scalable especially during the most...
-
Site Reliability Engineer
4 weeks ago
Manchester, United Kingdom Searchability Full timeSITE RELIABILITY ENGINEER £40k salary Join a growing, technology-driven business operating at scale within the online gaming and sports sector. Opportunity to shape the SRE strategy. ABOUT THE CLIENT Our client is a fast-growing digital technology company at the forefront of delivering high-availability platforms for the sports and gaming industry. They...
-
Site Reliability Engineer
2 weeks ago
Greater Manchester, United Kingdom UK Tote Group Full timeAt Tote, we’re on a mission to deliver a seamless and reliable digital experience for racing fans across the UK and beyond. As a Site Reliability Engineer (SRE), you’ll play a critical role in keeping our online platforms and infrastructure fast, stable, and scalable — especially during the most exciting moments in the racing calendar. This is an...
-
Site Reliability Engineer
2 weeks ago
Greater Manchester, United Kingdom UK Tote Group Full timeAt Tote, we’re on a mission to deliver a seamless and reliable digital experience for racing fans across the UK and beyond. As a Site Reliability Engineer (SRE), you’ll play a critical role in keeping our online platforms and infrastructure fast, stable, and scalable — especially during the most exciting moments in the racing calendar. This is an...
-
Site Reliability Engineer
4 weeks ago
Bolton, Greater Manchester, United Kingdom Caspian One Full timeWe're building a Centralised SRE team to champion reliability engineering across global technology infrastructure. As a Senior Site Reliability Engineer, you'll be at the forefront of this transformation engineering scalable systems, automating operations, and embedding resilience into every layer of the tech stack. This isn't just about keeping the lights...
-
Site Reliability Engineer
2 weeks ago
Manchester, United Kingdom hackajob Full timeJoin to apply for the Site Reliability Engineer role at hackajob Company Description At bet365, we're one of the world's leading online gambling companies, revolutionising the industry since 2000. Founded by Denise Coates CBE, we now employ over 9,000 people and serve over 100 million customers in 27 languages. Our focus on In-Play betting has solidified our...
-
Site Reliability Engineer
5 days ago
Manchester, United Kingdom Lorien Full time**Site Reliability Engineer 12-month contract **We are looking for an experienced Site Reliability Engineer to come in on a 12-month contract with one of our Public Sector clients. **We're interested in people who**: - can demonstrate a working familiarity with at least one programming language such as Ruby, Javascript, Go - are experienced with AWS and...
-
Site Reliability Engineer
4 weeks ago
Manchester, United Kingdom Anson McCade Full timeJob DescriptionAbout the RoleAre you passionate about building resilient systems and eliminating operational toil through automation? We’re looking for a Site Reliability Engineer (SRE) to join our high-impact team and help shape the future of our digital infrastructure.As an SRE, you’ll blend software engineering with systems engineering to ensure the...
-
Site Reliability Engineer
2 weeks ago
Manchester, United Kingdom hackajob Full timehackajob is collaborating with Bet365 to connect them with exceptional tech professionals for this role.Company DescriptionAt bet365, we're one of the world's leading online gambling companies, revolutionising the industry since 2000. Founded by Denise Coates CBE, we now employ over 9,000 people and serve over 100 million customers in 27 languages. Our focus...