Principal Sre

3 months ago


Remote, United Kingdom Tyk Technologies Full time

Who are Tyk, and what do we do?
The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number of their systems and services. Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or media industries (to name just a few)

If you’ve banked online, used an app to check the news, or perhaps even driven a connected car, API’s, and by extension, Tyk, make that possible. Founded in 2015 with offices in London - UK, London - Ontario, Atlanta and Singapore, we have many thousands of users of our B2B platform across the globe. Brands using Tyk range from Lotte, Bell, T Mobile, to RBS, Capital One and Vinci. We have a varied user base hailing from every continent - even Antarctica.

Our Mission

Tyk is on a mission to connect every system in the world. We’ve started by building an API Management platform.

Total flexibility, default remote, radical responsibility

We offer unlimited paid holidays and remote working from anywhere in the world, for everyone, Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees, we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.

If this sounds like an environment that you believe could work for you then read on to find out more.

The role:
We’re looking for a Principal SRE to drive the overall SRE short, medium, and long-term strategy, working with related functions and people to drive plans and roadmaps which allow the SRE function to grow with Tyk.

Here’s what you’ll be getting up to:

- Create operational growth and scaling plans with the Head of Delivery to ensure we are proactively planning how we scale, and what needs to change in the operating model.
- Drive the operational direction of the SRE team, working with Scrum teams to organise and allocate work, and ensure its a fair mix of proactive and reactive
- Lead the drive to automation, ensuring we have runbooks, breakpoints, and processes which drive a manual task to an automated one, and I make space in sprints to make that happen.
- Drive SRE performance through clear and strategy aligned performance agreements and development plans. I deal with underperformance in positive and constructive ways.
- Build a security first mindset into SRE, and I sit on the Security community of practice
- I design, run and optimise the on call process, ensuring SRE teams are protected, not overworked, and the on call process is equitable
- Own the risk profile for Cloud performance and scaling, and horizon scan to ensure I am calling out risks, and mitigating them
- Ensure that our production Cloud environment is running to defined SLA’s through proactive monitoring and I lead the team to achieve that
- Own our proactive alerting and monitoring strategy for Cloud, driving a coherent roadmap and working with other product domains to optimise their capabilities.
- Own the measurement and optimization of system performance.
- Drive strategy and execution of our automation agenda with clear strategies, runbooks, and breakpoints for when we automate.
- Ensure that recommendations from post mortems are assessed, planned for, and executed
- Accountable for ensuring that we create policies and runbooks to ensure everything we learn and execute is documented and repeatable, and that our operational processes are documented and followed
- Accountable for our call support proposition which ensures Cloud has a follow the sun model through attending alerts, hitting SLA in terms of response and fix, ticket triage, and automation of root cause
- Own the planning and execution software upgrades relating to keeping Cloud optimised, such as Kubernetes versions, and I drive a proactive dependency roadmap for SRE
- Understand, advocate for, and lead the chosen delivery process (SCRUM) and adhere to the principles of Scrum execution

**Requirements**:

- Deep SDLC knowledge
- Deep operability knowledge
- Good working knowledge of best in class security practices and protocols
- Team leadership and development
- Site reliability engineering knowledge
- Data led strategy derivation and continuous improvement

We all share the same vision - we value authenticity, respect, responsibility, independence, honesty, diversity and inclusion and most importantly treating others how you wish to be treated. We look for like-minded people who bring their personalities to work everyday, strive to achieve their personal goals and who are willing to challenge the way we do things, why? - to make what we do even better

Our values tell the story of Tyk - here’s how:

- It’s ok to screw up

We’ve found that it’s often the ‘stupid’ or unexpected ideas that turn out to be the successful ones - so try it, at l