Site Reliability Engineer, Lead
3 weeks ago
Our client is a remote-first company with team members across the globe Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the tools an organization needs to create, manage, track, and improve highly personalized learning experiences for customers, partners, and employees.
Successful Candidate:
- SaaS experience
- Experienced and able to thrive in a small-medium high-growth environment
- Invested in upskilling, learning new tech
- Deeply curious, creative, and innovative
- Flexible in working hours/ability to collaborate in different time zones
The Lead Site Reliability Engineer has a pivotal role at the forefront of our engineering operations, responsible for guiding the Platform Team toward achieving exceptional standards of reliability, performance, and stability across all our applications. The successful candidate will possess deep expertise in these core areas and will be instrumental in defining and implementing industry-leading practices. As a key leader, this role will not only shape the strategic direction of our platform operations but also establish the benchmarks and processes by which our engineering excellence is measured.
Responsibilities
- Lead the SRE Team, setting clear goals and priorities in line with business objectives. In collaboration with the department Director develop and execute strategies that enhance technological capabilities across the company
- Ensure all platforms and systems operate smoothly and remain highly available, scalable, and fault-tolerant. Implement best practices for continuous monitoring, preventive maintenance, and rapid response.
- Continuously assess system performance, identify bottlenecks, and make data-driven recommendations for infrastructure enhancements.
- Ensure that developers have access to the best tools and platforms to facilitate efficient coding practices and understand the performance of applications..
- Educate the rest of engineering about best practices for writing performant code and troubleshoot problematic areas
- Develop and refine incident management protocols. Lead efforts to troubleshoot and resolve high-impact issues, minimizing downtime and preventing future occurrences.
- Work closely with other engineering teams and departments to understand their needs and ensure platform initiatives support overall company goals.
- Monitor virtual infrastructure and be part of a 24x7 on-call rotation to respond to alerts
Requirements
- 8+ years of experience as a software engineer
- 5+ years of experience working with Ruby on Rails
- Proven experience leading SR teams
- 3+ years of experience working in infrastructure and operations
- Expertise with SQL databases such as PostgreSQL
- Experience with Cloud computing Amazon Web Services and/or Google Cloud
- Ability to dig into unfamiliar code bases
- Ability to document solutions and train operational teams on supportability
- A sense of comfort working in a team-oriented and collaborative environment
- Can communicate clearly and seek help and support proactively
- Takes ownership of tasks and leads them to completion
Desired
- Experience in developing solutions using server automation tools such as Ansible.
- Experience writing and maintaining CI/CD pipelines and services.
Education
- Bachelor’s degree in Computer Science or related technical field
-
Site Reliability Engineering
3 weeks ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. Deeply curious, creative, and...
-
Site Reliability Engineering
3 weeks ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. Deeply curious, creative, and...
-
Lead Site Reliability Engineer
1 month ago
United Kingdom Candour Solutions Full timeLead Site Reliability Engineer – Leeds (hybrid / remote) #TeamCandour have partnered with a true global player and genuine household name who are looking to build out their Leeds office with the addition of an accomplished Lead Site Reliability Engineer. This is an opportunity to collaborate with and lead a team of engineers around the world working on...
-
Lead Site Reliability Engineer
1 month ago
United Kingdom Candour Solutions Full timeLead Site Reliability Engineer – Leeds (hybrid / remote) #TeamCandour have partnered with a true global player and genuine household name who are looking to build out their Leeds office with the addition of an accomplished Lead Site Reliability Engineer. This is an opportunity to collaborate with and lead a team of engineers around the world working on...
-
Site Reliability Engineer, Lead
3 weeks ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...
-
Site Reliability Engineer, Lead
3 weeks ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...
-
Site Reliability Engineer, Lead
2 days ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...
-
Site Reliability Engineer, Lead
3 weeks ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...
-
Site Reliability Engineer, Lead
23 hours ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all...
-
Site Reliability Engineer, Lead
1 day ago
United Kingdom TekStream Solutions Full timeOur client is a remote-first company with team members across the globe! Offering a SaaS-based Learning Management System powering the world's leading education programs. Our client helps large brands and fast-moving companies increase revenue, improve customer retention, and decrease support costs through external education. The platform includes all the...
-
Senior Lead Site Reliability Engineer
2 months ago
United Kingdom JPMorgan Chase & Co. Full timeOut of the successful launch of Chase in 2021, we’re a new team, with a new mission. We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. Our team is key to our success. We’re people-first. We value collaboration, curiosity and...
-
Site Reliability Engineer Lead
1 month ago
United Kingdom RedRock Consulting Full timeSite Reliability Engineer (Linux/K8S/AWS) - Leading SaaS / ERP provider! Excellent opportunity to join a leading SaaS provider, that are expanding operations due to growth and forecasted digital change. This is an exciting time to join the organisation as they embark on a range of enterprise technology projects. Flexible dependent on experience ...
-
Site Reliability Engineer Lead
1 month ago
United Kingdom RedRock Consulting Full timeSite Reliability Engineer (Linux/K8S/AWS) - Leading SaaS / ERP provider! Excellent opportunity to join a leading SaaS provider, that are expanding operations due to growth and forecasted digital change. This is an exciting time to join the organisation as they embark on a range of enterprise technology projects. Flexible dependent on experience ...
-
Site Reliability Engineer
2 weeks ago
United Kingdom Understanding Recruitment Group Full time1 day ago Be among the first 25 applicants Direct message the job poster from Understanding Recruitment Lead Cloud Native/CTO Consultant and Host of The CTO Club Podcast Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come...
-
Site Reliability Engineering Manager
21 hours ago
United Kingdom Xcede Full timeSite Reliability Engineering Manager is required by a global financial technology organisation. In this newly created role, the Site Reliability Engineering Manager will be responsible for deploying and managing a suite of enterprise-wide tools used for provisioning, automation, and monitoring as well as technical team leadership. Site Reliability...
-
Site Reliability Engineering Manager
10 hours ago
United Kingdom Xcede Full timeSite Reliability Engineering Manager is required by a global financial technology organisation. In this newly created role, the Site Reliability Engineering Manager will be responsible for deploying and managing a suite of enterprise-wide tools used for provisioning, automation, and monitoring as well as technical team leadership. Site Reliability...
-
Site Reliability Engineering Manager
11 hours ago
United Kingdom Xcede Full timeSite Reliability Engineering Manager is required by a global financial technology organisation. In this newly created role, the Site Reliability Engineering Manager will be responsible for deploying and managing a suite of enterprise-wide tools used for provisioning, automation, and monitoring as well as technical team leadership. Site Reliability...
-
Site Reliability Engineering Manager
19 hours ago
United Kingdom Xcede Full timeSite Reliability Engineering Manager is required by a global financial technology organisation. In this newly created role, the Site Reliability Engineering Manager will be responsible for deploying and managing a suite of enterprise-wide tools used for provisioning, automation, and monitoring as well as technical team leadership. Site Reliability...
-
Senior Site Reliability Engineer
18 hours ago
United Kingdom Formula Recruitment Full timeSenior Site Reliability Engineer Salary Up to £120,000Fully RemotePermanent, Full TimeWe are partnered with a leading Web3 and Blockchain start-up company who aim to disrupt the the crypto eco-system and move away from a chain centric worldview and move towards an account centric worldview.They are currently looking for Senior Site Reliability Engineer to...
-
Senior Site Reliability Engineer
21 hours ago
United Kingdom Formula Recruitment Full time €120,000Senior Site Reliability Engineer Salary Up to £120,000 Fully Remote Permanent, Full Time We are partnered with a leading Web3 and Blockchain start-up company who aim to disrupt the the crypto eco-system and move away from a chain centric worldview and move towards an account centric worldview. They are currently looking for Senior Site Reliability...