![WALT Labs](https://media.trabajo.org/img/noimg.jpg)
Site Reliability Engineer
2 weeks ago
At WALT Labs, we are committed to empowering businesses to leverage the transformative power of cloud technology, facilitating innovation and operational efficiency. Specializing in managed services across Google Cloud Platform (GCP) and Amazon Web Services (AWS), we seek a dedicated Site Reliability Engineer (SRE) who is passionate about technology, excels in problem-solving, and is dedicated to providing unparalleled customer service. You will become the SME to the scale, resiliency and uptime of our own and the customer environments we support.
Role SummaryAs a critical member of our team, the SRE will provide technical support and expertise to our managed services clients. This role involves diagnosing and resolving complex issues across diverse cloud environments and technologies, ensuring high performance and reliability. The ideal candidate is a tech enthusiast, eager to expand their knowledge and skills daily, committed to problem-solving and delivering customer-focused solutions within defined Service Level Agreement (SLA) guidelines.
Key Responsibilities:- Ensure high availability and reliability of software systems and infrastructure. Building out SLOs & SLAs and constantly improving reliability of systems.
- Design, implement, and maintain monitoring and alerting systems to detect and address issues proactively, using mainly Datadog, GCP Cloud Monitoring and Pagerduty/Incident.io.
- Debug and troubleshoot production issues across various customer environments, technology stacks, and cloud providers, primarily focusing on GCP and AWS.
- Participate in an on-call rotation to respond to and resolve production incidents and conduct RCAs/Post Mortems to identify and address issues.
- Develop and maintain runbooks and playbooks for incident response and troubleshooting.
- Proactively optimize systems and application environments to identify bottlenecks and areas of improvements.
- Conduct load testing and capacity planning to ensure systems can handle expected traffic and growth.
- Develop and maintain IaC (Terraform) and Configuration Management (Ansible, Helm as examples)
- Work closely with development teams to understand system architecture, identify potential reliability risks, and implement solutions.
- Collaborate with operations teams to ensure smooth deployment and operation of software systems.
- Master a broad range of technologies, including but not limited to VMs, container orchestration, networking, security, databases, data warehouses, serverless technologies, and storage solutions.
- Proficiently deploy applications into Kubernetes using Helm, and manage Kubernetes administration and troubleshooting.
- Provide direct support to clients during production outages, offering expert assistance to swiftly rectify issues, adhering to SLA expectations.
- Diligently document solutions and processes, constantly seeking to improve knowledge, skills, and operational efficiency.
Requirements
- 3+ years experience in an SRE role
- From your core you understand how important SLOs, SLIs and KPIs are to the systems you support, using observability to be your grounding point on a daily basis.
- Extensive knowledge of all major services in GCP (Cloud Run, BigQuery, GKE etc)
- In-depth knowledge of all major services in AWS
- Experience in setting up and managing monitoring solutions like Datadog, Google Cloud Operations Suite, Cloudwatch, Nagios, and Zabbix.
- Familiarity with various CI/CD systems (Jenkins, Codefresh, GitLab CI, GitHub Actions, Argo CD).
- Exceptional problem-solving capabilities, the ability to work under pressure, and strong critical thinking skills.
- Be the voice and commander of incidents managed internally and externally to customers
- A passion for technology and an unquenchable thirst for learning new skills.
- A customer-focused mindset, dedicated to delivering the highest level of service.
Benefits
- We cover 100% of your base medical plan
- Dental, vision, disability, and life insurance available
- Generous PTO policy that increases with longevity
- 401k
- Professional development and advancement opportunities
- Bonus incentives
-
Site Reliability Engineer GCP/AWS
3 hours ago
Letchworth Garden City, United Kingdom Circle Recruitment Full timeSite Reliability Engineer - Letchworth (Hybrid) DevOps Engineer - Site Reliability Engineer - Terraform - Kubernetes - GCP - Azure - Cloud Engineering - AWS - CI/CD - Grafana - Ansible - Configuration Management - IT Support - Incident Management - Troubleshooting Are you a tech-savvy professional with a passion for cloud infrastructure and reliability? Do...
-
Site Engineer
7 days ago
Letchworth Garden City, United Kingdom RTL Group Ltd Full timeMy client is a leading sub-contractor who cover UK Wide. They are looking to on-board a site engineer to manage the engineering of a new contract that they have won in Hertfordshire. The scope of works you will be required to manage includes setting out of RC Frames. Site engineer responsibilities: * Site set up and setting out. * As-built surveys. ...
-
Site Engineer
7 days ago
Letchworth Garden City, United Kingdom RTL Group Ltd Full timeMy client is a leading sub-contractor who cover UK Wide. They are looking to on-board a site engineer to manage the engineering of a new contract that they have won in Hertfordshire. The scope of works you will be required to manage includes setting out of RC Frames.Site engineer responsibilities: * Site set up and setting out. * As-built surveys. * QA. *...
-
Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Square One Resources Full timeSite Reliability Engineer | Remote | Application Development City of London Posted 4 days ago Work Type Contract Remote Work Yes IR35 Status Inside IR35 Job Title: Infrastructure Site Reliability Engineer (infra SRE) Location: Fully remote Salary/Rate: up to £710 inside IR35/ Day Start Date: 06/06/2024 Job Type: 6 Month Initial Contract (2-3...
-
Site Reliability Engineering
2 weeks ago
City of London, Greater London, United Kingdom Mondrian Alpha Full timeA world leading multi strat, systematic fund are seeking an automation heavy (python / powershell) infrastructure site reliability engineer who primarily has experience in windows environments and a specialism in storage.Read on to fully understand what this job requires in terms of skills and experience If you are a good match, make an 'd be joining an SRE...
-
Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Square One Resources Full timeJob Title: Infrastructure Site Reliability Engineer (infra SRE) Location: Fully remote Salary/Rate: up to £710 inside IR35/ Day Start Date: 06/06/2024 Job Type: 6 Month Initial Contract (2-3 year program) The Site reliability engineers (SREs) combine engineering experience and an innate drive to improve existing systems and processes, with the...
-
Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Square One Resources Full timeFully remote Job Type: 6 Month Initial Contract (2-3 year program) The Site reliability engineers (SREs) combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Restful services - RPC services)in one or more programming languages such as Go, Java, C...
-
Site Manager
3 weeks ago
Letchworth Garden City, United Kingdom Bennett & Game Recruitment Full time**Job Profile for Site Manager - SW156329** Our client, a Regional House Builder, based in Letchworth are seeking a Site Manager to join them on a full-time, permanent basis. The initial site is in Welwyn Garden City with further sites across Hertfordshire. The Site Manager, will be responsible forall day to day site activities reporting into the Contracts...
-
SRE / Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Bayside Solutions Full time £91,400 - £108,000ContractLondon, England - Hybrid RoleWe seek a Site Reliability Engineer to join our team and play a crucial role in ensuring our applications and services' reliability, availability, and performance. This role requires a strong background in application support, monitoring, and cloud technologies, focusing on AWS, Azure, and Kubernetes. Java troubleshooting...
-
Site Reliability Engineer
4 weeks ago
City of London, United Kingdom Investigo Full timeSRE Contract, 6 Months, 3 days per week on site We are seeking a skilled Site Reliability Engineer (SRE) for a six-month contract for one of our consultancy clients. The role involves joining a project centred on developing applications to provide a cloud-based platform for client users. The ideal candidate should have a robust background in Agile,...
-
Reliability Engineer
4 weeks ago
Bristol (City Centre), United Kingdom MBDA Full timeBristol MBDA is a leading defence organisation.We are proud of the role we play in supporting the Armed Forces who protect our nations. We partner with governments to work together towards a common goal, defending our freedom.Salary: Up to£60,000depending on experienceWhat we can offer you:Company bonus of up to £2,500 (based on company performance and...
-
Site Reliability Engineer
2 days ago
City of London, South East, United Kingdom Oliver Bernard Ltd Full timeSite Reliability Engineer - Puppet SpecialistThe experience expected from applicants, as well as additional skills and qualifications needed for this job are listed below.A media client of ours is currently seeking a Site Reliability Engineer with expert Puppet experience to join their already well established team. The current team consists of around 10...
-
Database Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Mondrian Alpha Full timeMy client, a leading high frequency trading firm, is seeking a database site reliability engineer to join their office in London.Apply fast, check the full description by scrolling below to find out the full requirements for this role.This is an opportunity to be the first individual to join a newly created team, and have the responsibility of setting the...
-
Compliance / Reliability Lead
1 week ago
Welwyn Garden City, United Kingdom Premier Group Recruitment Full time**JOB- Reliability / Compliance Lead** **LOCATION- **Welwyn Garden City** **TERM- Permanent** **SALARY- £50,000 - £**63,000 per annum (dependant on experience)** We are looking for Compliance Engineer, Reliability Engineer or similar Engineer, Lead or Manager on a permanent basis in the Welwyn Garden City area with experience in the manufacturing...
-
Site Reliability Engineer
4 weeks ago
City of London, Greater London, United Kingdom Investigo Full timeSRE Contract, 6 Months, 3 days per week on site We are seeking a skilled Site Reliability Engineer (SRE) for a six-month contract for one of our consultancy clients. The role involves joining a project centred on developing applications to provide a cloud-based platform for client users. The ideal candidate should have a robust background in Agile,...
-
Site Reliability Engineer
4 weeks ago
City of London, Greater London, United Kingdom Investigo Full timeSRE Contract, 6 Months, 3 days per week on site We are seeking a skilled Site Reliability Engineer (SRE) for a six-month contract for one of our consultancy clients. The role involves joining a project centred on developing applications to provide a cloud-based platform for client users. The ideal candidate should have a robust background in Agile,...
-
Compliance / Reliability Lead
2 weeks ago
Welwyn Garden City, Hertfordshire, United Kingdom Premier Group Recruitment Full timeJOB- Reliability / Compliance LeadLOCATION- Welwyn Garden City**TERM- PermanentSALARY- £50,000 - £63,000 per annum (dependant on experience)**We are looking for Compliance Engineer, Reliability Engineer or similar Engineer, Lead or Manager on a permanent basis in the Welwyn Garden City area with experience in the manufacturing industry. Your main duty will...
-
Compliance / Reliability Lead
3 weeks ago
Welwyn Garden City, United Kingdom Premier Engineering Full timeJOB- Reliability / Compliance LeadLOCATION- Welwyn Garden CityTERM- PermanentSALARY- £48,000 - £63,000 per annum (dependant on experience)We are looking for Compliance Engineer, Reliability Engineer or similar Engineer, Lead or Manager on a permanent basis in the Welwyn Garden City area with experience in the manufacturing industry. Your main duty will be...
-
Site Reliability Engineer
2 weeks ago
City of London, Greater London, United Kingdom Kioni Talent Full time £90,000 - £140,00060 second overview Company | Global FinTech Areas | SRE, DevOps, Software Engineering, Software Deployment Skills | Python, Java, Kubernetes, Terraform, SQL Based | London with option to work remotely 1 day per week E + bonus + benefits Kioni are partnering with a global FinTech who have established themselves as a household name within the capital...
-
Site engineer
3 weeks ago
Welwyn Garden City, United Kingdom RTL Group Ltd Full timeMy client is a leading sub contractor who cover UK-Wide. They are looking to on-board a site engineer to manage the engineering of a number of new contracts that they have won. The scope of works you will be required to manage includes setting out of Groundworks, Drainage & RC Frame. Site engineer responsibilities: * Site set up and setting out. *...