AWS Site Reliability Engineer
4 weeks ago
Site Reliability Engineer (SRE) - LLM and Machine Learning
London/Remote
Roles we're searching for now: – Software Engineering /
We are a pioneering technology company specialising in cutting-edge Language Models (LLM) and Machine Learning solutions. We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team and ensure the reliability, scalability, and performance of our LLM and Machine Learning infrastructure.
As an SRE, you will play a critical role in maintaining the stability and efficiency of our LLM and Machine Learning platforms. Infrastructure Design and Automation: Collaborate with engineering and research teams to design, implement, and automate infrastructure for LLM and Machine Learning workloads, ensuring scalability and reliability.
Deployment and Configuration: Manage deployment pipelines, configuration management, and orchestration tools to streamline the deployment of models and services.
Monitoring and Alerting: Implement and maintain robust monitoring, alerting, and logging systems to proactively identify and resolve issues. Ensure optimal system performance.
Capacity Planning: Perform capacity planning and scaling to accommodate growing workloads and ensure resource efficiency.
Security and Compliance: Collaborate with security teams to implement security best practices, vulnerability assessments, and compliance requirements for LLM and Machine Learning systems.
Continuously evaluate and improve system reliability, performance, and efficiency through automation and optimisation.
Maintain comprehensive documentation for infrastructure configurations, procedures, and incident reports.
Bachelor's or Master's degree in Computer Science, Information Technology, or a related field.
Proven experience as a Site Reliability Engineer or a related role with a focus on LLM and Machine Learning infrastructure.
AWS, Azure, GCP) and containerization technologies (e.g., Experience with configuration management tools (e.g., Knowledge of monitoring and observability tools (e.g., Python, Bash).
-
AWS Site Reliability Engineer
4 weeks ago
London, United Kingdom Techruiter Full timeSite Reliability Engineer (SRE) - LLM and Machine Learning London/Remote Roles we're searching for now: – Software Engineering / We are a pioneering technology company specialising in cutting-edge Language Models (LLM) and Machine Learning solutions. We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team and ensure the...
-
Site Reliability Engineer | AWS | Contract
2 months ago
London, United Kingdom Salt Full timeSite Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience Monitoring Python,...
-
Site Reliability Engineer | AWS | Contract
2 months ago
London, United Kingdom Salt Full timeSite Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience Monitoring Python,...
-
Site Reliability Engineer | AWS | Contract
2 weeks ago
London, United Kingdom Salt Full timeSite Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience Monitoring Python,...
-
Site Reliability Engineer | AWS | Contract
1 week ago
London, United Kingdom Salt Full timeSite Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience Monitoring Python,...
-
Site Reliability Engineer | AWS | Contract
4 weeks ago
London, United Kingdom Salt Full timeJob Description Site Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience ...
-
Site Reliability Engineer | AWS | Contract
1 week ago
London, United Kingdom Salt Full timeJob Description Site Reliability Engineer – Hybrid – London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) AWS experience ...
-
Site Reliability Engineer | AWS | Contract
2 weeks ago
City of London, Greater London, United Kingdom Salt Full timeSite Reliability Engineer - Hybrid - London Day rate: £500 - £700 (inside IR35) Start: ASAP My new client is looking for a Site Reliability Engineer to join the team on a contract basis. This is a hybrid role so 2 days per week in the London office. Over 4 years solid SRE experience (No DevOps engineers) • AWS experience • Monitoring • Python,...
-
Site Reliability Engineer
2 weeks ago
London, United Kingdom Reed Full time**SRE | SITE RELIABILITY ENGINEER | DEVOPS | AWS | AMAZON WEB SERVCIES | CLOUDFORMATION | KINESIS | CODEPIPELINE | FARGATE | BATCH | PYTHON | GOLANG | DJANGO | REACT | UK | FULLY REMOTE** **Site Reliability Engineer - £80k** A renowned SEO business is looking for a Senior Site Reliability Engineer to build and improve a rapidly evolving infrastructure...
-
London, United Kingdom Marsh McLennan Companies Full timeDescription: Mercer IT Systems Engineering seek candidates for an hands-on, experienced, Site Reliability Engineering Manager for AWS Cloud , based in our London office: We have ambitious and exciting plans to expand further into AWS, Here, you will have the opportunity to share your depth of technical AWS expertise with our dedicated high...
-
Site Reliability Engineer
2 months ago
London, United Kingdom Redefined Ltd Full timeTesla is seeking a Site Reliability Engineer to build, improve, and scale the infrastructure that powers our Energy IoT applications. These applications provide real-time monitoring, optimization, control for our flagship Tesla Energy products including Powerwall, Megapack, Solar Roof, Supercharger, Autobidder and Virtual Power Plants. You must enjoy...
-
Site Reliability Engineer
4 weeks ago
London, United Kingdom Tesla Full timeTesla is seeking a Site Reliability Engineer to build, improve, and scale the infrastructure that powers our Energy IoT applications. These applications provide real-time monitoring, optimization, control for our flagship Tesla Energy products including Powerwall, Megapack, Solar Roof, Supercharger, Autobidder and Virtual Power Plants. You must enjoy...
-
Site Reliability Engineering Manager, AWS Cloud
3 weeks ago
London, United Kingdom Marsh McLennan Companies Full timeDescription: Mercer IT Systems Engineering is seeking candidates for an experienced, Site Reliability Engineering Manager for AWS Cloud , based in our London office: We have ambitious and exciting plans to expand further into AWS, Here, you will have the opportunity to share your depth of technical AWS expertise with our great global SRE Cloud...
-
Site Reliability Engineer Sre
1 month ago
London, United Kingdom Henderson Scott Full time**Site Reliability Engineer - AWS - London/Hybrid - £ Negotiable** One of my enterprise consultancy clients is looking for an SRE who has great skills around AWS, monitoring tools and operational expertise. **The Role**: - Triage production support issues. You will effectively monitor a wide range of systems, triage & trouble-shoot bugs - Gain valuable...
-
Site Reliability Engineer Sre Gcp or Aws
3 weeks ago
London, United Kingdom Prism Digital Full time**Site Reliability Engineer (SRE) | GCP or AWS & Kubernetes | SaaS HealthTech** **100% Remote** After successfully placing several Engineers into the cloud team here, I am now on the lookout for another SRE to join the growing cloud team. If you are passionate about Site Reliability and you are ready for your next challenge, this 'Greenfield' projectand...
-
Lead Site Reliability Engineer
2 weeks ago
London, United Kingdom LinuxRecruit Full timeFancy working with Python in the Amazon? I'm talking DevOps not the South American jungle…. One of London's top ranked start-ups are looking to expand their existing platform team in Lead capacity, mixing a great blend of hands-on work with mentoring and management. Technically this is a DevOps role with an emphasis on the DEV, primary in python,...
-
Lead Site Reliability Engineer
2 weeks ago
London, United Kingdom LinuxRecruit Full timeFancy working with Python in the Amazon? I'm talking DevOps not the South American jungle…. One of London's top ranked start-ups are looking to expand their existing platform team in Lead capacity, mixing a great blend of hands-on work with mentoring and management. Technically this is a DevOps role with an emphasis on the DEV, primary in python,...
-
Site Reliability Engineer
3 weeks ago
London, United Kingdom Prism Digital Full time**Senior Site Reliability Engineer (SRE) | GCP/AWS | Market Intelligence Leaders** We have an exciting opportunity for a Senior Site Reliability Engineer (SRE) to join a global organisation involved in the market intelligence space. Our client's AI-powered platform provides businesses with world-class and real-time consumer analytics. They are looking for...
-
SRE Site Reliability Engineer DevOps AWS
4 weeks ago
London, United Kingdom Cameron Connect Ltd Full timeJoin Our Clients Dynamic Mortgages Team at the Heart of Technological Innovation! Are you an experienced Java or C# engineer with a passion for building and maintaining reliable, high-performing systems? Do you thrive in roles where you can make a significant impact on the availability, performance, and efficiency of critical services? These opportunities...
-
SRE Site Reliability Engineer DevOps AWS
4 weeks ago
London, United Kingdom Cameron Connect Ltd Full timeJoin Our Clients Dynamic Mortgages Team at the Heart of Technological Innovation! Are you an experienced Java or C# engineer with a passion for building and maintaining reliable, high-performing systems? Do you thrive in roles where you can make a significant impact on the availability, performance, and efficiency of critical services? These opportunities...