Site Reliability Engineer

4 weeks ago


London, Greater London, United Kingdom Cisco Full time

Job Overview

The Cisco Site Reliability Engineering team is responsible for providing tools, services, and infrastructure to monitor and observe the ThousandEyes platform. As a Senior Site Reliability Engineer, you will own our logging pipeline and monitoring stack while working with developers to continuously improve our view of the platform.

Key Responsibilities

  • Design and implement visibility into our platform as we grow to multi-region scale.
  • Design, deploy, and maintain cloud native monitoring services in AWS and GCP that are elastic and resilient to failure.
  • Provide standards and best practices for instrumentation of container based services and cloud managed services.
  • Maintain our alerting pipeline so that we are notified of the right things, at the right time, in the right places.
  • Drive automation wherever possible, enabling our monitoring platforms to scale effortlessly.
  • Participate in and contribute to improve our 24x7 incident response and on-call rotation.

Requirements

  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
  • Strong knowledge of modern logging tool sets, including Logstash or Fluentd.
  • Understanding of Prometheus and its ecosystem, including Alertmanager.
  • Good knowledge of Application Performance Monitoring tools and crash reporting tools, such as Sentry.
  • Good knowledge of cloud provider managed services, and how they can be leveraged in our context.
  • Ability to write high quality code in Python, Go, or equivalent languages.

About Cisco

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.

Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.



  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesAs a Site Reliability Engineer at Fourier, you will be responsible for designing and implementing tools to enhance the reliability and resilience of our production systems. This includes investigating failures, improving system performance, and automating manual processes.Required SkillsExcellent Python scripting skillsExperience with...


  • London, Greater London, United Kingdom ESL FACEIT Group Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at ESL FACEIT Group. As a key member of our infrastructure team, you will be responsible for designing, analyzing, and troubleshooting large-scale distributed systems.As a Site Reliability Engineer, you will work closely with our software engineering teams to deploy and...


  • London, Greater London, United Kingdom J Bandy Consulting Full time

    Job SummaryThe Site Reliability Engineer will be responsible for ensuring the reliability, scalability, and performance of our systems. This role requires a strong understanding of SRE best practices, expertise in Git and GitOps, and experience with logging and monitoring solutions.Key ResponsibilitiesDevelop and maintain the Site Reliability Engineering...


  • London, Greater London, United Kingdom JPMorganChase Full time

    About the RoleWe're seeking a skilled Senior Site Reliability Engineer to join our team at JPMorgan Chase. As a key member of our Accelerators Engineering team, you will play a crucial role in ensuring the reliability and scalability of our products.As a Senior Site Reliability Engineer, you will be responsible for creating high-quality designs, roadmaps,...


  • London, Greater London, United Kingdom Experian Full time

    About the RoleWe're seeking a skilled Site Reliability Engineer to join our Experian Data Quality team in London, working on a hybrid schedule.As a key member of our QA team, you'll ensure the reliability, performance, and scalability of our market-leading data management products, focusing on observability to support incident resolution and drive ongoing...


  • London, Greater London, United Kingdom Apple Full time

    Job SummaryAt Apple, we're looking for talented Site Reliability Engineers to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability and scalability of our services. You'll work closely with our development teams to design, build, and operate the systems and infrastructure that power our products and...


  • London, Greater London, United Kingdom Curve Full time

    Job DescriptionAt Curve, we're on a mission to simplify your finances and help you live inspired. We're looking for a talented Site Reliability Engineer to join our team and help us scale our platforms to meet the needs of millions of customers.The ideal candidate will have a strong background in cloud infrastructure, with experience deploying...


  • London, Greater London, United Kingdom STAND 8 Technology Services Full time $75 - $85

    About the RoleWe are seeking an experienced Site Reliability Engineer to support our systems focused on linear channel delivery and modernization efforts. The ideal candidate will be responsible for maintaining existing systems, working on infrastructure modernization, and supporting the streaming engineering team to ensure smooth operation of linear...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesAs a Site Reliability Engineer at Fourier, you will be responsible for developing tools to enhance and monitor production systems, increasing system resilience, investigating failures, and improving reliability.You will also be responsible for automating manual processes and remediating incidents in real-time.RequirementsExcellent Python...


  • London, Greater London, United Kingdom IO Associates Full time

    Job OpportunityIO Associates is seeking a highly skilled Site Reliability Engineer to join their team for a short-term project within the Law Enforcement sector.Monitor system performance and security to ensure optimal functionality.Collaborate with the team to identify and resolve technical issues.This role offers a competitive daily rate of up to £500 per...


  • London, Greater London, United Kingdom J Bandy Consulting Full time

    Job SummaryJ Bandy Consulting is seeking an experienced Site Reliability Engineer to join our team. The ideal candidate will have a strong background in software engineering and a passion for building scalable and reliable systems.Key ResponsibilitiesDevelop and implement automation tools to improve the efficiency of our systemsCollaborate with...


  • London, Greater London, United Kingdom Selby Jennings Full time

    About Selby JenningsWe're a leading global financial services firm where technologists and investment professionals collaborate to drive innovation and operational excellence.About the RoleAs a Site Reliability Engineer, you'll apply your expertise in software and systems engineering to design, build, and maintain our robust infrastructure. You'll reduce...


  • London, Greater London, United Kingdom GoCardless Full time

    The RoleGoCardless is looking for a Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our payment and open banking products.Key ResponsibilitiesDesign and implement scalable and efficient infrastructure solutionsDevelop...


  • London, Greater London, United Kingdom Preqin Full time

    About the Role:Preqin is seeking an experienced Site Reliability Engineer to join our team in London. As a Site Reliability Engineer, you will work across Preqin's full suite of services, supporting our clients around the world.You will be responsible for designing, building, and operating our infrastructure, middleware, and CI/CD systems to ensure our teams...


  • London, Greater London, United Kingdom Highfield Professional Solutions Ltd Full time

    Highfield Professional Solutions Ltd is seeking a Site Reliability Engineer to join our team in Central London. The successful candidate will be responsible for managing and maintaining critical engineering systems within our Data Centre, ensuring that they operate efficiently and effectively. This role offers a competitive salary of up to 48,000 per year,...


  • London, Greater London, United Kingdom Hamilton Barnes Associates Limited Full time

    Revolutionize Automation and ReliabilityYou'll have the opportunity to join a dynamic tech environment as a Site Reliability Engineer (SRE) working on the 'Platform as a Service' toolset. If you're passionate about enterprise Linux systems, networking, and automation, this role is tailor-made for you.Key Responsibilities:Contribute, build and maintain the...


  • London, Greater London, United Kingdom Kinetech Full time

    At Kinetech, we're seeking a talented Site Reliability Engineer to join our team. This role is responsible for ensuring the smooth operation of our software systems, with a focus on scalability, reliability, and performance.Key Responsibilities:Design and implement CI/CD pipelines to automate code integration, testing, and deployment.Automate repetitive...


  • London, Greater London, United Kingdom Mondrian Alpha Recruitment Solutions Full time

    At Mondrian Alpha Recruitment Solutions, we are seeking a highly skilled Site Reliability Engineer to join our team responsible for engineering and supporting the company's critical infrastructure platforms.This team handles the centralized development infrastructure and works alongside engineering teams across the business to ensure the optimal route of...


  • London, Greater London, United Kingdom Fourier Full time

    Key ResponsibilitiesWe are seeking a skilled Site Reliability Engineering Specialist to join our team at Fourier. As a key member of our Site Reliability Engineering team, you will be responsible for developing tools for surveillance and enhancement of our production systems.You will work closely with our team to increase system resilience, investigate...


  • London, Greater London, United Kingdom Lorien Full time

    Key Responsibilities:Collaborate with the existing team to deliver a brand-new project.Work on a hybrid model with 1 day a week on-site in London.Develop and maintain reliable and efficient systems.Utilize experience with Java, Python, Splunk, ServiceNow, and MongoDB.Contribute to incident management and application monitoring.Ensure seamless interaction...