Observability Site Reliability Engineer

2 weeks ago


London, United Kingdom Apple Inc. Full time

People at Apple don’t just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here Join Apple, and help us leave the world better than we found it.The Apple Service Engineering(ASE) team builds and provides systems and infrastructure that fuel Apple’s services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple’s software developers build the products that our customers love. We are looking for passionate and talented Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work.” If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for youThe Observability SRE organization is specifically tasked with enabling other teams to better understand their infrastructure and services, providing world-class observability capabilities.

Key Qualifications

  • Strong sense of ownership and integrity demonstrated through clear communication and collaboration
  • Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
  • Experience with the Prometheus ecosystem
  • The ability to design, author, and release code in languages like Go or Python
  • Acute drive to automate manual operations and to improve them through repeated iteration
  • Understanding of the Linux Operating System, standard networking protocols, and components
  • Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker)
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
  • Excellent troubleshooting and problem solving skills
  • Experience with scale testing, disaster recovery, and capacity planning
  • Familiarity with microservices architecture and container orchestration with Kubernetes
Description

Apple Services Engineering infrastructure is BIG. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. SREs at Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management — our responsibilities are both broad and deep.ASE runs the majority of its systems on Linux. We run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.

Education & Experience

BS/MS in Computer Science or Equivalent (5+ years of software development or production operations experience in a large-scale environment)

#J-18808-Ljbffr

  • London, United Kingdom Prolific Full time

    The Role We are looking for a Principal Site Reliability Engineer to lead Site Reliability at Prolific, focusing on advancing the resilience and scalability of our GCP and AWS environments. You will play a pivotal role in overseeing and enhancing our Kubernetes clusters in GCP, which support our Django application, and in driving the SRE strategic...


  • London, United Kingdom McDonald's Limited Full time

    **The Opportunity**: **The Opportunity** An exciting opportunity to work as part of the Service Operations Team, the Site Reliability Officer will be responsible for improving the value of IT to the business by reducing the occurrence of systematic issues within our services. These improvements could be technical, procedural or behavioural and will require...


  • London, United Kingdom Neo4j Inc Full time

    The Role: The Site Reliability Engineering team’s mission is to improve the overall reliability of Neo4j’s DBaaS product: Neo4j Aura. Our product operates at scale and spans all 3 major cloud providers, with hundreds of Kubernetes clusters running in production. Until recently, the SRE function at Neo4j Aura achieved this by filling the shoes of a...

  • Site Reliability Engineer

    Found in: Jooble UK C2 - 5 days ago


    London, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer Tooling Our client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a...

  • Site Reliability Engineer

    Found in: Talent UK 2A C2 - 6 days ago


    London, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer ToolingOur client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a core focus on...


  • London, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer Tooling Our client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a...


  • London, United Kingdom Braze Inc. Full time

    WHAT YOU’LL DO Our products rely on sophisticated real-time and batch processing of massive amounts of data and instrumentation to provide analytics, automated decision-making, and an industry-leading customer engagement tool. We are looking for a Site Reliability Engineer (SRE) to join our Reliability team within our product and engineering...

  • Site Reliability Engineer

    Found in: Appcast UK C C2 - 6 days ago


    London Area, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer ToolingOur client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a core focus on...

  • Site Reliability Engineer

    Found in: Appcast UK C2 - 6 days ago


    London Area, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer ToolingOur client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a core focus on...

  • Site Reliability Engineer

    Found in: Appcast Linkedin GBL C2 - 6 days ago


    London Area, United Kingdom Acquire Me Full time

    Site Reliability Engineer - Developer ToolingOur client is a renowned global market making firm. They're hiring for a SRE with strong full-stack SWE skills with a background working on complex high availability infrastructure. You'll join a small group of high calibre SWEs building custom tooling from the ground up through to production, with a core focus on...

  • Site Reliability Engineer

    Found in: Talent UK C2 - 3 days ago


    London, United Kingdom TEKsystems Full time

    Site Reliability Engineer / SRE Description: My global client is looking for a Site Reliability Engineer / SRE to join their growing team who must have strong experience working within the financial services industry on large complex projects. To be successful in this Site Reliability / SRE project you will need expert experience within: AWS ...

  • Observability Engineer

    Found in: Jooble UK C2 - 5 days ago


    London, United Kingdom Atyeti Inc Full time

    About the job : We are seeking a dynamic and experienced Observability Engineer with expertise in any cloud, Grafana/Prometheus/Datadog Role & Responsibilities * Develop and improve instrumentation for monitoring and logging the health and availability of services. * Proactively monitor systems, networks, and applications to provide input in improving...


  • London, United Kingdom Atyeti Inc Full time

    About the job : We are seeking a dynamic and experienced Observability Engineer with expertise in any cloud, Grafana/Prometheus/Datadog Role & Responsibilities * Develop and improve instrumentation for monitoring and logging the health and availability of services. * Proactively monitor systems, networks, and applications to provide input in improving...


  • London, United Kingdom Globality, Inc. Full time

    At Globality, we’re proud to embody the core values of innovation, collaboration, and trust in both our culture and product. We’re creating ground-breaking technology utilizing a world-class, AI-powered Platform that revolutionizes how businesses buy and sell services. Our co-founders, Joel Hyatt and Lior Delgo, are seasoned entrepreneurs who bring an...


  • London, United Kingdom Anaplan Full time

    Observability Engineer At Anaplan we are looking for a self-motivated Observability Engineer to join our dedicated Observability Infrastructure team. Anaplan is a high-growth company that is leading the way in enterprise planning. We look for exceptional Engineers of all levels - inquisitive, hard-working people who believe in simplicity, agility and...


  • London, United Kingdom Palantir Technologies Full time

    A World-Changing Company Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role We’re looking for Site Reliability...

  • Site Reliability Engineer

    Found in: Jooble UK C2 - 5 days ago


    London, United Kingdom TravelPerk Full time

    Backed by world-class investors with portfolios including AirBnb, Stripe, Slack, Trello, Gusto, Twitter, Farfetch and Deliveroo, our team is made up of A-players from across the travel and technology industries. Over the past few years, we’ve been named the fastest-growing SaaS startup in the world by SaaS1000 and featured as one of the hottest...


  • London, United Kingdom TravelPerk Full time

    Backed by world-class investors with portfolios including AirBnb, Stripe, Slack, Trello, Gusto, Twitter, Farfetch and Deliveroo, our team is made up of A-players from across the travel and technology industries. Over the past few years, we’ve been named the fastest-growing SaaS startup in the world by SaaS1000 and featured as one of the hottest...

  • Observability Engineer

    Found in: Whatjobs ES C2 - 6 days ago


    London Area, United Kingdom Atyeti Inc Full time

    About the job : We are seeking a dynamic and experienced Observability Engineer with expertise in any cloud, Grafana/Prometheus/Datadog Role & Responsibilities * Develop and improve instrumentation for monitoring and logging the health and availability of services. * Proactively monitor systems, networks, and applications to provide input in improving...

  • Observability Engineer

    Found in: Appcast UK C C2 - 6 days ago


    London Area, United Kingdom Atyeti Inc Full time

    About the job :We are seeking a dynamic and experienced Observability Engineer with expertise in any cloud, Grafana/Prometheus/DatadogRole & Responsibilities * Develop and improve instrumentation for monitoring and logging the health and availability of services. * Proactively monitor systems, networks, and applications to provide input in improving the...