Senior Site Reliability Engineer, Observability DevOps
3 weeks ago
Teamwork makes the stream work. Roku is changing how the world watches TV
Roku is the #1 TV streaming platform in the US, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers.
From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.
About the teamThe Platform Infrastructure team ensures that all Roku systems run smoothly. These systems support over 100M+ users and billions in transaction revenue per year. We are a group of highly skilled infrastructure and software engineers that help build and operate systems at internet scale, including Platform (Kubernetes, Istio, Envoy, operators, and more) and Observability (OSS/CNCF-supported observability projects). We engage with multiple teams to achieve company-impacting results.
About the roleWe are looking for a highly skilled DevOps/SRE professional with a specialization in observability to join our team. The ideal candidate will have a strong background in designing, implementing, and maintaining monitoring and observability solutions for complex cloud environments. If you have a consistent track record architecting & building large scale systems and enjoy solving intriguing system challenges at internet-scale, and If you are innovative at heart and have a great balance on learning, organizing, building, and enjoy making an impact, this role might be a great fit for you
What you will be doing- Develop and implement an observability strategy to ensure comprehensive monitoring, logging, and tracing of systems and applications. Define key metrics, logs, and traces to provide actionable insights into system performance and reliability.
- Evaluate, select, and implement monitoring and observability tools and technologies best suited to our environment. This may include solutions such as Prometheus, Grafana, ELK stack, Datadog, Jaeger, or OpenTelemetry.
- Work closely with development and operations teams to instrument infrastructure, applications, and services for effective monitoring and tracing. Implement best practices for instrumentation to minimize overhead and maximize visibility.
- Configure alerting thresholds and escalation policies to proactively identify and respond to issues before they impact users. Lead incident response efforts, leveraging observability data to troubleshoot and resolve issues quickly.
- Utilize observability data to identify performance bottlenecks and optimization opportunities. Collaborate with development teams to improve application performance and scalability.
- Develop automation scripts and integrations to streamline monitoring and observability processes. Integrate observability solutions with CI/CD pipelines, configuration management tools, and incident management systems.
- Create comprehensive documentation of monitoring and observability setups, best practices, and troubleshooting guides. Provide training and support to team members on observability tools and techniques.
- Stay current with industry trends and emerging technologies in observability. Drive continuous improvement initiatives to enhance the effectiveness and efficiency of observability practices.
- Participate in 24x7 on-call rotation, and be available to work with global teams in the event of critical outages
- Experience with a number of the following: ECS, Docker, Kubernetes,Envoy, Istio.
- Experience with infrastructure as code (IaC) tools such as Terraform, Ansible, or CloudFormation.
- Proficiency in monitoring and observability tools such as Prometheus, Grafana, ELK stack, Datadog, Jaeger, or OpenTelemetry.
- Strong understanding of distributed systems, microservices architecture, and cloud-native technologies.
- Deep knowledge of TCP, and TLS stacks is very desirable.
- Knowledge of distributed tracing technologies like Jaeger, Zipkin, or OpenTelemetry.
- Familiarity with log management solutions such as Loki, Elasticsearch, Logstash, and Kibana (ELK stack).
- Demonstrated understanding of overall infrastructure design and developingtools to enable and automate infrastructure.
- 8+ years of experience in cloud-focused software development in Java, Go, Python, or other object-oriented programming languages.
- The drive and self-motivation to understand the intricate details of a complex infrastructure environment.
- Demonstrated ability to communicate clearly with both technical and non-technical project stakeholders, with the ability to work effectively in a cross-functional team environment.
- BS Degree in Computer Science or Equivalent.
- Preferred:
- Certifications in relevant technologies such as Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer, or Certified Information Systems Security Professional (CISSP).
Benefits
Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Our employees can take time off work for vacation and other personal reasons to balance their evolving work and life needs. It's important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter.
The Roku CultureRoku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company's success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We’re independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you'll be part of a company that's changing how the world watches TV.
We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn't real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002.
To learn more about Roku, our global footprint, and how we've grown, visit .
#J-18808-Ljbffr-
Site Reliability Engineer/DevOps
5 days ago
United Kingdom Fortice Full timeWe are heading up a recruitment drive for a global consultancy that require an SC Cleared Site Reliability Engineer to join them on a major government project that's based 2 days per week in Wokingham. The SRE team have L2 support responsibilities and will lead the triages. You will be trained in and exposed to many different modern technologies...
-
Senior DevOps Engineer
4 weeks ago
United Kingdom Oliver Bernard Full timeSenior DevOps Engineer – FinTech / Payments – AWS, Kubernetes, Terraform Oliver Bernard are currently working with a fast-growing Payments company, based in London, who are seeking a strong and experienced Senior DevOps Engineer to join their Platform team as part of plans to scale their infrastructure and drive DevOps best practices. The incoming...
-
Senior DevOps Engineer
4 weeks ago
United Kingdom Oliver Bernard Full timeSenior DevOps Engineer – FinTech / Payments – AWS, Kubernetes, Terraform Oliver Bernard are currently working with a fast-growing Payments company, based in London, who are seeking a strong and experienced Senior DevOps Engineer to join their Platform team as part of plans to scale their infrastructure and drive DevOps best practices. The incoming...
-
Senior DevOps Engineer
4 weeks ago
United Kingdom Oliver Bernard Full timeSenior DevOps Engineer - FinTech / Payments - AWS, Kubernetes, Terraform Oliver Bernard are currently working with a fast-growing Payments company, based in London, who are seeking a strong and experienced Senior DevOps Engineer to join their Platform team as part of plans to scale their infrastructure and drive DevOps best practices. Strong Cloud experience...
-
Senior DevOps Engineer
4 weeks ago
United Kingdom Oliver Bernard Full timeSenior DevOps Engineer - FinTech / Payments - AWS, Kubernetes, Terraform Oliver Bernard are currently working with a fast-growing Payments company, based in London, who are seeking a strong and experienced Senior DevOps Engineer to join their Platform team as part of plans to scale their infrastructure and drive DevOps best practices. Strong Cloud experience...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems,...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems,...
-
Senior DevOps Engineer
1 day ago
United Kingdom EVera Recruitment Full timeWe are currently seeking a highly skilled Senior DevOps Engineer to join our client’s team, a world-renowned luxury car manufacturer. The successful candidate will enhance the software development and deployment processes with DevOps capabilities. Automate and accelerate the testing, release, and deployment of automotive software applications into a...
-
Site Reliability Engineer II
4 weeks ago
United Kingdom Axon Full timeYour Impact As a contributor in the SRE (Site Reliability Engineering) organization, you are passionate about delivering solutions to the real-time problems our mission-critical cloud native services encounter. You are also obsessed about achieving the high quality and reliability our customers demand. You will work closely not only with the SRE division,...
-
Senior Lead Site Reliability Engineer
4 weeks ago
United Kingdom JPMorgan Chase & Co. Full timeOut of the successful launch of Chase in 2021, we’re a new team, with a new mission. We’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. Our team is key to our success. We’re people-first. We value collaboration, curiosity and...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT.ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems, backtesting,...
-
Senior Site Reliability Engineer
2 weeks ago
United Kingdom THINKalpha Full timeLocation: 100% Remote. The working timezone is EU/GMT. ThinkAlpha is looking for a Senior Site Reliability Engineer to work in the core infrastructure team supporting our data analytics platform and transactional trading engine. Our team provides solutions for real-time analytics, financial search, data integration, robust transactional systems,...
-
United Kingdom JPMorgan Chase & Co. Full timeWe’re creating products that solve real world problems and put customers at the center - all in an environment that nurtures skills and helps you realize your potential. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Accelerators Engineering team, you are the heart of this venture, focused on getting smart ideas into the hands of...
-
Salesforce DevOps Engineer
2 weeks ago
United Kingdom Oliver Bernard Full timeSalesforce/DevOps Engineer - Remote - £95K One of our clients who are a household name in the retail space are looking for a DevOps Engineer to play a key role in delivering the future of their Salesforce & AWS platform. They are offering remote working in the UK As a DevOps engineer you will sit within the Platform Engineering, you will play a...
-
Salesforce DevOps Engineer
2 weeks ago
United Kingdom Oliver Bernard Full timeSalesforce/DevOps Engineer - Remote - £95K One of our clients who are a household name in the retail space are looking for a DevOps Engineer to play a key role in delivering the future of their Salesforce & AWS platform. They are offering remote working in the UK As a DevOps engineer you will sit within the Platform Engineering, you will play a...
-
Senior Site Reliability Engineer II
2 days ago
United Kingdom LexisNexis Risk Solutions Inc Full timeSenior Site Reliability Engineer Would you like to join our great reliability engineering team? Do you have a passion for cloud infrastructure technologies? About The Business At Cirium, our goal is to keep the world connected. We are the industry leader in aviation analytics; helping our customers understand the past, present, and predicting what will...
-
Senior Site Reliability Engineer
2 days ago
United Kingdom LexisNexis Risk Solutions Inc Full timeSenior Site Reliability Engineer Would you like to join our great reliability engineering team? Do you have a passion for cloud infrastructure technologies? We are the industry leader in aviation analytics; helping our customers understand the past, present, and predicting what will happen tomorrow. Our mission is to transform the aviation industry by...
-
DevOps Engineer
2 days ago
United Kingdom IT Talent Hub Full timeRole Description This is a full-time on-site role for a DevOps Engineer at Grethena. As a DevOps Engineer, you will be responsible for tasks related to infrastructure as code (IaC), software development, continuous integration, system administration, and Linux. You will play a critical role in ensuring the reliability, scalability, and security of our...