Site Reliability Engineering

2 weeks ago


London, United Kingdom Globality, Inc. Full time

At Globality, we’re proud to embody the core values of innovation, collaboration, and trust in both our culture and product.
We’re creating ground-breaking technology utilizing a world-class, AI-powered Platform that revolutionizes how businesses buy and sell services. Our co-founders, Joel Hyatt and Lior Delgo, are seasoned entrepreneurs who bring an extensive business-building experience to our organization. Come help us build something great
Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Globality production systems running smoothly as a unit within the broader Production Engineering team.
SREs are a blend of pragmatic technical operators and tooling craftspeople that apply sound engineering principles, operational discipline, and mature automation to our production environment and the Globality codebase. We are a DevOps-driven culture with a particular team interest in improving our product stack insight, automation tooling, and scalability.
Globality is a unique product stack which brings unique challenges – it’s a ground-breaking technology utilizing a world-class, AI-powered microservices platform that revolutionizes how businesses buy and sell services. The experience of our team feeds back into other engineering groups within the company, perpetuating product improvement. Be part of the team responsible for managing an enterprise-grade AI-driven data and messaging platform.
Be on the (non-overnight) on-call rotation to respond to Globality availability incidents and provide support for other customer-impacting incidents.
Use your on-call shift to prevent incidents from ever happening.
Help make monitoring and alerting alert on symptoms and not on outages
Work with the Infrastructure and QA/TestEng teams to make the deployment process as efficient and boring as possible.
Work with the architects to implement the baseline technologies, policies, and practices to build a high-velocity, high-security, strong compliance platform that allows Globality scaling to support exponential growth.
~ Keep a keen eye on security issues in every project you work on, contributing to improving security in the systems that were already in place.

Help plan the growth of Globality's infrastructure.
~ Embrace a DevOps philosophy.
Know your way around Linux and the command line.
Have strong programming skills – Python, Go, and/or Ruby (etc.)
Have experience using the advanced tools of AWS, GCP, or other cloud providers.
Projects you could work on:
Improve our Metrics collection scope or improve our metrics-driven Monitoring story.
Work with the QA / Test Engineering team to fully pipeline our internal tools.
Work with Test Engineering on scale testing initiatives.
Develop a relationship with a product group, define their SLOs, help analyze our metrics data on those SLOs and improve their reliability.
Further our "Infrastructure as Code" mission using Terraform and CI/CD-focused automation
Administration of a variety of high-availability clusters.
Firm grasp of Metrics and Monitoring systems, Grafana visualization implementation, and delivery of well-targeted alerting with Slack/PagerDuty integrations.
Backend storage management and scaling
Disaster Recovery and High Availability strategy
Knowledge of Globality product stack and service interoperations
Collaboration and Communication:
Maintaining good relationships with other engineering teams in Globality that help improve the product
Are early-career Site Reliability Engineers who are expected to work toward:
Provides timely response to requests from Globality teammates and by reacting to alerts from monitoring and appropriately escalating when needed
# Proposes ideas and solutions within the Production Engineering team to reduce the workload through automation.
# Execute configuration change operations at the infrastructure level.
# Actively looks for opportunities to improve the availability and performance of the system by applying the knowledge gained from monitoring and observation

Shares gained knowledge readily with the team, either by creating issues that provide context for anyone to understand it or by writing Confluence articles.
# Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and seeing them through to resolution or escalating as appropriate.
# Plan and execute configuration change operations both at the application and the infrastructure level.
# Actively looks for opportunities to improve the availability and performance of the system by applying the knowledge gained from monitoring and observation

Shares gained knowledge readily with the team, either by creating issues that provide context for anyone to understand it or by writing Confluence articles.
# Senior Site Reliability Engineer I/II
Identifies significant projects that result in substantial cost savings or revenue
# Identifies changes for the product architecture from the reliability, performance, and availability perspective with a data-driven approach.
# Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage to make Globality cheaper to run.
# Identify parts of the system that do not scale, provides immediate palliative measures and drives long term resolution of these incidents.
# Are Senior SREs who meet the following criteria:
Significant modification to open source or major from-scratch tooling to deliver best-of-breed implementation of our production ecosystem.

Strives for automation either by coding it or by leading and influencing developers to build systems that are easy to run in production.
# Measure the risk of introduced features to plan ahead and improve the infrastructure.
# Proposes and drives architectural changes that affect the whole company to solve scaling and performance problems
# Leads significant project work for KPI level goals for the team

Works with engineers across the whole company, influencing design to create features that will work well multi-region/multi-cloud, massive-scaling implementations
# Runs RCAs and epic level planning meetings to get meaningful work scheduled into the plan

Routinely has an impact on the broader Engineering organization
# Helps to develop other team members into more senior levels and leaders in the team

We are an equal opportunity employer and a participant in the E-Verify program. We believe diversity makes teams better and that discrimination based on race, gender, or anything else is self-defeating.
#


  • Site Reliability Engineer

    Found in: Talent UK C2 - 2 days ago


    London, United Kingdom TEKsystems Full time

    Site Reliability Engineer / SRE Description: My global client is looking for a Site Reliability Engineer / SRE to join their growing team who must have strong experience working within the financial services industry on large complex projects. To be successful in this Site Reliability / SRE project you will need expert experience within: AWS ...

  • Site Reliability Engineer

    Found in: Jooble UK C2 - 2 weeks ago


    London, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability Engineer Location: London (Hybrid) Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.


  • London, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability Engineer Location: London (Hybrid) Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.

  • Site Reliability Engineer

    Found in: Talent UK 2A C2 - 2 weeks ago


    London, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability EngineerLocation: London (Hybrid)Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.


  • London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...

  • Site Reliability Engineer

    Found in: Talent UK 2A C2 - 7 days ago


    London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...

  • Site Reliability Engineer

    Found in: Jooble UK C2 - 7 days ago


    London, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...

  • Site Reliability Engineer

    Found in: Talent UK C2 - 1 week ago


    London, United Kingdom GroupM Full time

    Description Position at Choreograph Site Reliability Engineer Who we are: We create the data products & technology that make advertising work better for people. Choreograph is a global data products and technology company, purpose-built for an era that demands an innovative approach to data management, usage, and brand growth. Data is the fuel that...

  • Site Reliability Engineer

    Found in: Talent UK C2 - 1 week ago


    London, United Kingdom Experian Full time

    Job Description Work that matters – what you’ll be doing We’re looking for a Site Reliability Engineer to join our Experian Data Quality team where you will be working on cutting edge products within our Aperture suite (Data Studio and Data Governance). This role has aspects of both reliability engineering (SRE) and test engineering (SDET)....

  • Site Reliability Engineer

    Found in: Appcast UK C2 - 1 week ago


    London Area, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability EngineerLocation: London (Hybrid)Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.

  • Site Reliability Engineer

    Found in: Appcast UK C C2 - 1 week ago


    London Area, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability EngineerLocation: London (Hybrid)Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.

  • Site Reliability Engineer

    Found in: Appcast Linkedin GBL C2 - 2 weeks ago


    London Area, United Kingdom Vallum Associates Full time

    Job Title: Site Reliability EngineerLocation: London (Hybrid)Duration: Contractual role One of our Banking clients is looking for a Tech Site Reliability Engineer, with proven working experience in the Banking industry, working with FX/FI/FXOM Trading systems and relevant L2 support.

  • Site Reliability Engineer

    Found in: Talent UK C2 - 1 week ago


    London, United Kingdom N Consulting Ltd Full time

    Job title: Site Reliability EngineerWork Mode: 3 days office MandatoryLocation: 5 Broadgate, London EC2M 2QS, United KingdomContract Duration: 12 monthsWe’re looking for a Site Reliability Engineer to:· determine the reliability of our digital products, technology services, and the infrastructure that underpins them· minimize the risk and impact of...

  • Site Reliability Engineer

    Found in: Appcast UK C C2 - 7 days ago


    London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...

  • Site Reliability Engineer

    Found in: Whatjobs ES C2 - 6 days ago


    London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users. The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and...

  • Site Reliability Engineer

    Found in: Appcast Linkedin GBL C2 - 7 days ago


    London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...

  • Site Reliability Engineer

    Found in: Appcast UK C2 - 7 days ago


    London Area, United Kingdom Understanding Recruitment Full time

    Site Reliability Engineer I am seeking a Site Reliability Engineer for one of the worlds fastest growing social media platforms. With over 900 Million Daily users.The SRE group come from diverse technical backgrounds, Reliability, Software Engineering and Security Engineering, and have a broad remit ensuring high availability and performance, and currently...


  • London, United Kingdom Prism Digital Full time

    **Site Reliability Engineer | GCP OR AWS & Kubernetes | SaaS HealthTech** The local headcount currently is 35 in Ireland and 45 in the UK (remote sys admins, tech engineers, field engineers, project managers, programme managers and sales) and expanding the UK office - feels like a start-up with start-up good energy. Our client is around 50% through their...


  • London, United Kingdom IT Talent Solutions Ltd Full time

    **Site Reliability Engineer (GCP)** We are looking for a highly skilled Site Reliability Engineer to become part of our eCommerce client's business and help them develop and deliver a new platform strategy. This is a significant business transformation project as the business migrate from Legacy **OpenStack**based platform technologies to the Google Cloud...


  • London, United Kingdom WaferWire Cloud Technologies Full time

    We are seeking a highly motivated and experienced Site Reliability Engineer to join our growing team. You will be responsible for ensuring the reliability, performance, and scalability of our production systems. You will play a critical role in ensuring our systems are designed and operated with resiliency and high availability in mind. Project Duration:...