Director of Production Engineering
3 days ago
Toshiba Global Commerce Solutions is seeking a Director of Production Engineering (Reliability Platform Engineering) to lead the reliability backbone of our global POS, cloud, and middleware platform. This strategic role owns system availability, resilience, performance, observability, and release reliability across a distributed, mission-critical commerce ecosystem.
This leader will unify Site Reliability Engineering (SRE), Resilience & Performance Engineering, Observability, and AI-driven Reliability Automation into one cohesive function. As AI accelerates development velocity, verification and reliability become the core bottlenecks—making this role a cornerstone of our engineering organization.
You will partner closely with Architecture, Cloud Operations, Functional Quality Engineering, and Software Development to ensure predictable reliability, smooth releases, and dramatically fewer Sev-1/Sev-2 incidents.
Responsibilities
System Reliability & Uptime:
Define and enforce SLO/SLA frameworks, error budgets, and release criteria Lead availability, resilience, and performance strategy across all services. Own MTTR, MTBF, incident prevention, and rollback strategies at scale.Unified Reliability Engineering Organization:
Lead teams across SRE & L3 Engineering, Resilience & Performance Engineering, Observability & Telemetry, AI Reliability Automation. Build a culture focused on prevention over firefighting.Architecture-Level Reliability:
Collaborate with Principal Engineers and Architects to define system guardrails, resilience patterns, and failure modes. Ensure high-quality Production Readiness Reviews (PRRs) and architectural consistency.Resilience & Performance Engineering:
Own chaos, failover, load, stress, and soak testing strategies. Validate store-mode behavior, payment workflows, edge-device dependencies, and multi-service interactions.Observability & Telemetry:
Ensure complete, accurate signal for logs, traces, metrics, and business health. Partner with AI systems to build intelligent anomaly detection pipelines.AI-Driven Release Reliability:
Integrate AI-based reliability scoring, resiliency prediction, automated gating, regression analysis, and incident pattern detection. Define the path toward autonomous release reliability pipelines.Cross-Org Leadership:
Partner with Software Development, Functional Quality Engineering, Cloud Operations, Architecture, and TPM/TPO teams. Drive multi-team initiatives and ensure readiness across complex release trains.Required Experience:
Bachelors Degree in Computer Science, Engineering or 10-15 years direct experience.10–15+ years in SRE, Reliability Engineering, Production Engineering, Distributed Systems, and Performance/Resilience EngineeringProven ownership of uptime and system reliability in complex distributed architectures. Expertise in distributed systems, cloud platforms (AKS, Kubernetes), observability stacks (OpenTelemetry, Grafana, App Insights, Datadog), performance tuning, fault tolerance, network fundamentals, DB/service scaling, chaos testingArchitectural Leadership: Experience designing resilience patterns (timeouts, retries, hedging, circuit breakers). Strong partnership with architects and senior engineers. Operational Maturity: Led SRE/on-call organizations. Defined SLOs, SLIs, and error budgets at scale. Track record of driving incident prevention culture. Leadership & Communication: Builds strong engineering teams and hires top talent. Influential communicator with executives and cross-functional teams. Highly collaborative and low-ego.Preferred Requirements
AI-driven anomaly detection, regression analysis, incident clustering, reliability scoring. Experience with retail POS, payments, edge devices, or store environments.Hybrid cloud + edge architectures. Leading reliability transformations and scaling engineering organizations (200→500+).
Why This Role Matters
As AI accelerates development velocity, the bottleneck shifts from coding to verification, reliability, and release safety. This role ensures:
- Uptime becomes engineered, not reactive.
- Development and QA operate at AI-enabled speed.
- Our platform grows safely while delivering stability and performance.
- We match or surpass best-in-class tech organizations (Google, Amazon, Azure, Stripe).
You will build the production engineering foundation that powers our next decade of innovation.
Toshiba Global Commerce Solutions is a dynamic billion-dollar global company based in Research Triangle Park, NC, providing retail store solutions to your favorite brands. Have you ever been in a hurry and made use of the self-checkout at Lowe's Foods, earned fuel rewards at Kroger, or just paid for purchases at retailers such as Walmart, Michaels, Carrefour, The Gap, Calvin Klein, Boots, Cencosud, BJ's, or Costco? These are just a few examples of our in-store solutions and impressive customer base that made us the world's installed market share leader.
The nature of retail is changing quickly, so if you share our 'Together Commerce' vision of a seamless two-way, participatory shopping experience, let's get together to drive the new economy.
Toshiba Global Commerce Solutions, Inc. offers a competitive salary and generous benefits package including the following:
EEO:
Toshiba Global Commerce Solutions is an equal opportunity/affirmative action employer that evaluates qualified applicants without regard to age, ancestry, color, religious creed, disability, marital status, medical condition, genetic information, military or veteran status, national origin, race, sex, gender, gender identity, gender expression and sexual orientation or any other protected factor. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.
Individuals who need a reasonable accommodation because of a disability for any part of the employment process should email to request an accommodation
DIVERSITY, EQUITY & INCLUSION:
We at Toshiba Global Commerce Solutions firmly believe that our people are an integral part to the success of our customers. Furthermore, we're committed to Diversity, Equity, and Inclusion for all our people as highlighted by our 5 Core Principles (Create Outreach, Foster Belonging, Unleash Opportunity, Diverse Cultural Engagement and Culture of Transparency). We're passionate about our customers the retail industry and becoming a more responsible company as we help create a brighter future.
-
Product Engineer
2 weeks ago
Durham, North Carolina, United Kingdom FII Full time £30,000 - £60,000 per yearOverview: A Product Engineer is primarily responsible to oversee the RMA service lifecycle. The engineer will need to have a strong background technical engineering background. They are responsible to identify and solve any technical challenges during the RMA enabling stage of a product. They will be required to collaborate with various teams to ensure the...
-
Production Administrator
2 weeks ago
Durham, North Carolina, United Kingdom Ram Jack RaleighDurham Full time $40,000 - $45,000 per yearRam Jack has been helping homeowners and businesses with foundation problems for over forty years. Our patented stabilization process is virtually fail-proof when installed correctly. Ram Jack franchised dealers are located in North and Central America, Canada and Puerto Rico. We were the first foundation repair company in North America to be recognized by...
-
Production Administrator
2 days ago
Durham, North Carolina, United Kingdom Ram Jack RaleighDurham Full timeRam Jack has been helping homeowners and businesses with foundation problems for over forty years. Our patented stabilization process is virtually fail-proof when installed correctly. Ram Jack franchised dealers are located in North and Central America, Canada and Puerto Rico. We were the first foundation repair company in North America to be recognized by...
-
Hardware Test Engineer
2 weeks ago
Durham, North Carolina, United Kingdom FII Full time £30,000 - £60,000 per yearOverview: Test Engineers are responsible for designing and implementing tests to ensure the product functions to meet high-quality standards via developing test plans, peer reviewing code, observing current processes, and testing materials through processes. Test Engineers work primarily in a Linux Shell, maintaining, diagnosing, and troubleshooting...
-
Durham, North Carolina, United Kingdom Vulcan Elements Full time £60,000 - £80,000 per yearMaterials & Process Engineer, Rapid SolidificationResearch Triangle, NCVulcan Elements is onshoring the manufacturing of rare-earth permanent magnets with a secure & resilient supply chain, servicing customers in critical industries such as defense, aerospace, and automotive. As a Materials & Process Engineer on the Engineering team, you will develop the...
-
Durham, North Carolina, United Kingdom Vulcan Elements Full timeMaterials & Process Engineer, Rapid SolidificationResearch Triangle, NCVulcan Elements is manufacturing American rare-earth permanent magnets for a secure, resilient future. With a focus on national security and economic resiliency, we serve critical industries such as defense, aerospace, and automotive powering a high-technology future. Vulcan Elements is...
-
Quality Assurance Engineer
2 weeks ago
Durham, North Carolina, United Kingdom FII Full time £35,000 - £60,000 per yearQuality EngineerLocation: Durham, North Carolina, United StatesEmployment Type: Full-timeOverview:We are seeking a Quality Assurance Engineer to join our team and ensure the reliability and performance of refurbished GPUs and servers. The role focuses on inspection, testing, process optimization, and compliance to ensure that all products meet industry...
-
Facilities Electrical Engineer
2 weeks ago
Durham, North Carolina, United Kingdom FII Full time £60,000 - £80,000 per yearOverview:Our company is seeking experienced candidates to fill the role of Facilities Electrical Engineer. The role focuses on facilitating and coordinating electrical engineering and facilities activities while providing expertise, guidance, and support to ensure successful project execution.Lead and manage electrical engineering projects from conception to...
-
Software Engineer III
1 week ago
Durham, North Carolina, United Kingdom External Full time $70,000 - $120,000 per yearToshiba Global Commerce Solutions is seeking a Software Engineer -Checkout Environment for Consumer-Service- contractor. The TGCS Professional Services Software Engineer designs, builds, and supports custom software solutions for some of the world's largest retailers.The Software Developer will support activities with customers, external and internal...
-
Validation Manager, Engineering
2 weeks ago
Durham, North Carolina, United Kingdom Kincell Bio Full time $70,000 - $120,000 per yearKincell Bio engineers cells into therapies. With manufacturing facilities located in Research Triangle Park, NC and Gainesville, FL, Kincell Bio is a contract development and manufacturing organization (CDMO) with the mission to streamline CMC development, apply expertise in analytical and process development and GMP manufacturing, testing and release from...