Software Engineer, Internal Infrastructure
1 week ago
Who are we?
Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.
We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what's best for our customers.
Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.
Join us on our mission and shape the future
Why this team?
The internal infrastructure team is responsible for building world-class infrastructure and tools used to train, evaluate and serve Cohere's foundational models. By joining our team, you will work in close collaboration with AI researchers to support their AI workload needs on the cutting edge, with a strong focus on stability, scalability, and observability. You will be responsible for building and operating Kubernetes GPU superclusters across multiple clouds. Your work will directly accelerate the development of industry-leading AI models that power Cohere's platform North.
We're hiring software engineers at multiple levels. Whether you're early in your career or a seasoned staff engineer, you'll find opportunities to grow and make an impact here.
Please Note: All of our infrastructure roles require participating in a 24x7 on-call rotation, where you are compensated for your on-call schedule.
As a Software Engineer in the Internal Infrastructure team, you will:
Build and operate Kubernetes compute superclusters across multiple clouds
Partner with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads
Work closely with research teams to understand their infrastructure needs and identify ways to improve stability, performance, and efficiency of novel model training techniques
Design and build resilient, scalable systems for training AI models, focusing on creating intuitive user interfaces that empower researchers to self-serve to troubleshoot and resolve problems
Encourage software best practices across our company and participate in team processes such as knowledge sharing, reviews, and on-call
You may be a good fit if you:
Have deep experience running Kubernetes clusters at scale and/or scaling and troubleshooting Cloud Native infrastructure, including Infrastructure as Code
Have strong programming skills in Go or Python
Prefer contributing to Open Source solutions rather than building solutions from the ground up
Are self-directed and adaptable, and excel at identifying and solving key problems
Draw motivation from building systems that help others be more productive
See mentorship, knowledge transfer, and review as essential prerequisites for a healthy team
Have excellent communication skills and thrive in fast-paced environments
Bonus qualifications:
You've previously worked with ML training infrastructure and GPU workloads and have familiarity with RDMA networking
You have expertise to support and troubleshoot low level Linux systems
You have experience collaborating with research teams or machine learning engineers
If some of the above doesn't line up perfectly with your experience, we still encourage you to apply
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.
Full-Time Employees at Cohere enjoy these Perks:
An open and inclusive culture and work environment
Work closely with a team on the cutting edge of AI research
Weekly lunch stipend, in-office lunches & snacks
Full health and dental benefits, including a separate budget to take care of your mental health
100% Parental Leave top-up for up to 6 months
Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
6 weeks of vacation (30 working days)
-
Software Engineer, Internal Infrastructure
1 week ago
London, Greater London, United Kingdom Cohere Full time £60,000 - £120,000 per yearWho are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...
-
Software Engineer, Compute Infrastructure
6 days ago
London, Greater London, United Kingdom Mistral AI Full timeAbout Mistral At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life. We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed...
-
Software Engineer – Shared Infrastructure
4 days ago
London, Greater London, United Kingdom BAE Systems Full time £45,000 - £58,000 per yearJob Title: Software Engineer – Shared InfrastructureLocation: New Malden - We offer a range of hybrid and flexible working arrangements - please speak to your recruiter about the options for this particular roleSalary: Up to £58,000 dependent on skills and experienceWhat you'll be doing:Implement fixes and functionality updates of relevant software...
-
Infrastructure Tools Software Engineer
2 weeks ago
London, Greater London, United Kingdom Apple Full time £60,000 - £120,000 per yearPeople at Apple don't just build products - they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here The Apple Services Engineering (ASE) team builds and provides systems and infrastructure that fuel Apple's...
-
Graduate Software Engineer in Test
2 weeks ago
London, Greater London, United Kingdom Acre Software Full time £60,000 - £100,000 per yearAcre is rebuilding the UK's £1.4 trillion mortgage market from the ground up, with a completely new end-to-end system for mortgage advisors.Our platform cuts out the unnecessary admin, pain and friction from buying a home. We're covering the entire journey, from figuring out what you can borrow, to getting your keys. We're guided both by the voice of real...
-
Intern Software Engineer, London, ASE
1 hour ago
London, Greater London, United Kingdom Apple Full timeImagine what you could build here. At Apple, extraordinary ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Bring passion and dedication to your work, and there's no telling what you could accomplish.Join the Apple Cloud Object Store team, a key part of Apple's data storage infrastructure, as a Software...
-
Software Engineer Intern
2 weeks ago
London, Greater London, United Kingdom Meta Full time €40,000 - €80,000 per yearWant to build new features and improve existing products that more than a billion people around the world use? Are you interested in working on highly impactful technical challenges to help the world be more open and connected? Want to solve unique, large-scale, highly complex technical problems? Our development cycle is extremely fast, and we've built tools...
-
Software Engineer Intern
2 weeks ago
London, Greater London, United Kingdom Coinbase Careers Page Full time £55,000 per yearReady to be pushed beyond what you think you're capable of?At Coinbase, our mission is to increase economic freedom in the world. It's a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform — and with it, the future global financial system.To achieve our mission, we're seeking a very specific...
-
Software Engineer Intern
1 week ago
London, Greater London, United Kingdom Coinbase Global Full time £55,000 per yearReady to be pushed beyond what you think you're capable of?At Coinbase, our mission is to increase economic freedom in the world. It's a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform - and with it, the future global financial system.To achieve our mission, we're seeking a very specific...
-
London, Greater London, United Kingdom Mattermost Full time £90,000 - £120,000 per yearAt Mattermost, we build the #1 collaborative workflow solution for defense, intelligence, security, and critical infrastructure organizations. Trusted by governments, financial institutions, and technology companies, our platform enables secure, efficient operations for the world's most critical teams. We're dedicated to empowering organizations to...