Researcher: Multimodal

2 weeks ago

London, United Kingdom Cartesia Full time

**About Cartesia**:
Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

**The Role**

We’re opening our first ever office in Europe, and looking to hire incredible talent in London to advance our mission of building real-time multimodal intelligence. In this role, you'll:

- Lead the design, creation, and optimization of datasets for training and evaluating multimodal models across diverse modalities, including audio, text, video, and images.
- Develop strategies for curating, aligning, and augmenting multimodal datasets to address challenges in synchronization, variability, and scalability.
- Design innovative methods for data augmentation, synthetic data generation, and cross-modal sampling to enhance the diversity and robustness of datasets.
- Create datasets tailored for specific multimodal tasks, such as audio-visual speech recognition, text-to-video generation, or cross-modal retrieval, with attention to real-world deployment needs.
- Collaborate closely with researchers and engineers to ensure datasets are optimized for target architectures, training pipelines, and task objectives.
- Build scalable pipelines for multimodal data processing, annotation, and validation to support research and production workflows.

**What We’re Looking For**
- Expertise in multimodal data curation and processing, with a deep understanding of challenges in combining diverse data types like audio, text, images, and video.
- Proficiency in tools and libraries for handling specific modalities, such as librosa (audio), OpenCV (video), and Hugging Face (text).
- Familiarity with data alignment techniques, including time synchronization for audio and video, embedding alignment for cross-modal learning, and temporal consistency checks.
- Programming expertise in Python and experience with frameworks like PyTorch or TensorFlow for building multimodal data pipelines.
- Comfortable with large-scale data processing and distributed systems for multimodal dataset storage, processing, and management.
- A collaborative mindset with the ability to work cross-functionally with researchers, engineers, and product teams to align data strategies with project goals.

**Nice-to-Haves**
- Experience in creating synthetic multimodal datasets using generative models, simulation environments, or advanced augmentation techniques.
- Background in annotating and aligning multimodal datasets for tasks such as audio-visual speech recognition, video-captioning, or multimodal reasoning.
- Early-stage startup experience or a proven track record of building datasets for cutting-edge research in fast-paced environments.

**Our culture**

We’re an in-person team based out of San Francisco, Bangalore & London. We love being in the office, hanging out together and learning from each other everyday.

We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

**Our perks**

Lunch, dinner and snacks at the office.

Fully covered medical, dental, and vision insurance for employees.

Pension Plan.

Relocation and immigration support.

Your own personal Yoshi.

Research Engineer, Multimodal

2 weeks ago

London, Greater London, United Kingdom Anthropic Full time £250,000 - £270,000 per year

About AnthropicAnthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. At Anthropic, we...
Research Engineer, Multimodal

1 week ago

London, Greater London, United Kingdom Anthropic Full time £1,000,000 - £1,350,000 per year

About AnthropicAnthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.At Anthropic, we believe...
Senior Researcher

3 weeks ago

london, United Kingdom Intelix.AI Full time

Senior Research Scientist - Multimodal AI & Generative Models Leading Enterprise AI & LLM Company Remote/Hybrid | Competitive Package + Equity This role is with a world-class research team at a rapidly growing AI company that's revolutionizing how enterprises adopt and deploy AI solutions and are hiring exceptional researchers who can bridge the gap between...
Research Engineer, Multimodal and Video Modeling

2 weeks ago

Greater London, United Kingdom Google DeepMind Full time

Research Engineer, Multimodal and Video Modeling London, UK At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age,...
Research Engineer, Multimodal and Video Modeling

7 days ago

Greater London, United Kingdom Google DeepMind Full time

Research Engineer, Multimodal and Video Modeling At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship,...
Research Engineer: Multimodal Video Modeling

2 weeks ago

Greater London, United Kingdom The Rundown AI, Inc. Full time

A leading AI research lab is seeking a Research Engineer with exceptional programming skills and experience in training large-scale multimodal models. The successful candidate will develop and maintain data pipelines and translate innovative research into production. Candidates should have a relevant degree and strong knowledge of deep learning...
Research Engineer, Multimodal and Video Modeling

1 week ago

London, United Kingdom DeepMind Full time

At Google DeepMind we value diversity of experience knowledge backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex race religion or belief ethnic or national origin disability age citizenship marital domestic or civil partnership status sexual orientation...
Multimodal Video Modeling Research Engineer

7 days ago

Greater London, United Kingdom Google DeepMind Full time

A leading AI research organization in Greater London is seeking a Research Engineer who excels in programming and has a solid understanding of neural network training. The ideal candidate will focus on developing and improving multimodal models, especially for video, while collaborating closely with researchers to translate ideas into production-ready code....
Research Engineer, Multimodal and Video Modeling

2 weeks ago

London, Greater London, United Kingdom DeepMind Full time £120,000 - £140,000 per year

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual...
Research Engineer, Multimodal and Video Modeling

2 weeks ago

Greater London, United Kingdom The Rundown AI, Inc. Full time

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual...

Americas

Europe

Asia / Oceania

Africa

Researcher: Multimodal