AI Safety Expert

17 hours ago


London, Greater London, United Kingdom Anthropic Limited Full time

**About the Anthropic Fellows Program**

The Anthropic Limited Fellows Program is a 6-month collaboration program aimed at accelerating progress in AI safety research. We provide promising talent with the opportunity to gain research experience, bridging the gap between industry expertise and research skills required for impactful work in AI safety.

Fellows use external infrastructure (e.g., open-source models, public APIs) to work on an empirical project aligned with our research priorities, producing a public output (e.g., a paper submission). They receive substantial support, including mentorship from Anthropic researchers, funding, compute resources, and access to a shared workspace, enabling them to contribute meaningfully to critical AI safety research.

This role is employed by our third-party talent partner and may be eligible for benefits through the employer of record.

We are piloting this program with a cohort of 10-15 new collaborators. We aim to onboard our first cohort of Fellows in March 2025, with the possibility of more cohorts depending on applicant interest and logistical needs.

This position is ideal for those motivated by reducing catastrophic risks from advanced AI systems and have a strong technical background in computer science, mathematics, physics, or related fields.

You will undergo a project selection & mentor matching process in March 2025. Potential mentors include:

  • Ethan Perez
  • Jan Leike
  • Andi Peng
  • Samuel Marks
  • Joe Benton
  • Akbir Khan
  • Fabien Roger
  • Alex Tamkin
  • Kyle Fish
  • Nina Panickssery
  • Mrinank Sharma
  • Evan Hubinger

Our mentors lead projects in select AI safety research areas, such as:

  • Scalable Oversight: Developing techniques to keep highly capable models helpful and honest, even as they surpass human-level intelligence in various domains.
  • Adversarial Robustness and AI Control: Creating methods to ensure advanced AI systems remain safe and harmless in unfamiliar or adversarial scenarios.
  • Model Organisms: Creating model organisms of misalignment to improve our empirical understanding of how alignment failures might arise.
  • Model Internals / Interpretability: Advancing our understanding of the internal workings of large language models to enable more targeted interventions and safety measures.
  • AI Welfare: Improving our understanding of potential AI welfare and developing related evaluations and mitigations.

**Compensation:**

  • This role is not a full-time role with Anthropic Limited, and will be hired via our third-party talent partner.
  • The expected base pay for this role is £1,300/week, with an expectation of 40 hours per week.

**Location Policy:**

  • While we currently expect all staff to be in one of our offices at least 25% of the time, this role is exempt from that policy and can be done remotely from anywhere in the UK.
  • We strongly prefer candidates who can be based in London and make use of the shared workspace we've secured for our Fellows.


  • London, Greater London, United Kingdom AI Safety Institute Full time

    We are advancing the state of the science in risk modeling, incorporating insights from safety-critical and adversarial domains, while developing novel techniques. Our research aims to empirically evaluate these risks by building one of the world's largest agentic evaluation suites and pushing forward the science of model evaluations.Job RoleYou will work as...

  • AI Safety Researcher

    1 month ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    We're focused on addressing extreme risks from autonomous AI systems that can interact with the real world. To do this, we're advancing the state of the art in risk modeling, incorporating insights from other safety-critical and adversarial domains, and developing novel techniques. We're also empirically evaluating these risks through one of the world's...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    **Job Title:** Policy Lead**Location:** N/AAs a leading expert in the field of human-AI interaction risks, you will lead a multidisciplinary research team at the AI Safety Institute to evaluate and mitigate the behavioral and psychological risks that emerge from AI systems. The position offers a unique opportunity to push forward an emerging field and be...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    The AI Safety Institute is seeking an exceptional Advanced AI Research Expert to join its research unit. The successful candidate will work on mitigating extreme risks from autonomous AI systems, including risk models such as auto-replication and large-scale targeted manipulation.Key Responsibilities:Conducting research on advanced AI systems and their...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    We are seeking an exceptional Cybersecurity Research Engineer to join our team at the AI Safety Institute. Our goal is to develop first-of-its-kind government-run infrastructure to benchmark the progress of advanced AI capabilities in cyber security. The selected candidate will work closely with a cross-functional team of cybersecurity researchers, machine...

  • AI Safety Researcher

    3 weeks ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    About the RoleWe are seeking a highly motivated and talented Research Scientist/Engineer to join our Societal Impacts team at the AI Safety Institute. The successful candidate will work with other researchers to design and run studies that answer important questions about the effect of AI on society.The ideal candidate will have a strong background in...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    Role OverviewThe AI Safety Institute is seeking a highly skilled Senior AI Safety Researcher to join its Safeguard Analysis Team. The successful candidate will play a key role in researching and developing interventions that secure systems from abuse by bad actors.About the RoleThis is a challenging and rewarding opportunity for an experienced researcher to...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    About AI Safety InstituteAISI is a leading research institution in the field of artificial intelligence safety. We are dedicated to developing and applying cutting-edge technologies to ensure that AI systems align with human values.We are currently seeking a highly skilled researcher to join our Mechanistic Interpretability team. As a researcher, you will be...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    The AI Safety Institute is seeking an Expert in Autonomous Systems Safety to work on studying and evaluating risks from autonomous AI systems. The successful candidate will have a strong understanding of large language models and hands-on experience with pre-training or fine-tuning LLMs.About the Role:Studying and evaluating risks from autonomous AI...

  • AI Safety Engineer

    21 hours ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    Company Overview: The AI Safety Institute is a leading organization in the field of artificial intelligence safety. Our mission is to ensure that AI systems are developed and used in ways that benefit society.Salary: £80,000 - £120,000 per annum, depending on experience.Job Description: We are seeking a highly skilled Research Engineer to join our...

  • AI Safety Engineer

    4 weeks ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    The Post-Training Team at the AI Safety Institute is dedicated to optimizing AI systems for state-of-the-art performance in various risk domains. This involves a combination of scaffolding, prompting, supervised and RL fine-tuning of AI models.Key Responsibilities:Improve model performance using cutting-edge machine learning techniquesDevelop methodologies...

  • AI Safety Researcher

    3 weeks ago


    London, Greater London, United Kingdom AI Safety Institute Full time

    About the RoleWe are seeking a highly motivated and talented Research Scientist to join our Societal Impacts team at the AI Safety Institute. The successful candidate will work with our team to design and run studies that answer important questions about the effect AI will have on society.Key ResponsibilitiesDesign and run studies to evaluate the impact of...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    Estimated Salary: £80,000 - £110,000 per annumAbout the RoleWe are seeking an exceptional Senior AI Safety Researcher to join our team at the AI Safety Institute. This is a unique opportunity to contribute to the development of safety cases and advance the field of AI governance.Key ResponsibilitiesConduct foundational research on safety cases to help...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    At the AI Safety Institute, we are dedicated to optimizing AI systems for state-of-the-art performance across various risk domains. Our Post-Training Team works tirelessly to fine-tune and scaffold AI models, ensuring they reach their full potential.About the RoleWe are seeking a strong Research Scientist to join our team. As a member of this team, you will...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    **Job Description:** The AI Safety Institute is launching a new Psychological and Social Risks workstream, focused on understanding and mitigating the risks that arise from repeated or prolonged human-AI interaction. As a leading expert in this field, you will build and lead a multidisciplinary research team to develop behavioral and psychological research...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    Job DescriptionWe are seeking a highly skilled Research Scientist to join our team at the AI Safety Institute. This role offers an exciting opportunity to contribute to the development of rigorous scientific techniques for the measurement of frontier AI system capabilities.As a member of our Science of Evaluations team, you will be responsible for conducting...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    About AI Safety InstituteThe AI Safety Institute is a leading research organization dedicated to developing and applying cutting-edge technologies to ensure that AI systems align with human values. We are currently seeking a highly skilled researcher to join our Mechanistic Interpretability team.As a researcher, you will be responsible for advancing our...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    The AI Safety Institute is a pioneering organization at the forefront of developing safety evaluations for next-generation frontier AI systems. Our platform is the backbone of this critical initiative, and we're seeking an experienced Cloud Software Architect to join our Platform Engineering team.This is an exceptional opportunity to drive innovation in an...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    **About the Role:**We are seeking a highly skilled Research Lead to join our team at the AI Safety Institute. In this role, you will be responsible for advancing the state of science in evaluating societal-level harms caused by advanced AI systems.The Crime and Social Destabilisation workstream is a new initiative that focuses on assessing and mitigating...


  • London, Greater London, United Kingdom AI Safety Institute Full time

    As advanced AI systems continue to evolve, the potential risks associated with their cyber capabilities pose a significant threat to organizational and individual security. These risks are particularly concerning when combined with other AI risk areas, such as harmful outcomes from biological and chemical capabilities, and autonomous systems.The AI Safety...