I Reviewed Hundreds of AI Safety Applications. Here's What Actually Matters

TLDR

2-6 month AI safety programs (MATS, Anthropic Fellows, Goodfire Fellows, Astra, etc.) are one of the best ways to break into AI safety
Admissions are highly competitive: single-digit acceptance rates for MATS, 1.5% for Anthropic Fellows, ~15% for my SPAR projects
Having reviewed applications for multiple MATS cohorts and SPAR projects, I’m sharing insights that can improve your chances of acceptance

I originally wrote this for my SPAR scholars, but it applies to anyone applying to competitive AI safety programs. This guide has two goals: help you develop the skills that actually matter, and ensure reviewers can see it.

Understanding the Applicant Pool

Applications follow a normal distribution: very few are truly bad or exceptional. The median applicant is near the end of their Masters degree or doing PhD, has completed a few projects and internships, and has shown some AI safety interest. At 5% acceptance rates, you need to clearly stand out from this baseline. How?

Technical Skills

Technical ability is crucial but harder to assess than quantifiable metrics like papers or years of experience.

Coding Assessments through a recruiter’s lens

Most programs now use automated coding screens. While they may seem artificial, they have one critical advantage: they produce a simple, standardized number that’s directly comparable across all applicants. Everything else on your application is extremely hard to compare. Evaluating a CV requires judgment calls. Assessing GitHub contributions can take 15+ minutes per applicant to determine whether repos are substantial independent work or just toy projects and minor contributions. In contrast, a coding assessment gives recruiters an instant, objective comparison point. This makes these scores far more influential in decisions than applicants often realize. To excel:

Don’t cheat: CodeSignal’s detection is surprisingly sensitive and flags a substantial number of applicants, so avoid copy/pasting code from outside the IDE.
Practice Leetcode: Focus not on algorithms and data structures, but on coding everyday Python quickly. Can you iterate a dictionary? Sort over dataclasses with multiple keys? Make solutions extendable? Since most developers rely heavily on coding assistants now, these basics need deliberate practice.
Practice deliberately: Turn off coding assistance occasionally to avoid unlearning fundamentals. Review your own code and identify improvements. Record yourself coding and talking through problems for an hour, then review it the next day. Watch Python tutorials and apply new concepts immediately.

Project Experience

Substantial technical projects are the most convincing evidence - but surprisingly hard to evaluate from CVs alone. Most applicants describe what they did without demonstrating quality or impact. A 3-month internship description on the CV sounds identical whether you excelled or didn’t do shit. In group projects, the hard worker and the free rider write the same CV line.

Recruiters need evidence of technical ability:

GitHub repos: Make it clear whether your repo is independent work, a fork/tutorial, or part of a larger codebase. Avoid posting course exercises. For your own code: write a clear README explaining what it is and what you did, clean up the files, and get a few stars. For contributions to larger codebases, highlight your specific contributions.
Papers or products: Link to concrete outputs from your projects.
Selective program acceptances: Acceptance into highly selective technical programs (e.g., quant internships at top firms) serves as strong external validation.

Bottom line: Provide enough evidence: links, demos, outputs; so reviewers can actually assess your work. A primary rejection reason is uncertainty about whether your projects are substantive or just descriptions.

Research Experience

The median applicant is a Masters or PhD student listing several research projects from coursework, theses, or research assistant positions. The problem? These descriptions look identical regardless of whether the applicant contributed meaningfully or was a passive participant in a group project.

Recruiters cannot reliably assess research ability from project descriptions alone. Instead, they look for concrete, verifiable deliverables: particularly publications that have undergone external review. A first-author paper at a known conference provides far stronger evidence of research ability than any CV description.

In practice, I often put applicants into these categories:

No experience, or only small projects (few months) without public output or middle-author on any paper
First-author on a NeurIPS/ICML/ICLR workshop paper, another ML conference, or a high-quality technical LessWrong post
First-author on a NeurIPS/ICML/ICLR full paper

Some advice:

For upskilling programs (ARENA, SPAR, etc): If your goal is acceptance into programs like MATS, Anthropic Fellows, Goodfire Fellows, or similar, focus on producing at least a LessWrong post that you can submit to a conference workshop.
Visibility matters: If you are first author on any ML paper, make it immediately obvious on your CV and application.
Be precise: Don’t write “ICLR” if it was only an ICLR workshop paper or merely submitted but not accepted. Recruiters notice self-aggrandizement. This is also true for the other direction: If your work has been accepted to a prestigious journal, let them know.
Work in progress counts: If you’re drafting your first research write-up, this is research experience - but you must show evidence. List it under publications on your CV, clearly label it as “In Preparation,” and link to a Google Drive PDF of the current draft (with your name on it). There’s a huge difference between “worked on research project for 4 months” and “did research, currently drafting for publication, here’s the draft link.”
Polish your Google Scholar: If you have a profile, clean it up. Neel Nanda has a helpful guide: https://x.com/NeelNanda5/status/1864799600869326873

AI Safety Interest and Motivation

Reviewers evaluate two things: (1) what you’ve done that demonstrates AI safety interest, and (2) how you articulate your motivation in the application itself.

These programs invest significant time and money in you on the premise that you’ll pursue AI safety work. Being junior with limited prior engagement isn’t necessarily disqualifying, but lack of genuine motivation is. Demonstrating previous engagement helps, but how you frame your interest matters just as much.

On the contrary, some programs think in counterfactuals. They rather spend money on a promising junior person who might otherwise not be able to do research in the field than an accomplished researcher who would do the same anyways.

Common mistakes:

Dismissing the question: Writing “I haven’t done AI safety stuff yet” without explaining your motivation or interest (yes, there are surprisingly many applications like that).
Low-effort applications: Research success requires strong intrinsic motivation and sustained effort. If your application has incomplete answers, obvious copy/paste errors, or poor formatting, reviewers assume you’ll show the same lack of effort during the program.
Generic reasoning: If your answer could describe any applicant, it’s too generic. Reviewers want specifics. What got you interested? Why? Weak signals: “I took a university course” or “someone shared a program link and it seemed interesting.” Strong signals: specific, personal motivations like “I watched a friend develop mental health issues from replacing social connections with ChatGPT and want to work on AI alignment” or “I studied neuroscience and want to apply those methods to AI models because it’s a promising research direction.”
AI-generated reasoning Your goals are to (a) convince reviewers you’ll sustain effort throughout the program and (b) make your application interesting or insightful to read. AI-generated answers signal low-effort. Reading AI-generated slop is not insightful. The reviewer should have fun. Reviewers are (still) human, and having to read 100 AI-generated motivation statements makes humans sad. Sad humans don’t want to work with you.
Missing the “why”: Listing AI safety courses or programs you’ve completed without articulating why you’re genuinely interested in this work.
Wrong length: Too brief signals low effort or lack of substance. Too long is equally problematic. As the saying goes: “I didn’t have time to write a short letter, so I wrote a long one instead.” Writing long responses is easy; distilling your message into concise, information-dense prose takes real effort. When reviewers see overly long answers, they interpret it as disrespect for their time (they’re reading hundreds of applications) and lack of effort on your part to make your case clearly and efficiently. They’ll simply skim or skip it. This skill of writing concisely matters for paper writing too. Front-load your most convincing arguments in the first one or two sentences. Respect reviewers’ time by putting in the work to help them understand your answer quickly. Don’t assume they’ll work hard to extract meaning from convoluted or lengthy text.

References

References are primarily a checkbox: they rarely boost your application significantly, but they can actively hurt it in a meaningful fraction of cases. Most references are neutral, providing standard positive statements that don’t move the needle. Exceptional references include specific examples of what makes the applicant outstanding (e.g., “solved this difficult technical problem that stumped the team” or “independently proposed and executed this research idea”).

Why some references fail the checkbox:

Both referees from the same project: If you’ve listed 10 different positions/projects on your CV, why are both references (e.g., supervisor and coworker) from the same project? This looks suspicious and suggests either limited engagement elsewhere or that other supervisors wouldn’t give you strong recommendations.
Referees who barely know you: Professors whose course you attended, supervisors from a 2 week project, or anyone without substantial direct experience working with you. These provide weak, generic recommendations that fail to be convincing.
Wrong reference group: Most programs now ask referees to rank you against a comparison group rather than write lengthy letters. This makes it more useful for them and values referee’s time. If you ask someone with a very strong reference group (e.g., a senior professor who works with top students) where you’d rank as median or below, your reference looks weaker than if you’d asked a more junior person (e.g., a coworker or recent graduate) who thinks you’re exceptional in their reference group. Strategic selection matters.

CV

Surprisingly many CVs are poorly structured, even though guidelines exist. Here are specific issues to check that standard guides often miss:

Prioritize real estate wisely: A common mistake is burying major education (Bachelor’s, Master’s, PhD) into single lines while giving disproportionate space to minor 2 month internships or small projects. Allocate CV space according to importance: the most significant positions where you invested the most time and effort should get the most real estate.
Add hyperlinks: Few applicants do this, but it’s highly effective. Include links in your header to your personal website, GitHub, Google Scholar, and LinkedIn. For individual entries, link directly to publications, project demos, or repositories. This creates an impression of substantial public, verifiable output and makes reviewers happy by giving them easy access to details they want to investigate. A happy reviewer is more likely to favor your application.
Provide concrete evidence: For positions where your contribution might be unclear (smaller projects, research roles, etc.), don’t just describe what you did. Link to or specify actual outputs: GitHub repositories, publications, product demos, videos, etc. This is especially crucial for technical work, where CVs often leave reviewers uncertain about the depth of your technical contributions. Full-time roles at known companies need this less, but for everything else, verifiable outputs strengthen your CV significantly.

Low-Hanging Fruits for Junior Applicants

If you’re very junior and short on time before the deadline, here are the highest-impact improvements you can make with minimal time investment:

Master the coding assessment: Download CodeSignal’s whitepaper (https://codesignal.com/resource/industry-coding-framework/), which outlines what they expect applicants to know and what they exclude. Train specifically on those topics. Use an LLM to generate practice assignments based on their specifications.
Craft a compelling narrative: If you lack AI safety engagement, focus on why you’re committed now and why you’re worth the investment. Strong examples: “I read about AI safety and quit my job to pursue this full-time” or “I’ve spent years in quant trading / frontier AI research and am now pivoting to safety.” Organizations think in counterfactuals: if you have strong potential and they could be the tipping point in your transition to AI safety, they’ll see high counterfactual impact and be more likely to accept you despite limited direct experience.
Add technical evidence to your CV: Review your CV for opportunities to demonstrate technical ability. If you’ve done technical projects on GitHub, write clear READMEs and link them, even if the projects are basic. Most applicants don’t provide this, so it’s a positive signal even for simple projects (just ensure they’re independent work, not copied coursework).