Portfolio Founder potential, realized

Across investments in enterprise and consumer at seed and early growth stages, see why portfolio founders consistently say we're the most valuable investors on their cap table.

companies

Jobs

My job alerts

RL Environment Reviewer

Preference Model

This job is no longer accepting applications

See open jobs at Preference Model.See open jobs similar to "RL Environment Reviewer" SignalFire.

San Francisco, CA, USA

Posted on Apr 27, 2026

Location

San Francisco

Employment Type

Full time

Location Type

On-site

Department

Engineering

About us

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. We are backed by a16z.

About the role

Every RL environment we ship needs to survive a model that is actively trying to game it. A task with a weak grader or an exploitable reward signal is worse than no task at all: it teaches the model to hack rather than reason. We need someone whose full-time job is finding those holes before the model does.

We've learned that domain knowledge alone doesn't make a good reviewer. The people who are best at this have spent time thinking adversarially: designing problems that are hard to game, breaking other people's problems, or researching reward hacking directly.

What You Will do

Review RL environments and training tasks for correctness, robustness, and resistance to reward hacking
Identify ways a model could exploit graders, game evaluation criteria, or shortcut past the intended reasoning
Work directly with environment authors to tighten graders, fix reward signals, and redesign tasks that don't hold up
Develop and maintain review standards and checklists as we scale from hundreds to thousands of tasks per month
Advise on grader design during environment planning, before tasks are built, not after

What We are Looking For

You think like an attacker. You've spent real time designing problems that are hard to game, or breaking problems other people thought were solid. You have enough ML knowledge to understand what a model might try, and enough engineering sense to evaluate whether a grader actually tests what it says it tests.

Must have:

Track record of adversarial or constructive problem design: competitive programming problem authoring (ICPC, Codeforces, etc.), CTF challenge design, or similar
Familiarity with RL, reward hacking, and specification gaming (you've read Amodei et al., Krakovna's list, or similar work, and you've thought about it beyond surface level)
Strong Python reading skills
Ability to articulate clearly in writing why a task is broken and what needs to change

Any of these would make you stand out:

Published research on reward hacking, specification gaming, RLHF robustness, or AI safety
Background in security engineering, penetration testing, or red-teaming (with enough ML context to apply that mindset to RL environments)
Experience authoring or reviewing problems for competitive programming contests
You've built automated evaluation systems and know where they break
You've worked on LLM evaluation, benchmarking, or alignment research

What We Offer:

Competitive cash and equity compensation (>90th percentile)
Ownership and autonomy in a fast moving startup environment
Opportunity to work with top machine learning engineers
Health, vision, dental, benefits
401K match
Visa sponsorship & relocation support available

We value diverse perspectives and experiences. If you're excited about this role but don't check every box, we still encourage you to apply.

This job is no longer accepting applications

See open jobs at Preference Model.See open jobs similar to "RL Environment Reviewer" SignalFire.

See more open positions at Preference Model

Powered by Getro.com

Privacy policy Cookie policy