Reinforcement Learning Environments Engineer
XOR.ai
XOR is exclusively hiring on behalf of an elite Silicon Valley AI startup currently operating in stealth mode.
Our partner is redefining the future of AI by building the next generation of training data. While today’s LLMs are powerful, they often struggle with real-world tasks that fall outside their training distribution. This team is solving that by creating sophisticated reinforcement learning (RL) environments that ground AI feedback in reality.
Why Join?
- Elite Lineage: The founding team comes directly from Anthropic’s data team, having built the core data infrastructure, tokenizers, and datasets behind the Claude models.
- Tier-1 Backing: Backed by the world’s most prestigious Silicon Valley VCs (Seed round).
- Strategic Impact: You will work directly with top-tier AI labs, influencing the timelines and priorities of the world’s most advanced models.
- True Innovation: This isn't about "wrapping an API"—it's about architecting the environments where the next leap in intelligence will happen.
Brief Description of the Vacancy
We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.
Key Responsibilities:
- Architect Environments: Design and build production-grade MLE/SWE environments for LLM interaction.
- Model Targeting: Tailor tasks to specific language models while maintaining a rigorous difficulty distribution.
- Rapid Delivery: Once onboarded, maintain a high-velocity output (~1 complex task per 8-10 hours).
- Iterative Design: Refine and edit tasks within 24 hours based on customer/researcher feedback.
What we’re looking for (must-haves)
- Strong Python (engineering-quality, not notebook-only).
- Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
- Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
- Docker + production mindset (debugging, reliability, iteration speed).
- ≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
- Ability to meet throughput expectations and respond quickly to feedback.
Strong Signals (Nice-to-Haves):
- Experience designing environments/tasks for RL and/or evaluations.
- Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
- ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
- Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
- Exposure to RL / bandits / agentic systems (not required, but a strong signal)
Not a fit if
- You’re primarily a prompt engineer without strong ML/engineering foundations.
- You’re a research-only / academic-only profile with little or no shipping/production ownership.
- You’ve only built in notebooks or rely heavily on managed AutoML tools.
Compensation & Benefits
- Base Pay: $90 – $160 USD / hour ($15,000 – $22,500 monthly equivalent), based on seniority and technical performance.
- Performance Bonuses: Monthly bonuses based on task delivery and quality.
- Flexibility: 100% Remote, 40 hours per week, with a flexible schedule.
- Growth: A clear potential path to Full-Time Employment (FTE) and relocation for high performers.
The Hiring Process
- Application: Submit your CV and a brief note on your technical track.
- Initial Challenge: A short take-home form/task to assess baseline skills. You can also schedule a call with XOR during this stage to learn more about the client.
- Technical Deep Dive: An interview with the client's technical leadership.
- Final Coding Task: A comprehensive assignment to prove your production-ready skills.
Note: Time spent on the final take-home assignment is compensated if you receive an offer.
