Software Engineer
Software Engineering
San Francisco, CA, USA
About XOR
XOR is a platform that helps world-class companies pushing the frontier of AI hire exceptional ML, RL, and AI engineering talent.
About Our Client
Our client is a well-funded AI startup working on next-generation training systems for large language models. The team is small, technical, and moving fast, with a strong focus on hands-on engineering over process.
About the Role
We're looking for engineers who are keen to find where the best coding models still fail on real software work - large codebases with existing conventions and technical debt, ambiguous design decisions, multi-step problems - and build rigorous, gradeable test cases around those failures. You'll own each case end to end.
What You'll Do
- Hunt for where coding models break across software, and build the hard, high-fidelity scenarios that expose those failures and push the ceiling of what the best models can do
- Own the hardest problems on the roadmap end-to-end: multi-step workflows, realistic stakeholder interactions, large codebases with real conventions and technical debt, and challenging system design
- Build verification robust enough that a model can't hack it, and tell genuine capability gaps apart from artifacts of your own setup
- Direct coding agents heavily in day-to-day work, evaluate their output critically, and recognize when they are failing in subtle ways
- Build the tooling your own work depends on
- Mentor newer engineers on the team as it grows
What We're Looking For
- Deep software engineering experience across multiple domains, with genuine expertise in at least one specialty: infrastructure, distributed systems, performance, security, compilers, databases, or similar
- Proficiency in Python
- Extensive hands-on experience with coding agents, including an intuition for where they cut corners and how to direct them well
- Strong intuition for how models behave, even without prior ML or AI experience - you can anticipate where a model will take shortcuts and design around that
- Comfort working independently on complex, ambiguous problems with minimal direction
- Track record of owning work end-to-end in previous roles
Nice to Have
- Senior or staff software engineer at a company known for engineering rigor (e.g., a frontier AI company, infrastructure startup, or systems-heavy team) wanting to apply that experience to model training
- Deep specialty expertise in an area current models struggle with (distributed systems, low-level performance, security, compilers)
- Excited about building a new hard problem from scratch on a regular basis
- Early engineer at a previous startup who shipped independently and wants to do it again in AI
- Significant time spent building with coding agents, writing about their failure modes, or contributing to agent evaluation work
