Portfolio Founder potential, realized

Across investments in enterprise and consumer at seed and early growth stages, see why portfolio founders consistently say we're the most valuable investors on their cap table.

companies

Jobs

Low-Level / Kernels Engineer

XOR.ai

San Francisco, CA, USA

Posted on Jul 5, 2026

About XOR

XOR is a platform that helps world-class companies pushing the frontier of AI hire exceptional ML, RL, and AI engineering talent.

About Our Client

Our client is a well-funded AI startup working on next-generation training systems for large language models. The team is small, technical, and moving fast, with a strong focus on hands-on engineering over process.

About the Role

We’re looking for experienced engineers for a Low-Level / Kernels team that builds training tasks at the lowest layers of the stack - GPU and accelerator kernels, vector ISAs, codec and crypto primitives, FPGA work, and more. These are domains where current frontier models are weakest: niche paradigms, hardware underrepresented in training data, and open benchmarks where models lag. The role blends research and engineering - you'll develop novel approaches and realize them in code, owning tasks end-to-end: choosing the domain, designing the problems, building the scoring and infrastructure, and hardening it against shortcuts.

What You'll Do

Design and build low-level / kernel-focused training tasks that target a specified model and difficulty distribution
Choose which tasks are worth building - targeting niche or genuinely hard domains, exercising real hardware features (tiling, streaming, async copy, vector ISAs), using interesting hardware or simulators (FPGAs, novel accelerators, gem5), grounded in benchmarks where models lag, with a recognized reference to measure against (cuBLAS / FFTW / OpenSSL / etc.), and scalable into many diverse tasks from a single design
Build correctness and performance scoring that's deterministic and can't be gamed - the objective is clear, and the only way to hit it is to actually write the kernel

What We're Looking For

Strong low-level / systems engineering: fluent in C / C++ / CUDA (or an equivalent kernel language), comfortable dropping to assembly when it matters
Strong, engineering-quality Python across prior work - production code, automation and deployment scripts, data analysis and plotting (not notebook-only)
Hardware-aware coding: writing with the silicon in mind, considering memory hierarchy, occupancy, data movement, parallelism, latency vs throughput
Kernel development experience: writing kernels and optimizing them iteratively against a profiler
An adversarial mindset: turning fuzzy goals into robust, ungameable scoring, and asking 'how would a model cheat this?'
Hands-on work with LLMs
Ownership and autonomy: building, debugging, and shipping end-to-end with minimal supervision

Nice to Have

Have shipped a kernel that approached state of the art and can explain the remaining gap
Depth in a niche hardware target or ISA: FPGA/HLS, RISC-V Vector, DSPs, SIMD/AVX, TPUs
Depth in an adjacent discipline: HPC/heterogeneous clusters, hardware design (RTL/HDL, HLS), compilers and kernel toolchains (MLIR/LLVM, Mojo, Triton, gem5), or formal verification (Lean, Coq, SMT)
Reads performance and architecture papers and turns them into running code
Open-source contributions others rely on
Strong competitive-programming background (ideally in a low-level language)
Experience building evaluation infrastructure or agent harnesses

See more open positions at XOR.ai