AI Search Engineer
Shade Inc.
Software Engineering, Data Science
New York, NY, USA
USD 160k-220k / year + Equity
Location
New York City
Employment Type
Full time
Location Type
Hybrid
Department
Engineering
Compensation
- $160K – $220K • 0.3% – 0.5%
In-person (3 days/week minimum) · $160-220K base + 0.3-0.5% equity
Shade is scaling, fast. In a year and a half, we’ve built out the combined tech of Frame.io (acquired by Adobe for $1.275B) and LucidLink ($40M ARR) while combining it with proprietary AI search and labeling. We’re a critical piece of infrastructure for post-production houses, creative agencies, sports teams, and internal media teams at large companies.
Customers include Salesforce, Snowflake, A24, the Boston Celtics, Hello Fresh, Deloitte, and Motorola. We’re already ingesting 20-30% of Dropbox’s daily data: 50TB a day, 10M minutes of video a month, hundreds of millions of assets. We’re growing 200% QoQ with 120% NRR.
We also just closed a $14M Series A, backed by Khosla, General Catalyst, Construct Capital, Contrary, SignalFire, and Bling.
How we think about Shade:
Search at petabyte scale is unsolved. People want a multi-modal search engine that finds what they mean (not what they typed) across vector, transcript, metadata, and folder structure. This is the role you’d own.
Data transfer is unsolved. From hot storage → archive storage, cloud → cloud, camera → editor, moving high volumes of data is still flaky, unreliable, and difficult. We’re building the tooling and UI directly into the platform.
Version control wasn’t built for creatives. Git history is useful for engineers; the same concepts are useful for media teams. We’ve built the backend (every version of every file saved as a commit) but the creative UI isn’t there yet.
Storage layers don’t talk to other tools. Files move constantly between project management tools, AI tools, ad generators, MCP servers, and Premiere panels. Someone has to store them and reconcile versions. We want to be that layer.
Why this role exists
Shade is a kernel-level distributed file system. The technical work it took to get here—intercepting OS calls, keeping distributed state across machines, streaming bytes from the cloud as if they were local—is what makes the search layer worth building. Google and Dropbox can’t easily replicate the foundation, which means they can’t easily replicate what gets built on top of it.
On the strategic side: Dropbox already stores hundreds of petabytes. Reformatting that data, reindexing it, and restructuring it into both a streamable file system and a fully indexed search engine is a gargantuan effort. We started at zero petabytes. Their inertia is our advantage.
But the advantage only matters if the search itself is great. We have the foundation (file system, data transfer, version control, and integrations are all in active development), but we don’t yet have the engineer who’ll make the layer on top of it as good as the foundation deserves.
The problem you’ll own
Search at Shade today is good at generic queries: “skiing down a hill,” “B-roll of someone with a laptop.” Where it falls short is on business-specific intent. Grüns, one of our customers, makes gummy vitamins; they want to find clips of gummies falling from the sky. Our vectors aren’t strong enough alone to retrieve that, and pure semantic search doesn’t get there. The standard fix is LLM re-ranking, which works fine until you do the math at our scale.
We’re ingesting hundreds of millions of assets, and we hold ourselves to a 70-75% gross margin. Whatever indexing and retrieval pipeline you build has to be both more precise and more cost-efficient than what’s there now. Every architectural decision—what gets cached, how often things get re-indexed, what the chunking strategy is—has a margin consequence.
The system already does multi-signal retrieval. It works, but it needs to handle the cases it doesn’t at the cost it has to. You’d own that work end to end: designing what to index and how, building the LLM re-ranking layer, owning the eval harness, and deciding what to keep and what to rebuild.
Stack
Shade is built on Python, NodeJS, Next.js, and C++ with a Postgres database. The AI Search role lives in the Python backend, with pgvector for vector storage. You wouldn’t write C++, but you’d integrate with it.
Our core tenets for design
Keep dependencies as minimal as possible. You are the summation of your subprocessors’/dependencies’ issues. To build a durable and reliable company you must be deliberate when you add dependencies and control the vision of all the code you ship.
Monolith > microservices. Transactional everything requires one database.
Solve the core issue. Don’t invent a Band-Aid. If a database query is slow, address it directly rather than reaching for a cache.
The simplicity of fs.readFile() always wins. Have you tried to access files in a Dropbox local drive from your programs? It doesn’t work: files must be manually downloaded in their entirety to be accessed. We've built Shade to be accessible like a hard drive where files are streamed. Building an AI video editor? Works with Shade. Using n8n automations? Works with Shade. Using DaVinci Resolve? Works with Shade.
Our core tenets for the team
When we hire, we like to keep those hires. Because of that, we offer benefits on top of salary and equity:
Free lunch (under $30)
Free dinner (under $30) if you stay more than 9 hours
Fully covered health insurance, including dental and vision
401k with % match
Unlimited PTO
Lifetime gym membership
Commuter benefit for the subway
Qualifications
The greatest qualification in our eyes is that you can ship and maintain high volumes of quality code. If you’ve built side projects that are used by thousands of people or worked at companies where you’ve owned features end to end then we’re probably excited about you. What (we think) this looks like in bullet points:
3+ years of full-time engineering experience, with at least 2 years owning AI search or information retrieval in production end to end
Strong Python experience, including building and maintaining backend services and data pipelines
Hands-on experience with LLM-powered search: RAG, re-ranking, hybrid retrieval, embeddings
Experience operating vector pipelines at scale: chunking, metadata enrichment, backfills, continuous re-indexing
Experience building eval harnesses and measuring retrieval quality
Experience at a pre-Series B startup
MCP server work is a bonus
Who’ll thrive here
The strongest pattern in hiring has been people who’ve felt the file system pain firsthand. Brandon was a videographer; I directed films before I wrote code. Most people here were creatives before they were engineers.
The second pattern is opinionatedness: engineers willing to fight for an idea, push back when they disagree, and hear they’re wrong without taking it personally. The work depends on people doing all three.
The third is comfort with ambiguity. When you go online to figure out how to solve something at our scale, the answers usually aren't there. The engineers who do well here are the ones who treat that as a feature, not a bug.
Compensation Range: $160K - $220K
