I'm a founding engineer at Falconer, where I build production AI agent systems — tool execution, retrieval pipelines, evals, and streaming at scale. Before that, staff engineer at Stripe. MIT EECS. I take on a small number of consulting engagements.
Teams often overfit to rigid workflow pipelines — the kind that break the moment a user does something unexpected. In many cases a general agentic loop with well-designed tools works better, and the architecture decision is easier to get right early than to fix later. Most eval suites are also built on the happy path, which misses where the real failures happen and makes it harder to ship meaningful improvements. And premature model cost optimization tends to distort product decisions before you even know what you're building.
I advise on architecture, tool design, and eval strategy — the decisions that are expensive to undo. I also sometimes build: if you need an end-to-end prototype to figure out whether an idea is worth pursuing at all, I can do that in a tight sprint.
If you're building an agentic product and need a second opinion on architecture, evals, or whether something is actually worth building, reach out with a bit of context.