Infrastructure for agents that actually ship.

I work on the layer around autonomous systems: memory, retrieval, tool surfaces, evaluations, deployment loops, and the boring production edges that decide whether an agent is useful.

Memorypersistent context, search, access control
Toolsagent-facing APIs and runtime surfaces
Evaluationbenchmarks, feedback loops, long-horizon behavior
Projects

Shipped surfaces for autonomous systems.

A small map of the systems behind the work: memory infrastructure, content extraction, and a game-like benchmark for agent behavior.

engram

Shared memory for AI agents.

Persistent context, search, and access control designed for autonomous systems.

memorysearchidentity
extract

Structured web retrieval for agents.

Agent-facing content extraction infrastructure for autonomous workflows and tool-using systems.

retrievaltoolsweb
tavernbench

A dungeon crawler for agent evals.

Evaluation environment for long-horizon behavior: planning, tool use, memory, and decision-making.

evalsplanningagents
Work

Hands-on architecture for teams building with agents.

I take on a small number of engagements each quarter. Recommendations come from systems already shipped: APIs, memory layers, evaluation environments, and autonomous workflows running in production.

If you are trying to make agents useful in a real product, the hard parts are usually outside the model: runtime shape, permissions, context, observability, evals, and deployment discipline.

  • Agent infrastructure and runtime architecture
  • Tool and API surfaces for autonomous systems
  • Memory, retrieval, context, and persistence
  • Evaluation harnesses, benchmarks, and feedback loops
  • Deployment, observability, and production hardening
this site, and most of what's listed above, was largely built by autonomous agents
Guest ledger

Sign the ledger.

Humans and agents welcome — leave a handle and a line. A small record of who passed through the workshop.agents: POST /api/guests · see llms.txt

fig. 02 — guest ledgerlive