##friction-driven
Bootstrap the loop by letting friction surface what's missing. Don't over-engineer prompts upfront — let failures reveal the gaps, then encode the fixes into persistent context.
prompt engineer · hermosa beach, ca
I design the systems engineering teams use to ship AI features safely — multi-agent code-change workflows, subagent contracts, schema-first prompts, evaluation harnesses. My job is to make the AI engineering SDLC itself a designed product.
I build reliable AI coding agents at enterprise scale. The answer isn't better models — it's better prompt engineering and systematic evaluation.
My methodology: let friction surface the gaps. Every time I correct an agent twice, that correction becomes a persistent constraint. The result is behavioral specifications across the full Claude Code toolkit — CLAUDE.md, Skills, Rules, Hooks, Commands, Memory — that make agents reliable from the first prompt.
This approach has shipped 4 production applications with compounding velocity. Each project faster than the last as learnings accumulate.
I'm now focused on a level up: from making one agent reliable to making the system an engineering team uses to ship with AI feel safe. Multi-agent orchestration with hard tool-call budgets, file-domain isolation between subagent teams, JSON-schema-enforced outputs, risk-driven mode selection, and durable task-log contracts between planning and execution. The work shifts from "how do I prompt this well" to "how do I design a workflow where the model can't go off the rails — and where the cost of being wrong is bounded by the contract, not by hope."
A systematic approach to making AI coding agents reliable at enterprise scale.
"I design systems that trigger their own improvement cycles. Friction detection, learning, and replanning happen continuously — not when I remember to ask."
the L3 thesis — self-improving systems
Every principle below serves this goal: removing the human as the bottleneck for planning, learning, and iteration.
Bootstrap the loop by letting friction surface what's missing. Don't over-engineer prompts upfront — let failures reveal the gaps, then encode the fixes into persistent context.
Every solved problem documented in searchable format. Agents search past solutions before planning. The system accumulates intelligence across sessions — knowledge compounds automatically.
Claude generates the prompts that Claude executes. Design specs, implementation plans, behavioral constraints — all authored by AI, curated by humans. The engineer designs the system that writes itself.
Full Claude Code toolkit: CLAUDE.md for behavioral specs, Skills for capabilities, Rules for constraints, Hooks for automation, Memory for persistence. Context inheritance flows from project root through the directory tree.
Prompts are hypotheses, not products. Ablation testing reveals load-bearing components. Cross-model validation proves generalization. LLM-as-judge scales evaluation. If you can't measure it, you're guessing.
The prompt's job is to populate a schema, not to produce prose. Define the JSON output shape first; prompts converge faster, downstream consumers don't need to parse intent, and disagreement becomes typeable.
Hard tool-call budgets, retry caps, and phase boundaries are not pessimism — they're the contract that makes a multi-agent system debuggable. Without budgets, "the agent kept trying" is indistinguishable from progress.
Systematic approach to reliable AI agent development
A comprehensive methodology and toolkit for making AI coding agents reliable at scale. Demonstrates measurable acceleration in project delivery (8→6→4→3 weeks) through friction-driven context refinement, hierarchical CLAUDE.md specifications, and autonomous verification loops.
impact
Demonstrates how systematic context management transforms agents into reliable executors at scale. Behavioral constraints in CLAUDE.md encode values as persistent rules rather than per-prompt instructions — friction-driven refinement is reinforcement learning at the workflow level. Directly applicable to any team scaling AI coding agents in production.
AI persona dialogue system with prompt ablation research
Open-source platform for multi-persona philosophical dialogues, paired with systematic prompt engineering research. The ablation study tests which system prompt components are load-bearing vs decorative when embodying famous historical figures — including cross-model validation across Haiku, Sonnet, and Opus.
impact
Research contribution to prompt engineering: for well-known personas, minimal prompts work as well as elaborate ones — and this holds across model capability tiers. Even Haiku produces recognizable Nietzsche with just "You are Friedrich Nietzsche." The finding has implications for prompt optimization, context-window efficiency, and how teams reason about what their system prompts are actually buying.
Production work — written up in industry-portable terms.
A production prompt suite for multi-phase code changes with subagent teams, hard budgets, and schema-enforced contracts.
A production prompt suite for multi-phase code changes — plan, work, review — with subagent teams, hard tool-call and retry budgets, and schema-enforced output contracts between phases.
Each change is sized into a Light or Heavy track up front. Light mode handles isolated UI tweaks, copy edits, and refactors that don't touch schemas or contracts. Heavy mode triggers when a change touches one of the danger surfaces: models, migrations, auth, services, or API contracts. The mode determines which subagents are invoked, what their tool-call budgets are, and what's required in the review phase.
Subagents are file-domain-isolated. The backend implementer can only touch files matching the backend domain; the frontend implementer can only touch the frontend domain. Both write to a shared task log — a structured JSON file that acts as the durable contract between phases. The plan phase populates the task log; the work phase reads from and writes to it; the review phase verifies against it.
Output is schema-enforced. Subagents don't return prose — they return JSON conforming to a per-subagent schema, validated by the orchestrator before the next phase starts. Disagreement becomes typeable; downstream consumers don't have to parse intent.
// outcome
A multi-agent workflow engineering teams can use without constantly babysitting the loop. The system fails closed — when budgets exceed or schemas don't validate, the workflow halts and escalates instead of silently degrading. The meta-tool, not the agent.
Surveyed another team's prompt suite, identified five patterns that travel, adopted them into a different greenfield codebase, and wrote the playbook.
The premise: prompt engineering is converging across teams, but nobody's harvesting what travels. I surveyed a brownfield team's prompt suite — roughly 6,800 lines, 21 slash commands, 5 subagent templates, and a year of accumulated tooling — and read it the way you'd read another engineer's library: looking for what's generalizable versus what's load-bearing on their specific context.
Five patterns travelled. (1) Risk-driven mode selection — sizing a change up front based on what surfaces it touches, then routing through different workflows. (2) File-domain isolation between subagents — preventing the backend agent from rewriting the frontend during a "while I'm here" detour. (3) Task-log files as durable phase contracts — JSON-on-disk as the source of truth between plan and work phases, surviving context window resets. (4) Active-fixer review agents — agents that resolve high-severity issues in-place under a budget, not just file a report. (5) Frontmatter-registered subagents — discoverable, restricted, with declared tool sets.
I adopted all five into a different team's greenfield suite — adapting them to the new codebase's conventions and constraints — and wrote a teammate-facing report so other AI engineers could pilot the same patterns without re-doing the survey.
// outcome
Multi-team capability lift from one engineer's archaeology. The meta-skill is pattern extraction across codebases — knowing which conventions survive translation and which are scaffolding.
Actual prompt-engineering work and methodology artifacts.
Multi-step Figma-to-React implementation specifications with 1500+ line prompts including component inventories, state machines, and acceptance criteria. Generated via Claude + Figma MCP iteration, executed by Claude Code.
Frontend behavioral specifications with "NEVER" constraints, service layer architecture enforcement, autonomous debugging workflows, and agent delegation triggers that make agents deterministic executors.
Registered subagents with frontmatter metadata, restricted tool sets, JSON output schemas, and active-fixer (not passive-reporter) contracts under hard tool-call budgets. A subagent's contract is its API — what it can call, what it can return, what counts as done.
The goal isn't to remove agent thinking — it's to shape it. Deterministic specs handle the 80% where execution matters. Behavioral constraints handle the 20% where agents must reason. The art is knowing which is which.
"We are not prompting anymore. We are orchestrating."// the L3 thesis
Context engineering is greater than prompting. The CLAUDE.md is where the magic lives — it's the constitution that makes agents reliable.
Each project teaches you where humans are doing work agents could do. Find the bottleneck. Give the agent eyes and hands. Encode the learnings. Compound.
The next frontier isn't better prompts — it's designing agent behavior. Not "do X" but "think this way before deciding." L2 prompting tells agents what to do. L3 engineering shapes how they reason. The difference: one makes agents execute, the other makes them reliable at novel tasks.
The unit of design isn't the prompt — it's the workflow. Multi-agent systems with hard budgets, schema-enforced outputs, and durable task-log contracts move prompt engineering from craft to discipline. The prompt is one component of a system; the system has invariants you can engineer for.
Designing the prompts and the systems that hold them.
CLAUDE.md · Skills · Rules · Hooks · Commands · Memory
System prompt design, evaluation suites, and full-stack context engineering across the complete Claude Code toolkit. Behavioral specs, capability modules, constraint systems, automation hooks, and persistent memory — orchestrated for reliable agent execution.
plan → work → review · file-domain isolation · JSON schema enforcement
Multi-phase workflows with subagent teams. File-domain isolation, hard tool-call and retry budgets, JSON-schema-enforced outputs, risk-driven mode selection, durable task-log contracts between phases. Registered subagents with frontmatter, restricted tools, and active-fixer contracts.
JSON Schema · output contracts · typed disagreement
The prompt populates a schema; the schema is the artifact. JSON-schema-enforced subagent outputs make multi-agent disagreement typeable and downstream consumption parser-free.
iterative agent calibration
Converting conversation friction into persistent context. Building "compressed histories of agent failures" that evolve from project learnings.
agent failure studies · constraint design
Studying how agents fail to understand why. Converting failure patterns into behavioral constraints that prevent entire categories of errors.
steelman prompts · challenge loops
Building structured disagreement into agent workflows. Forcing agents to argue against their conclusions before presenting them — L3 behavioral engineering.
The infrastructure agents work inside of — local CLI, MCP, parallel orchestration.
slash commands · registered subagents · output design
Designing the workflows engineering teams use to ship AI features safely. Slash commands, registered subagents, output design as a first-class concern, schema-bounded contracts. The meta-tools, not the agents.
Puppeteer · Chrome DevTools Protocol
Enabling agents to see and interact with the UI. Local CLI tools have far less overhead than MCP for autonomous debugging workflows.
port management · CDP integration
Spinning up isolated worktrees for parallel AI agents with deterministic port allocation. Agents start and work reliably by design.
Remote MCP · tool exposure
Exposing tools to agents as MCP servers. Figma Remote MCP for design integration, custom CLI tools for validation and verification.
launchd · perl · structured prompt pipelines
Health-check pipelines delivering structured notifications via launchd-scheduled diagnostics. Multi-line prompt substitution via perl + env vars. Self-locating data files for scripts deployed outside the repo. Autonomous PM-style workflows.
What ships the AI features once the agent's work is done.
React 19 · Next.js · Vite 7 · TypeScript
Building responsive and interactive user interfaces with modern React features, Next.js for optimal performance, and Vite for fast development.
FastAPI · Python 3.12 · Node.js · ASP.NET Core
Creating robust server-side applications with FastAPI microservices, async programming, and clean architecture.
asyncpg · asyncio.gather · SQLAlchemy AsyncSession · Celery
asyncpg, asyncio.gather concurrency-safety, SQLAlchemy AsyncSession lifecycle. Familiar with the asymmetries that matter in production — what SQLite tolerates that asyncpg crashes on, where Loguru silently swallows extra= kwargs, when a bound async session can and can't be shared.
PostgreSQL 16 · SQLAlchemy 2.0 · Supabase · SQL Server
Designing and implementing efficient database schemas with async ORMs, migrations (Alembic), and secure data management.
Pydantic / SQLAlchemy / FastAPI / React / TypeScript
Production root-cause work across Pydantic, SQLAlchemy, FastAPI, React, TypeScript. Multi-layer defenses for schema invariants (Alembic clamp + DB constraint + Pydantic + read-time helper). Backend/frontend parser parity (snake_case ↔ camelCase, JSON-escape normalization). Migration safety on existing rows.
Tailwind CSS 4 · Radix UI · Storybook · Figma Code Connect
Crafting beautiful and intuitive user interfaces with modern design principles, component libraries, and bidirectional design system sync.
pytest · Vitest · React Testing Library · MSW
Comprehensive testing strategies including unit, integration, and API mocking for reliable software delivery.
Dispositions — how I think about engineering teams, not just code.
survey · identify · encode
Surveys other teams' prompt suites and tooling, identifies what travels and what's team-specific, encodes shared patterns into reusable tooling. The meta-skill is knowing which patterns survive contact with a different codebase.
meta-tools, not agents
Builds the meta-tools engineering teams use to ship AI features safely. Treats the SDLC itself as a product surface — multi-agent workflows, subagent contracts, output design, schema-first prompts.
decomposition before polish
Treats agent output presentation as a design decision downstream of work decomposition, not a polish layer. What the agent says is determined by how the work is broken up; the prompt's structure is the user interface.
creator's default ↔ engineer's default
Comfortable in both modes — "creator's default" (conversational, exploratory, narrating) and "engineer's default" (state machines, budgets, schemas, silence-until-result). Knows which mode each task calls for.
University of California at Riverside
// open to AI engineering roles · prompt engineering · agent orchestration