What's Really Happening When You Talk to an AI
Tokens, transformers, context window, system prompt, tools: the conceptual foundations for really understanding how ChatGPT, Claude, or Gemini work. No equations.
The essays lay down the principles. These articles put them into practice.
Each guide explores a technical topic in depth — RAG architecture, prompt engineering, evaluations, fine-tuning — with code examples, decision trees, and lessons drawn from production systems like WHOOP Coach and Cursor.
Tokens, transformers, context window, system prompt, tools: the conceptual foundations for really understanding how ChatGPT, Claude, or Gemini work. No equations.
An inventory of the techniques that fill the window, the phenomena that degrade it, the heuristics to master it. And along the way, the most expensive anti-pattern in production agents.
The full RAG pipeline — chunking, embedding, retrieval, reranking — and the production concerns that separate prototypes from systems that work.
A practical guide to multi-agent patterns — orchestrator-workers, pipelines, ensembles, and swarms — and where they break.
Fine-tuning changes how the model thinks. RAG changes what it sees. A practical decision framework for when to use each — and when to use both.
A practical guide to LLM evaluation — code-based checks, LLM-as-a-judge, human review, and how to build an eval suite that catches regressions before they ship.
Patterns that separate prompts that work in demos from prompts that work in production — context management, structured outputs, few-shot engineering, and version control.