Engineering notes from the agent era
Most posts here are about what changes when the agent becomes a first-class collaborator: local code intelligence, testing systems that analyze behavior over time, knowledge tooling that compounds. The rest of the career (cloud, low-latency, mobile) shows up when something's worth writing down.
Backtesting AI Agents: Replay to Catch Regressions
54% of enterprises ship AI agents in production. Most cannot tell when a CLAUDE.md edit silently regresses behavior. Backtesting is the missing discipline.
Context Engineering in Practice: Where Does Each Piece Go?
Context engineering became the #1 2026 skill shift. Anthropic's research notes context exhibits n² token relationships. Here's the per-surface decision framework.
- context engineering
- Claude Code
- MCP
- AI agents
- developer productivity
Treat AI as a Team Member, Not a Chat Window
84% of developers use AI, 46% distrust it. The right scaffolding (constitution, skills, memory, MCP, subagents) turns an assistant into a team member.
- AI agents
- developer productivity
- Claude Code
- MCP
- team workflows
How to Track Claude Code 5-Hour Window Usage
40.8% of devs use Claude Code, but the 5-hour window is opaque. Build a local dashboard that parses transcripts and estimates your token budget.
- claude-code
- token-usage
- developer-tools
- ai-coding
- cost-tracking
Your AI Agent Is Flying Blind Without Local Code Intelligence
84% of developers use AI tools but 46% distrust the output. Three on-device models, 32 MCP tools, 9.93/10 relevance, and zero source code leaving your machine.
- local code intelligence
- AI agents
- MCP
- code search
- developer tools
Building an LLM Wiki: From Karpathy's Gist to a Working CLI
I turned Andrej Karpathy's LLM wiki concept into a Bun CLI (~500 lines of TypeScript) that automatically builds a persistent knowledge base from Claude Code sessions, files, and URLs.
- llm
- cli
- knowledge-management
- claude-code
- bun
How Do You Test Systems That Analyze Behavior Over Time?
Backtesting borrows from quant finance to catch temporal bugs unit tests miss. Poor US software quality costs $2.41T per year. Here's the technique.
- backtesting
- software-engineering
- data-pipelines
- temporal-data
- regression-testing
- synthetic-data
- developer-tooling