
How I Improved an AI Agent from 40% to 60% — With A/B Test Data
Same model, same test cases, 20% better results. 7 out of 8 fixes were pure code, zero LLM cost. Here's exactly what I changed and why it worked.
Long-form writing on production AI agent engineering — validation loops, tool-call reliability, completion ownership, context engineering, cost forensics.

Same model, same test cases, 20% better results. 7 out of 8 fixes were pure code, zero LLM cost. Here's exactly what I changed and why it worked.

Don't bind to a single AI. Run three in competition, make the final call yourself, and let results judge everyone. The operating model for staying valuable in the AI age.

Building an AI agent that works is easy. Building one that doesn't break is 90% of the work. Here's what that 90% actually looks like — from leaked source code and production A/B data.

How to use OpenAI's Codex plugin inside Claude Code — turning Claude Opus and GPT-5.4 into a dual-brain coding system. Setup, commands, rescue workflows, and when each brain wins.

Claude Code's memory system looks simple on purpose. This piece breaks down the tradeoffs behind Markdown memories, Sonnet side-queries, and the decision to avoid vector databases.

Claude Code doesn't just stuff conversations into a 1M-token window. It uses a 5-level compression pipeline, cache-aware edits, and a final autocompact fallback to keep sessions alive.

Inside query.ts — the 1,729-line async generator that is Claude Code's beating heart. 10 steps per iteration, 9 continue points, 4-stage compression, and streaming tool execution. With line numbers.

AI API calls are unlike ordinary RPC: per-request cost varies 100×, tokens and models are first-class, streaming muddies timing, caching changes the pricing. A T-shaped instrumentation architecture — shared stem, specialized arms — that handles tracing, billing, and cost analytics without any of them contaminating the others.

The MEMORY.md frontmatter spec from Claude Code's leaked source: 4 types (user, feedback, project, reference), 200-line index cap, LLM-based picker. What's right, where it breaks at scale.

Claude Code v2.1.88 accidentally exposed 510K lines. The 5 hidden features: Kairos (permanent memory), Undercover Mode (stealth), Ultraplan (deep planning), Pet System (Buddy), Multi-Agent. Source-cited.
© 2026 HarrisonSec. All rights reserved.