Last Tuesday I lost about three hours to a regression in our checkout service. The cart total was off by a cent on certain promo combinations, and the only signal was a Slack ping from finance with a screenshot. No stack trace. No exception. Just wrong numbers. I did what I always do first. I opened the diff for the last deploy, scrolled, squinted, and tried to feel my way to the bug. Forty minute
My project is starting to get solid. I really like how it’s starting to look. Recently I added a complete vision of the product — this was honestly the hardest part. I’m trying to keep everything minimalistic. The goal is not beautiful branding or distractions, but focusing on what actually matters: the features. As I mentioned, here are the features: Capture HTTP requests & responses Inspect head
We had ArgoCD running perfectly. Every deployment was reconciled from Git. Drift detection worked. Rollbacks were one-click. Our GitOps setup was clean. Developers still couldn't provision a staging environment without pinging the platform team. That gap — between "GitOps in place" and "developers can actually self-serve" — is where most platform engineering teams get stuck. GitOps solves a real p
At 3:17 AM on a Tuesday in Q3 2024, our production Kotlin 2.0 microservice fleet hit a 92% memory utilization threshold across 140 nodes, traced to a silent coroutine leak in Ktor 2.2’s request pipeline that had been bleeding 12MB of heap per second for 72 hours. We lost $14k in SLO credits before we found the root cause. A Couple Million Lines of Haskell: Production Engineering at Mercury (78 p
Anthropic now ships at least three different memory models inside the Claude product family, and they don't behave the same way. Claude.ai has a chat memory feature for Pro, Max, Team, and Enterprise users that summarizes prior conversations and injects that summary into new chats. Claude Code has CLAUDE.md files plus a separate "auto memory" directory the model writes to itself, both loaded at se
Part 2 of 5 in The New Engineering Contract - what it means to lead engineers when AI is doing more of the coding. Stripe never skipped the boring stuff. They ship 1,300 AI PRs a week. Amazon skipped it. Their storefront went down for six hours. Kent Beck wrote the answer in Extreme Programming Explained in 1999. We read it. Then chose velocity anyway. A friend of mine leads engineering at a funde