There is a point in many serverless platforms where a Step Functions workflow that once felt elegant starts to feel like a mini application platform of its own. I have seen this happen in teams that are doing many things correctly: they standardized orchestration, they improved visibility, and they moved fragile glue logic out of Lambdas. Then six months later, the workflow has 100+ states, a maze
Overview Let's get our hands dirty. This part covers the full setup and the actual demo: deploy PayLedger to both regions, wire up Route 53 failover, configure the Agent Space, inject three simultaneous faults, and walk through exactly what the agent found. Quick recap from Part 1: PayLedger is a demo payment ledger deployed to ap-southeast-1 (primary) and ap-northeast-1 (secondary) with Route 5
We had ArgoCD running perfectly. Every deployment was reconciled from Git. Drift detection worked. Rollbacks were one-click. Our GitOps setup was clean. Developers still couldn't provision a staging environment without pinging the platform team. That gap — between "GitOps in place" and "developers can actually self-serve" — is where most platform engineering teams get stuck. GitOps solves a real p
Anthropic now ships at least three different memory models inside the Claude product family, and they don't behave the same way. Claude.ai has a chat memory feature for Pro, Max, Team, and Enterprise users that summarizes prior conversations and injects that summary into new chats. Claude Code has CLAUDE.md files plus a separate "auto memory" directory the model writes to itself, both loaded at se
Part 2 of 5 in The New Engineering Contract - what it means to lead engineers when AI is doing more of the coding. Stripe never skipped the boring stuff. They ship 1,300 AI PRs a week. Amazon skipped it. Their storefront went down for six hours. Kent Beck wrote the answer in Extreme Programming Explained in 1999. We read it. Then chose velocity anyway. A friend of mine leads engineering at a funde