There is a point in many serverless platforms where a Step Functions workflow that once felt elegant starts to feel like a mini application platform of its own. I have seen this happen in teams that are doing many things correctly: they standardized orchestration, they improved visibility, and they moved fragile glue logic out of Lambdas. Then six months later, the workflow has 100+ states, a maze
State that survives a docker compose down is one of those things you don't think about, until your test suite needs it, your local dev needs it, and your CI pipeline absolutely doesn't. LocalStack handles persistence with one switch (PERSISTENCE=1) and it's a Pro-only feature. Floci ships four storage modes, all free, all in core, with per-service overrides. Pick the right tradeoff for the job.
Overview Let's get our hands dirty. This part covers the full setup and the actual demo: deploy PayLedger to both regions, wire up Route 53 failover, configure the Agent Space, inject three simultaneous faults, and walk through exactly what the agent found. Quick recap from Part 1: PayLedger is a demo payment ledger deployed to ap-southeast-1 (primary) and ap-northeast-1 (secondary) with Route 5