The Wall Street Journal ran a piece yesterday on JustPaid, a 9-person Mountain View startup. They used OpenClaw and Claude Code to stand up seven AI agents that write code, review it, and run QA around the clock. In one month: 10 major features shipped. Each one would have taken a human engineer a month or more. This story is getting passed around as proof that the autonomous engineering team is h
MCP vs Skills: a practical decision guide for builders I need my agent to do X. Skill or MCP? If you build agents on Claude or anything MCP-compatible, this is the question that actually matters. The two patterns get pitched as alternatives. They are not. They solve different problems. Most production agents need both. Here is the decision rule, the framing for each, and the anti-patterns I keep
Most cloud sustainability tools are built for sustainability officers. They pull three-month-old billing data, run it through a proprietary model, and produce a PDF that engineers never see. By the time you know your us-east-1 cluster emits twice as much as us-west-2 would have, it's been running for a quarter. The architecture is locked in. The carbon is already burnt. The only moment you can act
In March 2026, a rogue AI agent at Meta triggered a Sev 1 security incident. Sensitive company and user data was exposed to unauthorized employees for nearly two hours. The agent held valid credentials. It operated inside authorized boundaries. It passed every identity check. And yet. Identity and Access Management answers one question: Is this agent who it says it is? It doesn't answer: Was this
The Problem Nobody Talks About AI can write code, generate content, analyze data, design systems, and manage projects. It's getting better every month. The natural question: what's left for humans? The wrong answer: "AI will replace us." The right answer is uncomfortable: stop picking the best AI. Run multiple AIs in competition, and become the judge. Three rules, learned the hard way: Multiple
Anthropic now ships at least three different memory models inside the Claude product family, and they don't behave the same way. Claude.ai has a chat memory feature for Pro, Max, Team, and Enterprise users that summarizes prior conversations and injects that summary into new chats. Claude Code has CLAUDE.md files plus a separate "auto memory" directory the model writes to itself, both loaded at se
Iris v0.4.0 ships today. It's the release where protocol-native eval crosses from "deterministic rules" into "semantic scoring" — without giving up any of what made the deterministic layer work. Three headline features plus a lot of infrastructure work that quietly compounds. I'll go through each, why it matters, and how it fits the thesis. Heuristic rules catch a lot: length, keyword overlap, PII