SOFTWARE ARCHITECTURE & REFACTORING 3 Domain-Centric Architectures Every Software Architect Should Know The first concern of the architect is to make sure that the house is usable; it is not to ensure that the house is made of brick. — Uncle Bob The expression domain is occurring in software bibles for a very long time now and is heavily discussed in the book Domain-Driven
很多团队的网络监控并不算差。 链路可用率有、接口带宽有、CPU 和内存有、异常告警也接进了企业微信、飞书和短信。但真正出了事,复盘时还是会出现同一句话:当时知道出问题了,但没有把现场留住。 这就是为什么越来越多团队开始关注网络回溯分析系统。 它解决的不是“能不能看到告警”这个初级问题,而是更关键的两个问题: 告警发生时,能不能快速还原到底是哪一段流量、哪一条路径、哪一种会话出了问题 事故结束后,能不能基于证据复盘,而不是靠聊天记录和印象拼凑过程 对云上和混合云场景来说,这件事尤其重要。因为链路更长、设备更多、路径更动态,很多故障不是“持续坏”,而是短时抖动、瞬时拥塞、路径切换、策略误命中。如果没有回溯能力,排障就很容易沦为赛后猜谜。 这篇文章不讲空洞概念,直接从一线运维视角拆清楚:云上网络回溯分析系统到底该怎么建,应该覆盖哪些能力,落地时最容易踩哪些坑。 先说结论: 传统监控擅长发现“异常
Or: what broke on my first three attempts so you don't have to repeat it I've built two prediction markets from scratch. The first one crashed on testnet. The second one launched but had zero users for two months. The third one? Actually works. Here's what I learned in the process. Ask yourself three boring but critical questions: Binary outcomes (Yes/No) or multiple choices? Who decides the trut
What if your Kubernetes cluster simply refused to run unsigned images? I spent some time experimenting with enforcing image provenance in a small Kubernetes setup using MicroK8s. The idea was simple: Only container images with valid cryptographic signatures are allowed to run in the cluster. For this I used: GitLab CI/CD (build + signing pipeline) Cosign / Sigstore (image signing) Kyverno (admissi
What Should Humans Design When AI Can Write Most of the Code? AI can now write code. Not perfectly. Not always safely. Not without review. But it can write a great deal of code. It can generate functions, create tests, call APIs, build UI components, handle common errors, and produce large amounts of implementation detail at a speed no human developer can match. This changes the meaning of prog
We are currently witnessing a massive shift in AI development. We’ve moved past the "Chatbot" era and into the era of Agentic Systems—AI that doesn’t just suggest text, but actually executes code, moves money, and modifies databases. However, there is a fundamental architectural flaw in how most agents are built today: we are giving "Intelligence" and "Authority" to the same probabilistic model.
Most teams I have worked with have one auth test in their suite. It looks like this: test('valid token verifies', () => { const token = signSync({ sub: 'user-1', aud: 'api://backend' }, secret); const result = verify(token, options); expect(result.valid).toBe(true); }); That test is fine. It is also a smoke test, not a regression suite. It catches the case where verification is completely b
The on-call alert at 02:14 said auth_5xx_rate spiked from 0.01 to 31.4. Not a deploy window. Not a traffic spike. Just thirty-one percent of authenticated requests failing for ~four minutes, then back to baseline. The cause was a JWKS rotation on the issuer side. New keys came in. Old keys went out. Caches in our service didn't refresh fast enough. Tokens signed with the new key were rejected beca