Architecture

5 Mar 2026

Scaling AI: Simplicity Over Better Models

Last month I sat in a room with a dozen engineering leads from companies running AI in production. Not demos, not prototypes—actual revenue-generating workloads. I asked what their biggest bottleneck was.

Not one person said “model quality.”

Every answer was some version of: our data is a mess, our infrastructure can’t keep up, or we’re burning GPU budget on workflows that don’t need it. A CockroachDB survey of over 1,000 tech leaders backs this up; AI workloads scale faster than the systems underneath them can adapt. And yet the industry conversation stays fixated on the next frontier release. It’s maddening, honestly.

31 Dec 2025

2025's Radical Frontend Shift: Rise of Agent Runners

A year ago, the AI-in-frontend conversation went roughly like this: Copilot writes your React components, ChatGPT scaffolds your Next.js pages, Cursor auto-completes your CSS. The AI was a faster keyboard. That was the whole pitch.

That’s not where 2025 ended up.

What actually happened was stranger — and I think more consequential. AI agents started living inside frontend architectures. Not as code generators that hand off to a human for the last mile, but as autonomous actors that monitor production, diagnose breakage, and ship fixes without waiting for anyone to approve a pull request.

21 Nov 2025

Cloud-to-Edge: Global Architectures for HealthCare

An ICU bed generates roughly 2,000 data points per second. Vital signs, ventilator readings, infusion pump metrics, waveform data from cardiac monitors. When a patient’s condition starts sliding, the clinical window for intervention shrinks to seconds.

So do you really want that data making a round trip to us-east-1?

The Latency Problem Nobody Can Ignore #

Cloud computing solved a lot of problems in healthcare. Centralized data storage, elastic compute for genomics workloads, accessible ML model training. But it introduced a new one that nobody budgeted for: latency that’s flatly incompatible with clinical urgency.

29 Apr 2025

Spec-Driven Development at Enterprise Scale

Three weeks ago I wrote about the shift from code-as-truth to intent-as-truth and collaborative intent articulation. The response was interesting: individual developers got it immediately, but engineering managers kept asking the same question.

“How does this work at scale?”

Fair question. And the honest answer is: it’s messy.

Spec-driven development at the single-team level is relatively straightforward. You write an OpenAPI spec, generate client and server stubs, validate implementations against the contract, and iterate. Good tools exist for this. The workflow is well-understood.

22 Apr 2024

Infrastructure from Code: The Death of Traditional IaC?

Every couple of years, someone declares that Terraform is dead. The replacement changes — Pulumi, CDK, now “Infrastructure from Code” — but the thesis stays the same: writing infrastructure definitions separately from application code is busywork, and a sufficiently smart tool should just infer the infrastructure from the code itself.

It’s a compelling idea. It’s also mostly wrong, at least in 2024.

I’ve spent the past few months evaluating IfC tools for DreamFlare, and the landscape tells an interesting story about what happens when elegant abstractions meet production reality.

27 Nov 2023

AI Agents: The Transition from Chatbots to Actors

Last week one of our engineers at DreamFlare asked me a question that stuck: “When does a chatbot become an agent?” My first instinct — say something about autonomy — felt too clean. The real answer runs messier than that.

A chatbot reads. An agent reads and writes.

The distinction sounds simple, but software architecture reshapes entirely around it. And we’re right in the middle of that shift, watching it happen in real time, building pieces of it ourselves.

1 Jan 2023

LLMs as Forensic Architects for Architecture Discovery

It’s New Year’s Day and I’m thinking about legacy code. Specifically, a conversation I had last week with a friend who just inherited a monolith. Two million lines of Java. The original architects left years ago. The documentation—such as it was—describes a system that no longer exists. The actual architecture is embedded in the code itself, visible only to someone willing to spend weeks reading it.

His question was simple: “Can I just paste chunks of this into ChatGPT and ask it what the architecture is?”

29 Mar 2022

Cybersecurity Mesh: Moving toward Zero Trust Designs

Gartner loves naming things. They’ve built an entire industry around it — the Hype Cycle, the Magic Quadrant, the annual list of Strategic Technology Trends that every CTO feels obligated to reference in their board deck. But occasionally, buried under all that branding, there’s a genuinely useful architectural concept. Cybersecurity Mesh Architecture (CSMA) is one of those.

CSMA made Gartner’s top strategic technology trends for 2022, sitting there alongside hyperautomation and autonomic systems. The headline prediction caught my eye: organizations that adopt a cybersecurity mesh architecture will reduce the financial impact of security incidents by an average of 90% by 2024.

6 Jan 2022

Data Plus Architecture: Integrating Data into Design

I’ve watched three organizations try to migrate their data at scale this past year. All three started the same way — someone said “we need better analytics.” All three hit the same wall: the architecture wasn’t built for data flow, so extracting anything meant duct-taping brittle pipelines onto systems that fought back every time you queried them.

This is how it always goes. Data gets treated like exhaust — a byproduct of the “real” work of building apps. You ship features, hit deadlines, and then someone from analytics shows up asking for a warehouse. That’s when you realize your microservices scattered data across a dozen schemas, and cross-service queries are now a nightmare you can’t wake up from.

9 Nov 2021

Dapr Acceptance into CNCF: Logic Decoupling

Last week the CNCF TOC voted to accept Dapr as an incubating project. I’ve been watching Dapr since its Microsoft launch in 2019 — mostly with skepticism, if I’m honest — and this feels like the right moment to talk about what makes it different from the dozen other “cloud-native” projects begging for attention.

The short version? Dapr doesn’t try to be a platform. It’s a set of building blocks that sit between your application and whatever infrastructure you happen to be running on. That distinction matters more than it sounds.

19 Aug 2021

Sustainable Software: Measuring our Carbon Footprint

I spent most of my career thinking about software performance in terms of latency, throughput, and cost. CPU cycles per request. P99 response times. Monthly AWS bills. Those are the metrics that show up in dashboards; those are the numbers managers care about.

Carbon emissions never made the list.

That’s starting to change, and I think it’s worth paying attention — not because sustainability is trendy (though it is) but because the numbers are genuinely staggering once you look at them.

28 May 2021

Data Mesh: Decentralized Ownership in Practice

About a year ago I wrote about why data lakes are failing. The gist: organizations dump everything into a centralized store, a small data team becomes the bottleneck for every analytical question, and the result is stale dashboards and frustrated stakeholders.

Zhamak Dehghani published a piece on Martin Fowler’s site in 2019 that crystallized what I’d been feeling. She called it “data mesh,” and the core argument is deceptively simple: treat analytical data the way we treat operational services. Decentralize ownership. Push responsibility to the teams closest to the data.

27 Apr 2021

Designing for Portability with Cloud-Native Abstractions

I’ve had the “should we go multi-cloud” conversation at every company I’ve worked at. The answer is always complicated. Pure multi-cloud is expensive and operationally brutal. Full single-cloud commitment is efficient right up until pricing changes, a region goes down, or an acquisition brings a different cloud into the picture.

The pragmatic middle ground — and the one I keep coming back to — is designing for portability without necessarily deploying to multiple clouds.

24 Mar 2021

Temporal Node.js SDK: Orchestration for Distributed Logic

Every backend team I’ve managed eventually hits the same wall. You’ve got a multi-step operation — charge the customer, reserve inventory, send a notification, update the dashboard — and somewhere in the middle, something fails. Network blip. Downstream timeout. OOM kill. Now what?

The usual answer? A patchwork of retry logic, dead letter queues, state machines, and prayer. It works until it doesn’t. And debugging it when it doesn’t is its own special hell.

22 Jan 2021

Socio-Technical Systems: Designing for Human Cohesion

In 1968, Melvin Conway submitted a paper with a claim that’s been haunting software engineering ever since: “Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.”

Harvard Business Review rejected it. Not enough evidence, they said.

Fifty-two years later, every engineering leader I know treats it as a law of nature. Funny how that works.

We usually quote Conway as a warning. Your architecture will mirror your org chart whether you plan for it or not. But I’ve been thinking about it differently lately — if org structure determines system structure, then org design is architecture design. You can use that relationship intentionally.

19 May 2020

Why Data Lakes are Failing the Modern Enterprise

Gartner estimated that 85% of big data projects fail. Back in 2016, that number was 60%. It moved in the wrong direction. Data lakes sit at the center of it.

I’ve seen this firsthand. At TaskRabbit, we made data infrastructure decisions that started with optimistic architecture diagrams and ended with engineers complaining nobody could find anything. The pattern repeats across the industry; the root causes run more organizational than technical—almost always.

3 Mar 2020

The Case for Modular Monoliths in Distributed Teams

Every architecture conversation I’ve had in the last two years eventually arrives at the same question: “When do we move to microservices?”

Not if. When.

I think that’s the wrong framing. At TaskRabbit, I manage a team of nine engineers spread across four time zones. We’ve been through the architecture discussion more than once, and what I keep coming back to is this: the coordination overhead of microservices might actually be worse than the monolith problems they’re supposed to solve.