Data

25 Sep 2025

Trillion-Row Creative Insights: Scaling Netflix Muse

Netflix published a deep dive into Muse, their internal platform that helps creative strategists figure out which artwork and video clips land with which audiences. The engineering caught my attention—not because the technology stack is exotic, but because it combines well-understood techniques (probabilistic data structures, precomputed aggregates, columnar storage) into something that serves interactive analytics over a trillion rows.

A trillion rows. Interactive latency. And the use case? Creative decision-making, not ad targeting or fraud detection. That contrast alone makes the architecture worth pulling apart.

5 Feb 2022

The Golden Thread: Managing Building Data Integrity

I rarely write about construction. But the concept emerging from the UK Building Safety Bill caught my attention—it maps so precisely to problems I spent years solving in software that I couldn’t ignore it.

The “golden thread” demands that every higher-risk building carry a continuous, authoritative digital record from design through construction through decades of operation. One source of truth, maintained throughout the building’s entire lifecycle because no single handoff point exists where someone else takes over responsibility. That sounds like a data lineage problem. Because it is one.

6 Jan 2022

Data Plus Architecture: Integrating Data into Design

I’ve watched three organizations try to migrate their data at scale this past year. All three started the same way — someone said “we need better analytics.” All three hit the same wall: the architecture wasn’t built for data flow, so extracting anything meant duct-taping brittle pipelines onto systems that fought back every time you queried them.

This is how it always goes. Data gets treated like exhaust — a byproduct of the “real” work of building apps. You ship features, hit deadlines, and then someone from analytics shows up asking for a warehouse. That’s when you realize your microservices scattered data across a dozen schemas, and cross-service queries are now a nightmare you can’t wake up from.

28 May 2021

Data Mesh: Decentralized Ownership in Practice

About a year ago I wrote about why data lakes are failing. The gist: organizations dump everything into a centralized store, a small data team becomes the bottleneck for every analytical question, and the result is stale dashboards and frustrated stakeholders.

Zhamak Dehghani published a piece on Martin Fowler’s site in 2019 that crystallized what I’d been feeling. She called it “data mesh,” and the core argument is deceptively simple: treat analytical data the way we treat operational services. Decentralize ownership. Push responsibility to the teams closest to the data.

19 May 2020

Why Data Lakes are Failing the Modern Enterprise

Gartner estimated that 85% of big data projects fail. Back in 2016, that number was 60%. It moved in the wrong direction. Data lakes sit at the center of it.

I’ve seen this firsthand. At TaskRabbit, we made data infrastructure decisions that started with optimistic architecture diagrams and ended with engineers complaining nobody could find anything. The pattern repeats across the industry; the root causes run more organizational than technical—almost always.