Trillion-Row Creative Insights: Scaling Netflix Muse

Felipe Hlibco

September 25, 2025

Netflix published a deep dive into Muse, their internal platform that helps creative strategists figure out which artwork and video clips land with which audiences. The engineering caught my attention—not because the technology stack is exotic, but because it combines well-understood techniques (probabilistic data structures, precomputed aggregates, columnar storage) into something that serves interactive analytics over a trillion rows.

A trillion rows. Interactive latency. And the use case? Creative decision-making, not ad targeting or fraud detection. That contrast alone makes the architecture worth pulling apart.

The Problem #

Netflix’s creative team faces a very specific challenge, one that gets worse as the platform scales. They produce promotional assets (thumbnail artwork, trailer clips, banner images) for every single title on the platform. Each asset performs differently across audience segments—sometimes dramatically so. A dark, moody thumbnail might hook thriller fans but push away viewers who gravitate toward comedies. An action-heavy trailer clip can misrepresent the pacing of a film, attracting the wrong crowd and inflating bounce rates.

So the creative team needs answers to questions like: “How did this artwork variant land with viewers who watch sci-fi?” or “Did this trailer clip pull in audiences who actually enjoy this genre, or did it mislead them into clicking?”

Getting those answers means joining promotional impression data (who saw which asset) with viewing behavior data (who watched what, how long they stuck around, whether they finished). At Netflix scale, that join spans trillions of rows.

And here’s the thing—traditional approaches fall apart. Run the query, wait for results, rinse and repeat. But when a creative strategist needs to iterate on hypotheses in real time, that doesn’t fly. “What if we compare this asset across these three taste clusters?” can’t take twenty minutes to return. The strategist loses their thread; the meeting moves on. The insight never surfaces.

HyperLogLog: Approximate Counts That Scale #

The first architectural decision is the most important one: accept approximation.

Netflix uses HyperLogLog (HLL) sketches from Apache DataSketches to compute distinct user counts. Instead of counting exactly how many unique users saw a particular thumbnail, HLL gives you an estimate within roughly 1% error.

Why accept the error? Because exact distinct counts at trillion-row scale cost a fortune in compute. They require either massive memory (holding every unique user ID in a set) or expensive multi-pass computations. HLL sketches compress a set of arbitrary cardinality into a fixed-size data structure—4 to 16 KB regardless of whether you’re counting a thousand users or a billion. That’s a wild compression ratio when you think about it.

The math behind HLL is elegant (genuinely, it’s one of those algorithms I enjoy re-reading). It hashes each element and observes the distribution of leading zeros in the hash values. The longest run of leading zeros provides a probabilistic estimate of the total distinct count. Multiple “registers,” each tracking a subset of hashes, combine to reduce variance.

For Muse’s use case, 1% error on a count of 50 million users means the answer lands within 500,000. For creative strategy decisions (“does this artwork variant resonate with sci-fi fans?”), that precision is more than enough. Nobody changes their creative direction based on whether 50.2 million or 50.7 million users saw the thumbnail. That difference lives entirely inside the noise.

The property that makes HLL practical here is mergeability. You precompute HLL sketches at the granularity of (asset, date, audience_segment) and merge them on the fly for ad-hoc queries. “Show me the combined distinct reach of these three assets across these two audience segments for Q3” turns into merging precomputed sketches: milliseconds, not minutes.

Hollow: In-Memory Serving at Scale #

Computing the aggregates is one thing. Serving them interactively is another problem entirely.

Netflix uses Hollow, their own open-source library, to serve precomputed results as in-memory key-value stores. Hollow was originally built for Netflix’s content metadata (title information, cast data, episode listings) but turns out to work well for any read-heavy workload where the dataset fits in memory and updates come in batches rather than continuously.

Here’s how it plugs into Muse. A batch pipeline running on Apache Iceberg tables precomputes aggregates: HLL sketches, performance metrics, audience affinity scores. Those aggregates get serialized into Hollow’s compact binary format and published to a centralized store. Each Muse application server loads the Hollow dataset into memory and serves queries straight from RAM.

The “hollow” in the name—which I always found oddly poetic for an infrastructure library—refers to how updates work. When new data arrives, Hollow computes a delta (the diff between the old and new datasets) and applies it without requiring a full reload. The application server never stops serving. It applies the delta atomically and continues with the updated data.

For Muse, the creative strategist sees data current as of the last batch run (hourly or daily) with query latency in the single-digit millisecond range. No database round trips. No query compilation overhead. The data already sits in memory, pre-aggregated, indexed for the exact access patterns Muse’s UI demands.

Iceberg: The Foundation Layer #

Underneath the HLL computations and Hollow stores sits Apache Iceberg, providing the table format for the raw impression and viewing data.

Iceberg’s role here is less flashy but essential. Three capabilities matter most:

Time travel and snapshot isolation. When a batch job computes HLL sketches, it reads from a consistent snapshot. No worries about partially-written files or concurrent updates corrupting the aggregation. That bug has bitten me before—Spark jobs reading half-written Parquet files, a total nightmare—so I appreciate this more than most.

Partition pruning. Muse queries scope by time range and content category. Iceberg’s metadata layer lets the compute engine skip entire partitions without scanning them, cutting down the data volume that flows through the aggregation pipeline.

Schema evolution. As the creative team’s analytical needs shift (new audience segments, new asset types, new performance metrics), the underlying tables evolve without rewriting historical data.

The compute layer reading Iceberg tables runs the HLL sketch computations and materializes results into Hollow-compatible format. This is the batch ETL part of the system. Not glamorous, but it’s where the trillion-row aggregation actually happens.

Dynamic Audience Affinity #

The most interesting analytical capability Muse unlocks is dynamic audience affinity analysis. Netflix infers viewer “taste clusters”—algorithmically determined groups of users with similar viewing patterns. These aren’t simple genre buckets. They’re multidimensional preference profiles that capture nuances a genre label never captures: someone who watches horror but only the psychological kind, or sci-fi fans who drop off when the CGI budget goes up.

Muse lets creative strategists compare how a promotional asset performs across these taste clusters. “This horror-themed thumbnail attracts viewers from the ‘psychological thriller’ cluster at 2x the rate of the ‘slasher’ cluster.” That kind of insight directly informs creative strategy.

Why does this require the full HLL + Hollow architecture? Combinatorial explosion. Netflix has thousands of assets, dozens of taste clusters, and months of historical data. The number of possible (asset, cluster, time_range) combinations is enormous. Precomputing every combination isn’t feasible; instead, you precompute the building blocks (HLL sketches at the appropriate granularity) and merge them at query time for whatever combination the strategist happens to request.

What Makes This Transferable #

I find Muse interesting not because I’m building a creative analytics platform anytime soon, but because the architectural pattern generalizes well beyond Netflix’s specific problem.

The pattern: precompute approximate building blocks, serve them from memory, merge on demand.

Any analytics workload with these characteristics fits the same mold. Queries involving distinct counts or set operations (where HLL and other sketches shine). Access patterns that are read-heavy with periodic batch refreshes (where Hollow or similar in-memory serving makes sense). Raw data too large to query interactively but amenable to pre-aggregation (where Iceberg or similar columnar storage provides the foundation).

I’ve seen this exact skeleton in ad tech—reach and frequency estimation, specifically—and in cybersecurity for counting distinct attacker IPs across massive log volumes. IoT analytics uses it too, for unique device event counting. The specific tools differ; you might reach for Redis instead of Hollow, or Druid instead of a custom batch pipeline. But the architectural pattern holds.

Netflix’s real contribution here isn’t any single technology. HLL has been around for over a decade. Iceberg is widely adopted at this point. Hollow is open source but fairly niche—I’d never heard of it before reading this paper, honestly. The contribution is showing that these components, composed with care, enable interactive analytics at a scale that would otherwise demand either massive compute clusters or painful query wait times.

That’s the kind of data engineering worth writing about. Not a new framework launch. Not a new database promising to solve everything. Just well-understood techniques, composed well, solving a real problem at genuine scale.