Rosetta 2 and the Architecture of Apple Silicon

Felipe Hlibco

October 15, 2020

Back in 2006, Apple shipped Rosetta — a binary translation layer that let PowerPC apps run on Intel Macs. It was slow. It was imperfect. And it bought them just enough time to pull off one of the most successful architecture transitions in computing history.

Fourteen years later, they’re doing it again.

At WWDC in June, Apple announced the transition from Intel to their own ARM-based silicon. The M1 chip. And alongside it, Rosetta 2.

I’ve been thinking about this a lot — not just the chip itself (which looks impressive on paper), but the translation problem. How do you take x86-64 binaries and run them on ARM without everything falling apart?

The translation approach #

Rosetta 2 uses two strategies. The first is ahead-of-time (AOT) translation. When you install an x86-64 app on a Mac running Apple Silicon, the system translates the binary before you ever launch it. This is where the bulk of the work happens. The translated code sits on disk, ready to execute as if it were native ARM.

The second strategy is just-in-time (JIT) translation for dynamically generated code. Some applications — JIT compilers in browsers, for example — produce machine code at runtime. You can’t pre-translate that. So Rosetta 2 catches it on the fly.

Here’s what caught my attention: the performance numbers. From the developer transition kit demos, translated apps appear to run at roughly 78-79% of native Apple Silicon performance. That sounds like a penalty — until you realize that 79% of Apple Silicon performance still outpaces most recent Intel Macs.

Think about that. A translated app on the new hardware runs faster than the native app on the old hardware.

That’s a strange sentence to write. But the math checks out.

Why it works better this time #

The original Rosetta had a harder job. PowerPC and x86 are fundamentally different architectures with different endianness, different register conventions, different memory models. The translation overhead was significant.

ARM to x86-64 is still nontrivial. The instruction sets differ in philosophy — ARM uses a load-store architecture with fixed-width instructions; x86 is a CISC design with variable-length instructions and decades of backwards-compatible cruft.

But Apple has one advantage they didn’t have in 2006: they control the entire stack. They designed the M1’s ARM implementation. They can (and apparently did) add specific hardware features to make translated x86 code run better.

There’s a persistent rumor that Apple included x86 memory ordering support in the M1’s microarchitecture specifically to help Rosetta 2. If true, that’s a hardware team building silicon to support a software compatibility layer. The integration is impressive — almost unnervingly so.

Unified memory changes the game #

The M1’s unified memory architecture is maybe the most underappreciated part of this announcement. Traditional laptops have separate memory pools for CPU and GPU. Data has to be copied between them. It’s a bottleneck that’s been accepted as normal for so long that people forget it’s a design choice, not a law of physics.

Apple’s approach puts CPU, GPU, and Neural Engine on the same memory fabric. No copies. The GPU reads directly from the same pool the CPU wrote to. For workloads that mix compute and graphics — which is most creative software — this eliminates an entire class of performance overhead.

From a developer perspective, this means thinking differently about memory management. The old mental model — allocate on CPU, copy to GPU buffer, dispatch compute — gets simpler. Whether existing apps can take advantage of that through Rosetta 2 translation is a separate question. I suspect the real gains show up in native ARM builds.

What this means for developers #

Here’s where I’m less certain.

The developer transition kits that Apple shipped out are running an A12Z chip — essentially an iPad Pro processor, not the M1. So the actual performance characteristics of the shipping hardware remain speculative. We’re extrapolating from WWDC demos and DTK experiences, which is always risky.

What seems clear is that Universal Binary 2 — fat binaries containing both x86-64 and ARM code — will be the migration path. Ship both architectures in one package; let the OS pick. It’s the same strategy they used in 2006 with Universal Binaries for PowerPC and Intel.

It worked then.

The migration burden falls on developers who maintain native code, C++ extensions, or rely on third-party libraries that haven’t been recompiled. If your stack is pure Swift or Objective-C and you’re using Apple’s frameworks, Xcode apparently handles the recompilation without drama.

If you’re shipping a complex build with native dependencies — Node.js addons, Python C extensions, Electron apps with native modules — the path gets rockier. I’ve been there. It’s not fun.

The bigger picture #

I think what Apple’s really demonstrating here isn’t just chip design. It’s vertical integration as competitive advantage.

They control the ISA, the silicon, the OS, the translation layer, the compiler toolchain, and the app distribution pipeline. When you own all of those pieces, you can make transitions that would be impossible for a horizontally structured ecosystem.

Microsoft tried something similar with Windows on ARM. The app compatibility story has been rough. Google’s running ARM in Chromebooks but ChromeOS apps are mostly web-based, which sidesteps the translation problem entirely.

Neither has the stack control that Apple does.

The first Macs with Apple Silicon ship next month. We’ll know pretty quickly whether the WWDC promises hold up under real workloads. But if the Rosetta 2 performance numbers are even close to what’s been demonstrated, Apple will have pulled off the same magic trick twice.

And this time with better hardware underneath it.