Spec-Driven Development at Enterprise Scale

Felipe Hlibco

April 29, 2025

Three weeks ago I wrote about the shift from code-as-truth to intent-as-truth and collaborative intent articulation. The response was interesting: individual developers got it immediately, but engineering managers kept asking the same question.

“How does this work at scale?”

Fair question. And the honest answer is: it’s messy.

Spec-driven development at the single-team level is relatively straightforward. You write an OpenAPI spec, generate client and server stubs, validate implementations against the contract, and iterate. Good tools exist for this. The workflow is well-understood.

Enterprise scale is a different animal. When fifty teams across three time zones all need their APIs to play nice together, the spec stops being a convenience and starts being a governance mechanism. That transition is where most organizations stumble.

The Spec as Executable Contract #

The core principle of SDD at scale is that specifications aren’t documentation—they’re executable contracts. If the implementation drifts from the spec, the build breaks. Not “someone should check this.” Breaks. Hard failure. Pipeline stops.

This sounds strict because it is. The alternative is drift, and drift at enterprise scale is how you end up with three different date formats across your microservices. (I’ve seen this; it took a team four months to reconcile.)

OpenAPI 3.1 is the most common entry point because the tooling ecosystem is mature. You get code generation, request/response validation, interactive documentation, and mock servers from a single spec file. Specmatic takes this further with contract-driven testing—your tests are derived directly from the OpenAPI spec, so there’s no gap between what the spec promises and what the tests verify.

But OpenAPI alone doesn’t solve the enterprise problem. It solves the single-API problem.

Beyond Code Generation #

The challenges that emerge at scale go far beyond generating stubs from a spec. I’ve been cataloging these as I talk to teams at Google and across the developer ecosystem, and they cluster into four categories.

Breaking change detection. When Team A modifies their API spec, will it break Team B’s integration? This sounds simple until you realize that “breaking” is context-dependent. Adding a required field to a request body is obviously breaking. But what about changing the maximum length of a string field from 255 to 100? Or deprecating an enum value that three other services depend on?

You need tooling that understands semantic versioning at the field level, not just the endpoint level.

Optic does interesting work here—it diffs API specs and categorizes changes as breaking, non-breaking, or potentially breaking, with enough context for a human to make the call. But “potentially breaking” is where the complexity lives, and no tool I’ve found fully automates that judgment.

Cross-team governance. Who owns the naming conventions? Who decides whether a new field should be created_at or createdAt or creation_date? At Google, we have style guides for this. Most enterprises don’t, and by the time they realize they need one, fifty APIs are already in production with inconsistent patterns.

Backstage is becoming the control plane for this at many organizations; it provides a catalog of all services and their API specs, making inconsistencies visible if not automatically fixable. But visibility is only half the battle. The other half is enforcement, and enforcement requires organizational authority that engineering platforms rarely have.

Security review integration. API specs contain security-relevant information: authentication schemes, data sensitivity classifications, rate limiting policies. In a spec-driven world, the security review should happen at the spec level before implementation begins. Most organizations still review implementation, which means security feedback arrives late and costs ten times more to address.

Documentation drift. This one’s ironic. SDD is supposed to eliminate documentation drift because the spec is the documentation. In practice, teams add supplementary docs—usage guides, integration tutorials, troubleshooting pages—that aren’t generated from the spec and drift immediately. The spec stays accurate (because the build enforces it); everything else decays.

Arazzo and the Workflow Problem #

Single-API specs don’t capture how APIs work together. If your checkout flow requires calling the inventory API, then the pricing API, then the payment API in sequence—with conditional logic depending on the pricing response—an OpenAPI spec for each individual API tells you nothing about that orchestration.

The Arazzo specification extends OpenAPI with workflow descriptions and dependency declarations for multi-step API orchestration. It’s relatively new and the tooling is still catching up, but the concept addresses a real gap. At enterprise scale, understanding individual API contracts isn’t enough; you need to understand how those contracts compose into business processes.

I’ve seen teams try to capture this in Confluence pages with sequence diagrams. It works for about three months, at which point the diagrams are outdated and nobody trusts them. Arazzo’s promise is that these workflow descriptions become executable and testable, just like individual API specs.

What Enterprise Adoption Actually Looks Like #

The pattern I’ve observed across organizations that successfully adopt SDD at scale follows a predictable arc.

Phase one is grassroots. One or two teams start using OpenAPI specs for code generation and contract testing. They get immediate benefits: fewer integration bugs, faster onboarding for new team members, automated documentation. Word spreads.

Phase two is the tooling investment. The platform team (or whoever plays that role) standardizes on a spec format, sets up a centralized catalog, and builds CI pipelines that validate specs on every commit. Tools like Fern and Speakeasy handle code generation across multiple languages from a single spec; this is where the enterprise efficiency gains start compounding.

Phase three is governance. This is where it gets political. Someone has to own the API style guide. Someone has to enforce naming conventions, versioning policies, and deprecation timelines. The tooling can automate enforcement, but the policies require cross-team agreement, and cross-team agreement requires organizational authority.

Most organizations stall between phase two and phase three. The technology works. The organizational dynamics are harder.

The Spec Review as the New Code Review #

One shift I’m particularly interested in is the elevation of spec review to the same status as code review. In a spec-driven world, reviewing the spec is arguably more important than reviewing the implementation, because the spec defines behavior and the implementation can be regenerated.

This changes who’s qualified to review. API spec review requires understanding of consumer needs (product knowledge), system architecture (technical knowledge), and standards compliance (governance knowledge). That’s a broader skill set than implementation review, which mostly requires programming expertise.

I’ve started advising teams to include at least one consumer-side representative in every API spec review. If Team A is building an API that Team B will consume, someone from Team B should review the spec before implementation begins. This catches misalignments weeks earlier than the traditional approach of discovering integration issues during testing.

It’s slow at first. Teams aren’t used to reviewing specs with the same rigor they apply to code. But the payoff compounds quickly, because a spec defect caught at review time costs a fraction of what it costs when discovered through a broken integration in staging.

My Honest Assessment #

SDD at enterprise scale works. I’ve seen it work. But it requires more organizational investment than the tooling vendors suggest, and the governance challenges are genuinely hard.

The technology layer—OpenAPI, Arazzo, Specmatic, code generation tools—is mature enough. The organizational layer is where enterprises struggle: getting fifty teams to agree on naming conventions, enforcing spec-first workflows when deadline pressure encourages shortcuts, and building review processes that treat specs as first-class artifacts rather than afterthoughts.

If you’re considering adopting SDD at scale, start with the governance conversation, not the tooling decision. Figure out who owns API standards, how breaking changes get communicated, and what the enforcement mechanism is. Then pick tools that support those decisions.

The spec-as-source-of-truth model is powerful. But a source of truth only works if everyone agrees it’s the truth. At enterprise scale, achieving that agreement is the hard part—and it’s not a problem any tool can solve for you.