DALL-E 2: Moving Image Synthesis into the API

Felipe Hlibco

September 6, 2022

Last week, OpenAI launched outpainting for DALL-E 2. You take an existing image, and the model extends it beyond its original borders — generating new content that matches the style, lighting, and subject matter of what’s already there. The demos look magical. A Vermeer painting expanded to reveal the rest of the room. A photograph extended to show what was just outside the frame.

It’s impressive. But I keep thinking about something else entirely.

Since April, when DALL-E 2 was announced, the conversation around generative AI art has been dominated by what these models can produce. “Look at this image.” “Is this real or AI?” “Can AI be creative?” Those are interesting philosophical questions, sure. But from a platform perspective — and this is the lens I look through every day — the more consequential question is: what happens when image generation becomes a programmatic capability?

Not an art tool. An API.

The DALL-E 2 Timeline So Far #

Let me trace the arc, because the speed here is worth noting.

April 6, 2022: OpenAI announces DALL-E 2. The research paper and demo images circulate widely. The quality gap between DALL-E 1 (January 2021) and DALL-E 2 is enormous; 1 produced collage-like compositions, 2 produces coherent, high-resolution imagery.

July 20, 2022: Beta access opens. OpenAI starts working through a waitlist of over a million people. Users get a credit system: a fixed number of free generations per month, with the option to buy more.

August 31, 2022: Outpainting launches. This is the feature that shifts DALL-E 2 from “generate an image from nothing” to “edit and extend existing images,” making it useful for actual creative workflows rather than just novelty.

The direction is clear. OpenAI is systematically moving DALL-E 2 from a research demo toward a product. And the next logical step — the one I think changes everything — is an API.

Why the API Is the Real Milestone #

Right now, if you want to use DALL-E 2, you go to OpenAI’s web interface, type a prompt, and wait. It’s a consumer experience. Fun, but limited.

An API changes the equation entirely. With an API, image generation becomes a building block. Any developer can embed it in any application. Consider what that enables:

Design tools. Figma, Canva, Adobe — any design application could add “generate variations” or “extend this canvas” as native features. Instead of searching stock photo libraries, a designer describes what they need and generates it in context.

E-commerce. Product images, lifestyle shots, background removal and replacement — all of these are expensive to produce with traditional photography. An API that generates product imagery from descriptions (or modifies existing imagery) could slash content production costs by 90%.

Gaming and entertainment. Procedural content generation in games has been around for decades. But generating high-quality textures, concept art, and environmental imagery from text descriptions? That’s a different level entirely. An indie game studio with a DALL-E 2 API key has capabilities that AAA studios spent millions on just two years ago.

Marketing and content. Social media managers generate dozens of image variations for A/B testing. Blog authors create custom header images without hiring a designer. Email campaigns include dynamically generated visuals tailored to the recipient’s context.

None of this works with a web UI. All of it works with an API.

The Pricing Question #

OpenAI hasn’t announced API pricing yet, but the consumer credit system gives some hints. Beta users currently get 50 free credits per month (one credit = one generation or edit) with additional credits at roughly $0.13 each.

For an API, I’d expect something closer to the GPT-3 pricing model: per-request pricing based on resolution and complexity. My guess? Somewhere in the $0.01-$0.03 per image range for standard resolutions. That price point makes it viable for high-volume applications (thousands of images per day) while still generating meaningful revenue.

The interesting strategic question is how pricing interacts with the open-source competition. Stable Diffusion just released its model weights publicly. You can run it locally for free. So what’s the value proposition of paying OpenAI per image when a comparable (if slightly lower quality) alternative costs nothing?

The answer, I think, is the same answer that makes any managed service viable: convenience, reliability, and incremental capability. Running Stable Diffusion locally requires a decent GPU, technical setup, and maintenance. An API call requires none of that. For a product team that needs to ship next quarter, the API wins every time. Even if the per-image cost is nonzero.

Outpainting and the Editing Paradigm #

The outpainting launch on August 31 deserves more attention than it got. Most coverage focused on the cool demos (and they are cool), but the underlying capability shift is what matters.

Before outpainting, DALL-E 2 had two modes: generation (create from nothing) and inpainting (edit a selected area within an existing image). Outpainting adds a third: extension (generate new content beyond the borders of an existing image).

Together, these three capabilities form a complete image editing paradigm:

Generate the base image from a text prompt
Inpaint specific regions that need refinement
Outpaint to extend the canvas in any direction

That’s not a toy anymore. That’s a workflow. And when that workflow becomes an API, application developers can build sophisticated image editing pipelines without touching a single pixel manually.

The implications for creative tools are obvious. But I’m more interested in the implications for any application that deals with images at scale. Real estate platforms generating virtual staging. Travel sites creating destination imagery. Fashion platforms showing how clothes look in different settings. Every one of these use cases currently depends on expensive photography or stock imagery.

The Content Moderation Challenge #

OpenAI has been cautious about DALL-E 2 access — the waitlist, the usage policies, the content filters. That caution makes sense. Generative image models can produce harmful content: deepfakes, violent imagery, non-consensual intimate images, misinformation-enabling content.

An API amplifies this challenge by orders of magnitude.

With a web UI, OpenAI can moderate at the interface level. Content filters screen prompts before generation. Human review catches edge cases. The rate of generation is limited by the speed of human typing.

With an API, generation is limited only by the rate of API calls. A thousand images per minute, each generated programmatically, each potentially testing the boundaries of the content policy. The moderation system has to operate at machine speed, not human speed. And machine-speed moderation is an unsolved problem across every content platform.

This is, I suspect, why the API hasn’t launched yet even though the technology is clearly ready. The engineering problem is straightforward: serve the model behind an endpoint. The trust and safety problem is hard.

What the Competitive Landscape Looks Like #

As of September 2022, the generative image space has three major players with very different strategies:

DALL-E 2 (OpenAI): Highest quality (arguably), closed model, moving toward API and platform. Revenue model: per-image pricing. Moat: quality, brand, first-mover advantage with developers.

Midjourney: Strong aesthetic defaults, Discord-native distribution, community-driven. Revenue model: subscription tiers. Moat: community, UX, artistic style.

Stable Diffusion (Stability AI + collaborators): Open-source weights, runs locally, permissive license. Revenue model: cloud services and enterprise support (for Stability AI). Moat: openness, ecosystem, zero marginal cost for users.

Each strategy has merit. But I think the API play — which only OpenAI is positioned to execute right now — has the largest addressable market. Artists and creators are important, but they’re a fraction of the potential users. Developers building products that happen to need image generation? That market is enormous.

The Platform Shift #

When GPT-3 launched as an API in 2020, it kicked off a wave of AI-powered startups that embedded language generation into applications that had nothing to do with “AI” as a primary value proposition. Copywriting tools, customer support bots, code assistants, summarization services — all built on the same underlying API call.

I think DALL-E 2’s API will trigger the same dynamic for images. The companies that will benefit most aren’t AI companies. They’re companies that have an image problem — literally — and suddenly have a programmable solution.

The transition from “research demo” to “developer platform” is where the real impact happens. Not because the technology changes (it doesn’t), but because the audience changes. Researchers care about capability. Developers care about integration. And when developers can integrate, the use cases multiply in ways nobody predicted.

We’re about to watch that multiplication happen with images. And I don’t think we’re ready for how fast it’s going to move.