Midjourney and the Rise of Generative Media

Felipe Hlibco

July 12, 2022

I woke up this morning and generated a Renaissance painting of a cat in a spacesuit. It took about sixty seconds. The cat looked contemplative.

This is where we are now.

Midjourney opened its beta to the public today, and if you haven’t tried it yet, the onboarding experience alone tells you something about where generative AI is headed. You don’t download an app. You don’t sign up for a waitlist (well, not anymore). You join a Discord server; you type a text prompt in a chat channel; you wait. And then an image appears that didn’t exist thirty seconds ago.

David Holz—the guy behind Leap Motion, the hand-tracking hardware company—founded Midjourney back in February. The Discord server went live on March 14th. Four months later, the thing has a community that feels more like an art collective than a software product. People share prompts, remix each other’s outputs, argue about aesthetics in real time. It’s genuinely weird and genuinely interesting.

The Discord Distribution Model #

Here’s what I keep thinking about: Midjourney doesn’t have a website with a “Generate” button. It doesn’t have a mobile app. The entire product lives inside Discord channels. If you told a product manager two years ago that a billion-dollar-trajectory AI company would ship exclusively through a gaming chat platform, they’d have questioned your sanity.

But it works. And it works for a reason that I think matters more than the AI itself.

Discord gives Midjourney something no standalone app could: a built-in social layer. Every generation is semi-public. You see what other people are prompting; you see what’s possible before you even try. The community teaches itself, in real time, how to use the tool. No docs needed (though they exist). No onboarding flow. Just observation and experimentation.

This is a distribution insight, not a technology insight. DALL-E 2 announced in April and it’s arguably more technically impressive; OpenAI’s research on CLIP back in 2021 laid much of the groundwork for how these text-to-image systems understand language. But DALL-E 2 sits behind a waitlist. Midjourney sits in your Discord. Accessibility beats capability when you’re trying to build a movement.

What the Images Actually Look Like #

Let’s be honest about the output quality right now. Midjourney’s current model produces images that are evocative, stylized, sometimes stunning—and sometimes deeply wrong. Hands are a disaster. Text in images is nonsensical. Faces can slip into uncanny valley territory fast.

But the aesthetic defaults are remarkably good. There’s a painterly quality to Midjourney’s outputs that makes even mediocre prompts produce something you’d hang on a wall. Compare that to DALL-E 2, which aims for photorealism and often lands in an uncomfortable middle ground. Different philosophies; different strengths.

Google announced Imagen back in May—beautiful outputs in the demos, but no public access. That’s become a pattern: Google builds impressive research, publishes the paper, and then… waits. Meanwhile, Midjourney ships to a Discord server and lets people go wild.

I’m not saying Google’s approach is wrong (I work there; I understand the caution around responsible AI). But the gap between “we built something amazing” and “you can use something amazing” matters enormously for adoption.

The Democratization Question #

Every time a new creative tool launches, someone calls it “democratizing.” Photoshop democratized design. GarageBand democratized music production. YouTube democratized video distribution.

Generative image tools are different, though. They don’t lower the skill floor for an existing craft; they remove the craft entirely. You don’t need to learn anything about composition, color theory, or visual design to produce a striking image with Midjourney. You need to learn prompting—which is its own emerging skill—but that’s closer to writing a sentence than learning to paint.

Is that democratization? Or is it something else?

I don’t have a clean answer. Part of me thinks this is genuinely exciting; the ability to visualize ideas without years of training opens creative exploration to everyone. Part of me worries about what happens to illustrators, concept artists, stock photographers. Not in the abstract “technology displaces jobs” way. In the specific, near-term, “my friend does freelance illustration and this tool can produce 80% of what her clients need for $10/month” way.

Authorship Gets Complicated #

If I type “oil painting of a sunset over San Francisco in the style of Monet” and Midjourney produces something beautiful, who made it? I provided the concept; Midjourney provided the execution. The model was trained on (among other things) actual Monet paintings. Does Monet get credit? Does Midjourney? Do I?

Copyright law doesn’t have good answers here yet. The US Copyright Office has historically required human authorship for registration. But what counts as “human authorship” when the human’s contribution is a sentence of text and a click of a button?

These questions aren’t theoretical anymore; they’re practical. People are already using Midjourney outputs commercially—for book covers, marketing materials, concept art, social media content. The legal frameworks haven’t caught up.

What I’m Actually Excited About #

Forget the art debate for a second. What excites me about Midjourney—and this whole wave of generative image models—is the shift toward generative media broadly.

Text-to-image is the visible tip. But the same underlying approaches (diffusion models, CLIP-style alignment between text and visual representations) will extend to video, 3D, audio, and combinations we haven’t imagined yet. We’re watching the infrastructure layer for a new kind of media creation get built in real time.

The speed is staggering. DALL-E was announced in January 2021. Eighteen months later, there are multiple competing systems, a thriving community of users, and serious conversations about regulation. For context, it took roughly a decade for smartphone cameras to seriously threaten professional photography. This transition is measured in months.

So What Now? #

I’ve been playing with Midjourney for a few hours today and I’ve already generated maybe forty images. Some are garbage. A handful are genuinely beautiful. One—a cyberpunk cityscape at dusk—is now my desktop wallpaper.

I still don’t know if I “made” that image or if I just described it well enough for a model to make it for me. Maybe the distinction doesn’t matter as much as I think it does. Maybe the interesting question isn’t “who made this?” but “what do we do now that anyone can make this?”

We’re going to find out fast.