OpenAI Codex: Structuring AI-Native Teams

Felipe Hlibco

February 27, 2024

Six months into building DreamFlare’s engineering team with AI tools baked into the daily workflow, some opinions have formed. A few of them are probably wrong. But the experience turned concrete enough that the lessons on structuring teams — when AI pair programming is the default, not the exception — deserve sharing.

The 55% number and what it misses #

GitHub published a study in September 2023 showing that developers using Copilot completed tasks 55.8% faster than those without it. The earlier academic study from February 2023 (Peng et al., published in collaboration with GitHub Research) found similar results across controlled task scenarios. These numbers get cited constantly; they’re real.

But they measure the wrong thing.

Speed on individual tasks rarely bottlenecks most engineering teams. Architecture decisions, coordination overhead, debugging production issues, onboarding new engineers, code review quality — these determine team velocity. AI coding assistants affect all of them, in ways the productivity studies don’t capture.

At DreamFlare, the engineers who get the most out of Copilot aren’t the fastest typists. The best users review suggestions critically, catch when generated code goes subtly wrong, and use the tool to explore implementation options rather than just accept the first completion. That skill — AI-augmented judgment — never shows up in task-completion benchmarks.

Smaller teams, different shapes #

The conventional wisdom: “AI makes developers faster, so you need fewer.” That’s partially true; at our stage (about 15 people total) the engineering team stayed leaner than it would have without AI tools.

But “fewer developers” represents the shallow take. The deeper change sits in team composition.

Pre-AI, a typical feature team might look like: 2 backend engineers, 1 frontend engineer, 1 QA engineer, maybe half a designer. Post-AI — at least in our experience — the ratio shifts. Fewer people writing boilerplate code; more people making architectural decisions, reviewing AI-generated output, and designing systems correct by construction rather than correct by exhaustive testing.

The team shrinks, but it skews more senior. Or rather, seniority takes a different shape now — one that emphasizes judgment over output volume.

What “senior” means now #

This part makes a lot of engineers uncomfortable.

For years, seniority in software engineering reflected raw productivity. Senior engineers shipped more features, wrote code faster, knew the codebase intimately enough to stay efficient. AI tools compress that advantage. A mid-level engineer with Copilot matches (or exceeds) the raw code output of a senior engineer without it.

So what does seniority mean in an AI-augmented world?

System design. AI generates code for a given specification; deciding what the right specification should be remains human work. Understanding tradeoffs between consistency and availability, choosing the right data model, knowing when to build vs. buy — that’s senior territory, and AI leaves it untouched.

AI output evaluation. Someone has to verify that generated code stays correct, performant, secure, and maintainable. Catching subtle bugs AI introduces — the kind that pass tests but fail in production under edge conditions — requires deep enough understanding to recognize what looks plausible but breaks at scale.

Cross-team coordination. AI skips architecture reviews. Negotiating API contracts between teams, resolving the organizational tensions that arise when two teams need the same database table to do different things — that remains stubbornly human.

Mentorship, but differently. Junior engineers now need to learn how to work with AI tools effectively — not just how to write code. Teaching someone to evaluate Copilot suggestions critically differs from teaching them to write clean JavaScript. Senior engineers who handle both command enormous value.

Restructuring at DreamFlare #

When joining DreamFlare as CTO in October 2023, the luxury of building the team from scratch made it possible to design the org structure with AI-native workflows in mind rather than retrofitting an existing team.

What changed:

Flatter hierarchy. No team leads in the traditional sense. Every engineer owns their domain and makes local decisions. AI tools give individual engineers enough capacity that one person handles scope that previously required a small team. The tradeoff: this demands high trust and strong engineering judgment from every hire.

Explicit review culture. Every PR gets reviewed by a human, regardless of whether human hands or AI assistance wrote the code. The standard stays identical: does the code work correctly, handle edge cases, and maintain architectural patterns? Reviewers apply more skepticism toward “too clean” code, which often signals AI generation that skipped critical evaluation.

Pair programming as default, AI as third participant. Most non-trivial work happens in pairs. The AI tool runs for whoever types. The combination — one person driving with AI suggestions, one person reviewing and questioning in real time — catches more issues than either solo-with-AI or traditional pairing alone.

Onboarding acceleration. This one surprised me. New hires at DreamFlare reach productivity faster than at any previous company I’ve managed; AI tools account for a big part of that. A new engineer uses Copilot to explore the codebase conversationally, generate boilerplate for new features, and focus learning time on the genuinely complex parts of the system.

Hiring criteria shifted #

The traits that matter now vs. three years ago:

Adaptability over specific stack expertise. The stack changes. AI tools evolve. Give me someone who learns quickly and evaluates new tools critically over someone expert in a specific framework that may obsolete (or dramatically AI-augment) in 18 months.

Critical thinking over coding speed. Can a candidate spot subtle issues in AI-generated code? Can they articulate why an approach fails — not just that it does? Harder to interview for than FizzBuzz; matters far more.

Communication skills. Smaller teams with more autonomy demand clear, async communication. Writing design docs, explaining decisions in PRs, articulating tradeoffs in Slack — soft skills become hard requirements.

Comfort with ambiguity. AI-native development remains new enough that best practices lack clear consensus. People who need a well-defined playbook struggle; people who operate in ambiguity and iterate thrive.

The honest uncertainties #

Whether our approach lands correctly remains an open question. Small team, specific product — what works for us may not generalize to a 500-person engineering org.

Some things still unresolved:

Do AI tools make generalists more valuable than specialists? The current approach favors generalists (or “T-shaped” engineers with broad competence and one deep area). Specialists with AI augmentation may prove more effective. Too early to call.

Does the “smaller, more senior” model hold at scale? Above-market pay for every hire works at our scale. At 50 engineers, the salary bill may force a rethink.

Where does junior development happen? If the team skews senior and AI handles the “learning-by-doing” work that used to grow junior engineers into mid-levels, what does the career ladder look like? No good answer here; the industry needs to figure this one out.

Where this lands #

AI-native team structure differs from “same team minus a few people.” The shape changes fundamentally — flatter, more senior, more autonomous, with different hiring criteria and different failure modes.

Companies that treat AI tools as a headcount reduction opportunity face disappointment. The ones that rethink how teams work, what roles look like, and what skills matter — those build something genuinely new.

DreamFlare experiments. Updates follow as the model proves or fails. For now, the early returns promise enough to bet the company on this approach.