SEMR Report 2025: AI Moves from Experimental to Essential

Felipe Hlibco

January 26, 2025

Jellyfish just published their 2025 State of Engineering Management Report, and one number jumped off the page: 90% of engineering teams now use AI coding tools. A year ago, that figure was 61%. Only 3% of respondents said they have no plans to adopt.

The adoption question is settled. The measurement question—wide open.

The Adoption Curve Collapsed #

I’ve managed engineering teams through several technology shifts—containerization at TaskRabbit, cloud migration at earlier companies—and none moved this fast. From “should we try Copilot?” to “everyone’s using something” in twelve months. GitHub Copilot leads at 42% adoption, but here’s the part that surprised me: 48% of teams report using two or more AI coding tools simultaneously.

Two or more. That means engineers are using Copilot in their IDE and Cursor for refactoring and maybe Gemini Code Assist for documentation. They’re tool-shopping in real time, mixing and matching based on which model handles which task better.

From a management perspective, this is both exciting and chaotic. Exciting because developers are self-organizing around productivity. Chaotic because you now have multiple AI tools touching your codebase with no unified governance.

The Money Moved #

The SEMR data on budgets tells a story about maturity. Innovation budgets for LLM tools dropped from 25% to 7% of total LLM spending. That’s not a reduction in spending—it’s a reclassification. AI coding tools moved from “experimental innovation budget” to “centralized IT and business-unit budgets.” In accounting terms, they went from R&D to operating expenses.

That shift matters because operating expenses get scrutinized differently. Innovation budgets have slack built in; you’re expected to experiment, and some experiments fail. Operating expenses need justification. Someone is going to ask “what are we getting for this?” And right now, most organizations can’t answer that question well.

The Measurement Gap #

Here’s the number that bothers me most: only 20% of engineering teams use engineering metrics to track AI tool impact.

Twenty percent.

We’ve deployed AI coding tools to 90% of teams, 62% of respondents report at least a 25% increase in developer velocity, and four out of five organizations have no systematic way to verify those claims. The velocity improvement could be real. It could be perception bias. It could be that developers feel faster because the tedious parts—boilerplate, test scaffolding, documentation—got automated, while the hard parts (architecture decisions, debugging race conditions, understanding legacy code) take the same time they always did.

I’ve seen this pattern before. When we introduced feature flags at TaskRabbit, everyone “felt” like deployments were faster. They were—by about 15%. But the team’s initial estimate was 40%. Perception amplifies improvement. Without measurement, you can’t distinguish genuine gains from enthusiasm.

What Would Good Measurement Look Like? #

The SEMR report identifies this gap but doesn’t prescribe solutions, which is honest. Measuring AI’s impact on engineering productivity is genuinely hard. Cycle time? Lead time? Lines of code? (Please no.) PR throughput? Bug escape rate?

My take: measure outcomes, not outputs. I don’t care if a developer writes 30% more code per day if the bug rate goes up proportionally. I care about:

Time from ticket to production (cycle time, but measured end-to-end, including review and QA)
Change failure rate (are we shipping more bugs alongside the faster output?)
Rework ratio (how much code written with AI assistance needs significant revision?)
Developer satisfaction (are engineers actually happier, or just busier?)

That last one isn’t soft. 81% of developers in the SEMR say they expect a significant portion of development to shift from humans to AI. If that expectation creates anxiety rather than excitement, your velocity gains will evaporate through attrition.

The Leadership Obligation #

Engineering managers have a window right now—maybe six to twelve months—where the expectation from leadership is “adopt AI tools.” That’s a relatively low bar. The next wave of expectation will be “prove AI tools are working.” Teams that built measurement infrastructure during the adoption phase will have answers ready. Teams that didn’t will scramble.

My advice to engineering leaders reading this: start measuring now, even imperfectly. A flawed baseline is infinitely more useful than no baseline. Track cycle time before and after AI tool adoption at the team level. Run quarterly developer surveys with specific questions about AI tool utility (not general satisfaction—specific: “which tasks does Copilot help with most?” and “where does it get in the way?”). Correlate AI tool usage with code quality metrics you already track.

The SEMR confirms what I’ve been seeing at Google and hearing from peers across the industry: AI coding tools work. The question is no longer whether to adopt them. It’s whether you’re managing the adoption or just letting it happen.

The 90% already answered the first question. The 20% measurement rate suggests we haven’t even started on the second.