Rethinking Errors, Warnings, and Lints in Large Codebases

Felipe Hlibco

July 26, 2021

Last week I opened a pull request on one of our older services at TaskRabbit and got hit with 347 ESLint warnings. Not errors. Warnings. The build passed, the tests passed, and the PR got merged. Nobody looked at a single one of those 347 signals.

That moment bothered me more than it should have.

I’ve been managing large codebases for a while now, and the way most teams handle the spectrum between errors, warnings, and lints is — to put it politely — chaotic. We tend to treat them as a binary: either something blocks the build or it doesn’t. The nuance in between? Lost. Completely.

The Spectrum Nobody Talks About #

Errors are straightforward. Your code won’t compile (or won’t pass type-checking, in TypeScript’s case). You fix them or nothing ships. There’s no ambiguity there.

Lints sit at the other end. They’re suggestions. Style preferences. “Hey, you might want to consider…” signals that carry zero urgency by design. You can disagree with a lint rule and that’s perfectly fine; the tool expects it. I have opinions about semicolons that I will take to my grave, but they’re just that — opinions.

Warnings live in the murky middle, and that’s where things fall apart. A warning says: “This probably works, but something smells off.” It’s not confident enough to block you, but it’s not trivial enough to ignore. The problem is that developers — under deadline pressure, with 300 other things to think about — treat warnings exactly like lints. Background noise. Filed under “I’ll get to it later” (spoiler: they don’t).

I’ve watched this pattern play out across three companies now. The warning count creeps up over months; nobody notices because the build stays green. By the time someone does notice, you’re looking at thousands of warnings, and the signal-to-noise ratio has cratered. Good luck triaging that mess.

The `-Werror` Trap #

The instinctive reaction (I’ve had it myself) is to flip the switch: treat all warnings as errors. -Werror in C/C++. "strict": true with all the trimmings in TypeScript. Set your ESLint to error-level across the board.

It sounds principled. Zero tolerance. Clean codebase.

In practice, it backfires. Hard.

Here’s why: not all warnings are created equal. Some are catching real bugs — unused variables that indicate dead code paths, implicit type coercions that will bite you in production. Others are stylistic preferences that the toolchain authors elevated to warning status because they ran out of categories. When you treat both the same way, you’re telling developers that a missing trailing comma is as serious as an unhandled null reference. Which, come on.

Developers are smart. They notice. And once they’ve been blocked three times by a trailing comma “error,” they start seeing the entire warning system as adversarial. That’s when the real damage happens — they lose trust in the tooling, and legitimate warnings get dismissed along with the noise. The baby goes out with the bathwater, as my grandmother used to say (about completely different things, but the metaphor holds).

Rust gets this right, by the way. Its lint system has four levels: allow, warn, deny, and forbid. That granularity matters. You can deny the things that actually catch bugs and warn on style preferences. The compiler gives you a vocabulary to express intent; most other toolchains give you a binary switch. I sometimes wonder if the people designing these tools have ever actually watched a team of five engineers try to ship under a deadline.

False Positives Kill Trust #

I keep coming back to trust because it’s the variable nobody measures. You can count warnings. You can track lint violations. But you can’t easily quantify how much your team trusts the signals their tools produce.

Every false positive is a withdrawal from that trust account.

At TaskRabbit we had a period where one of our custom ESLint rules was firing on valid code about 15% of the time. Not a huge rate, but enough that engineers started reflexively adding // eslint-disable-next-line comments without reading the warning. When we later updated the rule to catch an actual pattern that caused production bugs, those same engineers muscle-memoried right past it. The disable comments were already there; the habit was already formed.

I think about this constantly. The tooling ecosystem is obsessed with coverage (more rules, more checks, more signals) but spends almost no energy on precision. In machine learning terms, we’re optimizing for recall at the expense of precision, and the cost of false positives is developer attention — the scarcest resource in any engineering organization. Scarcer than compute, scarcer than office snacks, scarcer than uninterrupted focus time on a Tuesday afternoon.

A Phased Approach That Actually Works #

What I’ve landed on — and this is evolving, not gospel — is a phased model.

During local development, warnings stay as warnings. They’re informational. The IDE shows them, developers can act on them or not. No judgment. If someone wants to ship with 47 warnings on their branch, that’s their call. (It’s probably not a great call, but it’s theirs.)

In CI, the calculus changes. The PR check runs a stricter ruleset where certain warnings become blocking. Not all of them (that’s -Werror all over again) but the ones the team has explicitly agreed represent real invariants. Unused imports. Any-typed function parameters. Known unsafe patterns. The stuff that has actually caused outages.

The key word is “explicitly agreed.” When the team decides together which warnings graduate to blocking status, they own the decision. It’s not some config file someone set up two years ago that nobody remembers. Each blocking rule has a reason, and that reason is documented. Usually in a comment above the rule. Sometimes in a doc nobody reads. But it’s there.

We review the blocking list quarterly. Rules that produce false positives get demoted. Rules that catch real bugs get promoted. The list is alive; it reflects what the team actually values, not what some style guide prescribed. Style guides are written by people who don’t have your codebase, your constraints, your legacy decisions made in 2019 under completely different circumstances.

The Volume Problem #

At scale, even well-categorized warnings create a triage problem. I’m talking about codebases with 500+ files and dozens of contributors. You can have a perfect taxonomy of errors, warnings, and lints, and still drown in the sheer volume.

A few things help.

First, trend lines matter more than absolute counts. Going from 200 warnings to 250 in a sprint is a signal. Sitting at a stable 200 for six months is just your baseline — you’ve implicitly accepted it. Maybe that’s fine. Maybe it’s not. But at least you know.

Second, ownership is everything. Warnings without owners are warnings nobody fixes. We tag lint rules to teams (or at least to service boundaries) so that when the count spikes, there’s a clear escalation path. It’s not glamorous work — nobody puts “reduced ESLint warnings by 40%” on their promotion packet — but it’s the kind of infrastructure that keeps codebases healthy.

Third, new code vs. existing code. Blocking warnings on new code while allowing them in legacy modules gives you a ratchet: the codebase can only get better, never worse. Grandfathering existing violations (and tracking them separately) removes the “boil the ocean” anxiety that stops teams from tightening standards at all. I’ve seen teams paralyzed by the prospect of fixing 3,000 existing warnings. Don’t. Just don’t let there be 3,001.

The Real Question #

The taxonomy itself — error vs. warning vs. lint — matters less than the intent behind it. What are you trying to communicate to the developer encountering this signal? “Stop, this is broken.” “Heads up, this might cause problems.” “Consider this alternative.”

Those three messages require different UX. Different severity. Different escape hatches. Most toolchains collapse them into two buckets and call it done.

I don’t think that’s good enough. Not for teams managing a million lines of code across time zones, where the person who wrote the rule and the person who encounters the violation have never met. Where the rule was written in San Francisco at 2pm and the violation shows up at 2am in Bangalore.

The tooling should express what you mean. If it can’t, the gap gets filled with tribal knowledge and disable comments. And tribal knowledge doesn’t scale. It ages out, it walks out the door when people leave, it mutates in the retelling like a game of telephone played across Slack threads and Zoom calls.

So yeah. 347 warnings. That’s where this started. And maybe the real lesson is that the number itself isn’t the problem — it’s what the number represents. A system that’s trying to communicate but hasn’t quite figured out how to be heard.