September 02, 2025

Defending your code quality from AI coding

Your engineering organization is starting everybody on AI coding. Won’t this hurt quality?

How’s your automated verification?

When you’re trying to maintain quality as your developers turn up the AI coding, automated verification is obviously the first approach to lean on. You don’t want your senior developers being swamped in low-quality code review, and test suites, linters, security tools and the like can help keep the noise manageable.

But let me share a dirty secret from more than a decade consulting with software companies: If you work at an average software company, your automated checks are probably not good enough.

Everybody knows what a linter is, but I’ve seen lots of companies implement them dysfunctionally:

Run them against PRs, but without blocking a merge, so they’re usually ignored
Run them to block PRs, but don’t ever tune their configurations over time, so your rules never quite fit the working style of your company, and you hurt satisfaction and retention for your most passionate developers
Don’t run them at all

This is because linters themselves may be clear-cut, but how to use them is highly subjective. There are lots of tiny judgment calls involved in correlating the narrow execution of an automated check to the more nebulous idea of quality. Is it true that requiring all methods to be 15 lines or fewer automatically leads to more maintainable code? Do you feel confident enough to block all PRs based on that opinion?

The thing about the old days, they the old days

In the age before AI coding, many teams got by without much help from automated verification. Static checks were poorly tuned and underused, humans maintained quality in various planned and ad-hoc ways, and everybody muddled through. This was apparently fine.

But today, you run the risk of drowning in a swamp of messy, half-finished PRs. So it’s time to roll up your sleeves and use these tools for real:

Automated checks should be well-tuned enough that you feel confident using them to block PRs.
Most automated checks have ways to exclude sections of code from enforcement: You should add friction around adding exclusions or make them completely impossible. For example, it’s trivial to write a CI script that counts rubocop:disable lines and fails when the count goes above an arbitrary limit.
These configurations should be explicitly owned, either by a high-level engineer in a “benevolent dictatorship” model, or collectively with an explicit process for any developer to propose changes. The exact type of ownership matters less than the idea that they will be tuned over time.

Once you’ve invested time improving your automated verification, make it easy for AI tools to hook into them for local development. If there’s any friction to a developer telling their AI to run verifications on their changed files, smooth it down.

With AI coding, you’ll find that the cost-benefit ratio of automated verifications has shifted. It’s easier to be strict when developers are getting the AI to help with conformance. And AIs don’t have feelings to hurt when you ask them to redo everything.

AI code review: Not there yet

As for the issues that are beyond static analysis: You’re not going to get a lot of help from AI code review. Maybe there’s a new brilliant tool I haven’t seen, but personally I have found AI code review to be more noise than signal.

Part of the problem is that a proposed code change has to be evaluated on at least two axes:

Imperfection: Are there suboptimal design choices in this PR?
Significance: Are those mistakes large enough that it’s worth blocking approval?

AIs seem to be decent at detecting imperfection, but they’re poor at grasping significance. The result: A chatty review that gives you a dozen hints you could look into if you wanted to, but very little help on whether to approve or ask for changes.

Help is on the way, maybe?

This is where we are in September 2025: Generating code is a lot faster, reviewing it takes about the same amount of time, and there aren’t any silver bullets to address the mismatch. Still, I am personally optimistic that better practices and tools will emerge around this problem.

Some possible areas to watch:

Holding AI-generated code to a higher standard

Imagine defining a stricter set of standards for only AI code:

Stricter linter configs
AGENTS.md files that tell the AI to spend more time on subjective standards like meeting the single-responsibility principle or deduplicating code

… and then you tell your engineers “always use these checks when you use AI”.

Maybe you can raise the baseline of AI code quality this way? Of course, agents make opaque judgement calls when given ambiguous advice like “try to deduplicate code as you go.” And it’s hard for a reviewer to tell if a junior engineer is actually using stricter checks locally, or they just misconfigured their agent and conveniently forgot to fix it.

Codifying expectations about minimum PR quality

As AI coding continues to grow, I imagine we’ll hear about companies emphasizing a minimum quality standard when opening a PR in the first place. I’ve already heard one story of an engineer spamming their teammates with low-quality AI-generated PRs. Now that generating code is nearly free, it can lead to some antisocial behaviors.

Of course, this opens up one more area that an engineering manager needs to be aware of. As always, it’s best not to get distracted by the shiny new tool and to focus on the root issues surrounding interpersonal collaboration. The standard of behavior is not “you must use AI correctly”, it’s “you must contribute holistically to the team’s efforts, and not jam up your teammates in an effort to cheat the system”.

Transitioning to languages and frameworks with automated verification built-in

AI has a head start on coding vs. so many other kinds of knowledge work because many of the rules in writing software are unambiguous and cheap to verify. It seems likely that as AI coding becomes the new normal, teams will shift towards languages and frameworks that offer even more built-in rules, like static typing or API schemas.

This one makes me a little wistful, as someone who’s actually given talks about the beauty of dynamic typing. Such is the price of progress, I suppose.

Building an AI code review that works

It’s fun to think about how you’d make AI code review actually helpful. Maybe you set up a bot with the ability to reject a PR when some imperfection metric passes a threshold, steer it with the right reinforcement learning, and after X iterations you’ve got something that actually helps your team instead of getting in their way. Coming soon from an AI coding startup near you?

Human code review: Works fine, I guess

For now, I think senior engineers are just going to be doing more code review and less coding themselves. This will be less fun for some of us. But for those of us who can adapt, we’re going to help our companies deliver high quality code faster than we would’ve thought possible. Even if the transition is awkward for a while.

← → Top