ai reality check

Vibe Coding Is Dead. Karpathy Just Named What's Next.

Imran GardeziFebruary 26, 202613 min read

Six months ago, I called vibe coding the most dangerous trend in software.

People told me I was overreacting.

Last week, Andrej Karpathy, the guy who literally coined the term "vibe coding," declared it passe. And named what comes next. He didn't walk it back quietly. He put a name on the replacement. Agentic engineering. His words.

Today I'm going to break down exactly what happened, why the backlash is justified, and give you the 3-level framework for using AI without destroying your codebase.

And this isn't one guy changing his mind. The leading voices are shifting. Hackaday covered a research paper: "How Vibe Coding Is Killing Open Source." Red Hat published "The Uncomfortable Truth About Vibe Coding." Stack Overflow's 2025 Developer Survey (49,000 respondents) found 66% of developers say "almost right" AI code is their number one frustration. Trust in AI accuracy dropped from 40% to 29%. Eleven points in a single year.

The hype cycle is cracking. And the people who built their entire workflow on vibe coding are about to find out what that costs.

I've been building production software for 15 years. Shopify. Brex. Motorola. Pfizer. I prompted Claude 47 times yesterday. I'm not anti-AI. I'm anti-reckless. And I've been inside the codebases where vibe coding is doing real damage.

Let me show you.

The Real Damage

Let me tell you what I'm seeing in the field. Because the data is one thing. The reality is worse.

I work with companies. I get inside their codebases. And over the last six months, a pattern has emerged that is devastating.

True story. Client brings me in. Series A startup. "We moved fast with AI tools. Now things are breaking." I open the codebase.

First thing I find: a PKCE authentication flow. That's the security handshake your app does when users log in. The LLM generated it. It looked right. It passed basic tests. But there was a subtle flaw in how it handled the code verifier. In production, roughly one in every fifty login attempts would silently fail. Users would get a blank screen and leave. They'd been losing customers for three months. Nobody knew.

Second thing I find: ESLint rules. The team had them configured. Good rules. But the AI-generated code ignored them. Not sometimes. Every time. The AI would write code, the developer would accept it, and the linting errors would get suppressed because the AI's output "looked right."

Third thing: duplicate logic. The same business rule, pricing calculation, implemented in four different places. Four. Slightly different each time. Because every time a developer asked the AI to build a feature, it generated the logic fresh instead of referencing what already existed. When they needed to change the pricing model? Four places to update. They found three. The fourth one silently charged customers the wrong amount for two weeks.

Think about that. A broken auth flow. Ignored linting rules. Duplicate business logic charging the wrong prices. Three problems. One codebase. One root cause.

Nobody read the code. They generated it. They accepted it. They shipped it. And they moved on to the next prompt.

These weren't juniors. Experienced engineers. But the AI made them fast, and fast felt like productive. So nobody stopped to ask: do we understand what we're shipping?

The New Principle: 3 Levels of AI Maturity

So what's the answer? Is it "stop using AI"? No. That's stupid. That's like saying "stop using power tools because someone cut their finger." The answer is what Karpathy named. Agentic engineering. Let me break down what that actually means.

There are three levels.

The Vibe Coder prompts and prays. They accept whatever the AI generates. They don't read the output carefully. They don't understand the architecture. They don't write tests for edge cases. They ship fast. They ship blind.

This is a kid with a machine gun. A veteran with a bayonet is dangerous because they know where to strike. They've seen the battlefield. They know what good looks like. They can spot when AI goes off the rails. A kid with a machine gun? They'll spray in every direction. They'll hit something. They just won't know what. And they won't know the difference between a hit and a disaster until it blows up. That's Level 1. Dangerous.

The Tool User employs AI tactically. They generate boilerplate. They get unstuck on syntax. They use it for refactoring. But they still make all the architectural decisions themselves. They still review every line. They still write the tests. They still own the system.

This is a carpenter with a power saw. More productive. Still skilled. Most good engineers are here. And honestly? Level 2 is fine. You'll ship. You'll be productive. You won't blow things up. But you're leaving performance on the table. The Tool User treats AI as a typing accelerator. They haven't yet learned to use it as a thinking partner.

This is the new thing. This is what Karpathy is pointing at. This is where AI becomes a genuine collaborator, but only because the human brings genuine judgment.

The agentic engineer doesn't prompt for code. They prompt for options. They define the architecture first. They set constraints. They write specs. Then they let AI generate within those constraints. And they verify everything against the spec. It's the difference between "build me an auth system" and "implement PKCE auth using this exact flow, these exact token lifetimes, these exact error states, verified against these exact test cases." The first gives you vibe code. The second gives you engineered software.

This is a general with a battle plan and an army of specialists. The AI is the army. But the general picks the battles, draws the lines, and knows when to retreat.

And yes, you might vibe-code a throwaway prototype. That's fine. The problem is when Level 1 becomes your production default.

At Modh, when we build client systems, we write the architecture spec before a single AI prompt fires. We define the data model, the permission boundaries, the error states. Then we let AI generate within those guardrails. The discipline we practiced at Shopify and Brex before AI existed is exactly what agentic engineering formalizes for the AI era. Same principles. New context. At Brex, we handled billions in transactions. You know what happens when your auth flow fails silently at a financial services company? You don't lose customers. You lose your banking license. That's why judgment isn't optional. It's the whole job.

Now let's talk about what happens when this spreads. Because the broken auth flow? That's a symptom. The disease is deeper.

Tailwind's founder publicly attributed layoffs to AI's impact on their business. Seventy-five percent of their engineering team. Gone. Their documentation traffic dropped 40%. Not because fewer people were using Tailwind. Because developers were asking the AI instead of reading the docs. And here's the kicker: documentation was also Tailwind's monetization surface. Fewer eyeballs on docs meant less revenue. The business model got hollowed out from underneath them. Not by a competitor. By the workflow shift itself. That's a second-order effect nobody predicted. AI didn't just change how code gets written. It changed which businesses survive.

At the team level, the cascade is just as brutal. Code review becomes theatre. Nobody's reading the AI output carefully. They're scanning it. "Looks right" becomes the new standard. And "looks right" is how bugs hide.

I've seen this play out. Pull request comes in. 400 lines of AI-generated code. The reviewer opens it, scrolls for about eight seconds, leaves a comment: "LGTM." Looks good to me. Four letters. Eight seconds. 400 lines of code that now lives in production. Nobody read it. Nobody questioned it. Nobody tested the edge cases. And three weeks later, when that code breaks at 2 AM, the person who approved it says: "I thought the AI handled that." No. You handled that. You clicked approve.

Test coverage drops. Not because people stop writing tests. Because the AI doesn't write tests for the edge cases it doesn't know about. And the humans stopped thinking about edge cases because "the AI handles it." Trust erodes between founder and team, between engineering and product, between company and customers.

I'll say it again. The speed was an illusion. You felt fast. Your sprint velocity looked great. Your Jira board was green. But under the surface, every shortcut was compounding. And compound interest doesn't care about your dashboard.

Stack Overflow's 2025 Developer Survey: only 29% of developers trust AI-generated code. Down from 40%. That's engineers telling you the code AI writes is not trustworthy without human judgment.

And if your team doesn't have that judgment? You're compounding technical debt at 10x speed.

The Framework: How To Do It

Here's what this looks like in practice. Because I'm not going to preach without showing you how.

Three practices. Five steps. Memorize three. Implement five.

Practice one: Define before you generate.

Step one: Architecture before AI. Before you let AI generate a single line of code, you need to know what you're building. Data model. Permission model. Core user journeys. System boundaries. Write it down. I'm not talking about a 50-page architecture doc that takes a quarter. I'm talking about one page. One diagram. Thirty minutes of clarity that saves thirty days of rework.

Simple rule: if you can't explain the architecture in five minutes to another engineer, you don't understand it well enough to let AI code for it.

Step two: Constraints, not prompts. Stop asking AI open-ended questions. "Build me a dashboard" is vibe coding. "Build a dashboard showing these three metrics, pulling from this API, refreshing every 30 seconds, handling loading and error states, matching this component library." That's agentic engineering.

The more constraints you give, the better the output. Because constraints are judgment. And judgment is what makes the code work. Every constraint you add is a decision you've already made, which means the AI can't make it wrong. Open-ended prompts are invitations for AI to make architectural decisions it has no business making.

Practice two: Own what you ship.

Step three: Review like you wrote it. Every line AI generates, review it as if you wrote it yourself. Because you're shipping it. Your name is on it. Your team is maintaining it.

I watched an engineer accept 200 lines of AI-generated auth code with a two-second glance. Three weeks later. 2 AM. His phone buzzes. Production is down. Auth is broken. Users locked out. Three hours to diagnose code he never actually read.

"The AI wrote it" is not a root cause. You accepted it. You own it.

Step four: Test the edges, not the happy path. AI is phenomenal at the happy path. User logs in, clicks the button, sees the result. Beautiful. But what happens when the network drops mid-request? What happens when the user double-clicks? What happens when the input is 10,000 characters instead of 100? The AI doesn't think about that. You do. That's your job now. The happy path is the easy part. The edges are where production systems live and die.

Here's the proof this works. One team I worked with adopted this approach. Review time went up 20%. Bug reports dropped 60% in the first sprint. Net velocity: faster, not slower. That's not theory. That's what I measured.

Practice three: Measure the debt.

Step five: Track it or drown in it. If you're not tracking code quality metrics (duplication, test coverage, lint compliance, review thoroughness), you're flying blind. You'll feel fast. Your dashboard will look green. And underneath, the technical debt is compounding like credit card interest.

And I mean that literally. Technical debt works exactly like financial debt. You're paying interest whether you see the invoice or not. Every feature that takes longer than it should? That's interest. Every recurring bug your team keeps patching? That's interest. Every developer who quits because the codebase is a nightmare? That's interest. You didn't see a bill. But you paid it. Every single sprint.

The teams that track it catch it at 5%. Manageable. The teams that don't find out at 50%. By then, you're not iterating. You're rebuilding.

Velocity goes up because you stop spending 40% of your sprint debugging AI-generated code nobody understood. That's not a productivity hack. That's removing waste.

The Close

Here's what's happening right now.

The best teams, the ones that were disciplined before AI, are becoming unstoppable. AI amplified what was already there. Good architecture. Fast feedback loops. Clear ownership. Strong code review culture. Add AI to that foundation? Devastating output. They were fast before. Now they're unreachable.

The worst teams, the ones that were chaotic before AI, are in freefall. More code generated faster with less understanding. More debt compounding faster with less visibility. They were slow before. Now they're slow and buried under code nobody can maintain.

That's the part most people miss. AI doesn't make bad teams good. It makes bad teams worse, faster. It doesn't create discipline where there was none. It amplifies the chaos that was already there. The teams I'm working with who've adopted this approach are already shipping faster. Not a prediction. It's what I'm seeing right now in client codebases.

So here's the framework. Three levels.

Level 1: Vibe Coder. Prompt and pray. This is where the damage happens. The kid with a machine gun. Fast output, zero understanding, compounding debt.

Level 2: Tool User. AI as a tactical tool. Better. Still limited. The carpenter with a power saw. Productive, skilled, but not leveraging AI's full potential.

Level 3: Agentic Engineer. Spec first. Constraints first. Judgment first. AI generates within guardrails. The general with a battle plan. This is the future.

If your team needs help getting to Level 3, strategy session link is in the description.

Because vibe coding is dead. Karpathy killed it. The data killed it. The production incidents killed it.

What comes next is agentic engineering. Not less human involvement. More human judgment. Amplified by AI.

Key Takeaways

Karpathy killed his own creation. The person who coined "vibe coding" declared it passe and named "agentic engineering" as the replacement. This isn't a fringe opinion. It's the inventor of the term acknowledging that the approach he popularized is causing real damage in production codebases. When the creator walks it back, the trend is over.
The data is brutal and getting worse. Stack Overflow's 2025 Developer Survey, with 49,000 respondents, found 66% of developers cite "almost right" AI code as their number one frustration. Trust in AI-generated code dropped from 40% to 29% in a single year. The industry is learning through painful experience that speed without understanding creates more problems than it solves.
Vibe coding causes specific, measurable damage. From a single client codebase, I found a broken PKCE auth flow silently failing 2% of logins for three months, ESLint rules completely ignored by AI-generated code, and the same pricing logic duplicated in four places with subtle differences. These aren't hypothetical risks. They're production incidents that cost real money and real customers.
The three levels define your team's AI maturity. Level 1 (Vibe Coder) prompts and prays, creating compounding technical debt. Level 2 (Tool User) employs AI tactically with human oversight, which is productive but limited. Level 3 (Agentic Engineer) defines architecture first, sets constraints, writes specs, and lets AI generate within guardrails. The jump from Level 1 to Level 3 is the difference between a kid with a machine gun and a general with a battle plan.
Discipline is the fastest path, not the slowest. One team that adopted the define-review-measure approach saw review time increase 20% but bug reports drop 60% in the first sprint. Net velocity went up, not down. The 40% of sprint time previously wasted debugging AI-generated code nobody understood got redirected to actual feature work. Discipline doesn't slow you down. Chaos slows you down. Discipline removes the waste.

Frequently Asked Questions

What is agentic engineering and how is it different from vibe coding?

Agentic engineering is the approach Andrej Karpathy named as the replacement for vibe coding. Where vibe coding means prompting AI and shipping whatever comes out, agentic engineering means defining architecture first, setting constraints, writing specs, and then letting AI generate within those guardrails. The key difference is judgment: an agentic engineer prompts for options rather than code, verifies output against specifications, and maintains full understanding of what they're shipping. It's the difference between "build me an auth system" and "implement PKCE auth using this exact flow, these exact token lifetimes, verified against these exact test cases."

Why did Karpathy say vibe coding is dead when he coined the term?

Karpathy witnessed the same pattern the rest of the industry is now seeing: vibe coding works for throwaway prototypes but destroys production codebases. The Stack Overflow 2025 Developer Survey confirmed it with hard data. 66% of developers say "almost right" AI code is their top frustration, and trust in AI output dropped 11 points in one year. Meanwhile, research papers like "How Vibe Coding Is Killing Open Source" and reports from Red Hat documented the specific damage. Karpathy didn't walk it back reluctantly. He named the replacement, which signals he sees the evolution as necessary and inevitable.

How do I know if my team is doing vibe coding without realizing it?

Look for these warning signs: code reviews that take less than a minute for hundreds of lines of AI-generated code, linting rules being suppressed rather than addressed, the same business logic appearing in multiple places with slight variations, and bug reports increasing even though sprint velocity looks healthy. The most telling symptom is the phrase "the AI handled that." If your team uses that as an explanation for how something works, nobody actually understands the code. Pull request approval times under 30 seconds for non-trivial changes is another red flag. That's not reviewing. That's rubber-stamping.

Does adopting agentic engineering practices actually slow down development?

The opposite. One team I measured saw review time increase by 20%, but bug reports dropped 60% in the first sprint. Net velocity increased because the team stopped spending 40% of their sprint time debugging AI-generated code nobody understood. The "slowness" of vibe coding is invisible. It shows up as recurring bugs, mysterious production failures, and features that take three times longer than estimated because the codebase is full of duplicated logic and hidden dependencies. Discipline removes waste. Chaos creates it. The fastest teams are the most disciplined.

Can vibe coding ever be appropriate, or should it always be avoided?

Vibe coding is fine for exactly one thing: throwaway prototypes that will never see production. If you're testing whether an idea is worth pursuing, a vibe-coded proof of concept built in an afternoon is a valid approach. The problem is when Level 1 becomes your production default, when the prototype mindset bleeds into real systems that handle real users, real money, and real data. The line is clear: if the code is going to production, it needs architecture, constraints, review, and testing. If it's going in the trash after a demo, vibe away. Most teams that get into trouble started with a "quick prototype" that somehow became the production codebase without anyone deciding it should.