Getting Started with Vibe Coding

December 1, 2025·8 min read
#vibe-coding#ai-development#developer-tools#productivity#tutorial

You've heard about AI coding tools. Maybe you tried Copilot and it felt like autocomplete with delusions of grandeur. Maybe Claude generated code that worked perfectly, and then code that broke in weird ways, and you couldn't predict which you'd get. That unpredictability is the thing nobody talks about, and it's the thing that matters most.

Andrej Karpathy coined the term for letting AI write code while you direct. Gene Kim and Steve Yegge's methodology builds on that with a key insight: AI reliability varies dramatically by task type. Boilerplate config? Nearly perfect. Novel architecture decisions? Needs verification every step. Most people use the same level of trust for everything, and that's where things go sideways.

> INFO:

This post is about calibration. The personal story is at /principles, and the tooling deep-dive is at /builds/vibe-check.


The Vibe Levels

The solution is simple: declare how much you'll trust AI before you start working, then verify accordingly.

LevelTrustHow to VerifyExample Tasks
L595% trust95%Final result onlynpm install, formatting, linting
L480% trust80%Spot check major outputsBoilerplate, config files, CRUD endpoints
L360% trust60%Verify key outputsFeatures with known patterns, tests
L240% trust40%Check every changeNew integrations, unfamiliar APIs
L120% trust20%Review every lineArchitecture, security, core business logic
L00% trust0%AI for research onlyNovel problems, exploration, spikes

The levels measure your ability to evaluate AI output. If you don't know TypeScript well enough to spot a subtle type error, that's L2 work for you even if it's L4 for someone else.


Your First Session

Here's how to try this today.

Step 1: Install vibe-check

bash

npm install -g @boshu2/vibe-check

This gives you metrics on your actual work patterns, measured from git commits.

Step 2: Pick a Small Task

Choose something you'd normally complete in 30-60 minutes:

  • Add a new API endpoint
  • Create a UI component
  • Write tests for existing code
  • Fix a specific bug

Starting small matters. You need one session's worth of data before the metrics mean anything.

Step 3: Declare the Vibe Level

Before you start, ask: "How much should I trust AI for this specific task?"

Look at the table above and pick a level. Then write it down somewhere, even if it's just a comment in your terminal:

bash

L3: Adding auth middleware - known pattern, verify key outputs

The act of declaring forces you to think about what you're doing before you do it.

Step 4: Work According to Your Level

Match your verification to your declared level:

  • L4-L5: Let the AI generate, review at the end
  • L3: Check after each logical chunk (one function, one component)
  • L2: Verify every change before accepting
  • L1: Review line by line, question everything
> TIP:

If you find yourself constantly checking at a lower level than you declared, that's signal. You underestimated the task complexity.

Step 5: Measure

After your session, run:

bash

vibe-check --since "1 hour ago"

Compare the output to your declared level. Did you spiral (lots of fix commits)? Did it go smoothly? The metrics tell you whether your calibration was right.


Common Mistakes

Four patterns that trip people up:

1. Running L4 on L2 tasks. You trust the AI to generate a feature you don't fully understand, accept the output without deep review, and spend the next hour debugging subtle issues. The time you saved up front gets paid back with interest.

2. Running L1 on L5 tasks. You review every line of a package.json update or a formatting change. The work is trivially correct, but you've burned mental energy on low-stakes decisions.

3. Not declaring a level at all. You work reactively, trusting when things feel right and doubting when they don't. Without a declared level, you can't calibrate, and without calibration, you can't improve.

4. Ignoring debug spirals. You commit fix after fix after fix, each one addressing the side effects of the last. Three consecutive fix commits is a signal to stop, reassess, and possibly drop to a lower level.


The Feedback Loop

After a few sessions, you'll notice patterns:

// The Loop

Declare Level → Work → Measure → Adjust

Some things you'll discover:

  • Which tasks are actually L4 vs L2 for you specifically
  • When to escalate trust (this is easier than expected) or de-escalate (I'm out of my depth)
  • Your personal weak spots (CSS? Types? Database queries?)

The loop compounds. Each session teaches you something about your own reliability patterns, and that knowledge carries forward.


The Bigger Picture

This methodology comes from Gene Kim and Steve Yegge's book Vibe Coding. Their framing: you're not a line cook typing every character anymore. You're the head chef, directing AI sous chefs, tasting results, responsible for every dish that leaves the kitchen.

The research behind this shows a concrete threshold. Context utilization above 40% degrades AI performance dramatically:

Context UsedSuccess Rate
Under 35%98%
35-40%95%
40-60%72%
Over 60%24%

That's why calibration matters. You're managing cognitive load, not just code.

The book documents 12 failure patterns where vibe coding destroys work in minutes, from "Tests Passing Lie" (AI claims tests pass but never ran them) to "Eldritch Code Horror" (3,000-line functions where everything connects to everything). The vibe levels and calibration loop are how you avoid them.

> INFO:

Dario Amodei (Anthropic CEO) wrote the foreword: "We are probably going to be the last generation of developers to write code by hand."


Real Example: This Website

I built this site in 48 hours without prior Next.js experience. Here's the vibe-check output from that work:

// vibe-check --since 2025-11-28

Period: Nov 29, 2025 - Dec 1, 2025 (24.6h active over 3 days) Commits: 109 total (52 feat, 8 fix, 4 docs, 45 other)

METRIC VALUE RATING

Iteration Velocity 4.4/hour HIGH Rework Ratio 7% ELITE Trust Pass Rate 99% ELITE Debug Spiral Duration 0min ELITE Flow Efficiency 100% ELITE

OVERALL: ELITE

The key numbers:

  • 109 commits in 24.6 hours: tight feedback loops
  • 7% rework ratio: not constantly fixing mistakes
  • 0 debug spiral duration: never got stuck in a fix loop

I ran most of the work at L3-L4 (features with known patterns, verify key outputs). When I hit unfamiliar territory like OpenGraph images or Playwright tests, I dropped to L2 and verified every change. The metrics reflect that calibration.


Tools That Help

vibe-check: Metrics from your git history

bash

npm install -g @boshu2/vibe-check vibe-check --since "today"

npm: @boshu2/vibe-check | GitHub

Claude Code: AI pair programmer with context awareness

Declare your level in the prompt:

// Example prompt

This is L2 work (new integration). Verify each change with me before moving on. Stop if anything seems off.

claude.com/claude-code

Pre-commit hooks: Automatic checks

Add vibe-check to your pre-push hook to catch spirals before they hit the remote.


What's Next

When you're comfortable with...Try...
Single sessionsMulti-session projects with progress tracking
Basic vibe levelsTracer tests for L1-L2 work
Individual workContext bundles for resuming work

The full methodology is at 12factoragentops.com. Start with single sessions and expand from there.


Try It

bash

Install

npm install -g @boshu2/vibe-check

Do whatever you were going to do today

(just declare a vibe level before you start)

Check

vibe-check --since "today"

The first session won't tell you much. By the third session, you'll start seeing patterns. By the tenth, you'll have calibrated intuition about what works for you and what doesn't.