Getting Started with Vibe Coding
You've heard about AI coding tools. Maybe you tried Copilot and it felt like autocomplete with delusions of grandeur. Maybe Claude generated code that worked perfectly, and then code that broke in weird ways, and you couldn't predict which you'd get. That unpredictability is the thing nobody talks about, and it's the thing that matters most.
Andrej Karpathy coined the term for letting AI write code while you direct. Gene Kim and Steve Yegge's methodology builds on that with a key insight: AI reliability varies dramatically by task type. Boilerplate config? Nearly perfect. Novel architecture decisions? Needs verification every step. Most people use the same level of trust for everything, and that's where things go sideways.
This post is about calibration. The personal story is at /principles, and the tooling deep-dive is at /builds/vibe-check.
The Vibe Levels
The solution is simple: declare how much you'll trust AI before you start working, then verify accordingly.
| Level | Trust | How to Verify | Example Tasks |
|---|---|---|---|
| L595% trust | 95% | Final result only | npm install, formatting, linting |
| L480% trust | 80% | Spot check major outputs | Boilerplate, config files, CRUD endpoints |
| L360% trust | 60% | Verify key outputs | Features with known patterns, tests |
| L240% trust | 40% | Check every change | New integrations, unfamiliar APIs |
| L120% trust | 20% | Review every line | Architecture, security, core business logic |
| L00% trust | 0% | AI for research only | Novel problems, exploration, spikes |
The levels measure your ability to evaluate AI output. If you don't know TypeScript well enough to spot a subtle type error, that's L2 work for you even if it's L4 for someone else.
Your First Session
Here's how to try this today.
Step 1: Install vibe-check
npm install -g @boshu2/vibe-check
This gives you metrics on your actual work patterns, measured from git commits.
Step 2: Pick a Small Task
Choose something you'd normally complete in 30-60 minutes:
- Add a new API endpoint
- Create a UI component
- Write tests for existing code
- Fix a specific bug
Starting small matters. You need one session's worth of data before the metrics mean anything.
Step 3: Declare the Vibe Level
Before you start, ask: "How much should I trust AI for this specific task?"
Look at the table above and pick a level. Then write it down somewhere, even if it's just a comment in your terminal:
L3: Adding auth middleware - known pattern, verify key outputs
The act of declaring forces you to think about what you're doing before you do it.
Step 4: Work According to Your Level
Match your verification to your declared level:
- L4-L5: Let the AI generate, review at the end
- L3: Check after each logical chunk (one function, one component)
- L2: Verify every change before accepting
- L1: Review line by line, question everything
If you find yourself constantly checking at a lower level than you declared, that's signal. You underestimated the task complexity.
Step 5: Measure
After your session, run:
vibe-check --since "1 hour ago"
Compare the output to your declared level. Did you spiral (lots of fix commits)? Did it go smoothly? The metrics tell you whether your calibration was right.
Common Mistakes
Four patterns that trip people up:
1. Running L4 on L2 tasks. You trust the AI to generate a feature you don't fully understand, accept the output without deep review, and spend the next hour debugging subtle issues. The time you saved up front gets paid back with interest.
2. Running L1 on L5 tasks. You review every line of a package.json update or a formatting change. The work is trivially correct, but you've burned mental energy on low-stakes decisions.
3. Not declaring a level at all. You work reactively, trusting when things feel right and doubting when they don't. Without a declared level, you can't calibrate, and without calibration, you can't improve.
4. Ignoring debug spirals. You commit fix after fix after fix, each one addressing the side effects of the last. Three consecutive fix commits is a signal to stop, reassess, and possibly drop to a lower level.
The Feedback Loop
After a few sessions, you'll notice patterns:
Declare Level → Work → Measure → Adjust
Some things you'll discover:
- Which tasks are actually L4 vs L2 for you specifically
- When to escalate trust (this is easier than expected) or de-escalate (I'm out of my depth)
- Your personal weak spots (CSS? Types? Database queries?)
The loop compounds. Each session teaches you something about your own reliability patterns, and that knowledge carries forward.
The Bigger Picture
This methodology comes from Gene Kim and Steve Yegge's book Vibe Coding. Their framing: you're not a line cook typing every character anymore. You're the head chef, directing AI sous chefs, tasting results, responsible for every dish that leaves the kitchen.
The research behind this shows a concrete threshold. Context utilization above 40% degrades AI performance dramatically:
| Context Used | Success Rate |
|---|---|
| Under 35% | 98% |
| 35-40% | 95% |
| 40-60% | 72% |
| Over 60% | 24% |
That's why calibration matters. You're managing cognitive load, not just code.
The book documents 12 failure patterns where vibe coding destroys work in minutes, from "Tests Passing Lie" (AI claims tests pass but never ran them) to "Eldritch Code Horror" (3,000-line functions where everything connects to everything). The vibe levels and calibration loop are how you avoid them.
Dario Amodei (Anthropic CEO) wrote the foreword: "We are probably going to be the last generation of developers to write code by hand."
Real Example: This Website
I built this site in 48 hours without prior Next.js experience. Here's the vibe-check output from that work:
Period: Nov 29, 2025 - Dec 1, 2025 (24.6h active over 3 days) Commits: 109 total (52 feat, 8 fix, 4 docs, 45 other)
METRIC VALUE RATING
Iteration Velocity 4.4/hour HIGH Rework Ratio 7% ELITE Trust Pass Rate 99% ELITE Debug Spiral Duration 0min ELITE Flow Efficiency 100% ELITE
OVERALL: ELITE
The key numbers:
- 109 commits in 24.6 hours: tight feedback loops
- 7% rework ratio: not constantly fixing mistakes
- 0 debug spiral duration: never got stuck in a fix loop
I ran most of the work at L3-L4 (features with known patterns, verify key outputs). When I hit unfamiliar territory like OpenGraph images or Playwright tests, I dropped to L2 and verified every change. The metrics reflect that calibration.
Tools That Help
vibe-check: Metrics from your git history
npm install -g @boshu2/vibe-check vibe-check --since "today"
npm: @boshu2/vibe-check | GitHub
Claude Code: AI pair programmer with context awareness
Declare your level in the prompt:
This is L2 work (new integration). Verify each change with me before moving on. Stop if anything seems off.
Pre-commit hooks: Automatic checks
Add vibe-check to your pre-push hook to catch spirals before they hit the remote.
What's Next
| When you're comfortable with... | Try... |
|---|---|
| Single sessions | Multi-session projects with progress tracking |
| Basic vibe levels | Tracer tests for L1-L2 work |
| Individual work | Context bundles for resuming work |
The full methodology is at 12factoragentops.com. Start with single sessions and expand from there.
Try It
Install
npm install -g @boshu2/vibe-check
Do whatever you were going to do today
(just declare a vibe level before you start)
Check
vibe-check --since "today"
The first session won't tell you much. By the third session, you'll start seeing patterns. By the tenth, you'll have calibrated intuition about what works for you and what doesn't.