/builds/vibe-check

// damage meters for AI-assisted development

In WoW, I ran damage meters obsessively: DPS, interrupts, and deaths for every M+ run. You can't improve what you can't measure.

vibe-check does the same thing for AI-assisted coding sessions. It analyzes git history and tells you if that session was actually productive or just busy.

core metrics

npm

published

git

history based

// the 5 core metrics

Iteration Velocity

>5/hr/<3/hr

How tight are feedback loops?

Rework Ratio

<30%/>50%

Building or debugging?

Trust Pass Rate ← KEY

>95%/<80%

Does code stick?

Debug Spiral Duration

<15m/>45m

How long stuck?

Flow Efficiency

>90%/<70%

What % productive?

Trust Pass Rate is THE key metric—it measures whether you vibed at the right level.

// vibe_check_output.log

⚡

Iteration Velocity

DPS uptime

>3/hr

🔄

Rework Ratio

Wipe count

<50%

✓

Trust Pass Rate

First try kills

>80%

🌀

Debug Spiral

Time to reset

<30m

🎯

Flow Efficiency

Boss uptime

>75%

OVERALL

ELITE

npx @boshu2/vibe-check

// why this matters

AI reliability varies by task type. Formatting is nearly always correct. Architecture needs line-by-line verification. The vibe levels answer the question: when can you trust AI output, and when do you need to verify every line?

L595% trust: formatting, linting. Run it and move on.L480% trust: boilerplate, config files. Spot check the output.L360% trust: standard features, CRUD. Verify the key parts work.L240% trust: new features, integrations. Check every change before committing.L120% trust: architecture, security. Read every line the AI writes.L00% trust: novel research where the AI has no training data.

Declaring the level upfront forces you to think about what kind of task you're doing. After the session, compare what actually happened to what you expected.

// the 40% rule

Gene Kim and Steve Yegge found a hard threshold in their research. When context utilization stays under 40%, success rate is 98%. Above 60%, it drops to 24%. The AI starts forgetting instructions and contradicting itself.

<40% context98% success

>60% context24% success

This is why spiral detection matters. When you're stuck in a fix loop, context fills up fast.

// the insight

Git history doesn't lie. It's easy to feel productive, but are you actually shipping or just spinning? The commits tell the truth.

vibe-check analyzes your commit patterns to detect debug spirals before they consume your whole session. If you're stuck for 30 minutes on the same thing, that's a wipe—reset, do some research, and come back with a plan.

"Last week, the CLI flagged a spiral at 18 minutes. I realized I was arguing with the LLM about a circular dependency. I stepped away, drew the schema on paper, and fixed it in one commit. Without the alert, I would have wasted two hours."

// results

I've run this methodology across 200+ sessions over the past year. When I follow the discipline, it works. When I skip calibration because I'm in a hurry, I pay for it in rework.

95%

success rate

34x

commit throughput

code quality

10:1

ROI on time

// for autonomous agents

vibe-check measures human-AI collaboration sessions. For autonomous agents that run without a human in the loop, 12-Factor AgentOps applies DevOps and SRE patterns to AI systems.

12-Factor AgentOps →

npm →GitHub →