Devlog #2: How to Actually Use Your .claude/ Directory

In devlog 1, I spent $10K on AI tokens over 90 days and watched my setup collapse under its own weight. This is what I did about it.

This post is for anyone who's looked at their .claude/ directory and thought "what the hell am I doing?" If you've got 20+ agents and can't remember what half of them do — this is the cleanup guide I wish I had.

The Discovery

Around October, I fell down a rabbit hole. Dexter from HumanLayer had this talk about "Advanced Context Engineering" — the idea that you can get way more out of today's models if you're deliberate about what goes into the context window. Sean Grove was saying "specs are the new code." The Stanford study showed AI tools actually make developers less productive in brownfield codebases.

I was hooked. I started collecting everything — transcripts from AI Engineering talks, the Vibe Coding book excerpts, blog posts, my own experiments. My Obsidian vault turned into what I now call "Mount Markdown" — just this growing mountain of research, indexed and searchable through Smart Connections.

The core insight I kept seeing: stay under 40–60% context utilization. Compact frequently. Don't let the window fill up with garbage.

Simple enough, right?

The Explosion

Here's where I went wrong.

Everyone was saying "you need specialized agents." Data scientist agent. Product person agent. Security reviewer. Architecture expert. The reasoning made sense — different tasks need different expertise.

So I built 60+ agents. Each one a "specialist."

Dexter literally said "don't anthropomorphize roles — subagents are about context control, not playing house." I read that. I nodded. I did it anyway.

The result? My context window was 50% full before I even started working. Just from loading agent definitions. 100k tokens sitting there, wasted, because I had convinced myself that having a prompt-engineer-agent.md and a debug-agent.md and a deploy-agent.md was somehow making me more productive.

It wasn't.

Bloated agents directory

I was the problem. I had built this elaborate system of specialists that couldn't actually specialize because they were all fighting for the same context window.

Commands vs Skills vs Agents

Here's what nobody explains clearly, because honestly it keeps changing.

Claude Code has three main extension points: commands, skills, and agents. They sound similar. They live in similar directories. The APIs have merged and shifted. If you're confused about what goes where, you're not alone.

Let me try to explain what I think I understand (subject to being wrong again).

Commands vs Skills vs Agents

Commands are for workflows you consciously invoke. You type a slash, it runs. Use these when you want explicit control — "I am now doing research" or "I am now implementing this issue." You remember the name, you type it, it executes.

Skills are for context you need but don't want to think about. They auto-trigger based on phrases in your conversation. You say "what's ready to work on" and the relevant skill loads without you asking. The magic is that you don't have to remember anything — the system recognizes intent and loads what you need.

Agents are for parallel specialist review — and almost nothing else. You run them when you want multiple perspectives on the same code simultaneously. Security expert, architecture expert, code quality expert, all reviewing the same PR at once.

The decision tree:

Will you invoke it by name? → Command
Should it load automatically based on context? → Skill
Do you need parallel specialist perspectives? → Agent
Everything else? → You probably don't need it

If you're loading 60 agents at startup, you're doing it wrong. Ask me how I know.

The Consolidation

Then Claude Code 2.1 shipped on January 7th, and everything broke.

Not broke — broke. But broke enough that I had to rethink the whole setup. Commands became proper slash commands. Skills got a new auto-trigger system. The frontmatter schema changed.

I took it as an opportunity to clean house.

The great consolidation

January 11: The great consolidation. I archived 11 agents that were duplicates or replaced by commands. Went from ~60 agents to 4 domain specialists.

January 15: Further trimmed. Archived 6 more commands that were redundant:

/autopilot and /autopilot-polecat → both replaced by /crank
/doc-coverage → just use /doc coverage subcommand
/load-epic → bd show does this
/plan-to-beads → integrated into /plan
/synthesis → nobody was using it

Current state:

25 commands (down from 40+)
36 skills (down from 39)
4 agents (down from 60+)

The impact: Sessions that used to hit context limits after 20 minutes now run for hours. Hallucinations dropped noticeably — the model actually remembers what it's doing because it's not drowning in agent definitions. And startup is instant instead of that 3-second pause while everything loads.

It felt like finally deleting all those low-level alts you created "just in case." Same character, cleaner inventory.

The Walkthrough: How to Set Up Your .claude/

If I were starting over today, here's what I'd do.

Clean directory structure

Step 1: Start with CLAUDE.md

This is your universal config. It loads for every project. Put things here that apply everywhere:

Your workflow preferences
Session protocol (what to do at start/end)
Links to governance docs

Keep it under 200 lines. Anything longer and you're wasting context.

Step 2: Add commands for workflows you repeat

Don't go crazy. Ask yourself: "Do I type this sequence of steps more than twice a week?"

My core commands:

/research - Deep codebase exploration
/crank - Autonomous epic execution
/implement - Single issue work
/retro - Extract learnings

That's it. Four commands handle 80% of my work.

Step 3: Let skills handle the implicit stuff

Skills should fire without you thinking about them. Good skill triggers:

"what's ready" → shows available work
"create a task" → creates a beads issue
"research this" → runs exploration

Don't create skills for things you'll invoke explicitly. That's what commands are for.

Step 4: Only add agents for parallel specialist review

You probably don't need custom agents. The built-in Task() subagent types handle most exploration and implementation.

I only have 4 agents, and they're all for the same use case: running parallel code review before a merge. Security expert, architecture expert, code quality expert, UX expert. They each look at the same PR from a different angle.

If you're not doing parallel specialist review, you probably don't need custom agents.

Step 5: The 40% Rule

This is the most important thing I learned.

The 40% Rule

Stay under 40% context utilization. I tracked this obsessively for a month. Below 40%, tasks complete reliably. Above 60%, the model starts hallucinating, forgetting what it was doing, confidently producing garbage. The failure rate isn't linear — it's a cliff.

How do you stay under 40%?

Don't load everything at startup
Use skills with JIT (just-in-time) loading
Compact frequently — write summaries to files, start fresh sessions
Kill agents that return too much context to the coordinator

Complexity is where tokens go to die.

What I Think I Learned

Not that I know. This is just what I tried.

Simple beats clever. I spent weeks building elaborate coordination systems. Turns out dumb isolation wins. Separate copies of the code mean no merge conflicts. Failures don't cascade. Kill and restart without affecting others.

Skills replaced most of my agents. The stuff I thought needed a "specialist agent" actually just needed a skill that auto-loads relevant context. Less overhead, same capability. Auto-trigger beats explicit invocation.

The 40% rule is real. Every time I've ignored it, I've regretted it. Context overflow is the silent killer of AI productivity. Above 40%, the model doesn't degrade. It lies.

Consolidation is ongoing. I archived 6 commands while writing this post. The setup I have today will probably look different in a month. That's fine. The goal isn't perfection — it's not wasting tokens on stuff that doesn't help.

Try It

Want to audit your own setup? Open a terminal and list what's in your Claude config directory. Count how many markdown files you have. Check the total size. If you're surprised by what you find, you're not alone.

Want This Setup?

The workflow I landed on is available as a plugin. Check out vibe-kit on GitHub in the plugins folder, or read about 12-Factor AgentOps for the methodology behind it.

Or just grab the config template and adapt it yourself.

What's Next

Devlog #3: Getting Gas Town to run on open source models.

Right now, Gas Town is Claude-native. But what if you want to run workers on Ollama? What if your company can't send code to Anthropic's API? What if you just want options?

I'm working on native support for open source agents and models. The goal: same orchestration, your choice of brain.

Stay tuned.

Devlog #2. 25 commands. 36 skills. 4 agents. Down from 60+. This is the second in a series documenting my vibe coding journey. Start with devlog 1 if you haven't already.