Back to essays
7 min read

How I stopped hitting Claude Code limits

The 12 things I changed

1. Use Graphify for large codebases

Without Graphify, Claude reads through your entire repo on every query it needs to understand structure. With Graphify, it builds a persistent knowledge graph of your codebase once — code, docs, PDFs, diagrams — and queries the graph instead of re-reading files.

Reported result in real-world testing: 71.5x fewer tokens per query on mixed corpora.

If you're working on anything larger than a few files, this is probably the highest-leverage tool on this list.

Repo: github.com/safishamsi/graphify

2. Use Caveman to compress responses

Caveman forces Claude to respond in ultra-compressed language — no filler phrases, no verbose explanations — while keeping full technical accuracy. The style changes. The reasoning doesn't.

Numbers: ~75% fewer output tokens. It also ships a caveman-compress tool that rewrites your CLAUDE.md into terse shorthand, cutting ~46% of input tokens per session. A March 2026 study found brevity constraints actually improved benchmark accuracy by 26 percentage points on certain tasks.

The repo also has a classical Chinese / 文言文 mode that compresses even harder. The idea isn't "talk dumb" — it's compress meaning tighter so you get the same answer with fewer tokens.

Works across Claude Code, Codex, Cursor, Windsurf, and Copilot.

Repo: github.com/juliusbrussee/caveman

3. Use claude-mem

This is the engineering version of #7, but for code sessions.

claude-mem gives Claude Code persistent cross-session memory. It captures everything Claude works on during a session, compresses it using Claude's agent SDK, and injects the relevant context back in on your next session. You open a new chat and Claude already knows your motor configs, your PCB layout decisions, the bug you spent two hours on yesterday. No re-explaining.

The architecture: embeddings, RAG, SQLite, and vector search via ChromaDB and Mem0. It pulls only what's relevant — not the whole history dump.

It's free, TypeScript, and sits at ~61K GitHub stars. Two commands to install.

Repo: github.com/thedotmack/claude-mem

4. Edit your prompt instead of sending a follow-up

When Claude gets it wrong, don't reply. Hit Edit on your original message, fix the prompt, regenerate. A follow-up stacks onto context permanently. An edit replaces it. This one change alone cuts a surprising amount of waste.

5. Start a fresh chat every 15–20 messages

When a thread gets long: ask Claude to summarize everything → copy the summary → open a new chat → paste it as context. You keep the knowledge. You don't carry the token debt.

6. Batch your questions into one message

Three separate prompts trigger three full context reloads. One message with three questions triggers one. The answers are usually better too — Claude sees the full picture upfront instead of reasoning piecemeal.

7. Upload recurring files to Projects

If you're attaching the same document across multiple chats, Claude re-tokenizes it every single time. Upload once to a Project and it's cached. Every conversation in that project references it for free.

8. Turn off features you're not actively using

Web search, connectors, Advanced Thinking — all of these add tokens to every response whether you needed them or not. Keep them off by default. Turn on Advanced Thinking only after a first attempt fails.

9. Spread work across the day

Claude uses a rolling 5-hour window, not a midnight reset. Messages sent at 9am stop counting toward your limit by 2pm. If you burn through your limit in one morning session, most of your daily allocation is wasted.

Split work into 2–3 sessions: morning, afternoon, evening. By the time you return, your earlier usage has cleared.

10. Work during off-peak hours

Since March 26, 2026, Anthropic has been consuming session limits faster during peak hours: 5am–11am PT on weekdays. Same query, same chat — it just hits harder during peak. Move compute-heavy tasks to evenings or weekends.

If you're outside the US, check the PT conversion for your timezone — peak hours may land in the middle of your afternoon.

11. Use Haiku for simple tasks

Not everything needs Sonnet.

  • Haiku: grammar checks, brainstorming, formatting, quick translations

  • Sonnet: actual development work

  • Opus: complex reasoning, architecture decisions

Using Haiku for chores cuts costs significantly and keeps your Sonnet budget for the work that actually requires it.

12. Enable Extra Usage as a safety net

Pro, Max 5x, and Max 20x subscribers can turn on Overage in Settings → Usage. When your session limit hits, Claude switches to pay-as-you-go at API rates instead of cutting you off. Set a monthly spending cap and you won't get caught mid-session.


What did the most?

Habits 1–3 made a noticeable difference immediately. The plugins (#5, #6, #7) eliminated probably 80% of what was left.

The rest is discipline: session splits, off-peak timing, keeping features off until you need them.

Claude doesn't count messages. It counts tokens. Once you build around that, the limits mostly stop mattering.


claude-mem repo | Graphify repo | Caveman repo