Reducing Token Usage
in AI-Assisted Development
Dr. Farshid Pirahansiah
The Problem
Full .claude folder
Token cost
Response time
API cost
1 token ≈ 4 chars • Every file = tokens
Token Cost by Component
Core files
✅ Always needed
Skills (32 files)
Select per project
Agents (7 files)
Select per domain
Minimal config
Best savings
Strategy 1: .cursorignore
Minimal
Core only
Web
Python/backend
CV/ML
YOLO, SAM2
Edge/C++
Inference
Copy template → project/.cursorignore
Strategy 2: Selective Loading
✅ INCLUDE
! .claude/skills/cv-pipeline/
! .claude/agents/debugger.md
! .claude/CLAUDE.md
❌ EXCLUDE
.claude/skills/portfolio/
.claude/workflows/
.claude/agent-memory/
38-45% instead of 100%
Strategy 3: Remove Files
Delete
DIRECTORY-TREE.md
FILE-INVENTORY.md
PROJECT_PORTFOLIO (4084 lines!)
Also remove
7 .cursorignore variants
agent-memory/ (other projects)
myWebsite/ (duplicate)
-4,967 lines removed
Before & After
BEFORE
60 files • 162 KB
4084-line portfolio file
7 .cursorignore variants
AFTER
12 files • ~30 KB
73-line focused CLAUDE.md
81% fewer files
Strategy 4: codebase-memory
Stop "read this file" • Stop grep the repo
Fewer tokens
Answer quality
Fewer tool calls
Linux kernel index
One graph query replaces dozens of grep/read
Impact Analysis
BEFORE
grep → scan → read → repeat
~412,000 tokens per session
AFTER
Index once → one graph query
~3,400 tokens per session
99% token reduction
Advanced: AST Skeletonization
Hide implementation, keep structure
BEFORE
Send full 1000-line file
~50,000 tokens
AFTER (Skeleton)
Signatures + imports only
~2,000 tokens
Tree-sitter parses AST → strips function bodies → keeps class/method signatures
Advanced: Prompt Caching
The game changer from Anthropic & OpenAI
Cost reduction
Cache checkpoint
Accuracy
Cache hits
Freeze .claude/skills + CLAUDE.md in cache block
Advanced: LLMLingua Compression
Microsoft Research • 7B model pre-processor
Compression ratio
Accuracy drop
Pre-processor model
Token savings
AI doesn't need "the, and, a" or verbose boilerplate
Advanced: Multi-Model Routing
Don't use Sonnet for everything
Step 1: Haiku (cheap)
"Which 3 files are relevant?"
Returns file paths only
Step 2: Sonnet (expensive)
Receives only 3 files
Does the actual coding
80% cost reduction • High accuracy
Advanced: Unified Diff Output
Stop asking AI to rewrite entire files
BEFORE
Rewrite entire 1000-line file
Full re-stream every time
AFTER (Diff)
Output 10 changed lines only
60-90% token savings
Tools: Aider search/replace blocks, unified diff format
Advanced: Open Source Tools
Aider
Repository map with ctags
Strip comments & whitespace
Repomix
Pack repo → single file
Token counting + secrets scrub
grep-ast
AST-aware grep
Signatures only, skip bodies
codebase-memory
Knowledge graph index
158 languages, sub-ms queries
What We Added to .claude
New rule: rules/token-reduction.md
Added
token-reduction.md (8 strategies)
Quick reference table
Always-loaded coding rule
Result
.claude folder: cleaned + enhanced
Agent knows all 8 techniques
Applied automatically per session
Technique Comparison
AST Skeleton
High accuracy
Prompt Caching
Perfect accuracy
LLMLingua
Medium-high
Unified Diff
High accuracy
Results
Files
Size
Lines
Tokens saved
Summary
→ 10 strategies from file cleanup to AI compression
→ 60 files → 12, 162KB → 30KB
→ AST skeleton: 70% savings, full structure
→ Prompt caching: 90% cost reduction
→ LLMLingua: 20x compression, <1% accuracy loss
pirahansiah.com