Reducing Token Usage

in AI-Assisted Development

Dr. Farshid Pirahansiah

The Problem

162 KB

Full .claude folder

100%

Token cost

Slow

Response time

$$$

API cost

1 token ≈ 4 chars • Every file = tokens

Token Cost by Component

~10 KB

Core files
✅ Always needed

~90 KB

Skills (32 files)
Select per project

~14 KB

Agents (7 files)
Select per domain

6-10%

Minimal config
Best savings

Strategy 1: .cursorignore

5-10%

Minimal
Core only

20-30%

Web
Python/backend

35-45%

CV/ML
YOLO, SAM2

40-50%

Edge/C++
Inference

Copy template → project/.cursorignore

Strategy 2: Selective Loading

✅ INCLUDE

! .claude/skills/cv-pipeline/

! .claude/agents/debugger.md

! .claude/CLAUDE.md

❌ EXCLUDE

.claude/skills/portfolio/

.claude/workflows/

.claude/agent-memory/

38-45% instead of 100%

Strategy 3: Remove Files

Delete

DIRECTORY-TREE.md

FILE-INVENTORY.md

PROJECT_PORTFOLIO (4084 lines!)

Also remove

7 .cursorignore variants

agent-memory/ (other projects)

myWebsite/ (duplicate)

-4,967 lines removed

Before & After

BEFORE

60 files • 162 KB

4084-line portfolio file

7 .cursorignore variants

AFTER

12 files • ~30 KB

73-line focused CLAUDE.md

81% fewer files

Strategy 4: codebase-memory

Stop "read this file" • Stop grep the repo

10x

Fewer tokens

83%

Answer quality

2.1x

Fewer tool calls

3 min

Linux kernel index

One graph query replaces dozens of grep/read

Impact Analysis

BEFORE

grep → scan → read → repeat

~412,000 tokens per session

AFTER

Index once → one graph query

~3,400 tokens per session

99% token reduction

Advanced: AST Skeletonization

Hide implementation, keep structure

BEFORE

Send full 1000-line file

~50,000 tokens

AFTER (Skeleton)

Signatures + imports only

~2,000 tokens

Tree-sitter parses AST → strips function bodies → keeps class/method signatures

Advanced: Prompt Caching

The game changer from Anthropic & OpenAI

90%

Cost reduction

100KB

Cache checkpoint

Perfect

Accuracy

Free

Cache hits

Freeze .claude/skills + CLAUDE.md in cache block

Advanced: LLMLingua Compression

Microsoft Research • 7B model pre-processor

20x

Compression ratio

<1%

Accuracy drop

7B

Pre-processor model

90-95%

Token savings

AI doesn't need "the, and, a" or verbose boilerplate

Advanced: Multi-Model Routing

Don't use Sonnet for everything

Step 1: Haiku (cheap)

"Which 3 files are relevant?"

Returns file paths only

Step 2: Sonnet (expensive)

Receives only 3 files

Does the actual coding

80% cost reduction • High accuracy

Advanced: Unified Diff Output

Stop asking AI to rewrite entire files

BEFORE

Rewrite entire 1000-line file

Full re-stream every time

AFTER (Diff)

Output 10 changed lines only

60-90% token savings

Tools: Aider search/replace blocks, unified diff format

Advanced: Open Source Tools

Aider

Repository map with ctags

Strip comments & whitespace

Repomix

Pack repo → single file

Token counting + secrets scrub

grep-ast

AST-aware grep

Signatures only, skip bodies

codebase-memory

Knowledge graph index

158 languages, sub-ms queries

What We Added to .claude

New rule: rules/token-reduction.md

Added

token-reduction.md (8 strategies)

Quick reference table

Always-loaded coding rule

Result

.claude folder: cleaned + enhanced

Agent knows all 8 techniques

Applied automatically per session

Technique Comparison

70-80%

AST Skeleton
High accuracy

90%

Prompt Caching
Perfect accuracy

90-95%

LLMLingua
Medium-high

60-90%

Unified Diff
High accuracy

Results

-81%

Files

-82%

Size

-88%

Lines

90-99%

Tokens saved

Summary

→ 10 strategies from file cleanup to AI compression

→ 60 files → 12, 162KB → 30KB

→ AST skeleton: 70% savings, full structure

→ Prompt caching: 90% cost reduction

→ LLMLingua: 20x compression, <1% accuracy loss

pirahansiah.com

Thank You