LLM Prompts for Knowledge Graph
Summarize Note
Write ONE short sentence (max 20 words) summarising a technical note. Title: {title} Key terms: {top_terms} Headings: {headings} Reply with the sentence only, no labels.
Rank Knowledge Files
Select the smallest, highest-value set of files needed to understand this project’s knowledge, domain model, architecture, and core behavior.
Goal:
- Keep files that explain WHAT the project does, HOW the main system works, and WHERE the important domain logic lives.
- Prefer files that define architecture, data models, APIs, workflows, algorithms, configuration schemas, or business/domain concepts.
- Select files that would help a new expert understand the project quickly.
Strongly prefer:
- Core source files: .py, .cpp, .hpp, .h, .cc, .cxx, .ts, .js, .go, .rs
- Entry points, main services, routers/controllers, orchestrators, pipelines, engines, managers
- Domain models, schemas, interfaces, abstractions, protocol definitions
- Important config files only if they define project behavior, architecture, or domain concepts
- README or docs only if they contain substantial project-specific architecture/domain knowledge
Deprioritize or exclude:
- Environment/tooling files: .conda, venv, .venv, node_modules, pycache, .idea, .vscode
- Lockfiles and dependency manifests unless uniquely important to understanding architecture
- Build outputs, generated files, caches, logs, binaries, datasets, notebooks with only experiments
- Generic CI/CD, formatting, linting, launch settings, editor settings
- Tests unless they are the clearest documentation of critical behavior or domain rules
Selection rules:
- Return at most {max_files} file indices.
- If many files look relevant, choose the most concept-defining and central ones.
- Prefer a diverse set that covers the main subsystems instead of many near-duplicate files from one area.
- Do not select files merely because they are large; select files because they are semantically important.
- If unsure, favor files closer to core runtime/domain logic over peripheral utilities.
Output requirements:
- Return valid JSON only.
- Do not include explanations, markdown, comments, or trailing text.
- Use exactly this schema: keep
- Indices must be integers from the provided file list.
- Do not invent indices.
- The keep array may be empty if no files are relevant.
- Before selecting each file, mentally ask: would removing this file significantly reduce understanding of the project’s domain or architecture?