Open Source · AI Coding Assistant Skill

Reduce AI Token Use in Large Codebases

~50%token reduction validated on large codebases
25languages via tree-sitter AST
3-passextraction pipeline
Knowledge GraphAST / tree-sitterLeiden ClusteringNetworkXMulti-modalToken ReductionMCP ServerPythonClaude / Cursor / Codex
View on GitHub

Overview

Graphify is an open-source AI coding assistant skill that reads your files, builds a knowledge graph, and gives you back structure you didn't know was there. Type /graphify in Claude Code, Cursor, Codex, Gemini CLI, GitHub Copilot CLI, or any supported assistant, and it extracts concepts, relationships, and design rationale from your entire codebase — letting your AI navigate by graph structure instead of grepping through every file.

Personally tested and validated on large codebases: graphify reduces token usage by approximately 50% compared to feeding raw files into your AI assistant. The savings compound — the first run builds the graph; every subsequent query reads the compact representation instead of re-scanning source. A SHA256 cache means re-runs only re-process changed files.

Fully multimodal — drop in code, PDFs, markdown, screenshots, diagrams, whiteboard photos, or video files. Graphify extracts concepts from all of them and merges them into one unified graph. 25 programming languages supported via tree-sitter AST parsing.

Quick Start

Install the package and run one command inside your AI coding assistant:

Installpip install graphifyy && graphify install
Run/graphify .
Query/graphify query "what connects X to Y?"

Works in: Claude Code · Cursor · Codex · Gemini CLI · GitHub Copilot CLI · VS Code Copilot · Aider · and more.

Use Cases

Navigate large codebases

Type /graphify . in your AI assistant. Instead of grepping hundreds of files, your assistant reads GRAPH_REPORT.md and navigates by community structure — god nodes, surprising cross-file connections, and suggested questions.

~50% token reduction in large codebases

Personally tested and validated — graphify reduces token usage by approximately 50% in large codebases by letting your AI assistant query a compact knowledge graph instead of re-reading raw source files. Savings compound on every subsequent query.

Cross-file call graph & lineage

AST extraction via tree-sitter builds cross-file call graphs, import chains, class hierarchies, and docstring rationale for 25 languages — all deterministically, with no LLM needed for code.

Multi-modal corpus

Drop code, PDFs, markdown, screenshots, diagrams, whiteboard photos, or video files into one folder. Graphify extracts concepts from everything and merges them into a single unified graph with confidence-tagged edges.

3-Pass Extraction Pipeline

01

AST Pass — no LLM needed

tree-sitter deterministically parses 25 languages and extracts classes, functions, imports, call graphs, docstrings, and rationale comments (# WHY:, # NOTE:, # HACK:). All structural edges are tagged EXTRACTED with confidence 1.0.

02

Media Transcription — local Whisper

Video and audio files are transcribed locally with faster-whisper using a domain-aware prompt derived from corpus god nodes. Transcripts are SHA256-cached so re-runs are instant. Audio never leaves your machine.

03

Parallel Semantic Extraction

Claude subagents run in parallel over docs, papers, and images to extract concepts, relationships, and design rationale. Every edge is tagged EXTRACTED (found directly), INFERRED (reasonable inference with confidence 0–1), or AMBIGUOUS.

04

Graph Build + Leiden Clustering

Results merge into a NetworkX graph. Leiden community detection clusters nodes by edge density — no embeddings or vector DB required. Outputs: graph.html (interactive vis.js), graph.json (queryable), GRAPH_REPORT.md (god nodes, surprises, questions).

Output Artifacts

graph.html

Interactive vis.js graph — click nodes, filter by community, search by label. Open in any browser.

GRAPH_REPORT.md

God nodes, surprising cross-file connections (ranked by score), and 4–5 questions the graph is uniquely positioned to answer.

graph.json

Persistent queryable graph. Re-query weeks later via CLI (graphify query) or MCP server without re-reading source.

Not practically useful for Data Engineering projects

Validated through direct testing: Graphify is not practically useful for Data Engineering projects where data mappings are the core concern. It tracks code-level lineage — class hierarchies, function call chains, import graphs — using AST analysis, but it has no understanding of data flow (sources → transformations → sinks). In DE workflows, what matters is how data moves and transforms across systems, not how functions call each other. For that, dedicated data-lineage tools such as OpenLineage, Marquez, or Unity Catalog Lineage are the right fit.