Kompyla
Autonomous research agent that builds and evolves a structured knowledge base using LLMs as a compiler.
Overview
Kompyla treats an LLM like a compiler: raw documents go in, structured wiki pages come out. It extends Andrej Karpathy’s LLM Knowledge Base pattern with an active retrieval layer that fetches from the web, arXiv, GitHub, RSS feeds, and YouTube automatically. A self-evolving feedback loop detects knowledge gaps, flags stale pages, and applies confidence scoring — while a full presentation pipeline exports to HTML, DOCX, PPTX, Marp slide decks, and charts.
Key Features
- Agentic retrieval — Searches web (Tavily), arXiv, GitHub, RSS, and YouTube transcripts; deduplicates with SHA-256 + MinHash LSH
- Incremental compilation — LLM transforms raw markdown into structured wiki pages; existing pages get a merge pass rather than overwrite
- Health & gap detection — Finds broken links, stale pages, orphans, and LLM-suggested missing topics
- Natural-language Q&A — Answers questions from the wiki with citations; saves answers as synthesis pages
- Multi-format export — HTML, Markdown bundle, DOCX, PPTX, Marp slides, PDF, and chart PNGs
- Streamlit web UI — Browse, Search, Ask, and Stats tabs; also runs as a scheduled background daemon
- Offline-capable — Works entirely locally via Ollama; Anthropic API is optional
Technical Decisions
The project uses Markdown files + SQLite as its storage substrate — plain files give portability and git-diffable history while SQLite adds queryable metadata without requiring a server. The LLM abstraction (LLMProvider ABC with OllamaProvider and AnthropicProvider) keeps the pipeline provider-agnostic. Incremental compilation (only touching the relevant wiki slice per run) was chosen deliberately over one-shot builds to keep large knowledge bases manageable and resume-friendly.