All projects
Jun 16, 2026

Runbook GraphRAG

An open-source, fully-local graph-RAG engine for incident investigation. Turns runbooks/TSGs into a queryable knowledge graph — surfacing the right guide, the telemetry to query, and linked failures, with zero external services.

Open SourceGraph RAGPythonIncident ResponseRAG

The problem

During an incident, responders burn time hunting across dozens of runbooks and troubleshooting guides — and the ones that matter are often linked in ways plain keyword search can't see. An auth-throttling event shows up as payment timeouts; two guides are related because they query the same telemetry table, not because they share words. Flat similarity search misses exactly those cascade paths.

What I built

A small, hackable graph-RAG engine that turns a folder of runbooks into a queryable knowledge graph. Given an incident description, it returns the most relevant guides plus the graph-linked siblings flat search would miss — along with the exact telemetry queries to run and the documented mitigations, each attributed to its source runbook.

The reasoning layer is your own coding assistant; the engine does fast, explainable, local retrieval and graph traversal. It never calls an LLM and never touches the network.

How it works

  • Entity-aware ingestion — parses runbooks and extracts the nouns that connect them: services, symptoms, mitigations, and telemetry queries (KQL table/cluster/function extraction).
  • Typed knowledge graph — runbooks ↔ tables ↔ services ↔ symptoms, with shortest-path traversal (to reason about cascades) and "god-node" detection (recurring failure hotspots).
  • Hybrid retrieval — BM25 lexical scoring fused with graph expansion, so siblings sharing a telemetry table get surfaced and explained.
  • Assistant-ready — a CLI (query / explain / path / godnodes) and an optional MCP server expose the graph as tools.

Design constraints

ConstraintWhy
Standard library onlyRuns in locked-down environments — no pip
No external servicesNo vector DB, graph DB, cloud, or LLM API
No data egressRunbooks never leave the machine
~600 lines, 6 modulesEasy to read, fork, and adapt to any team

Stack

Python (standard library) · BM25 · in-process knowledge graph · KQL parsing · SQLite (optional) · MCP (optional) · MIT-licensed.