I didn’t build this because I thought the world needed another RAG framework.
I built it because I didn’t trust the answers I was getting—and I didn’t trust my own understanding of why those answers existed.

Reading about knowledge graphs and retrieval-augmented generation is easy. Nodding along to architecture diagrams is easy. Believing that “this reduces hallucinations” is easy.
Understanding where trust actually comes from is not.
So I built KnowGraphRAG, not as a product, but as an experiment: What happens if you stop treating the LLM as the center of intelligence, and instead force it to speak only from a structure you can inspect?
Why Chunk-Based RAG Breaks Down in Real Work
Traditional RAG systems tend to look like this:
-
Break documents into chunks
-
Embed those chunks
-
Retrieve “similar” chunks at query time
-
Hand them to an LLM and hope it behaves
This works surprisingly well—until it doesn’t.
The failure modes show up fast when:
-
you’re using smaller local models
-
your data isn’t clean prose (logs, configs, dumps, CSVs)
-
you care why an answer exists, not just what it says
Similarity search alone doesn’t understand structure, relationships, or provenance. Two chunks can be “similar” and still be misleading when taken together. And once the LLM starts bridging gaps on its own, hallucinations creep in—especially on constrained hardware.
I wasn’t interested in making the model smarter.
I was interested in making it more constrained.
Flipping the Model: The Graph Comes First
The key architectural shift in KnowGraphRAG is simple to state and hard to internalize:
The knowledge graph is the system of record.
The LLM is just a renderer.
Under the hood, ingestion looks roughly like this:
-
Documents are ingested whole, regardless of format
-
PDFs, DOCX, CSV, JSON, XML, network configs, logs
-
-
They are chunked, but chunks are not treated as isolated facts
-
Entities are extracted (IPs, orgs, people, hosts, dates, etc.)
-
Relationships are created
-
document → chunk
-
chunk → chunk (sequence)
-
document → entity
-
entity → entity (when relationships can be inferred)
-
-
Everything is stored in a graph, not a vector index
Embeddings still exist—but they’re just one signal, not the organizing principle.
The result is a graph where:
-
documents know what they contain
-
chunks know where they came from
-
entities know who mentions them
-
relationships are explicit, not inferred on the fly
That structure turns out to matter a lot.
What “Retrieval” Means in a Graph-Based RAG
When you ask a question, KnowGraphRAG doesn’t just do “top-k similarity search.”
Instead, it roughly follows this flow:
-
Extract entities from the query
-
Not embeddings yet—actual concepts
-
-
Anchor the search in the graph
-
Find documents, chunks, and entities already connected
-
-
Traverse outward
-
Follow relationships to build a connected subgraph
-
-
Use embeddings to rank, not invent
-
Similarity helps order candidates, not define truth
-
-
Expand context deliberately
-
Adjacent chunks, related entities, structural neighbors
-
Only after that context is assembled does the LLM get involved.
And when it does, it gets a very constrained prompt:
-
Here is the context
-
Here are the citations
-
Do not answer outside of this
This is how hallucinations get starved—not eliminated, but suffocated.
Why This Works Especially Well with Local LLMs
One of my hard constraints was that this needed to run locally—slowly if necessary—on limited hardware. Even something like a Raspberry Pi.
That constraint forced an architectural honesty check.
Small, non-reasoning models are actually very good at:
-
summarizing known facts
-
rephrasing structured input
-
correlating already-adjacent information
They are terrible at inventing missing links responsibly.
By moving correlation, traversal, and selection into the graph layer, the LLM no longer has to “figure things out.” It just has to talk.
That shift made local models dramatically more useful—and far more predictable.
The Part I Didn’t Expect: Auditability Becomes the Feature
The biggest surprise wasn’t retrieval quality.
It was auditability.
Because every answer is derived from:
-
specific graph nodes
-
specific relationships
-
specific documents and chunks
…it becomes possible to see how an answer was constructed even when the model itself doesn’t expose reasoning.
That turns out to be incredibly valuable for:
-
compliance work
-
risk analysis
-
explaining decisions to humans who don’t care about embeddings
Instead of saying “the model thinks,” you can say:
-
these entities were involved
-
these documents contributed
-
this is the retrieval path
That’s not explainable AI in the academic sense—but it’s operationally defensible.
What KnowGraphRAG Actually Is (and Isn’t)
KnowGraphRAG ended up being a full system, not a demo:
-
Graph-backed storage (in-memory + persistent)
-
Entity and relationship extraction
-
Hybrid retrieval (graph-first, embeddings second)
-
Document versioning and change tracking
-
Query history and audit trails
-
Batch ingestion with guardrails
-
Visualization so you can see the graph
-
Support for local and remote LLM backends
-
An MCP interface so other tools can drive it
But it’s not a silver bullet.
It won’t magically make bad data good.
It won’t remove all hallucinations.
It won’t replace judgment.
What it does do is move responsibility out of the model and back into the system you control.
The Mindset Shift That Matters
If there’s one lesson I’d pass on, it’s this:
Don’t ask LLMs to be trustworthy.
Architect systems where trust is unavoidable.
Knowledge graphs and RAG aren’t a panacea—but together, they create boundaries. And boundaries are what make local LLMs useful for serious work.
I didn’t fully understand that until I built it.
And now that I have, I don’t think I could go back.
Support My Work
Support the creation of high-impact content and research. Sponsorship opportunities are available for specific topics, whitepapers, tools, or advisory insights. Learn more or contribute here: Buy Me A Coffee
**Shout-out to my friend and brother, Riangelo, for talking with me about the approach and for helping me make sense of it. He is building an enterprise version with much more capability.