CogniVault Backend Explained, Part 1 · Meet the Backend: Three Processes, Four Layers

All abbreviations are fully explained in the appendix at the bottom of the page.

When people first open the CogniVault repository, the question I hear most is some version of: “Where do I even start?” There’s a RAG agent, a FAISS index, a DBOS workflow, an Ollama host — and if you’re transitioning into tech, every one of those words is a closed door.

This series opens the doors one at a time. No prior RAG knowledge assumed, every abbreviation spelled out, and every claim checkable against the source code. If you’ve already read my architecture deep dives, think of this series as the guided tour that should have come first.

Let’s map this out.

The whole app is three processes

CogniVault lets you chat with your own documents and turn them into quizzes, workshops, flashcards, and mindmaps — and nothing ever leaves your machine. (The why behind that constraint is its own story: Why I Built a Local-First RAG.)

You might expect an app like that to be a sprawl of microservices. It’s three processes:

Process	What it does
The Python backend	One FastAPI app on port 8000 — it also serves the compiled React frontend as static files
Ollama	The local model server on port 11434, running the AI models
PostgreSQL	One Docker container, used only for workflow checkpoints — never for your documents

Everything else — your files, the search index, your chat history, your quiz scores — is a plain file on disk. That’s not laziness; it’s the privacy argument made physical. You can open every byte the app stores with a text editor and a SQLite browser.

The four layers

Before we name technologies, here’s the mental model I want you to keep for the whole series. The backend is four layers, top to bottom:

Layer 1 — the web layer. A FastAPI application receives every HTTP request and routes it to one of six routers: chat (/rag), knowledge management (/upload, /ingest), study tools (/api/study/*), progress (/api/progress/*), voice (/api/transcribe), and chat history (/api/history). FastAPI (a modern Python web framework) also auto-generates interactive API documentation at /api/docs, which is the best way to explore the backend without reading a line of code.

Layer 2 — the intelligence layer. Two AI models with two different jobs. gemma4:e4b generates: chat answers, reasoning, image analysis, and tool calls. embeddinggemma embeds: it turns text into vectors (lists of numbers that capture meaning) so similar ideas can be found mathematically. Both run inside Ollama — think of Ollama as Docker, but for AI models.

Layer 3 — the retrieval layer. A search engine over your documents that combines semantic search (find things that mean the same) with keyword search (find the exact string). Part 3 of this series is entirely about this layer.

Layer 4 — the persistence layer. Four storage systems, each picked for one job: a FAISS index plus a JSON file for searchable knowledge, SQLite for study data, PostgreSQL for workflow checkpoints, and plain JSON files for chat history.

One diagram, every major piece

flowchart TB subgraph CLIENT["Browser"] UI["React Frontend
(compiled, served by FastAPI)"] end subgraph SERVER["FastAPI Backend — port 8000"] ROUTERS["6 Routers
rag · knowledge · study ·
progress · audio · history"] AGENT["RAG Agent
(Strands SDK, 6 tools)"] VDB["VectorDB
FAISS + BM25 + RRF"] INGEST["Ingestion
(DBOS durable workflow)"] GEN["Study generators
quiz · workshop · cards · mindmap"] PROG["Progress tracker
+ 25 achievements"] end subgraph OLLAMA["Ollama — port 11434"] GEMMA["gemma4:e4b
chat · thinking · vision · tools"] EMBED["embeddinggemma
text to vectors"] end subgraph STORAGE["Local storage"] FAISSF["vector_store.faiss + .json"] SQLITE["progress.db (SQLite)"] PG["PostgreSQL
workflow state only"] DOCS["docs/ folder + chat_history.json"] end UI --> ROUTERS ROUTERS --> AGENT --> VDB AGENT --> GEMMA VDB --> EMBED ROUTERS --> INGEST --> EMBED INGEST --> PG INGEST --> FAISSF VDB --- FAISSF ROUTERS --> GEN --> GEMMA GEN --> SQLITE ROUTERS --> PROG --> SQLITE ROUTERS --> DOCS

Keep this picture handy — Parts 2, 3, and 4 each zoom into one region of it.

The tech stack, and why each piece earned its place

The full dependency list lives in requirements.txt. Here’s what matters, grouped by job:

Serving requests. FastAPI defines the endpoints and validates every request and response with Pydantic (a data-validation library — think of it as a strict customs officer for JSON). Uvicorn is the ASGI server (Asynchronous Server Gateway Interface — the Python standard that lets one process juggle many simultaneous requests) that actually runs it.

Thinking. Ollama serves gemma4:e4b — the e4b tag is the roughly four-billion effective-parameter variant, about a 9.6 GB download — and embeddinggemma (about 622 MB). The agent behaviour is built with the Strands Agents SDK, which wraps the model in a loop where it can call tools, read the results, and only then answer. (Where I run Ollama relative to Docker is a deliberate choice with a story behind it: Why We Keep Ollama Out of Docker.)

Finding things. FAISS (Facebook AI Similarity Search — Meta’s vector search library) handles semantic lookups; rank-bm25 handles keyword lookups; a formula called Reciprocal Rank Fusion merges the two. Part 3 unpacks all of this.

Reading documents. pypdf for PDFs, with an OCR fallback (Optical Character Recognition — turning pictures of text into actual text) for scanned pages via pymupdf and Tesseract. Word, PowerPoint, and Excel each get their own extractor. trafilatura pulls clean article text out of web pages.

Not losing work. DBOS makes the ingestion pipeline durable — every step is checkpointed in PostgreSQL so a crash resumes instead of restarting. Part 2 shows this in action.

Remembering. SQLite — a complete database engine that lives in a single file, progress.db — holds your study sessions, achievements, quizzes, workshops, flashcard decks, and mindmaps.

Appendix: Abbreviations in this post

This series’ promise is “no unexplained abbreviations,” so here is the table I wish every technical tutorial shipped with.

Abbreviation	Full form	Plain-English meaning
LLM	Large Language Model	A neural network trained on huge amounts of text that can read and generate language
RAG	Retrieval-Augmented Generation	Fetch relevant passages from your documents first, then let the model answer from them — instead of from its training memory
API	Application Programming Interface	The set of URLs the frontend calls to talk to the backend
ASGI	Asynchronous Server Gateway Interface	The Python standard that lets the server handle many requests concurrently
JSON	JavaScript Object Notation	The universal text format for structured data
NDJSON	Newline-Delimited JSON	A stream where each line is its own JSON object — ideal for streaming AI answers chunk by chunk
FAISS	Facebook AI Similarity Search	Meta’s library for storing vectors and finding the most similar ones fast
BM25	Best Match 25	A classic keyword-ranking formula — the 25th ranking function developed in the Okapi information-retrieval system
RRF	Reciprocal Rank Fusion	A formula for merging multiple ranked result lists using only the ranks
ANN	Approximate Nearest Neighbour	A speed shortcut many vector databases take. CogniVault deliberately uses an exact index instead — precise, and plenty fast at personal-library scale
DBOS	Database-Oriented Operating System (the research project it grew from)	A library that checkpoints workflow steps in a database so crashed jobs resume
SQL / SQLite	Structured Query Language / SQLite	The language of relational databases / a tiny database that lives in one file
OCR	Optical Character Recognition	Turning pictures of text (scans) into machine-readable text
SHA-256	Secure Hash Algorithm, 256-bit	A fingerprint function — any file maps to a unique hash, used to detect changed files
CORS	Cross-Origin Resource Sharing	Browser rules controlling which websites may call the API
SSRF	Server-Side Request Forgery	An attack where a server is tricked into fetching internal URLs — the URL-import endpoint guards against it
MCQ	Multiple-Choice Question	One of the two quiz question types
KB	Knowledge Base	All your ingested, searchable documents

(Every claim in this series can be checked directly against the CogniVault source code — the relevant file is named whenever it matters, and the repository README maps the full architecture.)

The takeaway

Strip away the abbreviations and CogniVault is a small system: one web server, one model runtime, one durability database, and a handful of files. The sophistication isn’t in the part count — it’s in how a few well-chosen pieces cooperate. That cooperation is what the next three parts are about.

Next up: Part 2 · From File to Searchable Knowledge — how a 1,000-page scanned PDF becomes something the AI can search in seconds, and why the pipeline survives a crash at page 800.

No results found