CogniVault Backend Explained, Part 1 · Meet the Backend: Three Processes, Four Layers
All abbreviations are fully explained in the appendix at the bottom of the page.
When people first open the CogniVault repository, the question I hear most is some version of: “Where do I even start?” There’s a RAG agent, a FAISS index, a DBOS workflow, an Ollama host — and if you’re transitioning into tech, every one of those words is a closed door.
This series opens the doors one at a time. No prior RAG knowledge assumed, every abbreviation spelled out, and every claim checkable against the source code. If you’ve already read my architecture deep dives, think of this series as the guided tour that should have come first.
Let’s map this out.
The whole app is three processes
CogniVault lets you chat with your own documents and turn them into quizzes, workshops, flashcards, and mindmaps — and nothing ever leaves your machine. (The why behind that constraint is its own story: Why I Built a Local-First RAG.)
You might expect an app like that to be a sprawl of microservices. It’s three processes:
| Process | What it does |
|---|---|
| The Python backend | One FastAPI app on port 8000 — it also serves the compiled React frontend as static files |
| Ollama | The local model server on port 11434, running the AI models |
| PostgreSQL | One Docker container, used only for workflow checkpoints — never for your documents |
Everything else — your files, the search index, your chat history, your quiz scores — is a plain file on disk. That’s not laziness; it’s the privacy argument made physical. You can open every byte the app stores with a text editor and a SQLite browser.
The four layers
Before we name technologies, here’s the mental model I want you to keep for the whole series. The backend is four layers, top to bottom:
Layer 1 — the web layer. A FastAPI application receives every HTTP request and routes it to one of six routers: chat (/rag), knowledge management (/upload, /ingest), study tools (/api/study/*), progress (/api/progress/*), voice (/api/transcribe), and chat history (/api/history). FastAPI (a modern Python web framework) also auto-generates interactive API documentation at /api/docs, which is the best way to explore the backend without reading a line of code.
Layer 2 — the intelligence layer. Two AI models with two different jobs. gemma4:e4b generates: chat answers, reasoning, image analysis, and tool calls. embeddinggemma embeds: it turns text into vectors (lists of numbers that capture meaning) so similar ideas can be found mathematically. Both run inside Ollama — think of Ollama as Docker, but for AI models.
Layer 3 — the retrieval layer. A search engine over your documents that combines semantic search (find things that mean the same) with keyword search (find the exact string). Part 3 of this series is entirely about this layer.
Layer 4 — the persistence layer. Four storage systems, each picked for one job: a FAISS index plus a JSON file for searchable knowledge, SQLite for study data, PostgreSQL for workflow checkpoints, and plain JSON files for chat history.
One diagram, every major piece
(compiled, served by FastAPI)"] end subgraph SERVER["FastAPI Backend — port 8000"] ROUTERS["6 Routers
rag · knowledge · study ·
progress · audio · history"] AGENT["RAG Agent
(Strands SDK, 6 tools)"] VDB["VectorDB
FAISS + BM25 + RRF"] INGEST["Ingestion
(DBOS durable workflow)"] GEN["Study generators
quiz · workshop · cards · mindmap"] PROG["Progress tracker
+ 25 achievements"] end subgraph OLLAMA["Ollama — port 11434"] GEMMA["gemma4:e4b
chat · thinking · vision · tools"] EMBED["embeddinggemma
text to vectors"] end subgraph STORAGE["Local storage"] FAISSF["vector_store.faiss + .json"] SQLITE["progress.db (SQLite)"] PG["PostgreSQL
workflow state only"] DOCS["docs/ folder + chat_history.json"] end UI --> ROUTERS ROUTERS --> AGENT --> VDB AGENT --> GEMMA VDB --> EMBED ROUTERS --> INGEST --> EMBED INGEST --> PG INGEST --> FAISSF VDB --- FAISSF ROUTERS --> GEN --> GEMMA GEN --> SQLITE ROUTERS --> PROG --> SQLITE ROUTERS --> DOCS
Keep this picture handy — Parts 2, 3, and 4 each zoom into one region of it.
The tech stack, and why each piece earned its place
The full dependency list lives in requirements.txt. Here’s what matters, grouped by job:
Serving requests. FastAPI defines the endpoints and validates every request and response with Pydantic (a data-validation library — think of it as a strict customs officer for JSON). Uvicorn is the ASGI server (Asynchronous Server Gateway Interface — the Python standard that lets one process juggle many simultaneous requests) that actually runs it.
Thinking. Ollama serves gemma4:e4b — the e4b tag is the roughly four-billion effective-parameter variant, about a 9.6 GB download — and embeddinggemma (about 622 MB). The agent behaviour is built with the Strands Agents SDK, which wraps the model in a loop where it can call tools, read the results, and only then answer. (Where I run Ollama relative to Docker is a deliberate choice with a story behind it: Why We Keep Ollama Out of Docker.)
Finding things. FAISS (Facebook AI Similarity Search — Meta’s vector search library) handles semantic lookups; rank-bm25 handles keyword lookups; a formula called Reciprocal Rank Fusion merges the two. Part 3 unpacks all of this.
Reading documents. pypdf for PDFs, with an OCR fallback (Optical Character Recognition — turning pictures of text into actual text) for scanned pages via pymupdf and Tesseract. Word, PowerPoint, and Excel each get their own extractor. trafilatura pulls clean article text out of web pages.
Not losing work. DBOS makes the ingestion pipeline durable — every step is checkpointed in PostgreSQL so a crash resumes instead of restarting. Part 2 shows this in action.
Remembering. SQLite — a complete database engine that lives in a single file, progress.db — holds your study sessions, achievements, quizzes, workshops, flashcard decks, and mindmaps.
Appendix: Abbreviations in this post
This series’ promise is “no unexplained abbreviations,” so here is the table I wish every technical tutorial shipped with.
| Abbreviation | Full form | Plain-English meaning |
|---|---|---|
| LLM | Large Language Model | A neural network trained on huge amounts of text that can read and generate language |
| RAG | Retrieval-Augmented Generation | Fetch relevant passages from your documents first, then let the model answer from them — instead of from its training memory |
| API | Application Programming Interface | The set of URLs the frontend calls to talk to the backend |
| ASGI | Asynchronous Server Gateway Interface | The Python standard that lets the server handle many requests concurrently |
| JSON | JavaScript Object Notation | The universal text format for structured data |
| NDJSON | Newline-Delimited JSON | A stream where each line is its own JSON object — ideal for streaming AI answers chunk by chunk |
| FAISS | Facebook AI Similarity Search | Meta’s library for storing vectors and finding the most similar ones fast |
| BM25 | Best Match 25 | A classic keyword-ranking formula — the 25th ranking function developed in the Okapi information-retrieval system |
| RRF | Reciprocal Rank Fusion | A formula for merging multiple ranked result lists using only the ranks |
| ANN | Approximate Nearest Neighbour | A speed shortcut many vector databases take. CogniVault deliberately uses an exact index instead — precise, and plenty fast at personal-library scale |
| DBOS | Database-Oriented Operating System (the research project it grew from) | A library that checkpoints workflow steps in a database so crashed jobs resume |
| SQL / SQLite | Structured Query Language / SQLite | The language of relational databases / a tiny database that lives in one file |
| OCR | Optical Character Recognition | Turning pictures of text (scans) into machine-readable text |
| SHA-256 | Secure Hash Algorithm, 256-bit | A fingerprint function — any file maps to a unique hash, used to detect changed files |
| CORS | Cross-Origin Resource Sharing | Browser rules controlling which websites may call the API |
| SSRF | Server-Side Request Forgery | An attack where a server is tricked into fetching internal URLs — the URL-import endpoint guards against it |
| MCQ | Multiple-Choice Question | One of the two quiz question types |
| KB | Knowledge Base | All your ingested, searchable documents |
(Every claim in this series can be checked directly against the CogniVault source code — the relevant file is named whenever it matters, and the repository README maps the full architecture.)
The takeaway
Strip away the abbreviations and CogniVault is a small system: one web server, one model runtime, one durability database, and a handful of files. The sophistication isn’t in the part count — it’s in how a few well-chosen pieces cooperate. That cooperation is what the next three parts are about.
Next up: Part 2 · From File to Searchable Knowledge — how a 1,000-page scanned PDF becomes something the AI can search in seconds, and why the pipeline survives a crash at page 800.

Related
- CogniVault Backend Explained, Part 2 · From File to Searchable Knowledge
- CogniVault Backend Explained, Part 3 · How a Question Becomes a Cited Answer
- Part 1 · CogniVault Architecture: Why Standard RAG Isn't Enough (Hybrid Search)
- CogniVault Backend Explained, Part 4 · Study Tools, Progress, and the Privacy Receipts
- Gemma CogniVault