CogniVault Backend Explained, Part 4 · Study Tools, Progress, and the Privacy Receipts

Jun 12, 2026·
Ndimofor Aretas
Ndimofor Aretas
· 7 min read
blog Beginner Guides

All abbreviations are fully explained in the appendix at the bottom of the page.

In Part 3 we followed a question through hybrid retrieval and the agent loop to a cited answer. In this final part, the same machinery gets pointed at a different goal: teaching you — and then we close the series by auditing the project’s central promise: nothing leaves your machine.

One recipe, four study tools

CogniVault generates quizzes, multi-lesson workshops, flashcard decks, and mindmaps from your documents. Four different outputs — but under the hood, one shared five-step recipe:

  1. Retrieve. The same hybrid search from Part 3, but instead of your question, the probe is a broad query like “key concepts, definitions, important facts, main ideas”, scoped to the documents you selected. Up to 15 representative chunks come back.

  2. Prompt from a template. The instructions sent to Gemma are not buried in Python — they’re editable Markdown files in backend/prompts/ (quiz.md, flashcards.md, and so on). Drop a modified copy into backend/prompts/custom/ and it overrides the shipped version on the very next request. No restart, no code change. Prompt engineering as configuration.

  3. Constrain the output. Asking a small local model to “please return JSON” works most of the time — and most of the time is a production bug. CogniVault uses Ollama’s grammar-constrained generation (format="json"), which makes invalid JSON impossible rather than unlikely, plus low temperature for consistency. The full saga of getting reliable structure out of a 4-billion-parameter model is in Getting Reliable JSON Out of a Local LLM.

  4. Validate defensively. Every generated item is checked field by field, and malformed items are dropped rather than failing the whole batch. Small models occasionally fumble one question out of ten; a product shouldn’t collapse because of it.

  5. Persist. Everything lands in SQLite, so quizzes are resumable, workshop progress survives restarts, and flashcard statuses are remembered per deck.

Here’s the recipe in motion for a quiz:

%%{init: {'sequence': {'actorFontSize': 28, 'messageFontSize': 24, 'loopTextFontSize': 22, 'noteFontSize': 22}}}%% sequenceDiagram actor U as You participant F as Study Hub UI participant B as FastAPI participant V as VectorDB participant O as Ollama (gemma4:e4b) participant S as SQLite U->>F: Pick scope, difficulty, question count F->>B: POST /api/study/quiz/generate B->>V: Hybrid search, scoped to your documents V-->>B: Up to 15 representative chunks B->>B: Render the quiz.md prompt template B->>O: chat(format="json", low temperature) O-->>B: Grammar-constrained JSON B->>B: Validate each question, drop bad ones B->>S: Save quiz (resumable later) B-->>F: Typed response F-->>U: Play, submit, score — and maybe a new badge

The four tools differ only in their template and their shape: quizzes produce multiple-choice and true/false questions with explanations; workshops produce an outline first and then write each lesson on demand when you open it; flashcards produce front/back pairs; mindmaps produce a topic tree that the frontend renders as an interactive diagram. (That renderer is its own adventure: Hand-Rolling an SVG Mindmap.)

Sessions that track themselves

Most study apps make you press a start button, and most people forget. CogniVault takes a different stance: study sessions are inferred, not declared.

Every chat message either extends the current session or — after a 15-minute idle gap — quietly starts a new one. Walk away for coffee, come back, keep working: same session. Come back tomorrow: new session. No buttons, no forgetting.

Each message also records a tiny event (timestamp, whether you used a scope filter or attachments) into progress.db — a SQLite database, which is a complete relational database living in a single file. Eleven tables hold everything: sessions, message events, earned badges, quiz attempts and saved quizzes, workshops and lessons, decks and cards, and mindmaps.

One engineering note worth copying: the tracking call inside the chat endpoint is wrapped so that it can never block or break the chat. Analytics must be a passenger, never a driver.

25 badges, defined as data

The achievements aren’t scattered through the code as if statements. They live in one JSON file — 25 entries, each with a code, a name, an icon, the metric it watches, and a target. After each relevant action, an evaluator checks every definition against the database and persists anything newly earned. Some badges form ladders, each pointing to its next level.

Declarative beats imperative here for a simple reason: adding badge number 26 means adding a JSON entry, not writing new logic. The design behind the streaks, the idle-gap rule, and the 90-day heatmap got its own post: Gamifying Learning.

Voice input, without a cloud microphone

The microphone button is powered by faster-whisper — OpenAI’s Whisper speech-recognition model re-implemented on a faster inference engine — running on your CPU with int8 quantisation (8-bit numbers instead of 32-bit: smaller, faster, accurate enough). No audio ever leaves the machine.

The model is lazy-loaded on the first transcription so app startup stays instant, and if faster-whisper isn’t installed at all, the frontend simply hides the mic button. Features should degrade, not detonate.

The privacy receipts

The series began with a promise: nothing leaves your machine. Promises are cheap — here’s the audit. Every byte CogniVault stores, and where it lives:

DataLocationFormat
Your uploaded filesdocs/ folderThe original files
Search vectorsvector_store.faissFAISS binary index
Chunk text and metadatavector_store.jsonJSON
File-to-category mapcategories.jsonJSON
Chat sessionschat_history.jsonJSON
Sessions, badges, quizzes, workshops, decks, mindmapsprogress.dbSQLite
Ingestion checkpointsPostgreSQL (local Docker volume)DBOS system tables
The AI models themselvesOllama’s local model storeModel weights

Nothing in that table is on someone else’s computer. Inference goes to localhost. Embeddings go to localhost. The only outbound request the backend ever makes is the URL-import feature — at your explicit request, and guarded against fetching private addresses. The app even surfaces these stats live in its Privacy Vault Audit panel.

And because trust needs more than a table: the whole backend is covered by a pytest suite you can run yourself — the approach is documented in Testing a Local-AI App: 312 Tests, Zero Infrastructure.

Series wrap-up

Four parts, one architecture:

  1. Part 1 — three processes, four layers, and a decoder ring for the jargon
  2. Part 2 — a durable, format-aware pipeline that turns any document into searchable vectors
  3. Part 3 — two retrievers covering each other’s blind spots, fused by rank, driven by an agent
  4. Part 4 — the same machinery generating study materials, tracking progress without buttons, and a storage map with no cloud rows in it

If there’s one theme, it’s this: boring, verifiable choices in service of privacy. Exact search instead of approximate. SQLite files instead of hosted databases. Grammar-constrained JSON instead of hopeful parsing. Soft deletes instead of clever index surgery. Every piece is something you can open, read, and check — which is exactly the point.


Appendix: Abbreviations in this post

AbbreviationFull formMeaning
JSONJavaScript Object NotationThe structured format the generators force the model to produce
SQLite / SQL(SQL = Structured Query Language)A complete relational database living in one file, progress.db
MCQMultiple-Choice QuestionOne of the two quiz question types (the other is true/false)
CPUCentral Processing UnitWhere Whisper runs — no graphics card required
int88-bit integer (quantisation)Storing model weights as small integers: smaller, faster, accurate enough
AIArtificial IntelligenceSoftware performing tasks that normally need human intelligence
APIApplication Programming InterfaceThe endpoints the Study Hub and dashboard call
FAISSFacebook AI Similarity SearchThe vector index in the privacy-receipts table
DBOSDatabase-Oriented Operating SystemThe durable-workflow library whose checkpoints live in PostgreSQL
SSRFServer-Side Request ForgeryThe attack class the URL importer guards against
PNG / PDFPortable Network Graphics / Portable Document FormatTwo of the mindmap export formats (plus Markdown)
SVGScalable Vector GraphicsThe browser drawing format behind the interactive mindmap rendering

Next steps: clone the repository and read along — the README maps the full architecture, and every claim in this series can be checked directly against the code in backend/. And if you want the deep-dive versions of these topics, the architecture series picks up where this tour ends.