<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Beginner Guides |</title><link>https://aretascodes.dev/categories/beginner-guides/</link><atom:link href="https://aretascodes.dev/categories/beginner-guides/index.xml" rel="self" type="application/rss+xml"/><description>Beginner Guides</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Fri, 12 Jun 2026 00:00:00 +0000</lastBuildDate><image><url>https://aretascodes.dev/media/icon_hu_2ab4f4763b27c75b.png</url><title>Beginner Guides</title><link>https://aretascodes.dev/categories/beginner-guides/</link></image><item><title>CogniVault Backend Explained, Part 1 · Meet the Backend: Three Processes, Four Layers</title><link>https://aretascodes.dev/blog/backend-explained-meet-the-backend/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/backend-explained-meet-the-backend/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;When people first open the CogniVault repository, the question I hear most is some version of: &lt;em&gt;&amp;ldquo;Where do I even start?&amp;rdquo;&lt;/em&gt; There&amp;rsquo;s a RAG agent, a FAISS index, a DBOS workflow, an Ollama host — and if you&amp;rsquo;re transitioning into tech, every one of those words is a closed door.&lt;/p&gt;
&lt;p&gt;This series opens the doors one at a time. No prior RAG knowledge assumed, every abbreviation spelled out, and every claim checkable against the
. If you&amp;rsquo;ve already read my
, think of this series as the guided tour that should have come first.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s map this out.&lt;/p&gt;
&lt;h2 id="the-whole-app-is-three-processes"&gt;The whole app is three processes&lt;/h2&gt;
&lt;p&gt;CogniVault lets you chat with your own documents and turn them into quizzes, workshops, flashcards, and mindmaps — and nothing ever leaves your machine. (The &lt;em&gt;why&lt;/em&gt; behind that constraint is its own story:
.)&lt;/p&gt;
&lt;p&gt;You might expect an app like that to be a sprawl of microservices. It&amp;rsquo;s three processes:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Process&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;The Python backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One FastAPI app on port 8000 — it also serves the compiled React frontend as static files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The local model server on port 11434, running the AI models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One Docker container, used &lt;em&gt;only&lt;/em&gt; for workflow checkpoints — never for your documents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Everything else — your files, the search index, your chat history, your quiz scores — is a plain file on disk. That&amp;rsquo;s not laziness; it&amp;rsquo;s the privacy argument made physical. You can open every byte the app stores with a text editor and a SQLite browser.&lt;/p&gt;
&lt;h2 id="the-four-layers"&gt;The four layers&lt;/h2&gt;
&lt;p&gt;Before we name technologies, here&amp;rsquo;s the mental model I want you to keep for the whole series. The backend is four layers, top to bottom:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 1 — the web layer.&lt;/strong&gt; A FastAPI application receives every HTTP request and routes it to one of six routers: chat (&lt;code&gt;/rag&lt;/code&gt;), knowledge management (&lt;code&gt;/upload&lt;/code&gt;, &lt;code&gt;/ingest&lt;/code&gt;), study tools (&lt;code&gt;/api/study/*&lt;/code&gt;), progress (&lt;code&gt;/api/progress/*&lt;/code&gt;), voice (&lt;code&gt;/api/transcribe&lt;/code&gt;), and chat history (&lt;code&gt;/api/history&lt;/code&gt;). FastAPI (a modern Python web framework) also auto-generates interactive API documentation at &lt;code&gt;/api/docs&lt;/code&gt;, which is the best way to explore the backend without reading a line of code.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 2 — the intelligence layer.&lt;/strong&gt; Two AI models with two different jobs. &lt;code&gt;gemma4:e4b&lt;/code&gt; &lt;em&gt;generates&lt;/em&gt;: chat answers, reasoning, image analysis, and tool calls. &lt;code&gt;embeddinggemma&lt;/code&gt; &lt;em&gt;embeds&lt;/em&gt;: it turns text into vectors (lists of numbers that capture meaning) so similar ideas can be found mathematically. Both run inside Ollama — think of Ollama as Docker, but for AI models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 3 — the retrieval layer.&lt;/strong&gt; A search engine over your documents that combines &lt;em&gt;semantic&lt;/em&gt; search (find things that mean the same) with &lt;em&gt;keyword&lt;/em&gt; search (find the exact string). Part 3 of this series is entirely about this layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Layer 4 — the persistence layer.&lt;/strong&gt; Four storage systems, each picked for one job: a FAISS index plus a JSON file for searchable knowledge, SQLite for study data, PostgreSQL for workflow checkpoints, and plain JSON files for chat history.&lt;/p&gt;
&lt;h2 id="one-diagram-every-major-piece"&gt;One diagram, every major piece&lt;/h2&gt;
&lt;div class="mermaid"&gt;flowchart TB
subgraph CLIENT["Browser"]
UI["React Frontend&lt;br/&gt;(compiled, served by FastAPI)"]
end
subgraph SERVER["FastAPI Backend — port 8000"]
ROUTERS["6 Routers&lt;br/&gt;rag · knowledge · study ·&lt;br/&gt;progress · audio · history"]
AGENT["RAG Agent&lt;br/&gt;(Strands SDK, 6 tools)"]
VDB["VectorDB&lt;br/&gt;FAISS + BM25 + RRF"]
INGEST["Ingestion&lt;br/&gt;(DBOS durable workflow)"]
GEN["Study generators&lt;br/&gt;quiz · workshop · cards · mindmap"]
PROG["Progress tracker&lt;br/&gt;+ 25 achievements"]
end
subgraph OLLAMA["Ollama — port 11434"]
GEMMA["gemma4:e4b&lt;br/&gt;chat · thinking · vision · tools"]
EMBED["embeddinggemma&lt;br/&gt;text to vectors"]
end
subgraph STORAGE["Local storage"]
FAISSF["vector_store.faiss + .json"]
SQLITE["progress.db (SQLite)"]
PG["PostgreSQL&lt;br/&gt;workflow state only"]
DOCS["docs/ folder + chat_history.json"]
end
UI --&gt; ROUTERS
ROUTERS --&gt; AGENT --&gt; VDB
AGENT --&gt; GEMMA
VDB --&gt; EMBED
ROUTERS --&gt; INGEST --&gt; EMBED
INGEST --&gt; PG
INGEST --&gt; FAISSF
VDB --- FAISSF
ROUTERS --&gt; GEN --&gt; GEMMA
GEN --&gt; SQLITE
ROUTERS --&gt; PROG --&gt; SQLITE
ROUTERS --&gt; DOCS
&lt;/div&gt;
&lt;p&gt;Keep this picture handy — Parts 2, 3, and 4 each zoom into one region of it.&lt;/p&gt;
&lt;h2 id="the-tech-stack-and-why-each-piece-earned-its-place"&gt;The tech stack, and why each piece earned its place&lt;/h2&gt;
&lt;p&gt;The full dependency list lives in &lt;code&gt;requirements.txt&lt;/code&gt;. Here&amp;rsquo;s what matters, grouped by job:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Serving requests.&lt;/strong&gt; FastAPI defines the endpoints and validates every request and response with Pydantic (a data-validation library — think of it as a strict customs officer for JSON). Uvicorn is the ASGI server (Asynchronous Server Gateway Interface — the Python standard that lets one process juggle many simultaneous requests) that actually runs it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thinking.&lt;/strong&gt; Ollama serves &lt;code&gt;gemma4:e4b&lt;/code&gt; — the &lt;code&gt;e4b&lt;/code&gt; tag is the roughly four-billion effective-parameter variant, about a 9.6 GB download — and &lt;code&gt;embeddinggemma&lt;/code&gt; (about 622 MB). The agent behaviour is built with the Strands Agents SDK, which wraps the model in a loop where it can call tools, read the results, and only then answer. (Where I run Ollama relative to Docker is a deliberate choice with a story behind it:
.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finding things.&lt;/strong&gt; FAISS (Facebook AI Similarity Search — Meta&amp;rsquo;s vector search library) handles semantic lookups; &lt;code&gt;rank-bm25&lt;/code&gt; handles keyword lookups; a formula called Reciprocal Rank Fusion merges the two. Part 3 unpacks all of this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reading documents.&lt;/strong&gt; &lt;code&gt;pypdf&lt;/code&gt; for PDFs, with an OCR fallback (Optical Character Recognition — turning pictures of text into actual text) for scanned pages via &lt;code&gt;pymupdf&lt;/code&gt; and Tesseract. Word, PowerPoint, and Excel each get their own extractor. &lt;code&gt;trafilatura&lt;/code&gt; pulls clean article text out of web pages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not losing work.&lt;/strong&gt; DBOS makes the ingestion pipeline durable — every step is checkpointed in PostgreSQL so a crash resumes instead of restarting. Part 2 shows this in action.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Remembering.&lt;/strong&gt; SQLite — a complete database engine that lives in a single file, &lt;code&gt;progress.db&lt;/code&gt; — holds your study sessions, achievements, quizzes, workshops, flashcard decks, and mindmaps.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h2&gt;
&lt;p&gt;This series&amp;rsquo; promise is &amp;ldquo;no unexplained abbreviations,&amp;rdquo; so here is the table I wish every technical tutorial shipped with.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Plain-English meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Language Model&lt;/td&gt;
&lt;td&gt;A neural network trained on huge amounts of text that can read and generate language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval-Augmented Generation&lt;/td&gt;
&lt;td&gt;Fetch relevant passages from &lt;em&gt;your&lt;/em&gt; documents first, then let the model answer from them — instead of from its training memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Programming Interface&lt;/td&gt;
&lt;td&gt;The set of URLs the frontend calls to talk to the backend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ASGI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Asynchronous Server Gateway Interface&lt;/td&gt;
&lt;td&gt;The Python standard that lets the server handle many requests concurrently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JavaScript Object Notation&lt;/td&gt;
&lt;td&gt;The universal text format for structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NDJSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Newline-Delimited JSON&lt;/td&gt;
&lt;td&gt;A stream where each line is its own JSON object — ideal for streaming AI answers chunk by chunk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;Meta&amp;rsquo;s library for storing vectors and finding the most similar ones fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best Match 25&lt;/td&gt;
&lt;td&gt;A classic keyword-ranking formula — the 25th ranking function developed in the Okapi information-retrieval system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reciprocal Rank Fusion&lt;/td&gt;
&lt;td&gt;A formula for merging multiple ranked result lists using only the ranks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ANN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Approximate Nearest Neighbour&lt;/td&gt;
&lt;td&gt;A speed shortcut many vector databases take. CogniVault deliberately uses an &lt;em&gt;exact&lt;/em&gt; index instead — precise, and plenty fast at personal-library scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DBOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database-Oriented Operating System (the research project it grew from)&lt;/td&gt;
&lt;td&gt;A library that checkpoints workflow steps in a database so crashed jobs resume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL / SQLite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured Query Language / SQLite&lt;/td&gt;
&lt;td&gt;The language of relational databases / a tiny database that lives in one file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optical Character Recognition&lt;/td&gt;
&lt;td&gt;Turning pictures of text (scans) into machine-readable text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SHA-256&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure Hash Algorithm, 256-bit&lt;/td&gt;
&lt;td&gt;A fingerprint function — any file maps to a unique hash, used to detect changed files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CORS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cross-Origin Resource Sharing&lt;/td&gt;
&lt;td&gt;Browser rules controlling which websites may call the API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-Side Request Forgery&lt;/td&gt;
&lt;td&gt;An attack where a server is tricked into fetching internal URLs — the URL-import endpoint guards against it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple-Choice Question&lt;/td&gt;
&lt;td&gt;One of the two quiz question types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Knowledge Base&lt;/td&gt;
&lt;td&gt;All your ingested, searchable documents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(Every claim in this series can be checked directly against the
— the relevant file is named whenever it matters, and the repository README maps the full architecture.)&lt;/p&gt;
&lt;h2 id="the-takeaway"&gt;The takeaway&lt;/h2&gt;
&lt;p&gt;Strip away the abbreviations and CogniVault is a small system: one web server, one model runtime, one durability database, and a handful of files. The sophistication isn&amp;rsquo;t in the part count — it&amp;rsquo;s in how a few well-chosen pieces cooperate. That cooperation is what the next three parts are about.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next up:&lt;/strong&gt;
— how a 1,000-page scanned PDF becomes something the AI can search in seconds, and why the pipeline survives a crash at page 800.&lt;/p&gt;</description></item><item><title>CogniVault Backend Explained, Part 2 · From File to Searchable Knowledge</title><link>https://aretascodes.dev/blog/backend-explained-ingestion/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/backend-explained-ingestion/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;An LLM cannot &amp;ldquo;open&amp;rdquo; your PDF. That sentence surprises a lot of newcomers, so let&amp;rsquo;s sit with it for a second: when you chat with your documents in CogniVault, the model never touches the original files. Something has to happen &lt;em&gt;between&lt;/em&gt; &amp;ldquo;I dropped a file into the browser&amp;rdquo; and &amp;ldquo;the AI just quoted page 47 back at me.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That something is &lt;strong&gt;ingestion&lt;/strong&gt;, and it&amp;rsquo;s the subject of this part. In
we drew the whole map; today we zoom into one region — the conveyor belt that turns files into searchable knowledge.&lt;/p&gt;
&lt;h2 id="the-conveyor-belt"&gt;The conveyor belt&lt;/h2&gt;
&lt;p&gt;Think of ingestion as a four-station assembly line:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Extract&lt;/strong&gt; the text out of each file — even scanned ones.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Chunk&lt;/strong&gt; it into pieces small enough to fit into a prompt.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; each chunk — turn it into a vector (a list of numbers that captures its meaning) so similar ideas land near each other in vector space.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Store&lt;/strong&gt; vectors and metadata so they can be searched later.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="mermaid"&gt;flowchart TD
A["Upload&lt;br/&gt;POST /upload&lt;br/&gt;saved to docs/"] --&gt; B
subgraph WF["DBOS durable workflow"]
B["Step 1&lt;br/&gt;Which files changed?&lt;br/&gt;SHA-256 fingerprints"] --&gt; C["Step 2&lt;br/&gt;Extract text&lt;br/&gt;per-format + OCR fallback"]
C --&gt; D["Chunk&lt;br/&gt;1000 chars, 100 overlap"]
D --&gt; E["Step 3&lt;br/&gt;Embed&lt;br/&gt;embeddinggemma, batches of 5"]
E --&gt; F["Step 4&lt;br/&gt;Save&lt;br/&gt;FAISS index + metadata JSON"]
end
F --&gt; G["Reload in-memory index&lt;br/&gt;instantly searchable"]
&lt;/div&gt;
&lt;p&gt;Simple enough. The interesting engineering is in the failure cases — so let&amp;rsquo;s start there.&lt;/p&gt;
&lt;h2 id="the-factory-ledger-why-the-pipeline-cant-lose-work"&gt;The factory ledger: why the pipeline can&amp;rsquo;t lose work&lt;/h2&gt;
&lt;p&gt;Embedding a large library takes minutes. What happens when your laptop goes to sleep at page 800 of a 1,000-page manual? With a plain Python script: everything restarts from page 1.&lt;/p&gt;
&lt;p&gt;CogniVault instead writes the pipeline as a &lt;strong&gt;DBOS durable workflow&lt;/strong&gt;. Picture a factory where every station stamps a permanent ledger the moment it finishes a box. If the power cuts out, nobody rebuilds finished boxes — the workers read the ledger and resume at the first unstamped entry.&lt;/p&gt;
&lt;p&gt;DBOS is that ledger, and PostgreSQL is the book it&amp;rsquo;s written in. Each pipeline station is a checkpointed step; on restart, completed steps return their recorded results instantly and execution continues from the first unfinished one. A failed embedding batch is simply retried.&lt;/p&gt;
&lt;p&gt;This is also what powers the live progress timeline in the UI: starting an ingestion returns a &lt;code&gt;workflow_id&lt;/code&gt;, and the frontend polls a status endpoint that reports which steps have completed, which are running, and which are still waiting.&lt;/p&gt;
&lt;p&gt;I wrote a whole deep dive on this mechanism — including what happens when you &lt;code&gt;kill -9&lt;/code&gt; the process mid-ingest — in
.&lt;/p&gt;
&lt;h2 id="fingerprints-not-faith-sha-256-change-detection"&gt;Fingerprints, not faith: SHA-256 change detection&lt;/h2&gt;
&lt;p&gt;Re-embedding your whole library every time you add one file would be wasteful. So before any work happens, the pipeline computes each file&amp;rsquo;s &lt;strong&gt;SHA-256 hash&lt;/strong&gt; (a content fingerprint — change one character in the file and the fingerprint changes completely) and compares it to the fingerprint stored with the file&amp;rsquo;s existing chunks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Never seen before&lt;/strong&gt; → ingest it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fingerprint changed&lt;/strong&gt; → the old chunks are &lt;em&gt;soft-deleted&lt;/em&gt; and the file is re-ingested.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fingerprint identical&lt;/strong&gt; → skip it entirely.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Why &amp;ldquo;soft&amp;rdquo;-deleted? Because the FAISS index type CogniVault uses cannot remove individual vectors. Stale chunks are just marked &lt;code&gt;deleted: true&lt;/code&gt; in the metadata; their vectors stay in the index but every search filters them out. It&amp;rsquo;s an honest, boring solution — and it never corrupts the index.&lt;/p&gt;
&lt;h2 id="every-format-gets-its-own-treatment"&gt;Every format gets its own treatment&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s a detail that separates a demo from a product. A naive pipeline extracts &amp;ldquo;all the text&amp;rdquo; and calls it a day. CogniVault gives each format an extractor that preserves the &lt;em&gt;structure&lt;/em&gt; that retrieval will need later:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Page by page, keeping page numbers (those become citations later). Any page yielding fewer than 50 characters is presumed scanned and sent to OCR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scanned page&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The page is rendered to an image at roughly 144 dpi, then Tesseract OCR (Optical Character Recognition — reading text out of images) extracts the words&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Markdown&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Split on headings; each section chunk gets a breadcrumb prefix like &lt;code&gt;[Section: Intro &amp;gt; Setup]&lt;/code&gt; so its embedding carries the document hierarchy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rows grouped 20 per chunk — and &lt;em&gt;every&lt;/em&gt; chunk is prefixed with the header row, so the model always knows the column names&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Excel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same row-group idea per sheet, prefixed &lt;code&gt;[Sheet: name]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PowerPoint&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One chunk per slide&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Word&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Paragraphs plus table cells&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Web pages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fetched on request and stripped to clean article text — behind an SSRF guard (Server-Side Request Forgery protection: the server refuses to fetch private or internal addresses)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Ask yourself why the CSV detail matters. If chunk 14 of a spreadsheet is just twenty naked rows of numbers, no search will ever connect it to the question &amp;ldquo;what was the Q3 budget?&amp;rdquo; Prefix it with the header row, and the chunk &lt;em&gt;knows&lt;/em&gt; it contains budget columns. Structure is retrieval fuel.&lt;/p&gt;
&lt;h2 id="chunking-1000-characters-with-a-100-character-safety-overlap"&gt;Chunking: 1,000 characters with a 100-character safety overlap&lt;/h2&gt;
&lt;p&gt;Long text is split into pieces of about 1,000 characters, with neighbouring pieces overlapping by 100. The overlap is insurance: a sentence sliced at a chunk boundary still appears whole in one of the two neighbours, so no idea falls into the gap between chunks.&lt;/p&gt;
&lt;h2 id="embedding-and-saving"&gt;Embedding and saving&lt;/h2&gt;
&lt;p&gt;Chunks are embedded by &lt;code&gt;embeddinggemma&lt;/code&gt; (via Ollama) in batches of five — each chunk becomes one vector. The vectors are normalised and appended to a FAISS index; alongside it, a JSON file records each chunk&amp;rsquo;s source filename, page number, category, fingerprint, and the text itself. The index holds the numbers; the JSON holds the meaning.&lt;/p&gt;
&lt;p&gt;One choice worth highlighting for beginners: this is an &lt;strong&gt;exact&lt;/strong&gt; index, not an approximate one. Many vector databases use ANN (Approximate Nearest Neighbour) shortcuts that trade a little accuracy for speed at massive scale. At personal-library scale you don&amp;rsquo;t need the trade — CogniVault checks every vector on every search and is still fast.&lt;/p&gt;
&lt;h2 id="the-whole-journey-end-to-end"&gt;The whole journey, end to end&lt;/h2&gt;
&lt;div class="mermaid"&gt;%%{init: {'sequence': {'actorFontSize': 28, 'messageFontSize': 24, 'loopTextFontSize': 22, 'noteFontSize': 22}}}%%
sequenceDiagram
actor U as You
participant F as Frontend
participant B as FastAPI
participant W as DBOS Workflow
participant O as Ollama (embeddinggemma)
participant V as FAISS + metadata
U-&gt;&gt;F: Drag and drop a file, pick a category
F-&gt;&gt;B: POST /upload
B-&gt;&gt;B: Validate type and size, save to docs/
F-&gt;&gt;B: POST /ingest
B-&gt;&gt;W: Start durable workflow
B--&gt;&gt;F: workflow_id
loop Poll status
F-&gt;&gt;B: GET /ingest/status/{workflow_id}
B--&gt;&gt;F: Step list (drives the progress timeline)
end
W-&gt;&gt;W: SHA-256 change detection
W-&gt;&gt;W: Extract text (per format, OCR if scanned)
W-&gt;&gt;W: Chunk (1000 chars / 100 overlap)
W-&gt;&gt;O: Embed in batches of 5
O--&gt;&gt;W: Vectors
W-&gt;&gt;V: Append vectors + metadata
B--&gt;&gt;F: SUCCESS — index reloaded
F--&gt;&gt;U: "Knowledge Sync Complete"
&lt;/div&gt;
&lt;h2 id="the-takeaway"&gt;The takeaway&lt;/h2&gt;
&lt;p&gt;Ingestion is where most RAG quality is actually won or lost — long before any clever prompting. Page numbers preserved, headers carried into every spreadsheet chunk, scans rescued by OCR, and a ledger that makes the whole thing crash-proof: none of it is glamorous, all of it shows up later as answers that cite the right page.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Language Model&lt;/td&gt;
&lt;td&gt;A neural network trained on huge amounts of text that can read and generate language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DBOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database-Oriented Operating System&lt;/td&gt;
&lt;td&gt;The library that checkpoints workflow steps in PostgreSQL so crashed jobs resume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SHA-256&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure Hash Algorithm, 256-bit&lt;/td&gt;
&lt;td&gt;A content fingerprint — change one byte of a file and the hash changes completely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optical Character Recognition&lt;/td&gt;
&lt;td&gt;Reading text out of images — the rescue path for scanned PDF pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-Side Request Forgery&lt;/td&gt;
&lt;td&gt;An attack where a server is tricked into fetching internal URLs; the URL importer blocks it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;The vector index the embeddings are appended to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ANN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Approximate Nearest Neighbour&lt;/td&gt;
&lt;td&gt;The accuracy-for-speed shortcut CogniVault deliberately does &lt;em&gt;not&lt;/em&gt; take&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;dpi&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dots Per Inch&lt;/td&gt;
&lt;td&gt;Image resolution — scanned pages are rendered at ~144 dpi before OCR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JavaScript Object Notation&lt;/td&gt;
&lt;td&gt;The format of the chunk-metadata file beside the FAISS index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDF / CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Portable Document Format / Comma-Separated Values&lt;/td&gt;
&lt;td&gt;Two of the eight-plus supported file formats&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Programming Interface&lt;/td&gt;
&lt;td&gt;The endpoints (&lt;code&gt;/upload&lt;/code&gt;, &lt;code&gt;/ingest&lt;/code&gt;, &lt;code&gt;/ingest/status/…&lt;/code&gt;) driving the flow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next up:&lt;/strong&gt;
— hybrid retrieval, the six-tool agent, and the two-phase stream that shows the model think before it answers.&lt;/p&gt;</description></item><item><title>CogniVault Backend Explained, Part 3 · How a Question Becomes a Cited Answer</title><link>https://aretascodes.dev/blog/backend-explained-rag-agent/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/backend-explained-rag-agent/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;You type a question. A few seconds later you get an answer with footnotes — the exact documents and pages it came from. This part walks through everything that happens in between.&lt;/p&gt;
&lt;p&gt;In
we built the knowledge base: every document chunked, embedded, and indexed. Now we get to &lt;em&gt;use&lt;/em&gt; it — and this is where CogniVault stops being a pipeline and starts being interesting.&lt;/p&gt;
&lt;h2 id="two-librarians-because-one-keeps-failing-you"&gt;Two librarians, because one keeps failing you&lt;/h2&gt;
&lt;p&gt;Imagine a library with one librarian who organises everything by &lt;em&gt;vibe&lt;/em&gt;. Ask her about &amp;ldquo;server downtime procedures&amp;rdquo; and she&amp;rsquo;s brilliant — she understands what you mean and finds documents that discuss the concept, whatever words they use. But ask her for &amp;ldquo;Error Code 404B&amp;rdquo; and she shrugs, handing you general networking guides. She doesn&amp;rsquo;t do exact strings.&lt;/p&gt;
&lt;p&gt;Down the hall is a second librarian with a card catalogue. He finds the exact string &amp;ldquo;404B&amp;rdquo; instantly — but ask him a conceptual question phrased differently from the source text, and he finds nothing at all.&lt;/p&gt;
&lt;p&gt;These are the two halves of search:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Semantic search (FAISS)&lt;/strong&gt; — your question is embedded into a vector, and the index finds chunks whose vectors point the same way (technically: cosine similarity — how closely two arrows align). Great for meaning, blind to exact identifiers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keyword search (BM25)&lt;/strong&gt; — a scoring formula that rewards chunks containing your &lt;em&gt;exact&lt;/em&gt; words, weighted by how distinctive those words are. Great for identifiers, blind to synonyms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CogniVault asks &lt;strong&gt;both librarians every time&lt;/strong&gt;, then merges their answers with &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt; — a formula that combines ranked lists using only the &lt;em&gt;positions&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-fallback" data-lang="fallback"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;score(chunk) = sum over both lists of 1 / (60 + rank)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A chunk ranked highly by either librarian scores well; a chunk both of them liked floats to the top. The elegance is what&amp;rsquo;s &lt;em&gt;missing&lt;/em&gt;: you never have to reconcile FAISS&amp;rsquo;s similarity scores with BM25&amp;rsquo;s completely different scale, because ranks are the only input. The constant 60 comes straight from the original 2009 research paper, and yes, it&amp;rsquo;s cited in the code.&lt;/p&gt;
&lt;p&gt;A few implementation details worth knowing: both searches deliberately over-fetch (at least 20 candidates each) so the fusion has material to work with; very weak semantic matches are dropped, but a keyword-perfect chunk can still be rescued through fusion; and the final answer uses the top 7 chunks. I benchmarked this whole setup against pure vector search in
if you want the war stories.&lt;/p&gt;
&lt;h2 id="the-agent-a-model-that-decides-for-itself"&gt;The agent: a model that decides for itself&lt;/h2&gt;
&lt;p&gt;Here&amp;rsquo;s the second idea that trips up beginners: CogniVault&amp;rsquo;s chat is not &amp;ldquo;paste chunks into a prompt, get an answer.&amp;rdquo; It&amp;rsquo;s an &lt;strong&gt;agent&lt;/strong&gt; — a model running in a loop where it can &lt;em&gt;choose&lt;/em&gt; to call tools, read their results, and only then answer.&lt;/p&gt;
&lt;p&gt;Built with the Strands Agents SDK, the agent gets six tools:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_knowledge_base&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The core RAG tool — runs the hybrid search above, returns chunks with source and page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_documents&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;See what&amp;rsquo;s in the vault&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;analyze_document&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Structured analysis of one document: topics, entities, facts, summary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;compare_documents&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Answer a question by comparing two documents side by side&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;calculator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Safe maths — the expression is parsed into a syntax tree and only whitelisted operators run. No &lt;code&gt;eval()&lt;/code&gt;, ever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;current_time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The date and time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There is no hard-coded routing. The &lt;em&gt;model&lt;/em&gt; reads your question and decides which tools to call, guided by its system prompt. Ask &amp;ldquo;compare the two contracts on termination clauses&amp;rdquo; and it reaches for &lt;code&gt;compare_documents&lt;/code&gt;; ask &amp;ldquo;what&amp;rsquo;s 15% of 2,340&amp;rdquo; and it uses the calculator instead of hallucinating arithmetic.&lt;/p&gt;
&lt;p&gt;Two safety details I want beginners to notice, because they&amp;rsquo;re the difference between a toy and a product: a &lt;strong&gt;fresh agent is constructed for every request&lt;/strong&gt; (no shared state bleeding between concurrent chats), and the document-analysis tools call the model &lt;em&gt;directly&lt;/em&gt; rather than through the agent — otherwise an agent calling a tool that calls the agent could recurse forever.&lt;/p&gt;
&lt;h2 id="watching-the-model-think"&gt;Watching the model think&lt;/h2&gt;
&lt;p&gt;When you send a message, the response streams back as &lt;strong&gt;NDJSON&lt;/strong&gt; (Newline-Delimited JSON — each line of the stream is its own small JSON object). And it arrives in two phases:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phase 1 — thinking.&lt;/strong&gt; Gemma&amp;rsquo;s reasoning chain streams first, rendered in the collapsible panel above the answer. It&amp;rsquo;s deliberately best-effort: if it fails for any reason, the answer still comes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phase 2 — the agent answer.&lt;/strong&gt; Tools run, citations appear in the Sources panel the moment the search completes — &lt;em&gt;before&lt;/em&gt; the answer finishes writing — and the answer text streams in.&lt;/p&gt;
&lt;div class="mermaid"&gt;flowchart TB
Q["Your question&lt;br/&gt;(plus optional images, files, scope)"] --&gt; P1
subgraph STREAM["POST /rag — one NDJSON stream"]
P1["Phase 1: Thinking&lt;br/&gt;reasoning chunks stream first"]
P1 --&gt; P2["Phase 2: Agent&lt;br/&gt;fresh per request, history restored"]
P2 --&gt;|"decides to call"| T["search_knowledge_base"]
T --&gt; D["FAISS&lt;br/&gt;semantic"]
T --&gt; S["BM25&lt;br/&gt;keywords"]
D --&gt; RRF["RRF fusion — top 7 chunks"]
S --&gt; RRF
RRF --&gt;|"chunks + citations"| P2
P2 --&gt; OUT["citations, then answer text,&lt;br/&gt;then a memory-usage report"]
end
&lt;/div&gt;
&lt;p&gt;Each line in the stream is typed: &lt;code&gt;thinking&lt;/code&gt;, &lt;code&gt;metadata&lt;/code&gt; (a citation), &lt;code&gt;text&lt;/code&gt; (answer), &lt;code&gt;memory&lt;/code&gt; (how full the conversation budget is), or &lt;code&gt;error&lt;/code&gt;. The frontend just reads lines and routes them to the right panel. I dissected this design — and why thinking comes &lt;em&gt;before&lt;/em&gt; the tool calls — in
.&lt;/p&gt;
&lt;h2 id="a-memory-budget-not-a-bottomless-pit"&gt;A memory budget, not a bottomless pit&lt;/h2&gt;
&lt;p&gt;Gemma&amp;rsquo;s context window (the amount of text the model can consider at once) is 128K tokens, but CogniVault doesn&amp;rsquo;t let conversation history sprawl across all of it. Each chat session gets a budget of 48,000 characters — roughly 12,000 tokens. Exceed it, and the &lt;em&gt;oldest&lt;/em&gt; question-answer pair quietly drops out first, keeping the bulk of the window free for what matters: your current question and the retrieved chunks.&lt;/p&gt;
&lt;p&gt;Two resilience touches worth stealing for your own projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Restart survival.&lt;/strong&gt; In-memory history dies with the process. So the first message in a session after a backend restart rebuilds its history from the chat log the frontend persists. Multi-turn memory survives reboots.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Edit and regenerate.&lt;/strong&gt; Editing an earlier message rewinds the stored history to that point before re-asking — the model genuinely forgets the timeline that no longer exists.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scope-pinning-the-ai-to-specific-documents"&gt;Scope: pinning the AI to specific documents&lt;/h2&gt;
&lt;p&gt;One last feature, and a lesson about small local models. You can pin a chat to specific files or a category. The filter travels with the request &lt;em&gt;and&lt;/em&gt; a mandatory-search instruction is injected into both the system prompt and the user message itself.&lt;/p&gt;
&lt;p&gt;Why both? Because small models sometimes skip instructions that live only in the system prompt — but they can&amp;rsquo;t ignore what&amp;rsquo;s inside the question. Belt and braces. When you work with 4-billion-parameter models instead of frontier ones, you learn to make instructions impossible to miss rather than hoping they&amp;rsquo;re followed.&lt;/p&gt;
&lt;h2 id="the-takeaway"&gt;The takeaway&lt;/h2&gt;
&lt;p&gt;A cited answer is four systems cooperating: two retrievers covering each other&amp;rsquo;s blind spots, a fusion formula that needs nothing but ranks, an agent that picks its own tools, and a stream that shows its work. None of the four is exotic on its own — the product is the cooperation.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval-Augmented Generation&lt;/td&gt;
&lt;td&gt;Retrieve relevant passages from your own documents first; let the model answer from them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;The semantic (meaning-based) half of hybrid search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best Match 25&lt;/td&gt;
&lt;td&gt;The keyword half — a classic ranking formula from the Okapi information-retrieval system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reciprocal Rank Fusion&lt;/td&gt;
&lt;td&gt;Merges the two ranked lists using only each chunk&amp;rsquo;s rank: &lt;code&gt;score = Σ 1/(60 + rank)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NDJSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Newline-Delimited JSON&lt;/td&gt;
&lt;td&gt;A stream where each line is its own complete JSON object — the chat response format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JavaScript Object Notation&lt;/td&gt;
&lt;td&gt;The universal text format for structured data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AST&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Abstract Syntax Tree&lt;/td&gt;
&lt;td&gt;The parsed form of an expression — how the calculator does maths without &lt;code&gt;eval()&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Language Model&lt;/td&gt;
&lt;td&gt;A neural network trained on huge amounts of text that can read and generate language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SDK&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Software Development Kit&lt;/td&gt;
&lt;td&gt;A library of building blocks — here, Strands, which provides the agent loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K&lt;/strong&gt; (in 128K)&lt;/td&gt;
&lt;td&gt;Kilo (thousand)&lt;/td&gt;
&lt;td&gt;128K tokens ≈ 128,000 tokens — Gemma&amp;rsquo;s context window&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next up:&lt;/strong&gt;
— the same machinery pointed at generating quizzes, workshops, flashcards, and mindmaps, plus a table of every byte the app stores and exactly where it lives.&lt;/p&gt;</description></item><item><title>CogniVault Backend Explained, Part 4 · Study Tools, Progress, and the Privacy Receipts</title><link>https://aretascodes.dev/blog/backend-explained-study-hub-privacy/</link><pubDate>Fri, 12 Jun 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/backend-explained-study-hub-privacy/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In
we followed a question through hybrid retrieval and the agent loop to a cited answer. In this final part, the same machinery gets pointed at a different goal: &lt;em&gt;teaching you&lt;/em&gt; — and then we close the series by auditing the project&amp;rsquo;s central promise: nothing leaves your machine.&lt;/p&gt;
&lt;h2 id="one-recipe-four-study-tools"&gt;One recipe, four study tools&lt;/h2&gt;
&lt;p&gt;CogniVault generates quizzes, multi-lesson workshops, flashcard decks, and mindmaps from your documents. Four different outputs — but under the hood, one shared five-step recipe:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Retrieve.&lt;/strong&gt; The same hybrid search from Part 3, but instead of your question, the probe is a broad query like &lt;em&gt;&amp;ldquo;key concepts, definitions, important facts, main ideas&amp;rdquo;&lt;/em&gt;, scoped to the documents you selected. Up to 15 representative chunks come back.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prompt from a template.&lt;/strong&gt; The instructions sent to Gemma are not buried in Python — they&amp;rsquo;re editable Markdown files in &lt;code&gt;backend/prompts/&lt;/code&gt; (&lt;code&gt;quiz.md&lt;/code&gt;, &lt;code&gt;flashcards.md&lt;/code&gt;, and so on). Drop a modified copy into &lt;code&gt;backend/prompts/custom/&lt;/code&gt; and it overrides the shipped version on the very next request. No restart, no code change. Prompt engineering as configuration.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Constrain the output.&lt;/strong&gt; Asking a small local model to &amp;ldquo;please return JSON&amp;rdquo; works most of the time — and &lt;em&gt;most of the time&lt;/em&gt; is a production bug. CogniVault uses Ollama&amp;rsquo;s grammar-constrained generation (&lt;code&gt;format=&amp;quot;json&amp;quot;&lt;/code&gt;), which makes invalid JSON impossible rather than unlikely, plus low temperature for consistency. The full saga of getting reliable structure out of a 4-billion-parameter model is in
.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Validate defensively.&lt;/strong&gt; Every generated item is checked field by field, and malformed items are &lt;em&gt;dropped&lt;/em&gt; rather than failing the whole batch. Small models occasionally fumble one question out of ten; a product shouldn&amp;rsquo;t collapse because of it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Persist.&lt;/strong&gt; Everything lands in SQLite, so quizzes are resumable, workshop progress survives restarts, and flashcard statuses are remembered per deck.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here&amp;rsquo;s the recipe in motion for a quiz:&lt;/p&gt;
&lt;div class="mermaid"&gt;%%{init: {'sequence': {'actorFontSize': 28, 'messageFontSize': 24, 'loopTextFontSize': 22, 'noteFontSize': 22}}}%%
sequenceDiagram
actor U as You
participant F as Study Hub UI
participant B as FastAPI
participant V as VectorDB
participant O as Ollama (gemma4:e4b)
participant S as SQLite
U-&gt;&gt;F: Pick scope, difficulty, question count
F-&gt;&gt;B: POST /api/study/quiz/generate
B-&gt;&gt;V: Hybrid search, scoped to your documents
V--&gt;&gt;B: Up to 15 representative chunks
B-&gt;&gt;B: Render the quiz.md prompt template
B-&gt;&gt;O: chat(format="json", low temperature)
O--&gt;&gt;B: Grammar-constrained JSON
B-&gt;&gt;B: Validate each question, drop bad ones
B-&gt;&gt;S: Save quiz (resumable later)
B--&gt;&gt;F: Typed response
F--&gt;&gt;U: Play, submit, score — and maybe a new badge
&lt;/div&gt;
&lt;p&gt;The four tools differ only in their template and their shape: quizzes produce multiple-choice and true/false questions with explanations; workshops produce an outline first and then write each lesson &lt;em&gt;on demand&lt;/em&gt; when you open it; flashcards produce front/back pairs; mindmaps produce a topic tree that the frontend renders as an interactive diagram. (That renderer is its own adventure:
.)&lt;/p&gt;
&lt;h2 id="sessions-that-track-themselves"&gt;Sessions that track themselves&lt;/h2&gt;
&lt;p&gt;Most study apps make you press a start button, and most people forget. CogniVault takes a different stance: &lt;strong&gt;study sessions are inferred, not declared&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Every chat message either extends the current session or — after a 15-minute idle gap — quietly starts a new one. Walk away for coffee, come back, keep working: same session. Come back tomorrow: new session. No buttons, no forgetting.&lt;/p&gt;
&lt;p&gt;Each message also records a tiny event (timestamp, whether you used a scope filter or attachments) into &lt;code&gt;progress.db&lt;/code&gt; — a SQLite database, which is a complete relational database living in a single file. Eleven tables hold everything: sessions, message events, earned badges, quiz attempts and saved quizzes, workshops and lessons, decks and cards, and mindmaps.&lt;/p&gt;
&lt;p&gt;One engineering note worth copying: the tracking call inside the chat endpoint is wrapped so that it can &lt;em&gt;never&lt;/em&gt; block or break the chat. Analytics must be a passenger, never a driver.&lt;/p&gt;
&lt;h2 id="25-badges-defined-as-data"&gt;25 badges, defined as data&lt;/h2&gt;
&lt;p&gt;The achievements aren&amp;rsquo;t scattered through the code as &lt;code&gt;if&lt;/code&gt; statements. They live in one JSON file — 25 entries, each with a code, a name, an icon, the metric it watches, and a target. After each relevant action, an evaluator checks every definition against the database and persists anything newly earned. Some badges form ladders, each pointing to its next level.&lt;/p&gt;
&lt;p&gt;Declarative beats imperative here for a simple reason: adding badge number 26 means adding a JSON entry, not writing new logic. The design behind the streaks, the idle-gap rule, and the 90-day heatmap got its own post:
.&lt;/p&gt;
&lt;h2 id="voice-input-without-a-cloud-microphone"&gt;Voice input, without a cloud microphone&lt;/h2&gt;
&lt;p&gt;The microphone button is powered by &lt;strong&gt;faster-whisper&lt;/strong&gt; — OpenAI&amp;rsquo;s Whisper speech-recognition model re-implemented on a faster inference engine — running on your CPU with int8 quantisation (8-bit numbers instead of 32-bit: smaller, faster, accurate enough). No audio ever leaves the machine.&lt;/p&gt;
&lt;p&gt;The model is lazy-loaded on the first transcription so app startup stays instant, and if faster-whisper isn&amp;rsquo;t installed at all, the frontend simply hides the mic button. Features should degrade, not detonate.&lt;/p&gt;
&lt;h2 id="the-privacy-receipts"&gt;The privacy receipts&lt;/h2&gt;
&lt;p&gt;The series began with a promise: &lt;em&gt;nothing leaves your machine.&lt;/em&gt; Promises are cheap — here&amp;rsquo;s the audit. Every byte CogniVault stores, and where it lives:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Your uploaded files&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docs/&lt;/code&gt; folder&lt;/td&gt;
&lt;td&gt;The original files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search vectors&lt;/td&gt;
&lt;td&gt;&lt;code&gt;vector_store.faiss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;FAISS binary index&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunk text and metadata&lt;/td&gt;
&lt;td&gt;&lt;code&gt;vector_store.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File-to-category map&lt;/td&gt;
&lt;td&gt;&lt;code&gt;categories.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chat sessions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;chat_history.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sessions, badges, quizzes, workshops, decks, mindmaps&lt;/td&gt;
&lt;td&gt;&lt;code&gt;progress.db&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ingestion checkpoints&lt;/td&gt;
&lt;td&gt;PostgreSQL (local Docker volume)&lt;/td&gt;
&lt;td&gt;DBOS system tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The AI models themselves&lt;/td&gt;
&lt;td&gt;Ollama&amp;rsquo;s local model store&lt;/td&gt;
&lt;td&gt;Model weights&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Nothing in that table is on someone else&amp;rsquo;s computer. Inference goes to &lt;code&gt;localhost&lt;/code&gt;. Embeddings go to &lt;code&gt;localhost&lt;/code&gt;. The only outbound request the backend ever makes is the URL-import feature — at your explicit request, and guarded against fetching private addresses. The app even surfaces these stats live in its Privacy Vault Audit panel.&lt;/p&gt;
&lt;p&gt;And because trust needs more than a table: the whole backend is covered by a pytest suite you can run yourself — the approach is documented in
.&lt;/p&gt;
&lt;h2 id="series-wrap-up"&gt;Series wrap-up&lt;/h2&gt;
&lt;p&gt;Four parts, one architecture:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;
&lt;/strong&gt; — three processes, four layers, and a decoder ring for the jargon&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;
&lt;/strong&gt; — a durable, format-aware pipeline that turns any document into searchable vectors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;
&lt;/strong&gt; — two retrievers covering each other&amp;rsquo;s blind spots, fused by rank, driven by an agent&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Part 4&lt;/strong&gt; — the same machinery generating study materials, tracking progress without buttons, and a storage map with no cloud rows in it&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If there&amp;rsquo;s one theme, it&amp;rsquo;s this: &lt;strong&gt;boring, verifiable choices in service of privacy&lt;/strong&gt;. Exact search instead of approximate. SQLite files instead of hosted databases. Grammar-constrained JSON instead of hopeful parsing. Soft deletes instead of clever index surgery. Every piece is something you can open, read, and check — which is exactly the point.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JavaScript Object Notation&lt;/td&gt;
&lt;td&gt;The structured format the generators force the model to produce&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQLite / SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;(SQL = Structured Query Language)&lt;/td&gt;
&lt;td&gt;A complete relational database living in one file, &lt;code&gt;progress.db&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCQ&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple-Choice Question&lt;/td&gt;
&lt;td&gt;One of the two quiz question types (the other is true/false)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Central Processing Unit&lt;/td&gt;
&lt;td&gt;Where Whisper runs — no graphics card required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;int8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8-bit integer (quantisation)&lt;/td&gt;
&lt;td&gt;Storing model weights as small integers: smaller, faster, accurate enough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Artificial Intelligence&lt;/td&gt;
&lt;td&gt;Software performing tasks that normally need human intelligence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Programming Interface&lt;/td&gt;
&lt;td&gt;The endpoints the Study Hub and dashboard call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;The vector index in the privacy-receipts table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DBOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database-Oriented Operating System&lt;/td&gt;
&lt;td&gt;The durable-workflow library whose checkpoints live in PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-Side Request Forgery&lt;/td&gt;
&lt;td&gt;The attack class the URL importer guards against&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PNG / PDF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Portable Network Graphics / Portable Document Format&lt;/td&gt;
&lt;td&gt;Two of the mindmap export formats (plus Markdown)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SVG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scalable Vector Graphics&lt;/td&gt;
&lt;td&gt;The browser drawing format behind the interactive mindmap rendering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt; clone
and read along — the README maps the full architecture, and every claim in this series can be checked directly against the code in &lt;code&gt;backend/&lt;/code&gt;. And if you want the deep-dive versions of these topics, the
picks up where this tour ends.&lt;/p&gt;</description></item></channel></rss>