<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Engineering |</title><link>https://aretascodes.dev/categories/engineering/</link><atom:link href="https://aretascodes.dev/categories/engineering/index.xml" rel="self" type="application/rss+xml"/><description>Engineering</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 25 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://aretascodes.dev/media/icon_hu_2ab4f4763b27c75b.png</url><title>Engineering</title><link>https://aretascodes.dev/categories/engineering/</link></image><item><title>Part 8 · Testing a Local-AI App: 351 Tests, Zero Infrastructure</title><link>https://aretascodes.dev/blog/testing-local-ai-app/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/testing-local-ai-app/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;Part of a series on building
. Previously:
.
All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;CogniVault has &lt;strong&gt;351 tests across 22 files&lt;/strong&gt; (at the time of writing — the suite grows with the app). None of them need Ollama. None of them need Postgres. None of them need a real PDF, a microphone, or an internet connection. The whole suite runs in &lt;strong&gt;about three seconds&lt;/strong&gt; on my laptop.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s not because there isn&amp;rsquo;t much to test — the surface is wide. It&amp;rsquo;s because the test suite is built around one principle: &lt;strong&gt;mock at the edge, real everywhere else.&lt;/strong&gt; This post is about what &amp;ldquo;the edge&amp;rdquo; means in a local-AI app, and how to draw the line so the suite stays useful instead of decorative.&lt;/p&gt;
&lt;h2 id="the-22-test-files"&gt;The 22 test files&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;What it covers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_api.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The HTTP endpoints (upload, ingest, RAG, history, KB browsing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_tools.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Calculator, clock, KB search tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_thinking.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Two-phase stream, thinking tokens, session isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_chat_attachments.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Multi-file attach, PDF/DOCX extraction, size limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_chat_memory.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Session history budget, trimming, restart rebuild&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_doc_scope_filter.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-request ContextVar isolation, search filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_doc_tools.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;list_documents&lt;/code&gt;, &lt;code&gt;analyze_document&lt;/code&gt;, &lt;code&gt;compare_documents&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_edit_regenerate.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;History rewind, trim_history_to_turns validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_structure_chunking.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Markdown header splits, CSV row batches, doc types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_ocr_fallback.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OCR trigger threshold, graceful degradation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_new_formats.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PPTX, XLSX, HTML extractors, extension routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_docx_url.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DOCX ingestion and URL import (with the SSRF guard)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_reingest.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SHA-256 change detection, idempotency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_vector_db.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;BM25, FAISS, RRF fusion, hybrid search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_audio.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Whisper transcription endpoint&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_progress.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sessions, daily aggregation, achievement criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_prompts.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The prompt-template loader and custom overrides&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_vault_stats.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The Privacy Vault Audit numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;test_quiz.py&lt;/code&gt; / &lt;code&gt;test_workshop.py&lt;/code&gt; / &lt;code&gt;test_flashcards.py&lt;/code&gt; / &lt;code&gt;test_mindmaps.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-mode parsing, endpoints, achievements&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Everything that &lt;em&gt;can&lt;/em&gt; be tested in isolation is tested in isolation. Everything that needs to be tested through the FastAPI layer is, but the &lt;em&gt;only&lt;/em&gt; things mocked are the calls that cross the process boundary.&lt;/p&gt;
&lt;h2 id="what-gets-mocked-what-doesnt"&gt;What gets mocked, what doesn&amp;rsquo;t&lt;/h2&gt;
&lt;p&gt;The single most important question in a project like this: where do you stub?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-gdscript3" data-lang="gdscript3"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;React&lt;/span&gt; &lt;span class="n"&gt;frontend&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;▼&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;tested&lt;/span&gt; &lt;span class="n"&gt;directly&lt;/span&gt; &lt;span class="n"&gt;with&lt;/span&gt; &lt;span class="n"&gt;TestClient&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;▼&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;tested&lt;/span&gt; &lt;span class="n"&gt;directly&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rag_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;generators&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;│&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;├─►&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;BM25&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;real&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fast&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;├─►&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;SQLite&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;real&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;against&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;├─►&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;DBOS&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;patched&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;├─►&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;Ollama&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;patched&lt;/span&gt; &lt;span class="n"&gt;at&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;s import site&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="err"&gt;└─►&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="n"&gt;Whisper&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="err"&gt;←─&lt;/span&gt; &lt;span class="n"&gt;stubbed&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="mi"&gt;145&lt;/span&gt; &lt;span class="n"&gt;MB&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="nb"&gt;load&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The rule of thumb: &lt;strong&gt;anything that crosses a process or network boundary, mock. Anything in-process, run for real.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;FAISS and BM25 are real because they&amp;rsquo;re libraries we link into the test process. SQLite is real because it&amp;rsquo;s a file. DBOS is patched because launching it expects a Postgres connection, and that&amp;rsquo;s network. Ollama is patched because it&amp;rsquo;s HTTP. Whisper is stubbed because loading a 145 MB model in a unit test is silly.&lt;/p&gt;
&lt;p&gt;That principle keeps the test suite fast (no I/O the OS can&amp;rsquo;t handle in milliseconds) and meaningful (the real code paths through retrieval, chunking, parsing, scope filtering all execute).&lt;/p&gt;
&lt;h2 id="mocking-ollama"&gt;Mocking Ollama&lt;/h2&gt;
&lt;p&gt;Most CogniVault tests need &lt;em&gt;some&lt;/em&gt; model output, but they don&amp;rsquo;t care what model produced it. Each service imports the &lt;code&gt;ollama&lt;/code&gt; module directly, so the tests patch that reference &lt;strong&gt;at the service&amp;rsquo;s own import site&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Real pattern from test_quiz.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;unittest.mock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;backend.services&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;quiz_generator&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_quiz_parses_questions&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;fake&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;message&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;questions&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;VALID_MCQ&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;})}}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quiz_generator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;ollama&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_ollama&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;mock_ollama&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;return_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fake&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quiz_generator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate_quiz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;difficulty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;beginner&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_questions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question_types&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;mcq&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;A streaming variant feeds chunk sequences instead of a single response, used by the RAG and thinking tests. The key property: one &lt;code&gt;patch.object&lt;/code&gt; against the module the service actually uses. No deep mock hierarchies, no fragile string paths into third-party internals. Easy to read in a code review, easy to debug when a test fails.&lt;/p&gt;
&lt;h2 id="mocking-dbos"&gt;Mocking DBOS&lt;/h2&gt;
&lt;p&gt;DBOS expects &lt;code&gt;launch()&lt;/code&gt; to connect to Postgres. The shared &lt;code&gt;client&lt;/code&gt; fixture in &lt;code&gt;conftest.py&lt;/code&gt; simply patches the &lt;code&gt;dbos&lt;/code&gt; instance before the app is exercised:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Real pattern from conftest.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;&amp;#34;&amp;#34;A FastAPI TestClient with DBOS launch mocked out — no Postgres needed.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;patch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;backend.services.ingest.dbos&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;mock_dbos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;mock_dbos&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;launch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MagicMock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;backend.main&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;TestClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The decorated workflow steps still execute as ordinary Python functions — we lose the durability semantics, but the tests aren&amp;rsquo;t testing durability, they&amp;rsquo;re testing the &lt;em&gt;business logic inside the steps&lt;/em&gt; (hash detection, extraction, chunking). The durability layer has its own tests upstream, in DBOS&amp;rsquo;s own suite.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a second isolation layer that runs on &lt;strong&gt;every&lt;/strong&gt; test automatically: an autouse fixture points the docs folder, FAISS index, and metadata file at a per-test &lt;code&gt;tmp_path&lt;/code&gt; via environment variables, so no test can ever touch real data on disk.&lt;/p&gt;
&lt;h2 id="real-sqlite-with-one-override"&gt;Real SQLite, with one override&lt;/h2&gt;
&lt;p&gt;Progress tracking, achievements, quiz storage, deck CRUD — all SQLite. The progress tracker exposes a single test seam: a module-level path override.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Real pattern from test_quiz.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nd"&gt;@pytest.fixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;autouse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_isolate_progress_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monkeypatch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;monkeypatch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;setattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;progress_tracker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;_db_path_override&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;progress_test.db&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Every test gets a fresh database file; the schema auto-creates on first use. No connection pooling drama, no leaked state between tests, no in-memory &lt;code&gt;:memory:&lt;/code&gt; gymnastics. Just a temp file per test.&lt;/p&gt;
&lt;p&gt;This is the kind of test that catches bugs an SQL-level mock would never see — a missing index, a botched migration, a constraint that doesn&amp;rsquo;t fire. SQLite is fast enough on every machine I&amp;rsquo;ve ever owned that &amp;ldquo;use the real database&amp;rdquo; isn&amp;rsquo;t even a trade-off.&lt;/p&gt;
&lt;h2 id="the-testclient-pattern"&gt;The TestClient pattern&lt;/h2&gt;
&lt;p&gt;For HTTP tests, FastAPI&amp;rsquo;s &lt;code&gt;TestClient&lt;/code&gt; runs the app in-process. The upload, the validation, the chunking, the vector-store update, the response serialisation — every layer runs for real. Only the calls that would leave the process (the Ollama embedding call inside ingestion, the model call inside generation) are patched. That&amp;rsquo;s the right line: the test verifies the &lt;em&gt;integration&lt;/em&gt; of those layers, but doesn&amp;rsquo;t depend on an external service.&lt;/p&gt;
&lt;p&gt;The streaming endpoint tests use a slightly different style — they iterate the response body and parse each &lt;strong&gt;NDJSON&lt;/strong&gt; line (one JSON envelope per line, as described in
) — but the principle is identical.&lt;/p&gt;
&lt;h2 id="coverage-gaps-i-accept"&gt;Coverage gaps I accept&lt;/h2&gt;
&lt;p&gt;Three things the test suite &lt;em&gt;doesn&amp;rsquo;t&lt;/em&gt; cover:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The frontend.&lt;/strong&gt; No React testing in this suite — that&amp;rsquo;s a separate concern. Most failures show up in API tests anyway, because the frontend is a thin client over a typed API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real Ollama prompt quality.&lt;/strong&gt; Whether &lt;code&gt;gemma4:e4b&lt;/code&gt; actually produces &lt;em&gt;useful&lt;/em&gt; quiz questions is not a thing tests can answer. That&amp;rsquo;s evaluation, not testing. It belongs in a separate harness with a real model running.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Race conditions across DBOS workflow restarts.&lt;/strong&gt; The resume path is exercised at the logic level, but the full state space of &amp;ldquo;what happens if Postgres goes away at this exact instant&amp;rdquo; is too large to enumerate.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;These are conscious gaps. The test suite is for catching regressions in code I wrote; it&amp;rsquo;s not a replacement for evaluation, integration testing, or actual chaos engineering.&lt;/p&gt;
&lt;h2 id="what-the-suite-is-actually-for"&gt;What the suite is actually for&lt;/h2&gt;
&lt;p&gt;Two things, in order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Refactor confidence.&lt;/strong&gt; When I rip out the agent loop and put a new one in, do the tests still pass? If yes, the API contracts I care about haven&amp;rsquo;t drifted.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;PR review surface.&lt;/strong&gt; Every PR runs the suite in CI. A green run is a precondition for merge. The suite is loud enough that a real regression makes the noise.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Notice what it &lt;em&gt;isn&amp;rsquo;t&lt;/em&gt; for: proving the model works. It can&amp;rsquo;t. Tests can pin behaviour but they can&amp;rsquo;t pin quality. That&amp;rsquo;s a different muscle, and it belongs in a different harness.&lt;/p&gt;
&lt;h2 id="whats-worth-borrowing"&gt;What&amp;rsquo;s worth borrowing&lt;/h2&gt;
&lt;p&gt;If you&amp;rsquo;re building a local-AI app and your tests need Ollama running:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Patch the &lt;code&gt;ollama&lt;/code&gt; module at each service&amp;rsquo;s import site with &lt;code&gt;patch.object(service_module, &amp;quot;ollama&amp;quot;)&lt;/code&gt; — one seam per service, no shims required.&lt;/li&gt;
&lt;li&gt;Give your DB layer a path override and run against a &lt;code&gt;tmp_path&lt;/code&gt; SQLite file.&lt;/li&gt;
&lt;li&gt;Use an autouse fixture to redirect every on-disk artefact (docs folder, index files) to &lt;code&gt;tmp_path&lt;/code&gt;, so no test can touch real data even by accident.&lt;/li&gt;
&lt;li&gt;For each external service (model, audio, workflow engine), draw the seam at the process boundary. Test everything above it with real code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result is a suite where every test runs in any environment, finishes in milliseconds, and exercises the actual integration of every layer of code you wrote. 351 tests in about three seconds isn&amp;rsquo;t an optimisation, it&amp;rsquo;s a side-effect of mocking only at the edges.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuous Integration&lt;/td&gt;
&lt;td&gt;Automatically running the test suite on every push/PR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pull Request&lt;/td&gt;
&lt;td&gt;A proposed code change — merged only when the suite is green&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Programming Interface&lt;/td&gt;
&lt;td&gt;The HTTP surface the TestClient exercises in-process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HTTP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HyperText Transfer Protocol&lt;/td&gt;
&lt;td&gt;The protocol the (in-process) endpoint tests speak&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval-Augmented Generation&lt;/td&gt;
&lt;td&gt;The retrieval-then-answer pipeline under test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Knowledge Base&lt;/td&gt;
&lt;td&gt;The indexed document collection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;Real in tests — it&amp;rsquo;s an in-process library&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best Match 25&lt;/td&gt;
&lt;td&gt;The keyword index — also real in tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reciprocal Rank Fusion&lt;/td&gt;
&lt;td&gt;The rank-merging formula covered by &lt;code&gt;test_vector_db.py&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQLite / SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;(SQL = Structured Query Language)&lt;/td&gt;
&lt;td&gt;The real, file-based database every progress test runs against&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DBOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database-Oriented Operating System&lt;/td&gt;
&lt;td&gt;The durable-workflow library — patched so no Postgres is needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OCR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optical Character Recognition&lt;/td&gt;
&lt;td&gt;The scanned-PDF fallback with its own trigger-threshold tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSRF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-Side Request Forgery&lt;/td&gt;
&lt;td&gt;The URL-import attack class covered in &lt;code&gt;test_docx_url.py&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;NDJSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Newline-Delimited JSON&lt;/td&gt;
&lt;td&gt;The streaming format the endpoint tests parse line by line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SHA-256&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure Hash Algorithm, 256-bit&lt;/td&gt;
&lt;td&gt;The content fingerprint behind the re-ingest tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CRUD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Create, Read, Update, Delete&lt;/td&gt;
&lt;td&gt;The basic storage operations for decks, quizzes, and maps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDF / DOCX / PPTX / XLSX / HTML&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Portable Document Format / Word / PowerPoint / Excel / HyperText Markup Language&lt;/td&gt;
&lt;td&gt;The extractor formats with dedicated tests&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;That&amp;rsquo;s the series. Eight posts on the parts of
I&amp;rsquo;m most proud of — and a handful I&amp;rsquo;d build differently. If any of it was useful to you, the code is open source at
, and the
is on YouTube.&lt;/p&gt;
&lt;p&gt;Your data. Your hardware. Your AI. Your vault.&lt;/p&gt;</description></item></channel></rss>