<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Docker |</title><link>https://aretascodes.dev/tags/docker/</link><atom:link href="https://aretascodes.dev/tags/docker/index.xml" rel="self" type="application/rss+xml"/><description>Docker</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 03 Jun 2026 00:00:00 +0000</lastBuildDate><image><url>https://aretascodes.dev/media/icon_hu_2ab4f4763b27c75b.png</url><title>Docker</title><link>https://aretascodes.dev/tags/docker/</link></image><item><title>Part 3 · CogniVault Architecture: Why We Keep Ollama Out of Docker</title><link>https://aretascodes.dev/blog/cognivault-deployment-architecture/</link><pubDate>Wed, 03 Jun 2026 00:00:00 +0000</pubDate><guid>https://aretascodes.dev/blog/cognivault-deployment-architecture/</guid><description>
&lt;blockquote class="border-l-4 border-neutral-300 dark:border-neutral-600 pl-4 italic text-neutral-600 dark:text-neutral-400 my-6"&gt;
&lt;p&gt;All abbreviations are fully explained in the appendix at the bottom of the page.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The golden rule of modern software deployment is containerization. Put everything in Docker to isolate the dependencies, and it will run the exact same way on every machine.&lt;/p&gt;
&lt;p&gt;When initially designing CogniVault, the impulse was to put the FastAPI server, the PostgreSQL database, and the Ollama LLM engine all inside a single, secure Docker network.&lt;/p&gt;
&lt;p&gt;But we didn&amp;rsquo;t. We left Ollama running natively on the host machine. Let&amp;rsquo;s break down why.&lt;/p&gt;
&lt;h2 id="the-gpu-passthrough-problem"&gt;The GPU Passthrough Problem&lt;/h2&gt;
&lt;p&gt;Think of your GPU like the kitchen in a restaurant. The chefs (your AI models) need to &lt;em&gt;be in the kitchen&lt;/em&gt; — standing at the stove, hands on the equipment. Now imagine telling the chefs they must cook from a sealed meeting room down the hall, passing instructions through a serving hatch. Technically food might still come out. It will not come out fast.&lt;/p&gt;
&lt;p&gt;That sealed room is a container. Large Language Models like Gemma 4 need direct, unhindered access to your hardware&amp;rsquo;s GPU (like Apple Silicon&amp;rsquo;s Unified Memory or a dedicated Nvidia card) to generate text fast enough for a real-time chat interface. And the picture varies by platform:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;On macOS&lt;/strong&gt;, Docker runs containers inside a lightweight virtual machine — and there is currently &lt;strong&gt;no GPU (Metal) passthrough at all&lt;/strong&gt;. An Ollama container on a Mac runs CPU-only. For a chat app, that&amp;rsquo;s disqualifying on its own.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On Linux&lt;/strong&gt;, Nvidia GPU passthrough exists and works, but it requires extra toolkit configuration that breaks the &amp;ldquo;it just works&amp;rdquo; philosophy of local development.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Running Ollama natively sidesteps the whole category of problems.&lt;/p&gt;
&lt;h2 id="the-bridge-solution"&gt;The Bridge Solution&lt;/h2&gt;
&lt;p&gt;CogniVault uses a split deployment model, separating the application logic from the heavy AI processing.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The Secure Rooms (Docker):&lt;/strong&gt; PostgreSQL — which holds the DBOS workflow ledger from
— lives in a &lt;strong&gt;Docker Bridge Network&lt;/strong&gt; (a private virtual network). Isolated, clean, reproducible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Main Building (Native Host):&lt;/strong&gt; Ollama runs directly on your Mac, Windows, or Linux host OS, giving it direct metal access to your GPU.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;CogniVault actually ships &lt;strong&gt;two run modes&lt;/strong&gt;, and it&amp;rsquo;s worth being precise about them:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The default (&lt;code&gt;scripts/start.sh&lt;/code&gt;):&lt;/strong&gt; only PostgreSQL runs in Docker. The FastAPI backend runs natively too (&lt;code&gt;python -m backend.main&lt;/code&gt;), right next to Ollama. Simplest possible loop for local development.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The fully containerized mode (&lt;code&gt;docker-compose.yaml&lt;/code&gt;):&lt;/strong&gt; the FastAPI app joins Postgres inside the compose network. In this mode the app container reaches the native Ollama engine through a special Docker routing address: &lt;code&gt;host.docker.internal:11434&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Either way, the rule stays the same: &lt;strong&gt;the model never goes in the box.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="mermaid"&gt;graph TD
Client[📱 Browser / User] --&gt;|HTTP: 8000| App
subgraph Host Machine [Host OS: Native GPU Access]
Ollama[🧠 Ollama Engine]
Models[(gemma4:e4b)]
Ollama &lt;--&gt; Models
subgraph Docker Compose Network
App[🖥️ FastAPI App Container]
Postgres[(🐘 PostgreSQL)]
App &lt;--&gt;|Internal Port 5432| Postgres
end
App &lt;--&gt;|host.docker.internal:11434| Ollama
end
&lt;/div&gt;
&lt;h3 id="what-about-the-vector-database"&gt;What about the Vector Database?&lt;/h3&gt;
&lt;p&gt;You might notice FAISS isn&amp;rsquo;t a container here. Unlike massive SQL databases, FAISS is extremely lightweight. In CogniVault, FAISS runs directly inside the FastAPI Python process&amp;rsquo;s memory and saves its data to a local folder. It doesn&amp;rsquo;t need its own container.&lt;/p&gt;
&lt;p&gt;By keeping the heavy LLM lifting on the metal and the bookkeeping in containers, we get the balance that notoriously trips up local AI development: zero dependency conflicts combined with maximum AI performance.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="see-it-in-action"&gt;See It In Action&lt;/h3&gt;
&lt;p&gt;That wraps up the CogniVault architecture series! If you want to run this 100% local, privacy-first Study Companion on your own hardware:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Grab the code:&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Watch the walkthrough:&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="appendix-abbreviations-in-this-post"&gt;Appendix: Abbreviations in this post&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Abbreviation&lt;/th&gt;
&lt;th&gt;Full form&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Graphics Processing Unit&lt;/td&gt;
&lt;td&gt;The hardware that makes local model inference fast; containers struggle to reach it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large Language Model&lt;/td&gt;
&lt;td&gt;A neural network trained on huge amounts of text that can read and generate language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Artificial Intelligence&lt;/td&gt;
&lt;td&gt;Software performing tasks that normally need human intelligence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application Programming Interface&lt;/td&gt;
&lt;td&gt;The set of URLs the frontend calls to talk to the backend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HTTP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HyperText Transfer Protocol&lt;/td&gt;
&lt;td&gt;The protocol browsers and APIs use to exchange requests and responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operating System&lt;/td&gt;
&lt;td&gt;macOS, Windows, or Linux — where Ollama runs natively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DBOS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database-Oriented Operating System&lt;/td&gt;
&lt;td&gt;The durable-workflow library whose ledger lives in the Postgres container (see Part 2)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured Query Language&lt;/td&gt;
&lt;td&gt;The language of relational databases like PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FAISS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Facebook AI Similarity Search&lt;/td&gt;
&lt;td&gt;The in-process vector index — deliberately &lt;em&gt;not&lt;/em&gt; a separate container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Virtual Machine&lt;/td&gt;
&lt;td&gt;The hidden layer Docker uses on macOS — and the reason Mac containers can&amp;rsquo;t reach the GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;</description></item></channel></rss>