Building AI agents sounds fun until you actually build one. Then a different set of problems shows up — ones nobody wrote a blog post about yet.
This is a summary of a conversation between developers actively running agent systems in production or near-production. The topics: self-improvement conflicts with git, what to use for a knowledge base, Andrej Karpathy’s Obsidian approach, and why adding more agents rarely helps.
The self-improvement problem
One of the selling points of agents like Hermes is that they can self-reflect and improve — updating their own rules based on experience. That sounds great until you think about what that means in a real development workflow.
You have git. You have CI pipelines. You have deploys. If an agent updates its rules on a running server and you then deploy from your repo, those changes get overwritten. The agent learns, you deploy, it forgets.
The practical workaround that emerged: teach the agent to push its own changes to a separate branch — a learning branch — via a PR. Human reviews it, merges it. Keeps the learning in version control and under review, rather than silently overwriting state on a server.
It is not elegant, but it works.
Knowledge base: markdown is fine until it isn’t
Most people start with markdown files and a set of instructions for how to index and cross-validate them. It works early on. The problem is that as the knowledge base grows, more and more tokens go toward retrieving the right information — and fewer toward actually solving the problem. Context windows fill up with retrieval noise.
This is when people start looking at vector databases.
Andrej Karpathy’s approach: interconnected markdown in Obsidian
Before reaching for a vector database, it is worth looking at what Karpathy demonstrated: a graph of interconnected markdown files in Obsidian, where the graph view makes the relationships between notes visible. He has been using this alongside Claude Code as a knowledge layer for his work.
The idea is simple — notes link to other notes, creating a navigable knowledge graph. An agent can traverse it the same way a human would. No embeddings, no vector index, no retrieval pipeline. Just files and links.
One developer in the discussion took this further and built what they called a “librarian” agent — a dedicated sub-agent responsible only for managing a Personal Knowledge Management (PKM) vault in Obsidian:
- It indexes notes automatically
- Suggests fixes for orphaned notes (notes with no incoming links)
- Knows how to do research across the knowledge base
- Links related notes and updates adjacent ones when something changes
- Rejects low-quality content that other agents try to push in
After two months: 138 pull requests, 1,766 notes, 791,086 words, 12,688 links. No RAG, no vector database.
The librarian needs a capable model — Sonnet 4.5 or equivalent. A small local model will not hold up for this role.
Vector databases: when you actually need one
If your knowledge base outgrows plain markdown traversal, here is what the community is actually using:
Qdrant — works well at scale, good performance on large datasets. Recommended for production use cases where you have significant data volume.
ChromaDB — good for in-memory work and smaller setups. Easier to get running locally.
pgvector — a PostgreSQL extension that adds vector search. If you already have Postgres, this is worth trying before adopting a dedicated vector database. Less operational overhead.
Pinecone — managed, popular, the one most people have heard of. Worth evaluating if you want to avoid running your own infrastructure.
VoyageAI — embedding-focused, less of a full database, more of a retrieval API layer.
FAISS / Neo4j — FAISS for pure similarity search at scale; Neo4j if your knowledge actually has graph structure (relationships matter, not just content similarity).
One dissenting view worth noting: some argue that file-based search with grep is actually better than vector search for code and documentation. LLMs know how to write good grep queries. They are less good at formulating vector search queries and interpreting ranked results. The recommendation: use vector databases for accumulated personal content and media; use file-based search for code and structured documentation.
Local LLMs: viable, with trade-offs
Several people in the discussion are running fully local models to avoid subscription costs and keep everything on their own infrastructure.
Qwen 3.6 35B a3b is a current favourite — runs on 8GB VRAM, manageable RAM requirements. Capable enough for agent work, though slower (one person mentioned 40-minute task runs as acceptable).
Google recently released turboquant, a technique that significantly reduces VRAM/RAM usage for KV cache — worth watching if you are running larger models locally.
The honest trade-off: local models are cheaper and private, but you will spend time finding one that meets your quality bar. The model that works for one task may not work for the librarian role described above.
Memory and predictability
One developer raised something that does not get discussed enough: adding vector-based memory to agents makes them harder to debug.
When something goes wrong with a stateless agent, you can look at the request, trace the reasoning, and understand why it made the decision it made. When the agent has memory — especially in a vector database — the “why” becomes opaque. Some retrieved context influenced the output, but which context, and why did it rank that way?
The more conservative position: if you need agent memory, use structured wiki markdown. One top-level document with a table of contents, linking to topic pages. Re-index daily. Keeps everything human-readable and auditable.
Fewer agents, better prompts
The most experienced person in the conversation had a clear conclusion after building their own orchestrator (orqestra):
Adding more agents does not help. A pipeline where one LLM monitors another LLM which corrects another LLM sounds robust. In practice it compounds errors and makes the system harder to reason about.
Their direction:
- Reduce the number of agents
- Hard-code the pipeline and agent personas in code — do not let the system configure itself dynamically
- Focus on one conversation at a time and build a labelled set of test requests (valid, ambiguous, absurd)
- Optimise at the token level — think in terms of tokens, not sentences
The prompt is the product. Everything else is plumbing.
Summary
| Approach | When it works |
|---|---|
| Interconnected markdown + Obsidian | Starting point for most knowledge bases, scales further than expected |
| “Librarian” agent over markdown | When you have a large PKM and want automated maintenance without RAG |
| pgvector | Already have Postgres, want vector search without extra infrastructure |
| Qdrant | Large data volume, production use case |
| ChromaDB | Local/in-memory prototyping |
| File-based search (grep) | Code and structured documentation |
| Local LLMs (Qwen 3.6) | Cost control, privacy, acceptable latency trade-off |
| Hard-coded pipeline + fewer agents | When your multi-agent system is producing unpredictable results |
The pattern that keeps coming up: the teams making the most progress are not the ones with the most agents or the most sophisticated retrieval pipelines. They are the ones who reduced complexity, kept humans in the loop for learning/memory updates, and spent time on the prompt.