← Blog

Build Log #001: Our RAG Demo Returned Garbage (And Why)

Three bugs that nearly killed our live demo — an indexing catastrophe, a silent query planner issue, and a regex that split itself across two lines. This is what building production AI actually looks like.

This is the first entry in our build log — an honest journal of building AI systems for real businesses. Not the polished case study. The actual messy, frustrating, "why is this broken" engineering reality.

The Setup

We were building a live RAG demo for our website — something prospects could actually use to see retrieval-augmented generation in action. Nine documents. Data center SOPs, incident reports, a real estate feasibility analysis. Sixty chunks, 384-dimensional embeddings, PostgreSQL with pgvector, Claude for generation.

Simple, right? We've built these before. Chunk the docs, embed them, store the vectors, search on query, pass context to the LLM. The pipeline was up in a couple hours.

Then we tested it.

Bug #1: The IVFFlat Catastrophe

The search results were bad. Not "slightly off" bad. "Returning completely irrelevant chunks" bad. We'd ask about CRAC unit failure procedures and get back a chunk about real estate zoning classifications.

We checked the embeddings. Fine. We checked the chunking. Fine. We checked the similarity function. Cosine distance, standard stuff. Everything looked correct.

The problem was the index.

We'd created an IVFFlat index with 10 lists — a reasonable default if you have tens of thousands of vectors. But we had 39. Thirty-nine vectors split across 10 lists means each list has ~4 vectors. IVFFlat's default probes setting is 1, meaning it only searches the single closest list to the query vector.

We were searching approximately 10% of our data.

The fix was embarrassingly simple: drop the IVFFlat index entirely. For a dataset this small, exact brute-force search is both faster and produces perfect results. The index was making things worse, not better.

Lesson: IVFFlat indexes are designed for large datasets (100K+ vectors). On small datasets, they're catastrophically bad. If you have fewer than 10,000 vectors, don't use an approximate index at all. Exact search is faster and doesn't miss anything. This isn't in most tutorials.

Bug #2: The Silent JOIN Truncation

With the index fixed, search quality improved dramatically. But we noticed something weird: we were asking for the top 5 results and consistently getting back 3 or 4. Sometimes 2.

Our query looked like this:

SELECT c.content, c.chunk_index, d.title, d.doc_type
FROM chunks c
JOIN documents d ON c.document_id = d.id
ORDER BY c.embedding <=> query_vector
LIMIT 5;

Standard stuff. JOIN chunks to documents to get metadata, order by vector similarity, take the top 5.

Except pgvector 0.8.1 has a query planner bug. When you combine a JOIN with ORDER BY on a vector column and LIMIT, the planner sometimes short-circuits the vector scan. It doesn't error. It doesn't warn. It just silently returns fewer results than you asked for.

The fix: subquery pattern. Do the vector search in an inner query, then JOIN metadata after.

SELECT sub.content, sub.chunk_index, d.title, d.doc_type
FROM (
    SELECT c.content, c.chunk_index, c.document_id
    FROM chunks c
    ORDER BY c.embedding <=> query_vector
    LIMIT 5
) sub
JOIN documents d ON sub.document_id = d.id;

Same results, but now we actually get all 5. The inner query does the vector search without any JOIN interference, and the outer query decorates with metadata.

Lesson: pgvector + JOIN + ORDER BY + LIMIT is a known combination that can silently truncate results. Always do vector search in a subquery, then JOIN metadata after. This bug cost us hours because the results looked reasonable — just incomplete. The most dangerous bugs are the ones that look almost right.

Bug #3: The Regex That Split Itself

This one almost broke us. The chat UI was rendering markdown in the AI's responses — bold text, code blocks, lists. We had a JavaScript function called renderMarkdown that used regex to convert markdown syntax to HTML.

It worked in development. It broke in production.

The regex pattern for detecting newlines used \n as a literal string. But the JavaScript was being served from a Python backend that used triple-quoted strings. Python's string handling converted \\n into an actual newline character — which split the regex across two lines in the generated JavaScript.

In the browser console: SyntaxError: Invalid regular expression. The regex was literally broken in half.

The fix: use String.fromCharCode(10) instead of \n in the regex. It's uglier, but it's immune to string escaping issues across language boundaries.

Lesson: When you're generating JavaScript from Python (or any cross-language string embedding), escape sequences will betray you. \n means different things to different languages at different stages of the pipeline. Use character codes when in doubt, and always test the actual served output, not just the source file.

The Bigger Lesson

None of these bugs were in the AI. Not one. The LLM worked perfectly every time we gave it the right context. The embedding model produced good vectors. The chunking was solid.

Every single failure was infrastructure. An index that shouldn't have been there. A query planner quirk. A string escaping issue. This is the stuff that AI tutorials skip and production systems die from.

This is why we call ourselves "AI integration engineers," not "AI consultants." The hard part isn't choosing a model or writing a prompt. The hard part is making all the pieces work together reliably, at scale, in production, where the bugs are silent and the data is messy.

The demo works now. You can try it yourself. Ask it about CRAC unit failure procedures, or land acquisition due diligence, and watch it retrieve the right chunks, cite its sources, and give you a grounded answer.

But behind that clean interface are three bugs that each took longer to diagnose than the entire rest of the pipeline took to build. That's the job.


This is Build Log #001. We'll keep publishing these as we build — the real engineering stories behind production AI systems. If you want to follow along or talk about building something similar for your business, get in touch.