Build Log #006: Why We Chose pgvector Over Pinecone (And Saved $400/Month)

Every RAG tutorial pushes you toward a dedicated vector database. We put our vectors in plain Postgres and it handles everything we need. Here's the reasoning, the tradeoffs, and the query planner bug that almost made us regret it.

Build Log is our honest engineering journal. Not the polished case study. The real stuff — including the parts where we look dumb.

The Decision

When we started building our RAG demo and client knowledge base systems, the first architectural question was where to store embeddings. If you read any RAG tutorial from 2024-2025, the answer is always the same: Pinecone. Maybe Weaviate. Maybe Qdrant if the author is feeling spicy. The message is clear — you need a purpose-built vector database.

We went a different direction. We chose pgvector — a Postgres extension that adds vector similarity search to the database you're probably already running. Nine documents, 60 chunks, 384-dimension embeddings, all sitting in the same Postgres instance that stores our users, sessions, and application data.

That decision saved us roughly $400/month in infrastructure costs. But it also introduced a bug that took two days to diagnose. This is that story.

Why Not Pinecone?

Let's be clear about what Pinecone is good at: if you have 50 million vectors and need sub-10ms query times at thousands of QPS, Pinecone (or Weaviate, or Qdrant, or Milvus) is the right tool. Purpose-built vector databases are optimized for exactly that workload.

But that's not our workload. And it's not most companies' workload.

Here's what a typical SMB RAG deployment looks like:

Document count: 50-5,000 documents (SOPs, policies, contracts, manuals)
Chunk count: 500-50,000 chunks after splitting
Query volume: 10-500 queries per day
Latency requirement: Under 500ms is fine (the LLM call takes 2-5 seconds anyway)

For that workload, a dedicated vector database is like renting a Ferrari to drive to the grocery store. It works, but you're paying for capabilities you'll never use.

The Cost Math

Pinecone's Starter plan is free but limited. Once you need more than 100K vectors or want production features (backups, monitoring, SLA), you're on the Standard plan at $70/month minimum. For a real deployment with metadata filtering, namespaces, and enough pods, you're looking at $200-$400/month.

Weaviate Cloud starts at $25/month for a sandbox, $95/month for production. Qdrant Cloud is similar.

pgvector costs nothing. It's a Postgres extension. You install it with CREATE EXTENSION vector; and you're done. The Postgres instance you're already paying for — or the one included in your $16/month Hetzner VPS — now does vector search.

For our typical client deployment (Hetzner CX32, $16-35/month), the entire stack — Postgres with pgvector, the application server, Nginx, Redis — runs on a single box. Total infrastructure cost for a production RAG system serving 100-500 users: $16-35/month. With Pinecone bolted on, that becomes $216-435/month. For a small business, that difference matters.

The Technical Case for pgvector

Beyond cost, there are real engineering reasons to keep vectors in Postgres:

1. No Data Sync Problem

The moment you put vectors in a separate database, you create a consistency problem. Your metadata (document titles, access permissions, tags, departments) lives in Postgres. Your vectors live in Pinecone. Now you need to keep them in sync. Every document upload writes to two databases. Every deletion needs to hit two databases. Every permission change needs to propagate.

With pgvector, your vectors are just another column. A single SQL transaction inserts the chunk text, metadata, and embedding atomically. No sync jobs. No eventual consistency. No orphaned vectors.

2. Filtering Is Just SQL

Need to search only within a specific department's documents? With Pinecone, you learn their metadata filtering syntax. With pgvector, it's a WHERE clause:

SELECT chunk_text, 1 - (embedding <=> query_vec) AS similarity
FROM chunks
WHERE department = 'engineering'
  AND created_at > '2025-01-01'
ORDER BY embedding <=> query_vec
LIMIT 5;

You already know SQL. Your team already knows SQL. Your ORM already speaks SQL. There's no new query language to learn, no new client library to install, no new API to paginate.

3. Backups Are Just Postgres Backups

pg_dump captures everything — your application data AND your vectors. One backup strategy. One restore procedure. One disaster recovery plan. With a separate vector database, you need to coordinate backups across two systems and handle the case where one restores successfully and the other doesn't.

4. pgvector Is Surprisingly Fast

For datasets under 1 million vectors, pgvector with an HNSW index is fast enough that you won't notice the difference. Our benchmarks on a 4-core Hetzner VPS:

60 chunks (our demo): 2ms query time, exact search (no index needed)
10,000 chunks: 8ms with HNSW index
100,000 chunks: 15ms with HNSW index
500,000 chunks: 35ms with HNSW index

At 500K chunks, you're talking about a company with roughly 50,000 documents — a large enterprise knowledge base. And the query time is still 35 milliseconds. The LLM inference that follows will take 100x longer.

The Bug That Almost Killed It

Two weeks in, we hit a problem that had us questioning the entire decision. Our similarity search was returning wrong results — not bad results, wrong results. A query about "employee onboarding" would return chunks about "quarterly financial reports." The similarity scores looked fine (0.7-0.8), but the content was nonsensical.

We spent a day checking embeddings. Rebuilt the entire chunk index. Verified the embedding model was deterministic. Everything checked out — the embeddings were correct.

The problem was pgvector's query planner.

The IVFFlat Disaster

We had initially created an IVFFlat index (the default recommendation in most pgvector tutorials) with 10 lists on our 60-chunk dataset. IVFFlat works by partitioning vectors into clusters (lists) and only searching the closest clusters at query time.

Here's the problem: with 10 lists and 60 chunks, each list contains roughly 6 vectors. IVFFlat's probes parameter defaults to 1, meaning it only searches the single closest list — 6 vectors out of 60. If your query vector happens to land near a cluster centroid that doesn't contain the relevant chunks, you get garbage results with high confidence scores.

The fix was embarrassingly simple: don't use IVFFlat on small datasets. We switched to HNSW (which doesn't have this problem) and for datasets under 10K chunks, we just do exact nearest-neighbor search with no index at all. At 60 chunks, exact search takes 2ms. The index was solving a problem we didn't have.

The JOIN Bug

A week later, we hit a second pgvector-specific gotcha. We needed to join the chunks table with a documents table to get metadata:

SELECT c.chunk_text, d.title, 1 - (c.embedding <=> query_vec) AS similarity
FROM chunks c
JOIN documents d ON c.document_id = d.id
ORDER BY c.embedding <=> query_vec
LIMIT 5;

This query returned only 1-2 results instead of 5, even though there were dozens of matching chunks. The similarity scores were correct for the results it did return — it just silently dropped results.

This is a known bug in pgvector 0.8.1. The Postgres query planner, when combining a JOIN with an ORDER BY on a vector operator and a LIMIT, sometimes pushes the LIMIT into the index scan before the JOIN, truncating results.

The fix is a subquery pattern:

SELECT sub.chunk_text, d.title, sub.similarity
FROM (
  SELECT id, chunk_text, document_id, 1 - (embedding <=> query_vec) AS similarity
  FROM chunks
  ORDER BY embedding <=> query_vec
  LIMIT 5
) sub
JOIN documents d ON sub.document_id = d.id;

Force the vector search to complete in a subquery, then join. This is not documented anywhere obvious. We found it by reading pgvector GitHub issues at 2am.

When You Should NOT Use pgvector

We're not zealots. There are cases where a dedicated vector database is the right call:

More than 5 million vectors — pgvector works but dedicated systems optimize memory management and sharding at this scale
Sub-millisecond latency requirements — if you're building real-time recommendation systems serving thousands of requests per second
Multi-tenant isolation at scale — managing hundreds of isolated namespaces across enterprise clients
Your team doesn't know Postgres — managed vector databases are simpler to operate if SQL isn't your thing

But for 90% of business RAG deployments — the kind we build for clients — pgvector handles the workload at a fraction of the cost with zero operational complexity beyond what you already manage.

The Lesson

The AI tooling ecosystem has a marketing problem. Every new tool positions itself as essential, every tutorial assumes you need the most powerful option, and the cost of over-engineering is hidden because it's a monthly subscription, not a one-time bill.

Our job as engineers is to match the tool to the problem. A 500-document knowledge base for a 50-person company does not need Pinecone. It needs Postgres with an extension, running on a VPS that costs less than a Netflix subscription.

pgvector isn't perfect. The IVFFlat behavior on small datasets is a trap. The JOIN bug cost us two days. But those are solvable problems with known workarounds. The alternative — paying $400/month for infrastructure you don't need and managing a data sync layer you didn't have to build — is a permanent tax on every deployment.

We'll keep using pgvector for every client project until the data tells us otherwise. So far, it hasn't.

Want to see pgvector in action? Our live RAG demo runs on pgvector with 60 chunks, returning results in under 50ms. Or book a call to talk about building a knowledge base for your team.