Skip to main content

Command Palette

Search for a command to run...

Do You Actually Need a Vector Store?

Every RAG tutorial reaches for Pinecone. Almost none of them should. Here is the honest decision tree for where to put your vectors, and why boring infrastructure wins the first 90% of real projects.

Updated
12 min read
Do You Actually Need a Vector Store?

Every RAG tutorial on the internet opens with pip install pinecone-client. The tutorial author picked a managed vector service, wired it up in fifteen minutes, and showed you a retrieval demo that worked. You read the tutorial and thought "OK, I need a vector database."

You probably don't.

The honest version of this story is that most real projects run fine on a plain Postgres table with the pgvector extension, and a surprising number of them would run fine on a pickle file or a SQLite BLOB column with no extension at all. The decision to reach for a specialised vector database is one you should make once you have specific, measured reasons, not one you make on day one because a tutorial told you to.

This post is the decision tree I actually use. It's going to sound boring, because the correct answer for most projects is "the thing you already have." Boring infrastructure is usually the right call for the first 90% of real projects. The moment your retrieval becomes the bottleneck in your product, you'll know — and you'll know what to reach for next.


The four storage options, honestly

For any non-trivial RAG system, your options are roughly these four:

OptionGood forFree until
In-memory listPrototyping, tiny corpora, unit tests~10,000 vectors
SQLite BLOBSingle-machine apps, edge deploys, hobby projects~100,000 vectors
Postgres + pgvectorMost production apps that already use Postgres~1–10 million vectors
Dedicated vector DBVery large corpora, multi-tenant, exotic query patternsthe sky

The numbers are approximate and depend on your query latency budget, how you index, and how much RAM/disk you give the thing. But the rough order of magnitude is right: each step up adds an order of magnitude of scale, and each step adds operational complexity proportional to the jump.

Let me walk through each, with the code you'd actually write.


Option 1: in-memory list

The simplest thing that could possibly work. You have a few hundred documents, you embed them at startup, you hold them in a Python list, you compute dot products at query time. Total storage: RAM. Total query latency: microseconds.

# pip install openai numpy
import numpy as np
from openai import OpenAI

client = OpenAI()
DOCS = [...]  # your documents as strings

def embed(texts: list[str]) -> np.ndarray:
    resp = client.embeddings.create(model="text-embedding-3-small", input=texts)
    return np.array([d.embedding for d in resp.data])

# Embed once at startup, normalise, cache in module state.
DOC_VECS = embed(DOCS)
DOC_VECS = DOC_VECS / np.linalg.norm(DOC_VECS, axis=1, keepdims=True)

def search(query: str, k: int = 5) -> list[tuple[str, float]]:
    q = embed([query])[0]
    q = q / np.linalg.norm(q)
    scores = DOC_VECS @ q  # dot product, one-vs-many
    top = np.argsort(-scores)[:k]
    return [(DOCS[i], float(scores[i])) for i in top]

For up to about 10,000 vectors at 1,536 dimensions (roughly 60 MB of floats), this is genuinely fine. One matrix multiply against all your vectors is a few milliseconds on a laptop CPU, and you can actually use this in production as long as you can tolerate the full set living in every process's memory.

When this is the right answer: prototype demos, side projects, unit tests, internal tools whose corpora rarely change, small-corpus features (like "search over our 300-page docs"). Do not skip this option. It is the correct starting point for more projects than the tutorial industry wants to admit.

When it stops being the right answer: you can't load the corpus at startup, the corpus updates frequently, you have multiple server processes that each need a copy, or your memory pressure from holding the vectors starts hurting other parts of the app.


Option 2: SQLite with a BLOB column

One step up. You store each vector as a binary blob in a local SQLite database. No extensions, no dependencies beyond Python's stdlib. Queries are a full table scan — you read every vector and compute similarity in application code — which sounds bad but is fine until about 100,000 vectors.

# pip install openai numpy
import sqlite3
import numpy as np
from openai import OpenAI

client = OpenAI()
db = sqlite3.connect("corpus.db")
db.execute("CREATE TABLE IF NOT EXISTS docs (id INTEGER PRIMARY KEY, text TEXT, vec BLOB)")

def embed(texts: list[str]) -> np.ndarray:
    resp = client.embeddings.create(model="text-embedding-3-small", input=texts)
    arr = np.array([d.embedding for d in resp.data], dtype=np.float32)
    return arr / np.linalg.norm(arr, axis=1, keepdims=True)

def add(texts: list[str]) -> None:
    vecs = embed(texts)
    for text, vec in zip(texts, vecs):
        db.execute("INSERT INTO docs (text, vec) VALUES (?, ?)", (text, vec.tobytes()))
    db.commit()

def search(query: str, k: int = 5) -> list[tuple[str, float]]:
    q = embed([query])[0]
    rows = db.execute("SELECT id, text, vec FROM docs").fetchall()
    scored = []
    for row_id, text, blob in rows:
        vec = np.frombuffer(blob, dtype=np.float32)
        scored.append((text, float(np.dot(q, vec))))
    scored.sort(key=lambda x: -x[1])
    return scored[:k]

The full scan is the embarrassing-but-honest part. You read every row, hydrate every vector, compute every dot product. On 50,000 vectors at 1,536 dimensions, this takes around 100–300 ms in Python. Fast enough for most user-facing queries. Not fast enough if you're scaling to millions.

When this is the right answer: edge deploys, desktop apps, serverless functions with their own storage, projects where you want a single-file deployable. SQLite is shockingly competent and shockingly overlooked.

When it stops being the right answer: you outgrow the 100,000-vector ballpark, you need multi-writer access, or query latency becomes user-visible.

For a middle ground, there's sqlite-vec — a lightweight SQLite extension that gives you approximate-nearest-neighbor indexes inside SQLite. Drops latency by an order of magnitude without leaving the single-file world. Worth knowing about if you like SQLite but need the speed.


Option 3: Postgres with pgvector

This is the move for most production apps. pgvector is a Postgres extension that adds a vector column type and nearest-neighbor operators. If you already have a Postgres database in your stack — and most apps do — this is where your vectors should live.

-- Run once in your database
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE docs (
  id SERIAL PRIMARY KEY,
  text TEXT NOT NULL,
  vec vector(1536) NOT NULL
);

CREATE INDEX docs_vec_idx ON docs USING hnsw (vec vector_cosine_ops);

The hnsw index is an approximate-nearest-neighbor index that gives you sub-linear query time. You trade a tiny amount of recall (not always the exact top-k, usually the approximate top-k) for queries that run in milliseconds on millions of vectors. The trade-off is almost always worth it.

The query side, in Python:

# pip install psycopg openai numpy
import os
import numpy as np
import psycopg
from openai import OpenAI

llm = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])

def embed(texts: list[str]) -> np.ndarray:
    resp = llm.embeddings.create(model="text-embedding-3-small", input=texts)
    return np.array([d.embedding for d in resp.data], dtype=np.float32)

def search(query: str, k: int = 5) -> list[tuple[str, float]]:
    q = embed([query])[0]
    with conn.cursor() as cur:
        cur.execute(
            "SELECT text, 1 - (vec <=> %s::vector) AS similarity "
            "FROM docs ORDER BY vec <=> %s::vector LIMIT %s",
            (q.tolist(), q.tolist(), k),
        )
        return [(text, float(sim)) for text, sim in cur.fetchall()]

The <=> operator is pgvector's cosine distance. 1 - distance gives you cosine similarity. The query uses the HNSW index automatically, so even on millions of rows it's milliseconds-fast.

Why is this the right answer for most production apps?

  1. You already have Postgres. It's already backed up. Already monitored. Already has a replica. Already has access control. Already has a migration story. You don't add an operational surface, you use one you already maintain.
  2. Transactions. Your vectors and your business data are in the same database. You can atomically add a document, its metadata, its embeddings, and its audit log in one transaction. Dedicated vector DBs almost never give you this.
  3. Filters. The query "give me the top 5 vectors where the document's author_id = 42 and published_at > '2026-01-01'" is one SQL query in Postgres. In a dedicated vector DB it's typically a worse-performing afterthought, or a separate query-and-filter dance.
  4. Scale is surprisingly good. pgvector with HNSW handles tens of millions of vectors on a properly-sized instance. That's more than most projects will ever need.

When this is the right answer: almost all production apps that already use Postgres. Pick this by default. Switch if and only if you have measured evidence that you need something else.

When it stops being the right answer: corpora of hundreds of millions of vectors, very high QPS requirements, multi-region read replicas for vectors that don't match your regular read-replica setup, or you don't have Postgres and don't want to run one.


Option 4: the dedicated vector database

Pinecone, Weaviate, Qdrant, Milvus, Chroma, LanceDB. The managed and self-hosted vector databases. Each has its own query language, its own operational model, its own cost curve, its own strengths.

When any of these is the right answer, you will know because you have a specific problem one of them solves better than Postgres. Examples of specific problems:

  • You have 100+ million vectors and query latency is user-visible. Dedicated systems are optimised for this; pgvector can struggle at the extreme end.
  • You need multi-tenant isolation with strong performance guarantees per tenant. Some vector DBs handle this well; rolling it in Postgres is doable but non-trivial.
  • You need features pgvector doesn't have yet: exotic index types, GPU-accelerated search, hybrid search primitives built in, etc.
  • You specifically want a managed service and are willing to pay for it. Pinecone, Zilliz, and others let you offload operations entirely.
  • Your workload is read-heavy and latency-critical and you want the vendor to worry about tuning HNSW parameters, autoscaling, and replication.

What you don't want is to pick one of these because a blog post told you vectors live in vector databases. That's selection by vocabulary, not by problem.

A good rule: don't adopt a dedicated vector DB until you can articulate, in one sentence, what problem Postgres failed to solve. "It seems more professional" is not a problem.


The decision tree

Notice how large the "Postgres with pgvector" box is. That is not because Postgres is magic. It's because the range of project sizes where Postgres is the right answer is enormous — from tens of thousands to tens of millions of vectors — and most real projects never leave that range.


The cost question

Dedicated vector DBs have three cost surfaces: compute, storage, and query volume. Managed ones (Pinecone, Zilliz Cloud) charge you monthly regardless of usage. Self-hosted ones (Qdrant, Weaviate, Milvus on your own infra) shift the cost to ops.

Postgres with pgvector has, for most teams, no additional cost surface at all — you're already paying for Postgres, and vectors are just more rows. For a team that already runs Postgres, adding pgvector is free compared to standing up a new managed service. This is the part the "compare the vector DBs" articles never quite acknowledge: the correct baseline isn't "which vector DB is cheapest," it's "what does the marginal cost look like versus using what I already have."

The break-even is somewhere past a few million vectors with demanding latency requirements. Below that, using Postgres saves money and operational complexity. Above that, you might save money by moving off Postgres, depending on workload.


Two anti-patterns

Anti-pattern 1: the "future-proofing" pick. "We might have 10 million vectors in two years, so let's use Pinecone now." This is how you pay Pinecone for two years to store 5,000 vectors that would fit in a pickle file. Optimise for the next six months, not the imaginary future. Migrating from Postgres to a dedicated vector DB when you actually need it is a weekend of work, not a quarter.

Anti-pattern 2: the "one tool per job" split-brain. "We'll put the text in Postgres and the vectors in Pinecone." Now you have two databases to keep in sync, two query paths, two failure modes, and no transactions. Unless you have a specific reason, co-locate the vectors with the thing they describe. pgvector lets you do this in one place.


Admit what breaks

  • HNSW is approximate. The top-k results from an HNSW index are usually the same as the true top-k, but not always. If your product cares about exact recall (rare), you need IVFFlat or a sequential scan. For most apps, "approximate top 5" is identical to "exact top 5" in user perception.
  • Postgres indexes on vector columns have build time. Building an HNSW index over several million vectors can take minutes to hours. Build it offline or during a maintenance window; don't do it in a blocking transaction.
  • pgvector dimension limits. As of 2026, pgvector supports up to 16,000 dimensions. Most embedding models fit comfortably. Exotic large ones may not.
  • Re-embedding is expensive. Changing embedding models means re-embedding everything. Budget for the API cost and the index rebuild time. This is true for every storage option, not just pgvector.
  • Concurrent writes to HNSW. Postgres serialises HNSW index updates. If your workload is heavy on concurrent inserts, expect some contention. For most apps, this is a non-issue.
  • Dedicated vector DBs move faster than Postgres. The features you want (filtering, hybrid search, metadata search) may arrive in Pinecone or Qdrant six months before pgvector. If you're on the edge of what's possible today, the dedicated option may be unavoidable.

What just changed in your code

  • Default to Postgres with pgvector. If your app already has Postgres, add the extension and move on with your life. Do not go shopping for a vector database.
  • If you don't have Postgres, use SQLite with BLOBs or sqlite-vec for single-machine apps. You'll be surprised how far it goes.
  • If your corpus is under 10,000 vectors, use an in-memory list. No infrastructure at all. Load at startup, query in microseconds. Done.
  • Only adopt a dedicated vector database when you can state, in one sentence, the specific problem it solves that your current infrastructure doesn't. "Scale" is not a specific problem. "Our HNSW index is too slow on 50M vectors and we've tried pgvector tuning" is.
  • Co-locate your vectors with your business data whenever possible. Transactions, filters, and backups are all cheaper when they live together.

Next post, B3.3, is the one I most wish someone had told me early: chunking is the whole game. Your embedding model is probably fine. Your vector store is probably fine. The thing that's making your RAG retrieval bad is how you're cutting your documents up before you embed them. We'll go through three chunking strategies and the ones that actually earn their complexity.


Course navigation

⬅️ Previous📍 You are hereNext ➡️
⬅️ Previous
B3.1 · Embeddings in 30 Minutes of Code
B3.2 of B6.4Next ➡️
B3.3 · Chunking Is the Whole Game

📚 AI for Builders · Course Home — 28 posts, six modules.


Cover photo via Unsplash. This post is part of the AI for Builders series.

More from this blog

Learn AI - Zero to Hero

111 posts