LLM & AI Engineering vector database pinecone weaviate qdrant

Best Vector Databases in 2026: Pinecone vs Weaviate vs Qdrant vs pgvector

The four vector databases builders actually shortlist in 2026 — Pinecone, Weaviate, Qdrant, and pgvector — compared on real pricing, latency, scale limits, and production failure modes from our own shipped LLM features.

Ashish Pandey Published May 18, 2026 Updated Jul 19, 2026Recently updated 6 min read

TL;DR

Quick answer

The four vector databases builders actually shortlist in 2026 — Pinecone, Weaviate, Qdrant, pgvector. Real pricing, latency, scale, and production gotchas from shipped builds.

At Make An App Like, we are a US-based app development agency, and over the past three years our team has shipped 26+ production marketplace and AI platforms — including our Candy AI Clone with persona-memory vector search, our Carbon Credit Marketplace build with embedding-based project-similarity search, our Pocket FM clone with audio-series recommendation embeddings, plus Carvana, Zillow, Whatnot, Uber, Revolut, and Zepto across consumer marketplaces. We have run Pinecone, Weaviate, Qdrant, and pgvector in production at scale. In this guide, we compare the four vector databases builders actually shortlist in 2026 — real pricing, p50 and p95 latency from our own benchmarks, scale limits, metadata-filtering behavior, and the production failure modes you only discover after the bills land.

What is a vector database and when do you need one?

A vector database stores high-dimensional numerical embeddings (typically 384, 768, 1,536, or 3,072 dimensions) and runs approximate nearest neighbor (ANN) search across them in sub-100-millisecond time. Embeddings are the output of an embedding model (OpenAI's text-embedding-3-small, Cohere embed-v3, BGE, Voyage, Jina) that turns text, images, or audio into a fixed-length numerical fingerprint where semantically similar inputs sit close together in vector space.

You need a vector database when you are building one of four patterns. Retrieval-augmented generation (RAG) on a corpus larger than roughly 100,000 chunks where you need semantic search over your own data before sending it to an LLM. Semantic search when keyword search returns shallow results and meaning-aware ranking would convert better. Recommendations when collaborative-filtering signals are sparse and content-similarity embeddings carry more signal. AI agent memory when a conversational product needs to recall past sessions, user preferences, or learned facts across days or weeks.

You do NOT need a vector database when your corpus is under 10,000 documents (just keep the embeddings in memory or in a regular database column and brute-force the cosine similarity), when keyword search via Elasticsearch or Postgres full-text already converts well, or when your latency tolerance is in the seconds-range rather than tens-of-milliseconds.

The four options that actually matter in 2026

Pinecone, Weaviate, Qdrant, and pgvector are the four options on every serious shortlist. Chroma is fine for prototyping but has not converted to production scale. Milvus and Vespa are powerful but operationally heavy for most teams. Elasticsearch with its vector field works but lags the dedicated options on ANN performance. MongoDB Atlas Vector Search is a reasonable choice if you are already on Atlas. Redis with RediSearch handles vector search adequately for small workloads.

The four that survive serious production evaluation each have a distinct strength. Pinecone is the managed-first option with the lowest operational overhead. Weaviate has the strongest hybrid-search story (vector plus BM25) and a built-in module ecosystem. Qdrant is the open-source performance leader, written in Rust. pgvector is the Postgres extension that lets you skip the new-database conversation entirely if your data already lives in Postgres.

Pinecone — the managed-first option

Pinecone is the easiest path to a working vector index. You create an index from a dashboard, get a managed endpoint, and start writing vectors. The serverless tier handles autoscaling automatically and bills per read, write, and storage.

Pricing in 2026. Serverless starts at $0.33 per GB-month of storage plus $0.04 per million read units and $4 per million write units (the per-unit pricing varies slightly by region). A typical 10 million 1,536-dimension index runs roughly $200 to $400 per month at moderate read volume. The pod-based tier is more expensive at small scale (around $70 per pod-month for a starter pod) but converges with serverless above 100 million vectors.

Latency from our benchmarks. On a 5 million-vector index with 1,536-dimension embeddings and metadata filtering on a single field: p50 around 30 milliseconds, p95 around 80 milliseconds, p99 around 150 milliseconds. Cross-region adds 30 to 80 milliseconds.

Where Pinecone wins. Fast to launch, no infrastructure to run, generous free tier for prototyping, mature SDK in every major language, predictable scaling, strong serverless story for spiky workloads.

Where Pinecone loses. Vendor lock-in (no self-hosting option), aggressive pricing curve as you scale into hundreds of millions of vectors, limited control over the index parameters compared to the open-source options, US-and-EU-only data residency (a problem for India and APAC compliance), no native hybrid (BM25 + vector) search inside the engine.

Pick Pinecone when you want to ship the LLM feature this week, you are at single-digit-million vector scale, the team is small, and operational overhead is the constraint.

Weaviate — hybrid search and multi-tenancy

Weaviate is the strongest choice for products that need both semantic and keyword search in the same query. Its hybrid search blends BM25 lexical ranking with vector cosine similarity using a tunable alpha parameter, and the result is consistently better than either alone for product search, knowledge-base lookup, and recommendation.

Pricing in 2026. Weaviate Cloud Services (WCS) starts at $25 per month for a serverless tier and scales to roughly $295 per month for the standard production tier. Enterprise pricing for self-managed Kubernetes deployments lands in the $25,000 to $200,000 annual range depending on cluster size. Self-hosted on your own AWS or GCP is free (the software is open source under BSD-3) — you pay only for compute and storage.

Latency from our benchmarks. On the same 5 million-vector test: p50 around 25 milliseconds, p95 around 60 milliseconds, p99 around 120 milliseconds. Hybrid queries add roughly 15 to 25 milliseconds compared to pure vector.

Where Weaviate wins. Built-in hybrid search, the deepest module ecosystem (OpenAI, Cohere, HuggingFace, Voyage, Jina all wired in with one-line config), strong multi-tenancy for SaaS products serving thousands of separate customers, GraphQL plus REST API, open-source self-host option, multimodal vector support (CLIP for image-text), built-in cross-encoder reranking.

Where Weaviate loses. Heavier operations if you self-host, the GraphQL API has a learning curve, less mature serverless story than Pinecone, vertical scaling is more limited than horizontal.

Pick Weaviate when your product needs hybrid search out of the box, you are building a multi-tenant SaaS where every customer needs an isolated namespace, or you want the option to self-host later without rewriting the query layer.

Qdrant — the Rust-based open-source performance leader

Qdrant is written in Rust and tends to win raw-throughput benchmarks against the competition. The team focuses heavily on metadata filtering performance, which is the part of vector search most teams underestimate during evaluation and where most production systems fall over.

Pricing in 2026. Qdrant Cloud starts at $25 per month for a starter cluster and scales linearly with cluster size — a 16 GB managed cluster runs roughly $150 per month. Self-hosted is free under the Apache 2.0 license. Enterprise pricing for managed deployments at scale typically lands in the $15,000 to $100,000 annual range.

Latency from our benchmarks. On the 5 million-vector test: p50 around 15 milliseconds, p95 around 40 milliseconds, p99 around 80 milliseconds. Metadata-filtered queries hold p95 under 60 milliseconds where Pinecone and pgvector both slow down meaningfully.

Where Qdrant wins. Fastest open-source option in production, payload filtering performance is the best in the category, scalar quantization and binary quantization cut memory use 4x to 32x with small accuracy trade-off, Rust-based reliability under load, simple REST and gRPC APIs.

Where Qdrant loses. Smaller integration ecosystem than Weaviate, fewer language SDKs (Python, JavaScript, Go, Rust, Java are well supported; others lag), less polished managed offering than Pinecone, multi-tenancy story is less mature than Weaviate.

Pick Qdrant when raw query latency at scale is the constraint, when metadata filtering on multiple fields is core to the product (which it usually is), or when you need to compress vector memory aggressively for cost reasons.

pgvector — the Postgres extension

pgvector turns any Postgres database into a vector store by adding a single extension. The advantage is enormous if your transactional data already lives in Postgres — you keep one database, one connection pool, one backup strategy, and you can JOIN vector results against your existing tables in a single SQL query.

Pricing in 2026. Free — pgvector is an open-source extension under the PostgreSQL license. You pay only for your existing Postgres hosting. On Neon, Supabase, AWS RDS, or Google Cloud SQL, a Postgres instance handling 1 million vectors typically lands in the $50 to $300 per month range depending on memory and IOPS tier.

Latency from our benchmarks. Latency is sensitive to index type and scale. With the HNSW index (added in pgvector 0.5): p50 around 50 milliseconds at 1 million vectors, p95 around 150 milliseconds. At 10 million vectors with HNSW: p50 around 120 milliseconds, p95 around 400 milliseconds. The IVFFlat index is faster to build but slower to query than HNSW at scale.

Where pgvector wins. Zero new infrastructure, free, transactional consistency with the rest of your data (you can JOIN vector matches against orders, users, products in one query), works with every Postgres tool you already use (pg_dump, pgbouncer, psql, Prisma, Drizzle), HNSW index makes it competitive with dedicated options at moderate scale.

Where pgvector loses. Latency degrades faster than the dedicated options past 10 million vectors, no built-in hybrid search (you wire it together with Postgres tsvector manually), memory pressure on the Postgres instance increases as the HNSW index grows, ANN parameter tuning is less mature than the dedicated options.

Pick pgvector when you have fewer than 5 million vectors, you are already on Postgres, and the ability to JOIN vector results against transactional data is worth more than the last 20 milliseconds of latency.

Pricing and latency comparison

Vector DB	Pricing (10M vectors)	p50 latency	p95 latency	Hybrid search	Self-host
Pinecone	$200 - $400 / mo	30 ms	80 ms	No	No
Weaviate	$295 - $600 / mo (WCS)	25 ms	60 ms	Yes (built-in)	Yes (BSD-3)
Qdrant	$150 - $300 / mo (Cloud)	15 ms	40 ms	Limited	Yes (Apache 2.0)
pgvector	$200 - $500 / mo (your Postgres)	120 ms	400 ms	Manual (tsvector + vector)	Yes (PostgreSQL license)

Benchmark caveat — these numbers came from our own internal load tests against 1,536-dimension embeddings with a single metadata filter, run from US-East-1 against the same region. Your numbers will vary with embedding dimension, filter complexity, cluster region, and concurrent load. Always benchmark on your own workload before committing.

Production gotchas you only learn after launch

Five sharp edges that have bitten our team or our clients in production.

Embedding model drift. OpenAI deprecated text-embedding-ada-002 in early 2024 in favor of text-embedding-3-small and 3-large. Indices built on the old model became orphaned. Always plan for a full re-embed every 18 to 24 months. Budget the compute (text-embedding-3-small is $0.02 per million tokens, so re-embedding a 10 million-document corpus runs roughly $400 to $800).
Metadata filter performance cliffs. Filtering by a single high-cardinality field is fast on every option. Filtering by 3 to 5 fields with AND logic is where most options slow down 5x to 50x. Qdrant handles this best; pgvector and Pinecone degrade fastest. Test the filter shape you will actually use, not a single-field benchmark.
Reindexing cost during schema changes. Adding a new metadata field or changing the index type often requires a full reindex. On a 50 million-vector index, this can take 12 to 48 hours and lock writes during the rebuild. Plan for shadow indexes or rolling reindex via tags.
Oversized embedding dimensions. A 3,072-dimension OpenAI embedding costs roughly 2x the storage and memory of a 1,536-dimension one but rarely 2x better retrieval. We default to 1,536 (text-embedding-3-small) for cost and only move to 3,072 (text-embedding-3-large) when retrieval accuracy is materially below business need.
Cosine vs dot-product vs L2 confusion. Most embedding models are trained for cosine similarity. Some are trained for dot product. Mixing distance metrics silently degrades retrieval quality. Always match the distance metric to the embedding model's published convention.

Benchmarked one of these databases in production? You can write for us SaaS content and publish your methodology in full.

Frequently Asked Questions

Which vector database is the best in 2026?

There is no single best — the right choice depends on scale, ops appetite, and whether you need hybrid search. At under 1 million vectors with an existing Postgres database, use pgvector. At 1 to 50 million vectors with hybrid search and multi-tenancy needs, use Weaviate. At 1 to 100 million vectors where raw latency and metadata filtering matter most, use Qdrant. At any scale where operational overhead is the binding constraint, use Pinecone.

How much do vector databases cost at 1M, 10M, and 100M vectors?

At 1 million 1,536-dimension vectors, all four options run $50 to $200 per month. At 10 million vectors, Pinecone serverless and Weaviate Cloud land $200 to $600 per month; Qdrant Cloud is $150 to $300 per month; pgvector adds $100 to $400 to your Postgres bill. At 100 million vectors, Pinecone and Weaviate jump to $2,000 to $8,000 per month; Qdrant runs $1,000 to $4,000; pgvector becomes impractical and forces a migration. Self-hosted Qdrant or Weaviate on your own AWS at 100 million vectors typically runs $400 to $1,200 per month for the compute alone, before engineering operations time.

Should I use pgvector or a dedicated vector database?

Start with pgvector if your data is already in Postgres and you have fewer than 5 million vectors. The JOIN-with-transactional-data benefit is genuinely large for most product use cases, and HNSW makes pgvector competitive with dedicated options at this scale. Move to a dedicated vector database when you cross 10 million vectors, when query latency budget falls below 100 milliseconds p95, when you need hybrid search, or when the Postgres instance starts running out of memory under index pressure.

What latency should I expect from a vector database?

For a 1 to 10 million-vector index with 1,536-dimension embeddings and a single metadata filter: p50 of 15 to 50 milliseconds and p95 of 40 to 150 milliseconds is the realistic range across the four options. Cross-region adds 30 to 80 milliseconds. Hybrid search adds 15 to 30 milliseconds. Heavy metadata filtering on 3+ fields can add 50 to 200 milliseconds depending on which engine you picked. If you need sub-20 millisecond p95, Qdrant is the only realistic option among the four.

Can vector databases handle metadata filtering at scale?

All four can; the performance varies wildly. Qdrant is built around fast payload filtering and handles multi-field AND filters at 5 to 50 millisecond p95 even at 50 million vectors. Weaviate's filtering is solid up to 10 million vectors and degrades past that. Pinecone has improved filtering significantly in 2024 to 2025 but still slows down on multi-field combinations. pgvector with btree indexes on metadata fields plus an HNSW vector index handles modest filtering fine but is the slowest of the four at high cardinality.

What embedding model should I use?

For most English-language text retrieval workloads in 2026, OpenAI text-embedding-3-small (1,536 dimensions, $0.02 per million tokens) is the default — strong retrieval quality at low cost. For multilingual workloads, Cohere embed-multilingual-v3 or BGE-multilingual ship better recall. For domain-specific work (legal, medical, code), Voyage AI and Jina ship specialized models that beat the general-purpose options. For self-hosted, BGE-large, GTE-large, and E5-large are strong open-source choices that you can run on a single GPU. Always evaluate on your own retrieval task — leaderboard rankings rarely map to your specific corpus.

How do I migrate between vector databases?

The vectors and metadata themselves are portable — every option lets you export a JSON or Parquet dump. The harder migration work is the query API (each engine has a different filter syntax and SDK), the index parameter tuning (HNSW vs IVFFlat vs proprietary), and re-tuning the recall vs latency tradeoff on the new engine. Budget 2 to 4 weeks of engineering time for a non-trivial migration plus a dual-write period where you write to both old and new while you validate the new index. Avoid migrating during a product launch.

How did this article land?

Frequently Asked Questions

#Which vector database is the best in 2026?

#How much do vector databases cost at 1M, 10M, and 100M vectors?

#Should I use pgvector or a dedicated vector database?

#What latency should I expect from a vector database?

#Can vector databases handle metadata filtering at scale?

All four can; performance varies wildly. Qdrant is built around fast payload filtering and handles multi-field AND filters at 5 to 50 millisecond p95 even at 50 million vectors. Weaviate is solid up to 10 million and degrades past that. Pinecone has improved significantly in 2024 to 2025 but still slows down on multi-field combinations. pgvector with btree indexes on metadata fields plus an HNSW vector index handles modest filtering fine but is the slowest of the four at high cardinality.

#What embedding model should I use?

For most English-language text in 2026, OpenAI text-embedding-3-small (1,536 dimensions, $0.02 per million tokens) is the default. For multilingual workloads, Cohere embed-multilingual-v3 or BGE-multilingual. For domain-specific work, Voyage AI and Jina ship specialized models that beat general-purpose options. For self-hosted, BGE-large, GTE-large, and E5-large are strong open-source choices. Always evaluate on your own retrieval task.

#How do I migrate between vector databases?

The vectors and metadata are portable — every option lets you export JSON or Parquet. The harder work is the query API (each engine has different filter syntax), index parameter tuning (HNSW vs IVFFlat), and re-tuning recall vs latency on the new engine. Budget 2 to 4 weeks of engineering plus a dual-write period where you write to both old and new while you validate.

Written by

Ashish Pandey

“Enterprise SEO Consultant in India — Founder & CEO of Triple Minds & Make An App Like. Enterprise SEO Consultant in India · Schedule a Call for Investor-Ready Solutions.”

View profile →LinkedIn

Continue reading

LLM & AI Engineering

RAG Scalability Factors: Hardware, Memory, and Latency (Complete 2026 Guide)

Moving a RAG system from a prototype to production is a scalability problem across three pillars: hardware, memory, and latency. This engineering guide breaks down every factor with real numbers, memory formulas, infrastructure examples at three scales, latency budgets, cost tables, and the optimizations that actually move the needle in production.

by Ashish Pandey · Jul 24, 2026 15 min

Read article

LLM & AI Engineering

How Data Corruption and Poisoning Defeat AI Algorithms: Real Examples and Prevention

An AI algorithm is only as trustworthy as the data it learned from. When that data is corrupted by accident or poisoned on purpose, the model can learn the wrong patterns while still producing confident answers. This guide explains how data corruption and data poisoning defeat an AI algorithm, with real examples in fraud detection and image recognition, why poisoned models pass normal testing, and how businesses can reduce the risk.

by Ashish Pandey · Jul 21, 2026 6 min

Read article

LLM & AI Engineering

Which AI Offers Adult Features? NSFW AI Platforms Compared (2026)

The answer to which AI offers adult features changed dramatically over the past year: mainstream assistants started opening age-verified adult modes while the dedicated companion platforms kept building their lead. This guide maps the whole landscape as it stands in 2026: what the major assistants actually allow, which companion platforms permit NSFW content, the open-source route, and the age-verification, payment, and legal realities that apply to every player, users and founders alike.

by Ashish Pandey · Jul 16, 2026 6 min

Read article