Qdrant vs Weaviate vs Milvus: Which Vector Database for Your RAG Pipeline?

Qdrant vs Weaviate vs Milvus: Which Vector Database for Your RAG Pipeline?

If you are building anything with retrieval-augmented generation right now, you have probably hit the same fork in the road I did: which vector database do you actually run? Qdrant, Weaviate, and Milvus are the three open-source heavyweights, all self-hostable, all popular, and all happy to tell you they are the best fit for your RAG pipeline. I have spent a good chunk of the last few months running all three with real data, and the honest answer is that they win in different places. Here is how they compare in mid-2026, with current versions and the trade-offs that actually matter.

What they actually are

The three share the same core job (store embeddings, find the nearest neighbors fast) but they come from very different design philosophies.

Qdrant (written in Rust) is built around fast filtered search. If your queries combine vector similarity with a pile of metadata constraints (in stock, under fifty dollars, in this category), Qdrant is engineered for exactly that.

Weaviate (written in Go) is more than a vector store. It bundles vectorization, so you can send raw text and let built-in modules call OpenAI, Cohere, or HuggingFace to generate embeddings at ingest time, and it doubles as a lightweight knowledge graph for connected data.

Milvus (Go and C++) was designed for scale from day one. Its fully disaggregated architecture separates compute and storage so you can scale reads, writes, and indexing independently into the billions of vectors.

The comparison at a glance

Feature Qdrant Weaviate Milvus
Latest version (June 2026) 1.18.2 1.38.1 2.6.18
Language Rust Go Go + C++
Indexing HNSW + ACORN filtering HNSW HNSW, IVF, SCANN, DiskANN
Hybrid search Vector + payload filters Vector + BM25 keyword Vector + full-text
Built-in vectorizers No Yes No
Knowledge graph No Yes No
GPU acceleration Yes No Yes
License Apache 2.0 BSD 3-Clause Apache 2.0

What changed in 2026

All three shipped meaningful upgrades this year, so if you tested them in 2025 it is worth a second look.

  • Qdrant 1.18 introduced TurboQuant, a quantization mode that delivers roughly 8x vector compression without the usual recall penalty, plus a low-memory mode and dynamic CPU pooling for search.
  • Weaviate 1.38 continued hardening for production: a rate limiter on batch operations, async replication that auto-enables when your replication factors line up, and debug endpoints now disabled by default.
  • Milvus 2.6 added element-level search on Struct fields, nullable vector support, and steady improvements to query-node scheduling and compaction for high-delete workloads.

Where each one wins

Qdrant: the filtering champion

Qdrant's ACORN approach folds metadata filtering directly into the HNSW graph traversal instead of filtering after the search. That keeps queries fast even when a filter eliminates 99% of candidates, which is the exact scenario that makes naive vector search fall apart. The Rust foundation is also frugal: it is realistic to run millions of vectors on a 4 GB instance. If your retrieval mixes similarity with structured constraints (recommendations, e-commerce, personalization), Qdrant is the natural pick.

Weaviate: the all-in-one

Weaviate's headline benefit is that it removes a moving part. Configure a vectorizer module and you can push raw text straight in, with embeddings generated at ingest time, so there is no separate embedding service to deploy and babysit. Add the knowledge-graph side, where objects reference other objects, and Weaviate becomes a strong base for GraphRAG and any data model where relationships matter as much as similarity.

Milvus: the scale monster

Milvus is the one you reach for at genuine billion-vector scale. Its layered design (access, coordinator, worker, storage) lets you scale query throughput, write capacity, and indexing independently. It also offers the widest index selection: beyond HNSW, DiskANN keeps the index on NVMe rather than RAM, which changes the cost math entirely once you are past a few hundred million vectors. The price is more moving parts to operate.

Cost reality check

Self-hosting any of the three on Elestio means you pay for infrastructure, not per-vector or per-query fees. Rough sizing:

Scale Recommended config Elestio cost
Up to 1M vectors 2 CPU / 4 GB RAM ~$16/month
1M to 10M vectors 4 CPU / 8 GB RAM ~$29/month
10M to 100M vectors 8 CPU / 16 GB RAM ~$59/month
100M+ vectors 16 CPU / 32 GB RAM ~$119/month

For comparison, managed services like Pinecone or Zilliz Cloud climb into the hundreds of dollars per month for the same 10M-plus workloads with replication. Self-hosting at a flat infrastructure cost adds up to real savings over a year.

How to choose

  • Pick Qdrant when queries combine similarity with heavy metadata filtering and you want the lightest resource footprint.
  • Pick Weaviate when built-in vectorization or entity relationships (GraphRAG) make your life easier.
  • Pick Milvus when you are genuinely at hundreds of millions of vectors and need its scaling and indexing levers, and you can absorb the extra operational complexity.

Troubleshooting common issues

  • Queries slow down as data grows. Turn on quantization. Qdrant offers scalar, binary, and now TurboQuant; Weaviate has rotational quantization; Milvus has IVF-based options.
  • Memory usage too high. Switch to memory-mapped storage so cold vectors live on disk while hot data stays in RAM. Qdrant and Milvus both support mmap.
  • Results change after a restart. Make sure data persists to a mounted volume rather than ephemeral container storage. On Elestio this is handled for you.
  • Timeouts during big imports. Batch your inserts at 500 to 1,000 vectors per call instead of one at a time. It is the difference between minutes and hours.

The bottom line

There is no universally best vector database here, just the right one for your workload. Qdrant wins on filtered search efficiency, Weaviate on developer experience and built-in intelligence, and Milvus on scale and flexibility. The good news: all three are open-source and deploy in a few minutes on Elestio, so you can test each against your own data before committing. Spin one up from the Elestio catalog (Qdrant, Weaviate, Milvus) and let your latency numbers make the call.

Thanks for reading ❤️ See you in the next one 👋