Qdrant vs Weaviate vs Milvus: Which Vector Database for Your RAG Pipeline?
You've decided your AI app needs a vector database. Great. Now you're staring at three open-source options — Qdrant, Weaviate, and Milvus — and they all claim to be the fastest, most scalable, most production-ready choice. I've spent the last few months testing all three, and here's what I wish someone had told me before I started.
What They Actually Are
All three store high-dimensional vectors and let you search by similarity. That's where the similarities end.
Qdrant is written in Rust and obsesses over one thing: fast filtered search. If your queries look like "find similar products, but only in stock, under $50, in the electronics category," Qdrant was built for you.
Weaviate is written in Go and thinks of itself as more than a vector database — it's a vector database plus a knowledge graph. It ships with built-in vectorizers for OpenAI, Cohere, and HuggingFace, so you can throw raw text at it and skip the embedding step entirely.
Milvus is written in Go and C++ and was designed for scale from day one. Its fully disaggregated architecture separates compute and storage, which means you can independently scale reads, writes, and indexing. If you're working with billions of vectors, Milvus is the one that won't flinch.
The Comparison That Matters
| Feature | Qdrant | Weaviate | Milvus |
|---|---|---|---|
| Language | Rust | Go | Go + C++ |
| Indexing | HNSW + ACORN | HNSW + ACORN | HNSW, IVF, SCANN, DiskANN |
| Hybrid Search | Vector + payload filters | Vector + BM25 keyword | Vector + full-text (4x faster than Elasticsearch) |
| Built-in Vectorizers | No | Yes (OpenAI, Cohere, HuggingFace) | No |
| Multi-tenancy | Yes | Yes | Yes |
| Knowledge Graph | No | Yes | No |
| GPU Acceleration | Yes (NVIDIA, AMD, Intel) | No | Yes |
| API | REST + gRPC | REST + gRPC + GraphQL | REST + gRPC |
| SDKs | Python, JS, Rust, Go, Java, .NET | Python, JS, Go, Java, C# | Python, Java, Go, Node.js |
| License | Apache 2.0 | BSD 3-Clause | Apache 2.0 |
| GitHub Stars | ~29,000 | ~14,000 | ~35,000+ |
Where Each One Wins
Qdrant: The Filtering Champion
Look, if your RAG pipeline involves complex metadata filtering — and most production ones do — Qdrant handles this better than anyone. Its ACORN algorithm integrates filtering directly into the HNSW graph traversal instead of treating it as a post-processing step. The difference is measurable: filtered queries stay fast even when filters eliminate 99% of candidates.
The Rust foundation also means consistently low memory overhead. I've seen Qdrant run comfortably on a 4 GB instance handling millions of vectors with aggressive quantization.
Weaviate: The All-in-One
Weaviate's killer feature is that it removes an entire step from your pipeline. Instead of running a separate embedding service, you configure a vectorizer module and Weaviate handles embedding at ingest time. For teams that don't want to manage an embedding infrastructure, this is genuinely compelling.
The GraphQL API is another differentiator. If your data has entity relationships — articles linked to authors linked to topics — Weaviate lets you query across those connections in ways that pure vector databases can't.
Milvus: The Scale Monster
Milvus is the only one here that was architected from scratch for billion-scale deployments. Its four-layer design (access, coordinator, worker, storage) means you can scale each layer independently. Need more query throughput? Add query nodes. Write-heavy workload? Scale data nodes.
It also offers the widest range of indexing algorithms. HNSW is great for most cases, but when you're working with 10 billion vectors, DiskANN lets you keep the index on NVMe storage instead of RAM — which changes the economics entirely.
Cost Reality Check
Here's what running each one actually costs on Elestio:
| Scale | Recommended Config | Elestio Cost |
|---|---|---|
| Up to 1M vectors | 2 CPU / 4 GB RAM (NC-MEDIUM) | $16/month |
| 1M–10M vectors | 4 CPU / 8 GB RAM (NC-LARGE) | $29/month |
| 10M–100M vectors | 8 CPU / 16 GB RAM (NC-XLARGE) | $59/month |
| 100M+ vectors | 16 CPU / 32 GB RAM (NC-2XLARGE) | $119/month |
Compare that to managed vector database services where costs scale with query volume, storage, and compute. Pinecone's pod-based plans start at $70/month for a single pod and climb quickly — a production setup handling 10M+ vectors with replication easily reaches $300–$500/month. Zilliz Cloud (managed Milvus) charges per compute unit, adding up fast at scale. Self-hosting on Elestio at $59/month for the same workload saves you thousands annually.
All three — Qdrant, Weaviate, and Milvus — are available on Elestio with automated backups, updates, and monitoring included.
The Decision Framework
Pick Qdrant if your queries combine vector similarity with complex metadata filters. Recommendation engines, e-commerce search, content personalization — anywhere filtering performance matters as much as recall. It's also the lightest on resources, which makes it ideal if you're cost-conscious.
Pick Weaviate if you want built-in vectorization and don't want to manage a separate embedding service. Also the right call if your data model has entity relationships that matter for retrieval (GraphRAG use cases). The GraphQL API is a bonus if your team already thinks in graphs.
Pick Milvus if you're operating at genuine scale — hundreds of millions to billions of vectors. Its disaggregated architecture and multiple indexing options give you optimization levers that the others simply don't have. The trade-off is complexity: Milvus has more moving parts to configure.
Troubleshooting Common Issues
Slow queries after data growth? Enable quantization. All three support it — Qdrant offers scalar and binary quantization, Weaviate uses rotational quantization by default, and Milvus supports IVF-based quantization. This trades a small recall percentage for dramatically lower memory usage.
High memory usage? Switch to memory-mapped storage. Qdrant and Milvus both support mmap, which offloads vector storage to disk while keeping hot data in RAM.
Inconsistent results between restarts? Make sure your data is persisted to mounted volumes, not ephemeral container storage. On Elestio, this is handled automatically — but if you're running Docker Compose manually, double-check your volume mounts.
Connection timeouts on large imports? Use batch inserts (500–1,000 vectors per batch) instead of inserting one at a time. All three SDKs support batch operations, and it's the difference between minutes and hours for large datasets.
The Bottom Line
There's no universally "best" vector database here — just the right one for your specific workload. Qdrant wins on filtered search and efficiency, Weaviate wins on developer experience and built-in intelligence, and Milvus wins on raw scale and flexibility.
The good news? All three are open-source, all three run on Elestio, and you can spin up any of them in under three minutes to test with your actual data. That's worth more than any benchmark.
Thanks for reading. See you in the next one.