Dripshop Infrastructure Case Study

Dripshop Infrastructure Case Study
90% Cost Reduction
How Dripshop Migrated from GCP Kubernetes to Dedicated Servers
$20K Before (GKE) → $2K After (Dedicated)
🎯 The Result
52,000 TPS at the database layer. 10,000+ concurrent users. 90% cost reduction.
This is the story of how we did it—and why managed infrastructure isn't always the answer.
❌ The Problem: GKE at Scale Gets Expensive
Dripshop started on Google Kubernetes Engine—the standard choice for modern containerized applications. But as traffic grew, so did the bill.
💸 $20,000/month
Monthly GKE infrastructure spend including compute, managed databases, and egress costs for global users.
📦 20 Instances at Peak
During NFT drops, we scaled to 20 instances (16 vCPU, 32GB RAM each). Total: 320 vCPUs, 640GB RAM.
⏱️ 75ms Latency
PgBouncer connection pooling became a bottleneck, adding significant latency to database operations.
📈 Overprovisioning
Node pools required overprovisioning to handle traffic spikes, meaning we paid for resources we rarely used.
🐘 AlloyDB: "Managed" But Not Really
GCP's AlloyDB promised a fully managed PostgreSQL experience, but reality was different. We still had to manually tune work_mem, shared_buffers, and other parameters. It only supported PgBouncer for connection pooling—no Odyssey option. Managed in name, DIY in practice.
The Real Problem: We were paying premium prices for "managed" services while still doing the optimization work ourselves—and hitting the limits of single-threaded tools.
🤔 The Decision: Going Against the Grain
The cloud-native playbook says use managed Kubernetes.
It's "easier to operate." It "scales automatically." It's "the modern way." But we asked ourselves: what are we actually paying for?
Auto-scaling → We could handle this ourselves with PM2 cluster mode
Managed control planes → We didn't need them for a stable workload
Premium pricing → We were paying cloud tax on commodity compute
The New Infrastructure
Three dedicated servers from ServerBasket—bare metal machines with serious specifications:
GCP Kubernetes $20K → Dedicated Servers $2K
🏗️ The Architecture: Three Servers, Zero Compromise
We split responsibilities cleanly across three dedicated servers, each with 256 cores and 768GB RAM.
1
Server 1: Backend
256 cores • 768GB RAM
🔄 PM2 Cluster Mode
⚡ 200 Node.js Workers
🌐 API + Business Logic
📊 Load Balancing
2
Server 2: Cache + Search
256 cores • 768GB RAM
🐉 DragonflyDB (Redis)
🔍 OpenSearch
🔐 Session Store
📦 Queue Management
3
Server 3: Database
256 cores • 768GB RAM
🐘 PostgreSQL 18
🌊 Odyssey Pooler
📍 PostGIS Extension
🧩 19 Extensions
🔗 Why This Separation Works
Backend: Pure compute—no contention with I/O services. Cache: Memory-bound DragonflyDB + heavier OpenSearch queries isolated. Database: Full resources for PostgreSQL with massive shared_buffers allocation.
🧩 PostgreSQL Extensions: Supercharging the Database
We leverage 19 PostgreSQL extensions to optimize performance, enable advanced queries, and add specialized functionality.
🚀 Performance Extensions
pg_stat_statements for query analysis
pg_buffercache for buffer inspection
pg_prewarm for cache warming
pg_visibility for visibility maps
🔍 Search & Indexing
pg_trgm for fuzzy text search
fuzzystrmatch for string matching
bloom for bloom filter indexes
btree_gin / btree_gist for composite indexes
🌍 Geospatial (PostGIS)
postgis for spatial data
postgis_raster for raster data
postgis_topology for topological models
postgis_tiger_geocoder for US geocoding
🛠️ Data Types & Utilities
uuid-ossp for UUID generation
pgcrypto for encryption
hstore for key-value storage
citext / intarray / tablefunc
💡 Why Extensions Matter
These extensions let us handle fuzzy search, geospatial queries, advanced indexing, and performance monitoring inside PostgreSQL—eliminating the need for external services and reducing network round-trips.
🧪 The Experimental Stack
Instead of just migrating our existing stack, we re-evaluated every component with modern multi-core hardware in mind.
Connection Pooling: Odyssey
PgBouncer is single-threaded—on a 256-core machine, that's a bottleneck. Odyssey from Yandex is multi-threaded, using 64 worker threads to handle connections.
20x Faster
vs PgBouncer (2,650 → 51,807 TPS)
In-Memory Cache: DragonflyDB
Redis is single-threaded by design. DragonflyDB is a drop-in replacement that's multi-threaded from the ground up. Same API, dramatically better throughput on multi-core machines.
25x Faster
vs Redis on 256-core machine
Zero Code Changes Required: We exported Redis data as an RDB file and imported it directly into DragonflyDB. All existing Redis clients work without any modification—true drop-in replacement.
📊 Benchmark Results: Odyssey vs PgBouncer
We ran comprehensive benchmarks with 200 clients and 100 threads over 60 seconds. The results speak for themselves.
💡 Key Insight
Odyssey with 64 worker threads nearly matches direct PostgreSQL performance while still providing connection pooling, transaction management, and 10,000+ client connection support.
🐉 DragonflyDB vs Redis
🏆 The Results: Numbers Don't Lie
After completing the migration from GKE + AlloyDB to dedicated servers, here's where we landed:
52K
TPS
Database transactions per second—impossible to achieve with AlloyDB + PgBouncer limitations.
10K+
Users/sec
Concurrent users handled comfortably with room to spare.
45ms
P95
API response time at the 95th percentile—down from 150ms.
90%
Savings
Monthly infrastructure cost reduced from $20,000 to $2,000.
Resource Utilization
15-25%
CPU Average
30-35%
Memory Usage
3-4x
Growth Headroom
💡 The Irony
With dedicated servers, we have full control over PostgreSQL tuning—work_mem, shared_buffers, connection pooling—and achieved 52K TPS. The "managed" AlloyDB couldn't come close, yet cost 10x more.
Made with