Database Design for Scale: Lessons from 10 Million Rows
Val Chahul
The Operator
When your database grows beyond toy scale, everything you thought you knew stops working. Here is what I learned scaling PostgreSQL.
When your database grows beyond toy scale, everything you thought you knew stops working. Here is what I learned scaling PostgreSQL from a prototype to a production system handling millions of records.
The Inflection Point
Somewhere between 100K and 1M rows, your queries start slowing down. Somewhere between 1M and 10M, they start failing. This is the inflection point where casual database design becomes a liability.
The Index Hierarchy
Not all indexes are created equal. Here is my mental model:
Index Type | Use Case | Write Overhead |
|---|---|---|
B-tree (default) | Equality, range queries | Medium |
GIN | Full-text, JSONB, arrays | High |
BRIN | Large, naturally ordered data | Very Low |
Partial | Filtered subsets | Low |
The Query to Rule Them All: EXPLAIN ANALYZE
-- Before optimization: 2.3 seconds
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
-- After adding composite index: 4ms
CREATE INDEX CONCURRENTLY idx_orders_user_status_created
ON orders (user_id, status, created_at DESC)
WHERE status IN ('pending', 'processing');The Five Rules of Scale
Index what you query, not what you think you might query
Partition by time for anything with a natural date dimension
Denormalize reads, normalize writes
Cache aggressively at the application layer
Measure before optimizing — intuition lies
The fastest query is the one you never run. Cache invalidation is hard, but not as hard as explaining to users why the page takes 10 seconds to load.
Scaling a database? I have optimized systems from thousands to billions of rows. Let us talk.
Discussion
Comments coming soon.