Inside Postgres pg_stat_statements: Find Slow Queries Without an APM
Production Postgres falls over for one of three reasons: a missing index, the same query running ten thousand times when it should run ten, or a join that decided today is the day to seq-scan a 50-million-row table. The trick is knowing which one. Without an APM like DataDog, most teams find out by reading the deploy chat the morning after.
pg_stat_statements is the answer hiding in your database. It is a Postgres extension that tracks every distinct query pattern (calls, total execution time, mean time, rows returned, buffer hits) and lets you query it like any other table. It ships with Postgres, costs nothing extra, and turns "why is the app slow" into a SQL question with an answer.
Here is how to enable it, what the columns actually mean, and the three queries that surface 90 percent of real bottlenecks.
Enable the Extension
pg_stat_statements ships with Postgres but is not loaded by default because it allocates shared memory. Three steps to turn it on.
1. Add to shared_preload_libraries. Edit postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
max is how many distinct query patterns Postgres will keep stats for (10,000 is plenty for most apps). track = all includes nested statements inside functions, which you almost always want.
2. Restart Postgres. The module is loaded at server start. There is no way around the restart.
sudo systemctl restart postgresql
3. Create the extension in your target database.
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
You only need to run this in the databases you want to monitor. The view pg_stat_statements is now available.
The Columns That Matter
The view has 40+ columns. You will use six of them most days:
query: the normalized query text (literals replaced with$1,$2, etc.)calls: how many times this query pattern has been executedtotal_exec_time: cumulative wall time in milliseconds across all callsmean_exec_time: average time per call in millisecondsrows: total rows returned across all callsshared_blks_hit/shared_blks_read: shared buffer cache hits vs disk reads (highreadversushitmeans cold cache or scan-heavy queries)
Two columns are tempting but misleading on their own: total_exec_time favors slow queries, calls favors hot ones. Use both, plus mean_exec_time, to triangulate.
The Three Queries That Surface Real Bottlenecks
1. Slowest single calls (the "what is killing us" query)
SELECT
substring(query, 1, 100) AS query,
calls,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(total_exec_time::numeric, 2) AS total_ms,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
Surfaces queries that are individually expensive. Prime candidates for index additions or query rewrites. The ones that take 800ms per call and run twenty times an hour are usually the painful ones.
2. Worst total time consumers (the "where does the day go" query)
SELECT
substring(query, 1, 100) AS query,
calls,
round(total_exec_time::numeric / 1000, 2) AS total_sec,
round(mean_exec_time::numeric, 2) AS mean_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Surfaces queries that are cheap individually but run a million times. These often hide N+1 problems in your ORM. A query that takes 2ms but runs 500,000 times an hour is a bigger CPU sink than one that takes 800ms but runs twenty times.
3. Cache miss offenders (the "why is disk on fire" query)
SELECT
substring(query, 1, 100) AS query,
calls,
shared_blks_hit,
shared_blks_read,
round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) AS hit_pct
FROM pg_stat_statements
WHERE shared_blks_read > 0
ORDER BY shared_blks_read DESC
LIMIT 20;
Surfaces queries doing the most physical disk reads. Low hit_pct means the data is not in shared buffers, usually because the working set exceeds shared_buffers or because a scan-heavy query is blowing the cache.
Resetting and Snapshotting
pg_stat_statements stats accumulate from the last reset. To start fresh:
SELECT pg_stat_statements_reset();
Useful before benchmarking a specific feature or migration. For ongoing observability, snapshot the view to a regular table on a schedule (INSERT INTO pg_stat_snapshots SELECT now(), * FROM pg_stat_statements) and graph the deltas. That gives you trend data without buying an APM.
Gotchas
Permissions. Non-superusers see only their own query stats by default. Either grant pg_read_all_stats to your monitoring user or query as a superuser. Letting your app role read its own stats is fine and useful.
Query normalization is not perfect. Two semantically identical queries with different whitespace or comment blocks count as different patterns. Use pg_stat_statements.track = all and accept some noise, or normalize via your ORM.
The view is in-memory. Stats are lost on Postgres restart unless you snapshot them first. If you care about historical trends, set up a cron to snapshot every minute or hour.
max is a hard cap. Once you hit pg_stat_statements.max, the least-used entries get evicted. If your app has more than 10,000 distinct query patterns (unusual but possible with dynamic SQL), bump it.
Wrapping Up
pg_stat_statements is the most valuable Postgres extension nobody talks about. Enable it, run the three queries above once a week, and you will catch performance regressions before your users do. The whole thing takes five minutes to set up and costs nothing.
If you want managed Postgres with pg_stat_statements already enabled, automatic backups, and version updates handled for you, you can deploy PostgreSQL on Elestio in one click. We handle the operational side so you can focus on the queries that actually matter.
Thanks for reading. See you in the next one.