PgBouncer Metrics Monitoring
This guide is part of Connection Pool Observability, and it focuses on the one source of truth every PgBouncer operator must master: the admin console. PgBouncer is a thin, single-threaded multiplexer between thousands of client connections and a small set of backend PostgreSQL connections, and its admin console exposes the exact counters that reveal whether that multiplexing is healthy or quietly queueing requests. Unlike an application pool, PgBouncer maintains two distinct connection populations — client-side and server-side — and almost every production incident comes down to misreading the relationship between them.
The admin console is a virtual database named pgbouncer. You reach it with any libpq client by connecting to PgBouncer’s listen port with dbname=pgbouncer as a user listed in admin_users or stats_users. From there a family of SHOW commands returns one row per pool, per database, per client, or per server, plus rolling traffic statistics. Reading those rows correctly is the difference between scaling default_pool_size when you actually have a slow backend, and chasing a phantom while clients pile up in the wait queue.
Key operational takeaways:
cl_waitingandmaxwaitare the saturation signal — any sustainedcl_waiting > 0means clients are blocked waiting for a server connection.SHOW POOLSis per database-plus-user pool;SHOW STATSis per database traffic;SHOW LISTSis the global capacity summary.sv_activeneardefault_pool_sizewithcl_waiting > 0means the backend pool is the bottleneck; raisedefault_pool_sizeor fix slow queries.avg_query_timerising alongsidemaxwaitpoints at the database, not at PgBouncer.- Scrape the console with the PgBouncer Prometheus exporter to turn point-in-time
SHOWoutput into time series and alert rules.
Foundational mechanics
PgBouncer keeps two sides of every connection separate. On the client side, applications open connections to PgBouncer’s listen port; each is cheap because it is just a socket plus a small client struct. On the server side, PgBouncer opens real PostgreSQL backend connections — expensive, capped, and shared. A pool is the set of server connections dedicated to one (database, user) pair, sized by default_pool_size. The whole point of PgBouncer is that the client population can be far larger than the server population, with the proxy time-slicing server connections across clients according to the configured PgBouncer Transaction vs Statement Pooling mode.
When a client issues a query (in transaction mode, when it opens a transaction), PgBouncer must hand it a server connection. If a free server connection exists in the pool, the client borrows it immediately and the client moves to active. If every server connection is busy, the client is parked in the pool’s wait queue and counted in cl_waiting. It stays there — accumulating maxwait time — until a server connection is released back to the pool. This queue is invisible to the application except as latency, which is why reading it directly from the admin console is the only reliable way to diagnose saturation.
The diagram below maps the SHOW POOLS columns onto this client/server split and the wait queue in between.
The admin console: command by command
Every diagnostic begins by connecting to the console. Use the standard psql invocation against the PgBouncer port, naming the virtual pgbouncer database:
psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer
The console does not accept WHERE, ORDER BY, or joins. You get raw rows; filtering happens downstream in grep or your monitoring layer. The commands divide cleanly by what they describe: pools, traffic, capacity, databases, and individual connections.
A note on cadence: every SHOW is a point-in-time read against PgBouncer’s single event loop. It is cheap — microseconds — but it is also a sample, not a stream. Two SHOW POOLS calls a second apart can disagree wildly during a burst because the wait queue drains and refills between samples. This is precisely why manual console reads are adequate for confirming an incident but inadequate for catching one: you only see the queue if you happen to be looking when it forms. Continuous scraping by the exporter closes that gap by sampling on a fixed interval and persisting the result.
SHOW POOLS — per-pool live state
SHOW POOLS returns one row per active (database, user) pool. It is the first command in any saturation investigation because it shows both connection populations side by side.
SHOW POOLS;
| Column | Meaning |
|---|---|
cl_active |
Client connections currently bound to a server connection and running or between queries inside an open transaction. |
cl_waiting |
Client connections that have sent a query but have no server connection yet — they are queued. The primary saturation signal. |
sv_active |
Server connections currently lent to a client. |
sv_idle |
Server connections free and immediately available to lend. |
sv_used |
Server connections used recently but not yet returned to the idle set (awaiting server_check_query or reset). |
sv_login |
Server connections in the process of logging in to PostgreSQL. |
maxwait |
Whole seconds the oldest currently-waiting client has been queued. |
maxwait_us |
The microsecond fraction of maxwait. Total wait = maxwait + maxwait_us / 1e6. |
pool_mode |
Effective pooling mode for this pool (session, transaction, or statement). |
The relationship to watch: when sv_active equals default_pool_size and sv_idle is 0, the pool is fully lent out. If cl_waiting is also above zero at that moment, you are saturated — clients want server connections that do not exist.
SHOW STATS — per-database traffic and timing
SHOW STATS reports cumulative and averaged traffic per database. It answers “is the backend slow?” where SHOW POOLS answers “are clients waiting?”.
SHOW STATS;
Key columns include total_xact_count, total_query_count, total_received/total_sent (bytes), and the averaged-over-the-last-period figures: avg_xact_time, avg_query_time, avg_recv, and avg_sent. avg_query_time and avg_xact_time are reported in microseconds. A climbing avg_query_time that coincides with rising maxwait is the signature of a slow database: queries hold server connections longer, so the pool drains and clients queue. In that case raising default_pool_size only adds load to an already-strained backend — the fix is the query, not the proxy.
The distinction between avg_xact_time and avg_query_time matters under transaction pooling. avg_query_time measures the wall-clock time of individual statements; avg_xact_time measures the full transaction span, including client think-time between statements while the transaction is open. A transaction that runs three 2 ms queries but sits idle for 200 ms between them holds a server connection for the whole 200-plus milliseconds. That gap — a transaction that is open but not executing — is the leading cause of pool drain that avg_query_time alone will not explain. When avg_xact_time dwarfs avg_query_time, the application is holding transactions open across application logic or network round-trips, and the remedy is tightening transaction boundaries, not resizing the pool. The total_* counters are monotonic and reset only on restart, so they are best consumed as Prometheus counters and graphed as rates rather than read raw.
SHOW LISTS — global capacity summary
SHOW LISTS is a flat key/value snapshot of global counts: total databases, users, pools, free_clients, used_clients, login_clients, free_servers, and used_servers, plus dns_names and dns_queries. Use it to confirm you have not hit max_client_conn (compare used_clients against the configured ceiling) and to see aggregate server connection usage across all pools at once.
SHOW DATABASES — configured routing and limits
SHOW DATABASES lists each configured database entry with its resolved host, port, database (the real backend name), pool_size, reserve_pool, pool_mode, max_connections, and current_connections. It is how you verify that an override (a per-database pool_size or pool_mode) is actually in effect rather than falling back to the [pgbouncer] defaults.
SHOW CLIENTS and SHOW SERVERS — per-connection detail
SHOW CLIENTS and SHOW SERVERS drop to one row per individual connection. Each row carries state (active, waiting, idle, used, login), the addr/port, connect_time, request_time, and wait/wait_us for clients. These are for forensic work: identifying which client address is hoarding connections, or which server connection has been used without being released — the fingerprint of a state-leak or a long-running transaction.
The server-side state vocabulary is worth memorizing because it maps directly onto the lifecycle of a backend connection. A server in login is mid-handshake with PostgreSQL and not yet usable. idle means it is parked in the pool waiting to be lent. active means a client is running a query on it right now. used is the transient state after a client releases the connection but before PgBouncer runs server_check_query or server_reset_query to make it idle again. The pathological reading is a server stuck in active with a request_time minutes in the past: that is a single client holding one backend connection hostage, which under transaction pooling almost always means an application transaction that never committed. Correlating the addr from SHOW SERVERS back to the client in SHOW CLIENTS ties the leaking backend to the offending application instance.
Because the console output is plain text with a stable column order, lightweight forensic filtering is a one-liner even without WHERE:
# server connections busy for more than ~60s (request_time column)
psql -At -h 127.0.0.1 -p 6432 -U pgbstats pgbouncer -c "SHOW SERVERS;" \
| awk -F'|' '$3=="active"'
Precision sizing & interpreting the saturation signal
The single most important reading discipline is treating cl_waiting and maxwait as a paired saturation signal and everything else as context that explains why the signal fired. The matrix below maps the live SHOW POOLS state to a diagnosis.
cl_waiting |
sv_idle |
sv_active vs pool size |
avg_query_time trend |
Diagnosis |
|---|---|---|---|---|
| 0 | > 0 | below limit | flat | Healthy — spare capacity. |
| > 0 | 0 | at default_pool_size |
flat | Pool too small for concurrency; raise default_pool_size. |
| > 0 | 0 | at default_pool_size |
rising | Slow backend; fix queries before resizing. |
| 0 | high | well below limit | flat | Over-provisioned; consider lowering default_pool_size. |
| > 0 | > 0 | below limit | flat | Per-user pool limit or max_db_connections cap; check SHOW DATABASES. |
maxwait converts the abstract cl_waiting count into a latency budget. A cl_waiting of 5 with maxwait = 0 and a large maxwait_us is a transient micro-burst; a cl_waiting of 5 with maxwait = 3 means a client has been blocked three full seconds and your application is almost certainly seeing acquisition timeouts. Alert on maxwait, not just on cl_waiting, so brief spikes do not page anyone. For the full treatment of distinguishing pool exhaustion from backend slowness, see Detecting Connection Pool Saturation.
There is a sizing relationship that closes the loop between these counters and default_pool_size. By Little’s Law, the number of server connections a pool needs in steady state equals the arrival rate of transactions multiplied by their mean hold time. PgBouncer hands you both inputs: total_xact_count over an interval gives the arrival rate, and avg_xact_time gives the hold time. A pool serving 500 transactions per second at a 40 ms mean transaction time needs roughly 500 * 0.040 = 20 concurrent server connections to avoid queuing. If default_pool_size is below that and cl_waiting is persistently positive, the pool is mathematically undersized regardless of how fast individual queries run. If it is well above that and sv_idle stays high, you are holding idle backend connections that consume PostgreSQL max_connections headroom for nothing. This is the same arithmetic applied to application pools in the architecture fundamentals, applied here to the proxy’s server-side population.
The reserve_pool_size lever is the burst-absorption complement to default_pool_size. Reserve connections sit unused until a pool has had waiting clients for longer than reserve_pool_timeout, at which point PgBouncer opens them to drain the queue. Watching cl_waiting clear faster than maxwait grows is the signature of the reserve pool kicking in; if maxwait keeps climbing past reserve_pool_timeout with reserve connections configured, the reserve is too small for the burst shape.
Production configuration examples
Two settings make the admin console usable for monitoring: a stats user and a clean listen configuration. Grant a dedicated read-only console identity through stats_users so your scraper never needs admin_users privileges (which can issue RELOAD, PAUSE, and KILL).
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
admin_users = pgbadmin
stats_users = pgbstats
pool_mode = transaction
default_pool_size = 50
reserve_pool_size = 10
max_client_conn = 2000
; surface query timing in SHOW STATS without flooding the log
stats_period = 60
stats_users grants the listed roles permission to run every SHOW command and read the console, but not to mutate state. stats_period controls the averaging window for the avg_* columns in SHOW STATS. With this in place a scraper authenticates as pgbstats and pulls metrics safely.
The separation between admin_users and stats_users is a real security boundary, not a formality. A role in admin_users can run PAUSE, RESUME, RELOAD, KILL, and SHUTDOWN — enough to take the proxy down or sever every live connection. A monitoring scraper never needs any of that; it only reads. Granting it stats_users and nothing more means a leaked scrape credential cannot be turned into a denial-of-service. Keep the two role lists disjoint, and store the stats credential the same way you store any other service secret rather than inlining it where it can land in a process list or a container image layer.
Two further parameters shape what the counters mean. server_check_query (default SELECT 1) and server_check_delay govern how long a released connection lingers in the used state before becoming idle again; a long server_check_delay inflates sv_used without indicating a problem. server_idle_timeout controls how aggressively idle backend connections are closed, which directly determines how often you will see sv_idle shrink to zero between bursts. Reading the pool counters without knowing these settings leads to misdiagnosis — a healthy pool with conservative timeouts can look busier than it is.
The console can be sampled from a shell when you need a quick look without a dashboard:
# one-shot pool snapshot, machine-readable
psql -At -h 127.0.0.1 -p 6432 -U pgbstats pgbouncer -c "SHOW POOLS;"
Diagnostics & telemetry: exporting to Prometheus
Manual SHOW POOLS is fine for incident response but useless for trend detection and alerting. The standard path is to scrape the admin console on an interval and convert the rows into Prometheus time series. The community PgBouncer exporter connects to the console as a stats_users role, runs SHOW POOLS, SHOW STATS, SHOW LISTS, and SHOW DATABASES on each scrape, and exposes them as pgbouncer_* gauges and counters labelled by database and user.
Once those series exist you get the time-series equivalents of the console columns — pgbouncer_pools_client_waiting_connections for cl_waiting, pgbouncer_pools_server_active_connections for sv_active, and a counter for query time that lets you graph avg_query_time over hours rather than the last stats_period. The full deployment, scrape config, and alert rules live in the dedicated guide on Setting Up the PgBouncer Prometheus Exporter. For the dashboard and alert-routing layer that consumes those series alongside application-pool metrics, see Prometheus and Grafana Pool Metrics.
Integration & proxy compatibility
PgBouncer’s counters describe only the proxy layer. The full request path usually has a second pool in front of it — the application connection pool — and the two must be read together. When an application thread blocks acquiring a connection, the wait can be in either pool. If HikariCP Configuration Deep Dive reports pending threads while PgBouncer shows cl_waiting = 0, the bottleneck is the application pool size, not PgBouncer. If HikariCP is calm but PgBouncer shows sustained cl_waiting, the proxy or backend is the constraint. Reading both sides prevents resizing the wrong layer.
In transaction mode the cl_active count includes clients sitting inside an open transaction between statements, which is why a leaky application that holds transactions open inflates sv_active without doing useful work. That pattern is identical to a server-side connection leak and shows up in SHOW SERVERS as connections stuck in active with an old request_time. Aligning the proxy mode with the application’s transaction behavior — covered in PgBouncer Transaction vs Statement Pooling — keeps these counters meaningful.
Common failure patterns & remediation
| Symptom | Root cause | Exact fix | Validation command |
|---|---|---|---|
Sustained cl_waiting > 0, sv_idle = 0, avg_query_time flat |
Pool too small for client concurrency | Raise default_pool_size; add reserve_pool_size for bursts |
SHOW POOLS; — confirm cl_waiting returns to 0 |
cl_waiting > 0 with avg_query_time climbing |
Slow backend queries draining the pool | Fix the slow query; do not resize the pool first | SHOW STATS; — watch avg_query_time |
cl_waiting > 0 but sv_active below default_pool_size |
Per-user or per-database cap (max_db_connections) reached |
Raise the per-database limit, not the global pool | SHOW DATABASES; — compare current_connections to max_connections |
| Console connection refused for scraper | Role missing from stats_users |
Add the role to stats_users, then RELOAD |
psql ... pgbouncer -c "SHOW STATS;" succeeds |
used_clients near max_client_conn |
Client-side fd exhaustion approaching | Raise max_client_conn and OS ulimit, or cut app pool sizes |
SHOW LISTS; — read used_clients |
Server rows stuck active with old request_time |
Application leaking open transactions | Fix transaction handling; set server_reset_query = DISCARD ALL |
SHOW SERVERS; — inspect state and request_time |
Related
- Connection Pool Observability — the parent overview tying proxy, application, and cloud pool telemetry together.
- Setting Up the PgBouncer Prometheus Exporter — deploy the exporter, scrape config, and alert rules for these columns.
- Detecting Connection Pool Saturation — turning
cl_waitingandmaxwaitinto a reliable saturation alert. - PgBouncer Transaction vs Statement Pooling — how pool mode changes what
cl_activeandsv_activemean. - Prometheus and Grafana Pool Metrics — dashboards and routing for the exported PgBouncer series.