Handling node-postgres Pool Errors and Reconnection

This guide is part of Express.js Connection Pool Middleware. It addresses a failure mode that crashes Node processes that otherwise look healthy: an idle pg.Pool client throws an error nobody is listening for, and the unhandled error event takes down the whole process. The classic crash log is:

events.js:292
      throw er; // Unhandled 'error' event
      ^
Error: Connection terminated unexpectedly
    at Connection. (/app/node_modules/pg/lib/client.js:132:73)
    at Object.onceWrapper (events.js:421:28)
    at Connection.emit (events.js:315:20)

This is not a query failing. It is a connection that was sitting idle in the pool when the database closed it — a failover, a idle_in_transaction_session_timeout, an RDS Proxy recycle, a network blip. node-postgres surfaces that as an error event on the pool, and in Node an error event with no listener is rethrown and terminates the process. The fix is a small set of disciplines: always attach pool.on('error'), understand client.release(err) semantics so a poisoned connection is destroyed rather than reused, and let the pool’s own lazy reconnection plus a bounded retry/backoff ride out transient failover instead of taking the service down. This guide covers all three.

Key operational takeaways:

pool.on('error', handler) is mandatory. Without it, an idle-client error is an unhandled error event and process.exit. The handler should log, not crash.
A pg.Pool recovers on its own: dead idle clients are removed and new ones are created lazily on the next pool.connect()/pool.query(). You do not rebuild the pool.
client.release(true) (or client.release(err)) destroys the client instead of returning it to the pool — use it whenever a query error might have left the connection in a bad state.
Wrap acquisition + query in bounded retry with exponential backoff for transient codes (ECONNRESET, 57P01, 08006), but never retry non-idempotent writes blindly.
Graceful shutdown still matters; combine this with Implementing Graceful Connection Pool Shutdown in Express.

Rapid incident diagnosis

When the service crashes or sheds errors, first classify which connection threw and whether it was idle or in-use. The two have different handlers and different fixes.

Symptom	Where it surfaces	Cause
`Unhandled 'error' event` + process exit	`pool.on('error')` missing	Idle client died (failover/timeout); no listener
`Connection terminated unexpectedly` on a query	Inside `await pool.query()`	In-use client lost connection mid-statement
`terminating connection due to administrator command` (`57P01`)	Query or idle	Database failover/restart killed the backend
`Client has encountered a connection error and is not queryable`	Reused client	A poisoned client was `release()`d back to the pool, not destroyed
Slow ramp of errors after deploy	Acquisition	Pool size too small or stuck clients not released

The decisive question is idle vs in-use. An idle client’s death is asynchronous and only the pool-level error event sees it — that is the crash path. An in-use client’s death rejects the in-flight query() promise, which your request handler’s try/catch can see. You must handle both: the pool listener for idle deaths, the per-query catch for in-use deaths.

Confirm failovers from the database side so you do not misattribute them to application bugs:

-- PostgreSQL: backends terminated by admin/failover show in the log;
-- check current idle timeout that may be closing your pooled clients.
SHOW idle_in_transaction_session_timeout;
SELECT pid, state, state_change, application_name
FROM pg_stat_activity WHERE application_name = 'express-app';

Why an idle-client error crashes the process

node-postgres Pool is an EventEmitter. When a client sitting idle in the pool emits an error (the socket closed under it), the pool re-emits that as a pool-level error event. Node’s EventEmitter has a special rule: an error event with no registered listener is thrown as an exception. In an async context that becomes an uncaught exception and the default behavior is to terminate the process.

There is no formula here, just a hard invariant: the number of error listeners on the pool must be at least one for the lifetime of the pool. The cost of forgetting it is total process death on the first failover. The remediation is one listener attached at pool creation, before any traffic.

Exact remediation & configuration

Attach the pool error handler at creation, handle in-use failures with retry/backoff, and destroy poisoned clients on release.

// db.js — one pool per process, with a mandatory error listener.
const { Pool } = require('pg');

const pool = new Pool({
  host: process.env.PGHOST,
  port: 5432,
  database: 'appdb',
  max: 10,                       // max clients in the pool
  idleTimeoutMillis: 30000,      // close idle clients after 30s
  connectionTimeoutMillis: 5000, // fail acquisition after 5s instead of hanging
  application_name: 'express-app',
});

// MANDATORY: without this, an idle-client error crashes the process.
// Do NOT rethrow here. Log and let the pool recover lazily.
pool.on('error', (err, client) => {
  console.error('idle pg client error (pool will recover):', err.code, err.message);
  // No action needed: the pool already removed the dead client.
  // A new client is created on the next acquisition.
});

module.exports = { pool };

Bounded retry with backoff for acquisition + query, scoped to transient errors only:

// withRetry.js — retry transient connection failures, not logic errors.
const TRANSIENT = new Set([
  'ECONNRESET',   // socket reset
  'ETIMEDOUT',    // network timeout
  '57P01',        // admin shutdown / failover (terminating connection)
  '08006',        // connection failure
  '08003',        // connection does not exist
]);

async function withRetry(fn, { retries = 3, baseMs = 100 } = {}) {
  let attempt = 0;
  for (;;) {
    try {
      return await fn();
    } catch (err) {
      attempt += 1;
      const transient = TRANSIENT.has(err.code);
      if (!transient || attempt > retries) throw err;
      // Exponential backoff with jitter: 100, 200, 400ms (+/- jitter).
      const delay = baseMs * 2 ** (attempt - 1) * (0.5 + Math.random());
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

module.exports = { withRetry, TRANSIENT };

Manual client checkout with correct release semantics — this is where most reuse bugs live:

const { pool } = require('./db');
const { withRetry } = require('./withRetry');

async function runTransaction(work) {
  return withRetry(async () => {
    const client = await pool.connect();
    try {
      await client.query('BEGIN');
      const result = await work(client);
      await client.query('COMMIT');
      client.release();           // healthy: return to pool for reuse
      return result;
    } catch (err) {
      try { await client.query('ROLLBACK'); } catch (_) { /* connection may be dead */ }
      // Destroy the client: passing a truthy arg removes it from the pool
      // instead of recycling a possibly-corrupted connection.
      client.release(err);
      throw err;
    }
  });
}

The client.release() semantics are the crux:

client.release() with no argument returns the client to the pool for reuse. Use only when the connection is known healthy.
client.release(true) or client.release(err) (any truthy value) destroys the client — closes the socket and removes it from the pool. The pool creates a fresh client on the next acquisition. Use this whenever an error occurred, because a connection that errored mid-transaction may be in an unknown protocol state, and recycling it produces Client has encountered a connection error and is not queryable on the next unlucky request.

Never call process.exit() from the pool.on('error') handler or from a query catch. A transient failover should degrade a few requests, not the process. The pool’s lazy reconnection — it creates new clients on demand up to max — is what carries you through a failover without a restart. For serverless or autoscaled deployments where pool sizing interacts with connection limits, see Sizing the node-postgres Pool for Serverless.

Validation & verification

Prove three behaviors: the process survives an idle-client kill, in-flight queries retry through a failover, and poisoned clients are not reused.

Kill an idle backend from the database and confirm the process stays up:

-- Terminate one of the app's idle backends; node-postgres should log,
-- not crash, and recover on the next query.
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE application_name = 'express-app' AND state = 'idle'
LIMIT 1;

With the pool.on('error') listener attached, you see the log line from the handler and the next request succeeds. Without it, the process exits — run this once on a staging box without the listener to internalize the failure.

Simulate failover under load: terminate all app backends mid-test and confirm the retry layer absorbs it.

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity WHERE application_name = 'express-app';

A load generator hitting runTransaction should show a brief spike of 57P01/ECONNRESET that the backoff retries, then full recovery, with the process never restarting.

Confirm clients are not reused after error. After forcing query failures, inspect that the pool created fresh clients rather than reusing a poisoned one:

// expose for a debug endpoint
console.log({ total: pool.totalCount, idle: pool.idleCount, waiting: pool.waitingCount });

waitingCount should return to 0 after the burst, and you should not see repeated not queryable errors — that signals release(err) is being used correctly.

Frequently Asked Questions

Do I need to recreate the pg.Pool after a database failover?

No. A pg.Pool is self-healing: when a pooled client’s connection dies, the pool removes that client, and the next pool.connect() or pool.query() lazily creates a new client against the (now-promoted) database. Recreating the pool throws away healthy idle clients and complicates shutdown. Just keep the pool.on('error') listener attached and let bounded retry cover the brief window during the failover.

What exactly does client.release(err) do differently from client.release()?

Passing any truthy value to release tells the pool to destroy the client — close its socket and discard it — instead of returning it to the idle set for reuse. This matters after an error because the connection may be mid-protocol or in an aborted transaction; reusing it yields Client has encountered a connection error and is not queryable. After a successful query, call release() with no argument so the connection is recycled.

Why does my process crash with “Unhandled ‘error’ event” even though I catch errors in every route?

Your route-level try/catch only sees errors on connections you are actively querying. An idle client that dies in the pool emits a separate pool-level error event, and Node rethrows error events that have no listener. The route catch never runs because no query was in flight. Attach pool.on('error', ...) so that idle-client failure is handled instead of fatal.

Should I retry every failed query automatically?

Only idempotent operations, and only on transient connection error codes like ECONNRESET, 57P01, and 08006. Retrying a non-idempotent write (an INSERT without a unique guard, a balance increment) risks double-applying it if the first attempt actually committed before the connection dropped. Make writes idempotent (unique keys, upserts) before enabling retry on them, or wrap them so the retry checks for prior success.

How do I keep idle clients from being killed by the database in the first place?

Set idleTimeoutMillis on the pool shorter than the database’s idle-connection killer (PostgreSQL idle_in_transaction_session_timeout, or a proxy’s idle timeout) so the pool retires idle clients before the server does. This reduces — but never eliminates — idle-client errors, because failovers and network events still happen, so the pool.on('error') listener and retry layer remain required.

Express.js Connection Pool Middleware — the parent topic covering pool wiring and middleware in Express.
Implementing Graceful Connection Pool Shutdown in Express — draining the pool cleanly on SIGTERM so reconnection logic isn’t fighting shutdown.
Sizing the node-postgres Pool for Serverless — pool sizing where reconnection and connection limits interact under autoscaling.