Diagnosing RDS Proxy Borrow Timeouts

This guide is part of AWS RDS Proxy Connection Pooling. A borrow timeout is the proxy telling you it could not obtain a backend connection within ConnectionBorrowTimeout. The application sees a connection acquisition failure that looks like database slowness but is not — the database may be nearly idle while the proxy’s backend pool sits at its ceiling. The client-side symptom is a stalled acquire followed by an error such as:

java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not
available, request timed out after 120000ms

or, surfaced directly by the proxy when the backend pool is exhausted:

ERROR: request to borrow a connection from the pool timed out

The wait of roughly 120 seconds is the giveaway: that is the RDS Proxy default ConnectionBorrowTimeout. The proxy held the request, waited for a backend connection to free up, none did, and it failed the borrow. This guide isolates whether the cause is a true ceiling, excessive pinning, or slow-returning transactions, then applies the ceiling math and remediation.

Borrow timeout sequence A client borrow request waits in the proxy queue because every backend connection is in use, then fails after ConnectionBorrowTimeout elapses. Client borrow begins transaction Proxy queue no idle backend request waits… ConnectionBorrowTimeout 120s elapses Backend pool at ceiling Borrow fails timeout error to client
When every backend connection is in use, a borrow request waits in the proxy queue until ConnectionBorrowTimeout, then fails.

Rapid incident diagnosis

The failure has three candidate causes. Distinguish them with three CloudWatch metrics before changing anything.

  1. DatabaseConnectionsBorrowLatency — the time requests wait to borrow a backend connection. A baseline of microseconds spiking into seconds is the direct signature of a borrow timeout. This metric rising is your confirmation; everything else explains why.

  2. DatabaseConnections — backend connections currently open. Compare it against the ceiling max_connections × MaxConnectionsPercent ÷ 100. If DatabaseConnections is pinned at that ceiling, the pool is genuinely exhausted.

  3. DatabaseConnectionsCurrentlySessionPinned — backend connections locked 1:1 by session pinning. If this is high, pinning is consuming the pool and the real fix is to eliminate the state, per Resolving RDS Proxy Session Pinning.

The decision tree:

DatabaseConnections vs ceiling ...SessionPinned Diagnosis
At ceiling Low True undersizing — raise MaxConnectionsPercent or instance max_connections
At ceiling High Pinning exhaustion — fix the session state, not the ceiling
Below ceiling Low Slow-returning transactions holding connections too long

A fourth check rules out the database itself: if ConnectionRequestsBorrowed (the rate of successful borrows) has flatlined while DatabaseConnectionsBorrowLatency climbs, connections are not being returned — long-running transactions or a lock wait on the backend are holding them. Confirm with pg_stat_activity for long state = 'active' or idle in transaction sessions.

Do not be misled by client-side metrics in isolation. A HikariCP pool reporting high PendingThreads and connectionTimeout exceptions points at the proxy as the bottleneck, but the same client symptom appears whether the cause is the proxy ceiling, pinning, or a slow backend query. The proxy-side metrics above are authoritative; the client metrics only tell you the wait is happening upstream of the application. Correlate the two: a client timeout that lines up with a DatabaseConnectionsBorrowLatency spike confirms the proxy is the wait point, while a client timeout with flat proxy borrow latency points back into the client pool itself — the distinction drawn in the HikariCP Configuration Deep Dive.

One more distinction matters for triage: borrow timeouts are not acquisition timeouts on the client pool. The client connection-timeout governs how long the application waits for a slot in its own pool; ConnectionBorrowTimeout governs how long the proxy waits for a backend connection. When both fire, the shorter one wins and produces the error you see. If the client timeout is the default 30 s but waits chain up to the proxy’s 120 s, the error message and stack frame tell you which layer gave up first.

Mathematical sizing / ceiling formula

The backend pool ceiling is fixed and computable:

backend_ceiling = floor(instance_max_connections × MaxConnectionsPercent / 100)

For a db.r6g.large with max_connections = 1365 (PostgreSQL derives max_connections from instance memory) and MaxConnectionsPercent = 90:

backend_ceiling = floor(1365 × 90 / 100) = 1228

Now apply Little’s Law to find whether that ceiling can sustain the offered load. Let:

  • λ = transaction arrival rate (transactions/second)
  • T = mean transaction hold time on a backend connection (seconds), including query execution and the borrow round trip

The number of backend connections the workload needs concurrently is:

required = λ × T

Worked example: 4,000 transactions/second, each holding a backend connection for 8 ms:

required = 4000 × 0.008 = 32 backend connections

That fits comfortably under the 1228 ceiling — so if you are seeing borrow timeouts at this load, the cause is not the ceiling; it is pinning or long-held transactions inflating T. Conversely, if T balloons to 400 ms (a slow query or lock wait):

required = 4000 × 0.4 = 1600 > 1228 → borrow timeouts

The lever is almost always T, not the arrival rate. Halving query latency halves required connections. This is the same Little’s Law sizing applied to client pools in Optimizing HikariCP maximumPoolSize for High Concurrency; the only difference is that the proxy ceiling is expressed as a percentage of the instance limit rather than an absolute.

Also subtract non-proxy consumers. If admin tools, replication, and direct app connections hold 150 backend slots, your effective ceiling is 1228 − 150 = 1078. Sizing MaxConnectionsPercent to 100 on a shared instance invites FATAL: remaining connection slots are reserved.

Exact remediation & configuration

Match the remediation to the diagnosis from the decision tree.

If genuinely undersized (at ceiling, low pinning): raise MaxConnectionsPercent, or if already near 100 on a dedicated instance, scale the instance class to raise max_connections.

connection_pool_config {
  max_connections_percent      = 95
  max_idle_connections_percent = 30
  connection_borrow_timeout    = 120
}

max_idle_connections_percent = 30 keeps more backend connections warm so bursts do not pay cold-connection latency on top of the borrow.

If pinning exhaustion (at ceiling, high pinning): do not raise the ceiling — fix the state. Move SET to SET LOCAL or role defaults and disable named prepared statements, then rotate connections. Full procedure in Resolving RDS Proxy Session Pinning.

If slow-returning transactions (below ceiling): the proxy is fine; the backend is holding connections. Add a backend statement_timeout and find the long transactions.

ALTER ROLE app_user SET statement_timeout = '10s';
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '15s';

idle_in_transaction_session_timeout reaps sessions that opened a transaction and stalled, which would otherwise hold a backend connection indefinitely.

Tune the client pool to fail faster than the proxy so you get a clean error instead of a 120-second hang. Set the client connection-timeout (HikariCP) or connectionTimeoutMillis (node-postgres) to a few seconds — well under ConnectionBorrowTimeout:

spring:
  datasource:
    hikari:
      connection-timeout: 5000
      maximum-pool-size: 15

A 5-second client timeout fails fast and lets a circuit breaker or retry kick in, rather than blocking an application thread for two minutes. The trade-offs of acquisition timeout under bursty load are covered in Tuning Connection Acquisition Timeout Under Burst Load.

Apply all of these without downtime: proxy pool config and role defaults take effect on the next borrow/connection, so a rolling restart (or natural max-lifetime rotation) propagates them.

Validation & verification

Confirm DatabaseConnectionsBorrowLatency falls back to its microsecond baseline and ConnectionRequestsBorrowed resumes its normal rate.

On the database, verify backend connections are well under the computed ceiling and not stuck:

SELECT state, count(*)
FROM pg_stat_activity
WHERE usename = 'app_secrets_user'
GROUP BY state;

Healthy output shows most connections idle (returned to the proxy pool) and few active. A pile of idle in transaction rows means the slow-transaction remediation has not taken hold.

Run a synthetic load test at peak arrival rate and assert borrow latency stays flat:

pgbench -c 200 -j 8 -T 120 -h app-proxy.proxy-abc123.us-east-1.rds.amazonaws.com app

During the run, DatabaseConnections should plateau below the ceiling and DatabaseConnectionsBorrowLatency should remain in the sub-millisecond range. Any climb toward ConnectionBorrowTimeout means the workload still exceeds λ × T capacity.

Frequently Asked Questions

Why is the timeout almost exactly 120 seconds?
120 seconds is the default ConnectionBorrowTimeout on RDS Proxy. The proxy holds the borrow request for that long waiting for a backend connection to free up, then fails it. Lower the value if you would rather fail fast, but the better fix is usually to relieve whatever is exhausting the pool.
Is a borrow timeout the same as the database being out of connections?
Not necessarily. The proxy enforces its own ceiling at max_connections × MaxConnectionsPercent. You can hit a borrow timeout while the database still has free slots, because the proxy refuses to exceed its configured percentage. Check DatabaseConnections against the computed ceiling to tell which limit you hit.
Should I raise MaxConnectionsPercent to fix borrow timeouts?
Only if the diagnosis is true undersizing (at ceiling, low pinning) and the instance has headroom for non-proxy consumers. If pinning or slow transactions are inflating demand, raising the percentage just moves the database closer to FATAL: remaining connection slots are reserved without fixing the root cause.
Why do borrow timeouts appear under load but not in steady state?
Required backend connections scale as λ × T. A latency spike that increases T (a slow query, a lock wait, or a flood of pinned sessions) multiplies demand at the same arrival rate, pushing it past the ceiling only during the spike. Sustained borrow latency that tracks query latency confirms T is the lever.
Can the client pool mask or worsen borrow timeouts?
Yes. A long client connection-timeout (the HikariCP default is 30 s, but waits can stack to the proxy’s 120 s) makes the application hang instead of failing cleanly. Set the client timeout well below ConnectionBorrowTimeout so the application fails fast and a circuit breaker can shed load.