Skip to content

Port Exhaustion

Have you ever run out of TCP ports?

In large-scale services it is not unusual. The upper limit is 65,535, but the ephemeral range carves that down further. You can calculate connection counts at design time. Workers times destinations. That part is predictable. The problem is release.

TCP holds a closed connection in TIME_WAIT for a fixed interval. This prevents late-arriving packets from a previous connection being confused with a new one on the same source-port-and-destination pair. Default: 60 seconds. The port stays locked. Handle a few hundred requests per second and tens of thousands of ports sink into TIME_WAIT within a minute.

The classic fix was tcp_tw_recycle. Reuse ports stuck in TIME_WAIT. Problem solved. Except behind NAT it broke. Multiple clients behind NAT share a single IP. Timestamp consistency collapses. Connections get refused. Plenty of engineers enabled it behind a load balancer and learned the hard way.

Then Linux 4.12 removed tcp_tw_recycle entirely. A kernel upgrade, and the parameter you depended on no longer exists. You carried your sysctl config forward, confident it would apply. On the next kernel it pointed at nothing. Infrastructure strata shift like this, quietly, without warning.

Today containers and managed services hide the problem. Port exhaustion rarely surfaces in personal projects. Connect over a Unix domain socket and the concept of ports disappears altogether. As long as you run on localhost, you never learn how small 65,535 really is. Everything works without knowing. But push ignorance into production traffic and there comes a moment when the math stops adding up.