Skip to content

Tolerance

On one operations project, I was told to lower the MySQL slow query threshold every week for availability improvements. Started at 5 seconds, ended at 1. Catching missing indexes and inefficient queries made sense. But at 1 second, queries started tripping on storage latency alone.

I told a friend who works in architectural design. He laughed. Then he told me about screws.

Screws have tolerances. Under JIS standards, an M8 bolt has a nominal outer diameter of 8mm, but the actual acceptable range is 7.760mm to 7.972mm. A perfect 8.000mm screw doesn't exist. Can't be made. And even if it could, you couldn't tighten it. A screw turns because there's a slight gap between it and the nut. Tolerance isn't a defect. It's a feature.

The same screw from a hardware store and one used in aircraft are priced worlds apart. The tolerance isn't especially tighter. What's different is the acceptable defect rate. Every unit inspected, every part traceable. The required reliability determines how deep quality control goes.

What's even more interesting is that sometimes you need the gap. Bridge joints have expansion devices. Steel expands with temperature. Without gaps, the bridge warps — or worse, breaks. Same with railroad tracks. That rhythmic clatter is designed clearance.

In IT, you're conditioned to treat deviation as failure. Types must match. Response times must stay under threshold. Data must be perfectly consistent. That's not wrong. But carry that mindset into the physical world and everything looks broken. Reality runs on accumulated, permitted tolerances.

That project where they kept demanding sub-1-second — maybe it was like requesting a zero-tolerance screw. I didn't have the nerve to say so, though.