Skip to content

Things Break

Data centers don't go down. I believed that.

Redundant power feeds, uninterruptible power supplies, diesel generators. "Even in a disaster, service continues." The sales brochure said so. I believed it and took on hosting contracts for major enterprises. This was about ten years ago, at a large conglomerate-run IDC in Tokyo.

It went down.

During the switchover from commercial power to emergency power, a fuse blew. The mechanism designed to perform the switch itself broke. The redundancy junction was a single point of failure. All power died.

When I arrived on the IDC floor, the scene was strange. People in suits had converged on the server room. Normally the only people here wore T-shirts and cargo pants. Suits meant "people who have to explain things," and this many suits meant that many people needed explanations. You can read a crisis by the dress code.

Power gets duplicated. Servers get duplicated. STP makes switches redundant. But all of it sits inside the same building. This time it was only the power, but if the whole building goes down, redundancy means nothing.

I knew this in theory. The disaster recovery textbook says "span your sites." But until I lived through "it actually goes down," those were just words on a page.

Since that day, I can't take "it never goes down" at face value. It's not that it doesn't go down. The probability is just low. And low-probability events, given enough time, always happen.

Cross-site disaster recovery costs money. It's tedious. Testing is hard. But we live in a world where one fuse stops everything.