Dress Rehearsal
I once stared at an ALTER TABLE progress bar for four hours.
It was a payment system. The maintenance window was four hours. Adding a column to a table with a few hundred million rows. I assumed it would finish instantly.
It did not.
Whether ALTER TABLE finishes as a quick metadata change or physically rebuilds the entire table — the line between the two is less obvious than it looks. MySQL version, storage engine, column type, DEFAULT values, index structure. The outcome depends on the combination. The boundary defies intuition. Hit the wrong side on a table with a few hundred million rows, and all you can do is pray.
I prayed.
It had finished in reasonable time on a local setup. So I got careless. Production was on the cloud.
One hour. Still running. Two hours. Three. It blew past the maintenance window. This was a system where ten minutes of downtime meant more than just an incident report. I sat there watching the screen, confronting the fact that there was nothing I could do. The most powerless I've ever felt as an engineer.
The cloud's disk I/O was unpredictable. AWS or GCP would have had more community knowledge to draw from, but the production environment was on a domestic Japanese cloud. Almost no published benchmarks. No war stories. Just because it passed on a local rehearsal didn't mean production would finish in the same time.
What I should have done was keep the ALTER minimal — no rebuild — and backfill the data with a batch job. Run it asynchronously. No downtime needed. I knew this. I knew, and I cut corners.
YAGNI. You Ain't Gonna Need It. A fundamental rule of software development, and I mostly agree. But database design is the exception. The cost of changing the schema on a table that's grown to hundreds of millions of rows is orders of magnitude higher than refactoring application code. You have to read ahead. Sometimes you read wrong. Still better than not reading at all.
As it turned out, the cloud provider's performance was deemed the problem, and their team took the blame. My company wasn't held responsible. Even though I was the one who wrote the ALTER.
Rehearsals matter, but they're not free. Preparing production-equivalent data, standing up a production-equivalent environment. It eats time and infrastructure. Offloading it to AI saves labor costs, but the disk I/O bill stays the same.
So now I create columns ahead of time. Even when I'm not sure they'll be needed. YAGNI would disapprove. But it beats running ALTER TABLE on a few hundred million rows.