Batch Night
I try not to write batch jobs.
More than ten years ago, a monthly batch failed. On New Year's Eve. So I rang in the new year on my laptop. Best start to a year you could ask for. I don't remember the exact cause anymore. A recent maintenance changed the table structure, the query plan shifted, timeout — something like that. It had run fine every month until the one month it didn't. Which is why I spent New Year's Day working remotely — with a client explanation call as a bonus.
Batch jobs are considered simple. Smaller than the main service, straightforward in scope. So they get handed to junior engineers. This is where things go wrong.
Consider what a batch job actually needs. Can you dry run it and verify results before execution? If it dies midway, can you rerun without double-processing? Idempotency. Does it shut down gracefully on SIGTERM? Can it resume from where it stopped? Retry when a connection drops. Do the logs track what was processed and what was skipped? Bulk update batches need throttling to avoid crushing the database or downstream systems. Conversely, if throughput matters, you need multithreading or worker pools — and the concurrency control that comes with them.
Line all that up, and no one should be handing this to a junior because it looks simple. Batch jobs are senior territory.
One more thing. When a batch job is needed — outside of integrations with external systems — it's often an architecture problem. Something that could be processed in real time is being deferred to a batch because the design didn't account for it. Before writing a batch, review whether a batch-free design is possible.
Fewer batches is the best design. Correct batches is the next best. Handing batches to juniors is the worst call. Either way, cleaning up the mess is a senior's job.