Memory Warp

The first time I saw memory shared between servers for redundancy was Veritas.

I had been dispatched to support infrastructure buildouts at a large enterprise. The client had a brilliant player-manager on their side. A decision-maker in a suit, plugged into servers, hammering away at a keyboard. Building Veritas Cluster Server with his own hands. Under him, I witnessed Veritas redundancy for the first time.

Shared disk as a given. Failover at the application level. One server dies, the other picks up the service. That much I could grasp. What I could not grasp was the idea of synchronizing memory contents between servers — contents rewriting at microsecond intervals. Are they insane, I thought. It worked, though. Seemed to hold together. Maybe differential updates optimized it somehow. The internals remained opaque to me.

Years passed. I ended up working on a certain company's cloud platform. Under the hood, it ran vSphere.

VMware has a mechanism called vMotion. It migrates a virtual machine between physical hosts while it is still running. Memory contents are copied to the destination, deltas chased continuously, and only at the final moment does it pause for a split second to switch over. The VM sits on shared storage, so the disk stays put. Only memory warps across chassis.

Veritas was application-level failover. vMotion is OS-level. The entire virtual machine, the entire memory space, moves wholesale. Different scale entirely.

vMotion made me suffer plenty. Micro-outages triggered by host maintenance or resource rebalancing. The network blinks. Applications time out. It takes time to realize vMotion is the cause. You chase logs until you finally land on the vCenter event.

Still. Synchronizing memory contents — rewriting at microsecond intervals — across the network, moving a virtual machine while it is still alive. The technology I once thought insane now runs behind every cloud as if it were nothing. As an engineer, I can only stand in awe.

Memory Warp ​

Memory Warp