The MySQL Binlog
I wrote a MySQL binlog parser from scratch. When I built MygramDB — an in-memory full-text search engine — I needed to ingest MySQL data in real time. Reading the binlog seemed like the obvious path. I started down that path and found a hole about ten times deeper than expected.
What is the binlog?
MySQL's binary log records every data change (INSERT, UPDATE, DELETE). It serves three main purposes: replication (copying data to replicas), point-in-time recovery (restoring to a specific moment), and CDC (Change Data Capture — streaming changes to external systems in real time). MygramDB uses it for CDC.
Three Recording Formats — And History
The binlog has three recording formats: Statement-based (SBR), Row-based (RBR), and Mixed. This isn't a feature list — it's a timeline.
Statement-based is the oldest. It logs the SQL statement itself. UPDATE users SET active = 0 WHERE last_login < '2025-01-01' gets recorded as-is. Compact and human-readable.
The problem was nondeterminism. Replay a statement containing NOW() on a replica and the timestamp differs. A DELETE ... LIMIT without ORDER BY removes different rows. A trigger generating UUIDs produces different values. Primary and replica silently diverge.
Row-based (MySQL 5.1+) records which rows changed and how. For an UPDATE, the before and after images of each row are written in binary. Nondeterminism doesn't apply — results are recorded, not instructions. NOW(), RAND(), whatever — the rows are the rows.
The trade-off is log size. A single SQL updating 10,000 rows is one statement in SBR. In RBR, it's 10,000 rows of before/after data.
Mixed lets MySQL auto-switch based on context. Safe statements use SBR; nondeterministic ones fall back to RBR. Clever. But predicting which mode gets chosen is hard. For anyone writing a CDC parser, having everything in Row-based is far easier to deal with. There's almost no reason to use anything other than binlog_format=ROW in production today.
The arrival of GTIDs (MySQL 5.6)
GTID (Global Transaction Identifier) assigns each transaction a unique ID: server UUID + sequence number. Before GTIDs, replication tracked position by binlog filename + byte offset. When the primary changed, position mappings broke and manual reconfiguration was required. With GTIDs, "applied up to this ID" is the only state needed. Failover became dramatically simpler. MygramDB's replication is GTID-based.
The Binary Format Abyss
The first thing I thought while writing the parser: "What was the person who designed this fighting against?"
A binlog event starts with a 19-byte fixed header. Timestamp (4 bytes), event type (1 byte), server ID (4 bytes), event length (4 bytes), next event position (4 bytes), flags (2 bytes). All little-endian. So far, so normal.
The hell is what comes after.
TABLE_MAP → ROWS: A Two-Stage Protocol
Row-based events come in two stages. First, a TABLE_MAP_EVENT announces the table schema — table name, column types, nullable flags, and so on. Immediately after, a WRITE_ROWS_EVENT (INSERT), UPDATE_ROWS_EVENT (UPDATE), or DELETE_ROWS_EVENT (DELETE) carries the actual row data.
A ROWS event alone doesn't say "this column is a VARCHAR." Without the preceding TABLE_MAP_EVENT, decoding is impossible. The parser must be stateful.
The Packed Integer Trap
MySQL's binary protocol encodes integers differently depending on context. Small values use 1 byte, larger ones use 3 or 8 — the packed integer (length-encoded integer). The first byte's value determines how many bytes follow.
| First byte | Meaning |
|---|---|
| 0–250 | The value itself (1 byte) |
| 252 | Next 2 bytes hold the value |
| 253 | Next 3 bytes |
| 254 | Next 8 bytes |
Looks simple. But this encoding appears everywhere — VARCHAR lengths, BLOB sizes, row data after the NULL bitmap. Miss one and you're off by a byte. Everything downstream breaks. Debugging is not fun.
DECIMAL Binary Encoding
The most bewildering format in the binlog was DECIMAL.
To store "12345.67" in a DECIMAL(10,2), MySQL splits the integer and fractional parts, groups each into chunks of up to 9 digits, and packs each chunk into a 4-byte big-endian integer. Leftover digits use 1–4 bytes depending on the count. The sign is encoded by XOR-flipping the MSB of the first byte. Negative numbers additionally bit-invert every byte.
DECIMAL binary format in detail
For example, storing "-12345.67" in DECIMAL(10,2):
- Integer part "12345" → split into 9-digit groups: "12345" (5 digits, 3 bytes)
- Fractional part "67" → split into 9-digit groups: "67" (2 digits, 1 byte)
- Write integer part as big-endian:
0x00 0x30 0x39 - Write fractional part as big-endian:
0x43 - Negative, so bit-invert all bytes:
0xFF 0xCF 0xC6 0xBC - XOR the MSB (0x80) of the first byte:
0x7F 0xCF 0xC6 0xBC
Why big-endian when everything else is little-endian? Why XOR? Because it makes memcmp produce correct sort order. This encoding lets byte-level lexicographic comparison equal numeric comparison. Rational. Not kind to the person writing the parser.
When I wrote this decoder in mysql-event-stream's rows_parser, I prepared 37 test cases. Positive, negative, zero, leading zeros, fractional-only, integer-only, precision edge cases. Every case was verified against MySQL's actual output. One mismatch means corrupted data.
The Anatomy of Replication Lag
Replication lag — the gap between primary and replica — is unavoidable with binlog replication.
The structural cause is straightforward. The primary processes transactions in parallel across many threads and writes to the binlog. The replica traditionally had a single-threaded SQL Applier that read these events and reapplied them serially. Changes written in parallel on the primary, processed on a single thread on the replica. Under load, it can't keep up.
MySQL 5.6 introduced database-level parallel apply. MySQL 5.7 evolved this into MTS (Multi-Threaded Slave) with logical-clock-based parallelism — "if transactions committed simultaneously on the primary, they can be applied in parallel on the replica." MySQL 8.0.27 added Writeset-based dependency tracking, pushing parallelism further.
The MygramDB experience
MygramDB acts as a MySQL replica but doesn't replay SQL. It reads ROWS events and directly updates its n-gram index. Standard MTS doesn't apply here — I had to design event parallelism from scratch.
MygramDB's BinlogReader uses two threads with a 10,000-event queue between them. The Reader thread pulls binlog events and pushes them into the queue. The Worker thread pops events and updates the index. When the queue fills, the Reader blocks, creating natural backpressure. The MySQL server pauses binlog transmission and waits.
binlog.cc — 10,771 Lines Deep
I opened MySQL's source code when a parser bug refused to yield. sql/binlog.cc — 380KB, 10,771 lines. Group commit, rotation, purging, recovery — all in one file. Even with AI summarizing sections for me, reading it end to end took an entire day. Without that help, I'd have lost my mind.
But reading it revealed why the binlog format is the way it is. Compatibility constraints. When adding a new event type, old MySQL versions must not crash. Unknown events must be skippable — hence the event length in the header. V2 rows events use separate type codes (30 vs. 23) to maintain backward compatibility.
It's a format with layers of history stacked on top of each other. Elegant? Honestly, debatable. But as a protocol that has maintained backward compatibility while evolving for over 20 years, it's remarkably well-built. Casually criticizing something that hasn't broken is easy when you've never had to keep it running.
Writing a Parser
From starting MygramDB's binlog parser to reaching stability took months. When I extracted it into mysql-event-stream, I redesigned from scratch. Two passes to finally understand how it should have been written.
Writing a binlog parser is an exercise in retracing MySQL's design decisions. Why packed integers? (Bandwidth savings.) Why are TABLE_MAP and ROWS separate? (To avoid resending schema for consecutive operations on the same table.) Why is DECIMAL big-endian when everything else is little-endian? (For memcmp sort order.)
Each choice has a reason. A reason not in the documentation. A reason you only find in the source code.
MySQL's binlog is a binary stream compressed from 20+ years of production history. It's not clean. But clean protocols don't always last 20 years.