Skip to content

MySQL vs MariaDB Binlog: The Fork That Split the Stream

I started adding MariaDB support to MygramDB. MygramDB ingests data via MySQL binlog replication and parses the binary format itself. "MariaDB is compatible with MySQL" — I'd heard that enough times to assume a few tweaks would do it.

I opened the MariaDB 13.0.1 source code and learned that assumption was wrong.

Row events are byte-for-byte identical. NULL bitmaps, packed integers, TABLE_MAP — all the same. But GTIDs are a completely different animal. Different event type numbers, different binary layout, different replication protocol. Zero changes to the row parser. A full rewrite of the GTID layer. That's what the work turned out to be.

Background for this article

The basics of binlog structure (event headers, the TABLE_MAP → ROWS two-stage protocol, packed integers, etc.) are covered in "The MySQL Binlog." Changes in MySQL 9.x are in "MySQL 9 Binlog: What's Actually Different." This article is the third in the series, focused on the MariaDB divergence.

What's the Same, What Isn't

The big picture first.

AreaSame?
Magic bytes (\xfebin)Identical
Binlog versionStill 4
Event header (19 bytes, fixed)Identical
CRC32 checksumIdentical
TABLE_MAP_EVENT (type 19)Identical
WRITE/UPDATE/DELETE_ROWS (type 30-32)Identical
NULL bitmaps, packed integersIdentical
DECIMAL, VARCHAR, BLOB encodingIdentical
GTID event type numbersDifferent
GTID binary formatDifferent
GTID string representationDifferent
Replication dump protocolDifferent
MariaDB-specific events (160+)Don't exist in MySQL

If you've written a binlog parser, this table is half reassuring, half alarming. The row data layer — the most complex, most bug-prone part of the parser — doesn't need to be touched. But GTIDs are the backbone of replication. "Partially different" can be harder than "completely different."

2013: Two Answers to the Same Problem

Why does the binlog diverge this much between two projects that share the same ancestor? The source code's comments and copyright headers tell a surprisingly clear story.

MariaDB forked from MySQL 5.5. MySQL 5.5 had no GTIDs. GTIDs arrived in MySQL 5.6, released in 2013.

The MariaDB fork

In 2008, Oracle acquired Sun Microsystems, gaining control of MySQL in the process. Monty Widenius (Michael Widenius), MySQL's original creator, was concerned about MySQL's future under Oracle and forked it in 2009 as MariaDB — named after his daughter. Development began from MySQL 5.5, and the two have diverged ever since.

MariaDB was building its own GTID implementation independently. The copyright header in include/rpl_gtid_base.h reads "Kristian Nielsen and MariaDB Services Ab, 2013, 2024." At roughly the same time MySQL 5.6 shipped, a different team solved the same problem with a different design.

MySQL went with UUID:sequence. Each server gets a UUID, each transaction a sequence number. Globally unique. Simple.

MariaDB chose a different path. domain_id-server_id-seq_no — three numbers joined by dashes. The key is that leading domain_id.

domain_id — Why MariaDB Needed It

From sql/rpl_gtid.h:

For every independent replication stream (identified by domain_id), this remembers the last gtid applied on the slave within this domain. Since events are always committed in-order within a single domain, this is sufficient to maintain the state of the replication slave.

"Independent replication streams." That's the design principle behind MariaDB's GTIDs.

MySQL's GTIDs assume a single global stream. One primary to one replica, one flow. For tracking "how far have I applied" during failover, that's enough.

MariaDB was solving a different problem. Multi-source replication — multiple masters feeding into one slave — where each master's stream needs to be tracked independently.

What is multi-source replication?

Normal replication is one-to-one (primary → replica). Multi-source aggregates changes from multiple primaries into a single replica. Think: merging changes from several sharded databases into one analytics server. You need to track each primary's changes without mixing them up.

With domain_id, events within each domain are guaranteed to be committed in order. Domains are independent. If one domain's replication falls behind, the others are unaffected.

sql/rpl_gtid.cc contains the control logic for when the same GTID arrives from multiple master connections in a multi-source setup. The --gtid-ignore-duplicates option prevents duplicate application of the same GTID. Only one Relay_log_info at a time can "own" a domain; other connections wait until ownership is released.

This design also feeds into parallel replication. MariaDB's GTID event carries an FL_ALLOW_PARALLEL flag (bit 3). Event groups from different domains can be applied in parallel as long as there are no dependencies. The FL_WAITED flag (bit 4) indicates that a row lock wait was detected during execution — useful metadata for parallel apply decisions.

MySQL later implemented logical-clock-based parallel apply via MTS (Multi-Threaded Slave). Different approach, same problem.

The Binary Format in Practice

Comparing the binary layouts as read from the source code.

MySQL GTID_LOG_EVENT (type 33)

MySQL's GTID event is a 16-byte binary UUID followed by an 8-byte sequence number.

FieldSizeContent
flags1 byteCommit flag
UUID16 bytesServer UUID (binary)
gno8 bytesTransaction number

String representation: 550e8400-e29b-41d4-a716-446655440000:42

MariaDB GTID_EVENT (type 162)

MariaDB's GTID event has a 19-byte post-header. The comment in log_event.h explains why:

The binary format for Gtid_log_event has 6 extra reserved bytes to make the length a total of 19 byte (+ 19 bytes of header in common with all events). This is just the minimal size for a BEGIN query event, which makes it easy to replace this event with such BEGIN event to remain compatible with old slave servers.

The 19-byte size matches the minimum size of a BEGIN query event. When sending to old slaves, the GTID event can be rewritten in-place as a BEGIN query event. The size was chosen deliberately.

FieldSizeContent
seq_no8 bytesTransaction sequence number (LE)
domain_id4 bytesReplication domain ID (LE)
flags21 byteFL_STANDALONE, FL_GROUP_COMMIT_ID, etc.
reserved / commit_id6 / 8 bytesReserved or group commit ID

server_id comes from the common event header (bytes 5–8 of the 19-byte header). It's not in the post-header.

String representation: 0-1-42 (domain_id=0, server_id=1, seq_no=42)

The flags2 bitfield is interesting.

BitNameMeaning
0FL_STANDALONEStandalone event, no terminating COMMIT
1FL_GROUP_COMMIT_IDPart of a group commit
2FL_TRANSACTIONALCan be safely rolled back
3FL_ALLOW_PARALLELParallel replication allowed
4FL_WAITEDRow lock wait detected
5FL_DDLContains DDL
6FL_PREPARED_XAXA transaction prepared
7FL_COMPLETED_XAXA transaction completed

MySQL's GTID event has nothing like this level of parallel replication control. MariaDB embeds replication optimization metadata directly in the GTID event itself.

MariaDB GTID_LIST_EVENT (type 163)

The equivalent of MySQL's PREVIOUS_GTIDS_LOG_EVENT (type 35). Logged at the start of each binlog file, it records the current replication state.

count   4 bytes (lower 28 bits = count, upper 4 bits = flags)
entry × count:
  domain_id  4 bytes
  server_id  4 bytes
  seq_no     8 bytes

16 bytes per entry. element_size = 4+4+8 in the source code. Using the upper 4 bits of count for flags — extending functionality without adding fields.

The 160s — MariaDB's Own Territory

MariaDB uses its own event type numbers starting from 160. The source code defines MARIA_EVENTS_BEGIN= 160.

TypeNamePurpose
160ANNOTATE_ROWS_EVENTSQL annotation for ROWS events
161BINLOG_CHECKPOINT_EVENTBinlog checkpoint
162GTID_EVENTMariaDB GTID
163GTID_LIST_EVENTGTID list at binlog start
164START_ENCRYPTION_EVENTBinlog encryption
165QUERY_COMPRESSED_EVENTCompressed query event
166-168ROWS_COMPRESSED_EVENT_V1Compressed V1 row events
169-171ROWS_COMPRESSED_EVENTCompressed V2 row events
172PARTIAL_ROW_DATA_EVENTPartial row data

MySQL's types 33–35 (GTID_LOG_EVENT, ANONYMOUS_GTID, PREVIOUS_GTIDS) are explicitly commented in MariaDB's source:

c
/* MySQL 5.6 GTID events, ignored by MariaDB */
GTID_LOG_EVENT= 33,
ANONYMOUS_GTID_LOG_EVENT= 34,
PREVIOUS_GTIDS_LOG_EVENT= 35,

In the parser implementation, these are handled as Ignorable_log_event. Skipped.

From a parser writer's perspective, the 19-byte event header with its event length field is what makes this work. Unknown event types can be skipped by length. A MySQL parser reading a MariaDB binlog can skip type 162 as "unknown event." The reverse works too. Credit to whoever designed that header 20 years ago.

Telling Them Apart

The magic bytes can't distinguish MySQL from MariaDB. Both are \xfebin.

There are three ways to tell.

1. Version string in FORMAT_DESCRIPTION_EVENT

The FORMAT_DESCRIPTION_EVENT at the start of every binlog file contains a 50-byte server version string. MariaDB includes "MariaDB" — e.g., "13.0.1-MariaDB". The Format_description_log_event class in the source code has a master_version_split parser that distinguishes KIND_MYSQL from KIND_MARIADB.

2. SELECT VERSION()

Run SELECT VERSION() at connection time and check for "MariaDB" in the result. This works for replication connections, not for reading binlog files directly.

3. Event type numbers

If events with type 160+ appear, it's MariaDB. Type 162 (GTID_EVENT) appears at the start of every transaction, so a few events into the stream is all it takes.

For MygramDB, I went with SELECT VERSION() at connection time, storing the flavor in the Connection object.

Replication Protocol Differences

The protocol for starting GTID-based replication is fundamentally different between MySQL and MariaDB.

MySQL: COM_BINLOG_DUMP_GTID

MySQL uses a dedicated COM_BINLOG_DUMP_GTID command. The GTID set is binary-encoded and packed into the request packet.

MariaDB: SQL variables, then COM_BINLOG_DUMP

MariaDB is simpler. GTID position is set via SQL session variables, then the old COM_BINLOG_DUMP command is issued.

sql
SET @slave_connect_state='0-1-42,1-1-100';  -- GTID position
SET @slave_gtid_strict_mode=1;
SET @master_heartbeat_period=3000000000;     -- 3 seconds
SET @master_binlog_checksum=@@global.binlog_checksum;

Then a COM_BINLOG_DUMP packet is constructed and sent:

binlog_pos    4 bytes   Start position (usually 4)
binlog_flags  2 bytes   Flags
server_id     4 bytes   Replica server ID
filename      variable  Binlog filename

Where MySQL's COM_BINLOG_DUMP_GTID packs the GTID set into a single binary packet, MariaDB combines SQL and the binary protocol. From an implementer's perspective, MariaDB is easier to debug. You can inspect the state in SQL.

The catch is that sending the COM_BINLOG_DUMP packet requires simple_command() and reading responses requires cli_safe_read() — both internal client library functions. They need extern "C" linkage.

COM_BINLOG_DUMP vs COM_BINLOG_DUMP_GTID

Both are commands defined in MySQL's binary protocol. COM_BINLOG_DUMP dates back to MySQL 4.0, specifying stream position by binlog filename + offset. COM_BINLOG_DUMP_GTID was added in MySQL 5.6, with the GTID set binary-encoded in the packet. MariaDB uses the older COM_BINLOG_DUMP but pre-sets the GTID position via SQL session variables.

Reading the Source

MariaDB's sql/log.cc is 15,000 lines. sql/log_event.h is 6,200. sql/log_event_server.cc is 8,900. sql/sql_repl.cc is 5,800. sql/rpl_gtid.cc is 4,100. Over 40,000 lines of binlog-related code. I didn't read all of it. I traced the parts related to GTIDs and event definitions.

What stood out was the obsession with backward compatibility embedded throughout. Sizing the GTID event to match a BEGIN query event. Jumping event type numbers to the 160s to avoid collisions with MySQL. Making each other's GTID events safely skippable as Ignorable_log_event. Two projects that forked from the same codebase, still careful not to break each other's streams.

MariaDB's build configuration

The default build type is RelWithDebInfo — release optimization with debug info. C++17 standard. -fno-omit-frame-pointer preserves frame pointers, -D_FORTIFY_SOURCE=2 catches buffer overflows. ASAN, TSAN, UBSAN, and MSAN are all supported. For a codebase this size, the debugging experience is surprisingly good.

Impact on MygramDB

The practical breakdown:

AreaChange needed
Row event parser (rows_parser)None
NULL bitmaps, packed integersNone
TABLE_MAP cacheNone
GTID parsingNew implementation
Replication protocolNew implementation
Server flavor detectionNew implementation
Connection validationFlavor-aware branching

I introduced an IBinlogStream interface and use the Strategy Pattern to switch between MySQL and MariaDB stream implementations. Row event processing stays shared. Only the GTID layer branches by flavor.

The design cut cleanly because the divergence point is clear. The boundary between "compatible" and "completely different" isn't vague — it's precise at the binary level.

Same Face, Different Personality

MySQL and MariaDB binlogs are like twins. Same magic bytes, same headers, same row data. They look identical, and that's where the "compatible" reputation comes from.

But when it comes to GTIDs — the heart of replication — their personalities diverge. MySQL chose UUID-based global uniqueness. MariaDB chose domain-based independent streams. In 2013, two teams answered the same question differently.

For someone who writes binlog parsers, "row data is identical" is the thing that matters most. The most complex, most bug-prone part of the parser is shared. GTIDs need a redesign, but it's a well-defined scope. Not an unbounded compatibility problem.

Same face doesn't mean same person. But there's no question they share a parent.