Why WASM Lost to js-yaml

I once plotted to replace js-yaml with a Rust + WebAssembly rewrite. Make it fast. Bury the benchmark. Partly because people around me kept saying I should try Rust for once. I named it "fast-yaml." I ran the benchmarks. It was slower.

I rebuilt it in C++. Slapped a green "20× faster" badge on the README. Before recording any benchmarks. Never even pushed the repo.

This is about the line between where WASM gets faster and where it doesn't.

js-yaml Is Harder to Beat Than It Looks

js-yaml is a pure JavaScript YAML parser. V8's JIT compiler turns JavaScript into machine code, detects hot paths, and optimizes them. String operations and object creation are what V8 does best — decades of tuning behind it.

On YAML 1.2 spec compliance, js-yaml isn't perfect. About 89 of the 351 YAML Test Suite cases show non-compliant behavior. It cuts corners on the spec.

YAML 1.2 is more complex than it looks

YAML looks like "a readable config file format." The spec is 84 pages. Aliases, anchors, tags, multi-document streams, mixed flow and block styles, five scalar quoting styles. Whether yes gets interpreted as a boolean differs between YAML 1.1 and 1.2.

What is the YAML Test Suite?

The official test suite for mechanically verifying YAML spec compliance. Each case includes YAML input, an expected event tree (+STR +DOC +SEQ +MAP =VAL token sequences), and a JSON equivalent. It covers 351 cases — from spec examples (Spec Example 2.4: Sequence of Mappings) to edge cases like circular alias references, mixed flow/block styles, and implicit keys. The de facto standard for measuring parser correctness.

fast-yaml used Rust's yaml-rust2 internally and passed all 351 YAML Test Suite cases. It was more spec-compliant than js-yaml. The problem was speed.

Rust + WASM: The Optimization Record

Initial Implementation

The initial design was naive. Parse YAML with yaml-rust2, receive the AST in Rust, convert it to JavaScript objects, return them.

The bottleneck is obvious. Every YAML node crosses the WASM↔JS boundary to create a JavaScript object. A 1MB YAML file (~10,000 items) triggers tens of thousands of boundary crossings.

On top of that, the initial implementation had console.log in the hot path. Debug logging leaking into benchmarks. A rookie mistake.

Optimization Attempts

Three things I tried.

1. Remove console.log

Stripped all logging from the hot path. Some improvement, but not fundamental.

2. Enable wasm-opt

Turned on the WASM binary optimizer wasm-opt with -O3 (speed) and -Oz (size). LTO, codegen-units = 1, and panic = 'abort' were already enabled from the start.

3. JSON String Bridge

The biggest design change. Instead of creating JS objects per node, convert YAML to a JSON string inside WASM and pass that single string to the JS side for JSON.parse() to reconstruct.

Boundary crossings dropped from tens of thousands to one. The implementation got simpler too.

Benchmark Results

Measured with hyperfine (3 warmup runs, 10 measurement runs, averaged).

Size	fast-yaml (Rust/WASM)	js-yaml	Factor
10KB	36.4ms	30.3ms	0.83× (20% slower)
100KB	38.0ms	36.3ms	0.95× (5% slower)
1MB	90.4ms	69.4ms	0.77× (30% slower)

The 1MB case was 106.5ms before optimization, so the 15% improvement was real. At 100KB, it reached 95% of js-yaml's speed. But it lost at every size.

The telling trend: the larger the file, the wider the gap. 20% slower at 10KB, 30% slower at 1MB. WASM overhead scales with data size.

Benchmark conditions

Test data was synthetic YAML (nested object arrays). hyperfine launches a fresh Node.js process per run, so WASM module initialization cost is included. Hot-loop comparisons would narrow the gap, but in real-world use, initialization cost is real cost.

Why WASM Lost

Boundary Cost

Crossing the WASM↔JS boundary always costs something. Even the JSON string bridge involves copying. The string sitting in WASM's linear memory gets copied to the JS heap, then JSON.parse() scans it again to build JS objects. Parsing happens twice.

The two steps highlighted in red don't exist in js-yaml. js-yaml converts YAML strings directly into JS objects. No intermediate representation.

V8's JIT Optimization

V8's JIT compiler is heavily optimized for JavaScript string operations and object creation. Hidden Classes for fast property access, inlining, Escape Analysis to eliminate unnecessary allocations — none of these are available through WASM.

js-yaml tends to generate objects with the same shape every time, so V8 reuses Hidden Classes efficiently. Objects created via JSON.parse() from WASM get the same benefit, but the cost of building and transferring the string is added on top.

The Nature of YAML Parsing

A YAML parser takes a string as input and produces JS objects as output. Both the input and output are JS-side data types. WASM excels when it receives input, does heavy computation internally, and returns results — audio analysis, image processing, numerical simulation. WASM is fast when computation cost dwarfs I/O cost.

YAML parsing is computationally light. Most of the time goes to token reading and object construction. When computation is light, boundary cost dominates.

The C++ Version: One More Time

After the Rust version lost, I built a C++ version. This time not WASM but Node.js N-API — a native addon.

N-API is a different approach. C++ code runs directly inside the Node.js process, creating JS objects through V8's API. No linear memory copies like WASM, so boundary overhead is smaller.

I chose rapidyaml (ryml) as the internal parser. A C++-based high-performance YAML parser, faster than yaml-rust2. Bundled RapidJSON for JSON Schema validation.

I wrote target numbers in the README. 1MB load: js-yaml 40ms down to under 2ms. 20× faster. Put a GitHub badge on it too. A green "20× faster" shining bright. Before recording any benchmarks.

N-API vs WASM

WASM: Compile a language to WASM bytecode, run it in V8's WASM runtime. Sandboxed — explicit boundary crossings required to talk to JS. Cross-platform with a single .wasm file.

N-API: Compile C/C++ as a Node.js native addon. Direct access to V8's API, so JS object creation cost is lower. But you need separate binaries per platform (Windows x64, macOS arm64, Linux glibc, etc.).

Design

The C++ version went in with a serious architecture, informed by the Rust version's lessons.

Dependency injection for Scanner/Emitter/Validator separation
Both sync and async N-API (libuv worker threads)
Full js-yaml API compatibility (load, dump, safeLoad, safeDump, loadAll)
JSON Schema validation
Comment preservation
CLI tools (lint, fmt, bench)

Fifteen C++ source files, fifteen TypeScript source files. Tests in both GoogleTest and Jest. Test fixtures exhaustively extracted from js-yaml's edge cases.

I didn't keep the benchmark results. The source code sits on a local drive, never published to GitHub. Draw your own conclusions.

Why C++ Couldn't Win Either

N-API's boundary cost is lower than WASM's, but not zero. Converting parsed results to JS objects still means setting properties one by one through V8's API.

At the end of the day, js-yaml creates JavaScript objects directly from JavaScript. No middle layer. No matter how fast the parser sitting in between, the cost of constructing JS objects stays the same. The faster the parser gets, the larger the share of total time that object construction takes.

When WASM Wins

I've built several other WASM/C++ projects. A full-text search engine, a Japanese tokenizer, a music generator, an audio analysis library. They're all fast in WASM. Only fast-yaml lost.

What's different?

Project	Input	Output	WASM-internal computation	Result
Full-text search engine	Query string	Document ID list	Inverted index scan, scoring	✔
Japanese tokenizer	Text	Token array	Lattice construction, Viterbi search	✔
Music generation	Seed parameters	MIDI data	Counterpoint constraints, Markov chains	✔
Audio analysis	Audio data	BPM/key/chords	FFT, HPSS, chroma	✔
YAML parser	YAML string	JS objects	Token reading	×

A pattern emerges. WASM wins when internal computation is heavy and data round-trips to JS are few.

Audio analysis does massive floating-point computation — FFT, HPSS — entirely inside WASM. Output is small values like BPM and key. Boundary cost is a tiny fraction. The Japanese tokenizer runs lattice search inside WASM and returns only the token array.

The YAML parser is the opposite. Input is a JS string, output is JS objects. The computation in between — token reading, syntax tree construction — is light as parsers go. Right in the sweet spot of what V8's JIT compiler does best.

Lessons

WASM runs at "near-native speed." That's true — for computation inside WASM. Crossing the JS boundary has a cost. When input and output are both JS types and the internal computation is light — when you're challenging the JS runtime in the territory it's most optimized for — WASM loses.

Neither the Rust version nor the C++ version of fast-yaml ever made it to npm. They passed all 351 YAML Test Suite cases. More spec-compliant than js-yaml. Just not faster.

A project named "fast" that turned out slow. It taught me where to use WASM and where not to. The "20× faster" badge is still sitting in a local repo.

Why WASM Lost to js-yaml ​

js-yaml Is Harder to Beat Than It Looks ​

Rust + WASM: The Optimization Record ​

Initial Implementation ​

Optimization Attempts ​

Benchmark Results ​

Why WASM Lost ​

Boundary Cost ​

V8's JIT Optimization ​

The Nature of YAML Parsing ​

The C++ Version: One More Time ​

Design ​

Why C++ Couldn't Win Either ​

When WASM Wins ​

Lessons ​