Preamble

Comparing Gleam on the BEAM with Rust and Tokio only works if the workload is the same story told twice. Too many benchmark posts swap queue depths, thread counts, and random seeds between runs and still draw sweeping conclusions. This post is the boring anchor on purpose: pin inputs, semantics, and metrics before Gleam on the BEAM: Actors, Types, and OTP Primitives spins up BEAM processes and Rust and Tokio: The Same Concurrent Workload in Type-Safe Threads mirrors the shape in async Rust.


Producers, bounded queues, and backpressure

Producers emit tagged jobs into a bounded queue or mailbox. The bound is not optional—unbounded queues make overload look like “infinite throughput” until memory explodes. Backpressure should be visible: producers block, drop with metrics, or apply a policy you document.

Tags on jobs matter when you later slice metrics (e.g. “small compute” vs “simulated slow I/O”) or when you assert fairness across job classes.


Workers and simulated I/O

Each worker pulls a job, performs a small deterministic compute step, then optionally sleeps for a configurable duration to stand in for disk or network latency. Determinism keeps runs diffable; the sleep distribution keeps the scheduler honest—pure CPU micro-benchmarks lie about real services.


Failure injection

Real systems lose workers. The spec includes random worker crashes (or kills) with a documented probability and seed. Recovery semantics must be spelled out: BEAM will lean on supervision; Rust will use panic hooks, task joins, or explicit restart loops—Supervision Trees and Rust Task Hierarchies compares those operational stories.


Collectors and aggregation

A collector (or shared aggregator) records completion counts, error counts, and timing. Whether aggregation is a single process or a mutex-protected structure changes contention; the important part is that both language implementations expose the same externally visible metrics, not that they use identical internal tricks.


Metrics to capture

  • Throughput (jobs completed per second, steady state).
  • Latency percentiles for end-to-end job completion—p95 and p99 at minimum.
  • Error rates and recovery time after injected failures (wall-clock until throughput returns within X% of baseline).

Raw numbers without methodology are anecdotes. Configuration files (queue bounds, producer count, seeds) should be checked into the repo beside the prose.


Concrete parameters (copy into bench.toml or env)

These defaults are illustrative—tune to your hardware class, but commit whatever you run:

Knob Example value Why it matters
producer_count 8 Scales offered load; pair with queue depth to see backpressure.
worker_count 32 BEAM: process count; Tokio: task count.
queue_capacity 256 Bounded channel / mailbox depth; unbounded hides overload.
job_tags cpu, io Lets you assert fairness across classes.
work_units 1_000–50_000 deterministic ops Keeps CPU work reproducible (no FP).
io_sleep_ms {10, 50} drawn from seed Simulates latency; mix with CPU jobs.
crash_probability 0.001 per job Triggers supervision / restart paths.
rng_seed 0xC0FFEE Makes crash injection diffable across runs.
steady_state_secs 120 Soak before recording p99; ignore first 30s warm-up.

Reproducible result definition: given the same seed + config + binary/OTP build, throughput, p95/p99 completion time, and time-to-recover-after-crash fall within a tight band (≤5% relative drift is a reasonable sanity gate before publishing charts).


Metrics schema (both implementations log the same JSON lines)

{"ts":"…","event":"job_done","id":4812,"tag":"io","latency_ms":42}
{"ts":"…","event":"worker_crash","worker":7,"reason":"injected"}
{"ts":"…","event":"recovered","throughput_rps":910.2}

The Gleam and Tokio harnesses should emit the same event names so the side-by-side read in Rust versus Gleam on the Same Bench: What the Numbers Suggest is a join on keys, not a narrative.


Conclusion

Gleam on the BEAM: Actors, Types, and OTP Primitives implements this specification on the BEAM with Gleam. Gleam and the BEAM Scheduler Under Load measures scheduler behavior under that load. Rust and Tokio: The Same Concurrent Workload in Type-Safe Threads runs the same workload in Rust/Tokio so Rust versus Gleam on the Same Bench: What the Numbers Suggest can compare apples to apples—not fan fiction.