Rust versus Gleam on the Same Bench: What the Numbers Suggest

Preamble

Gleam and the BEAM Scheduler Under Load ran the workload on the BEAM; Rust and Tokio: The Same Concurrent Workload in Type-Safe Threads ran it on Tokio. This post is the side-by-side read: same hardware class, same seeds, same queue bounds, configuration checked into the repo next to the charts. The aim is not a single winner but which trade-offs show up for this workload and how fragile those conclusions are if the workload drifts.

Throughput and scaling producers

Producer count is swept against fixed worker pools and channel/mailbox bounds. BEAM may shine when tasks are massively concurrent and mixed with sleeps; Rust may shine when CPU efficiency and predictable native codegen dominate. The crossover depends on message sizes, allocation rates, and whether anyone blocked a scheduler thread.

When charts diverge, the question is whether divergence is fundamental (scheduler design) or accidental (tuning mistake, blocking call, logging overhead).

Memory residency

Process heaps on the BEAM versus Rust allocator behavior tell different stories under load. RSS snapshots during steady state and immediately after recovery runs belong in the appendix. Micro-benchmarks that ignore allocator fragmentation still teach—but they do not replace long-soak tests.

Recovery after injected failures

A Language-Agnostic Concurrent Workload for 2025 Comparisons required random worker crashes. Wall-clock time is measured until throughput returns within a negotiated band of baseline. BEAM supervision often makes this story short if restart policies match the failure mode. Rust typically composes panic handling, JoinHandle inspection, and application-level watchdogs—Supervision Trees and Rust Task Hierarchies names those patterns explicitly.

Developer ergonomics

Typing Send/Sync in Rust is work up front; writing supervisor trees in OTP is also work, just distributed across config and process design. Neither side is “free” in caricature terms. The question is which failure modes your team can operationalize at 3 a.m.

Polyglot services

When neither runtime alone fits, split services behind HTTP/gRPC (see Polyglot Interop: HTTP and gRPC Between Python and Java) and own the integration tax consciously. Benchmarks inform boundaries; they do not replace domain decomposition.

Side-by-side code: same job, two runtimes

BEAM (Erlang) — preemptive scheduling + process isolation; long CPU without yields can still hurt, but reduction counting bounds greedy processes in the common case:

-module(demo).
-export([worker/0]).

worker() ->
    receive
        {job, Id, Work, Sleep} ->
            run(Work),
            timer:sleep(Sleep),
            io:format("{\"event\":\"job_done\",\"id\":~p}~n", [Id]),
            worker()
    end.

run(0) -> ok;
run(N) -> run(N - 1).

Rust / Tokio — cooperative tasks; fairness depends on not blocking the executor; sleep yields; CPU work must stay bounded per .await boundary or move to spawn_blocking:

async fn worker(mut rx: mpsc::Receiver<Job>) {
    while let Some(job) = rx.recv().await {
        for _ in 0..job.work_units {
            std::hint::black_box(());
        }
        tokio::time::sleep(Duration::from_millis(job.sleep_ms)).await;
        // emit same JSONL { "event":"job_done", "id": job.id }
    }
}

How to read the comparison: the JSONL lines should match shape across runs so plots in this comparison are apples-to-apples. Divergence in p99 with similar mean throughput often signals scheduler blocking (NIF/driver on BEAM; blocking syscalls on Tokio) or contention on a shared aggregator—fix those before declaring one runtime “faster.”

Reading results without fooling yourself

Symptom	Likely cause	First experiment
BEAM mean OK, p99 awful	long NIF, port driver, or huge messages	isolate with `:erlang.process_info/2`, reduce payload
Tokio mean OK, p99 awful	blocking in async task, lock contention	`tokio-console`, move blocking work off runtime
Both degrade together	OS / disk / logging storm	control hardware, mute hot logs, retest

Conclusion

Micro-benchmarks favor narratives—it helps to mark which results are workload-specific and which illustrate scheduler fundamentals. This post is the comparison pass: same seeds, same bounds, same metrics schema. Supervision Trees and Rust Task Hierarchies generalizes recovery semantics—supervision versus explicit restart loops—once the raw numbers are trustworthy.