🦀 Scientific Computing Benchmark: Rust 🦀 vs Zig ⚡ vs the father C 👴

by Daniel Boros

Jul 7

6 min read

521 views

I've always been a big fan of low-level, hardware-oriented programming languages — especially when it comes to scientific computing. As the maintainer of stochastic-rs, I’ve long wanted to try writing similar code in Zig and C.

I have to admit: I'm not an expert in either Zig or C. That’s exactly why this benchmark is meant to show what happens if someone (like me) who’s used to Rust goes online, reads a few docs, browses some StackOverflow threads, and just tries to implement a basic algorithm "quick and dirty."

Instead of yet another trivial million-times for loop, I picked something more "scientific": a stochastic process that’s widely known in stochastic calculus and mathematical finance — specifically, the Ornstein–Uhlenbeck (OU) process.

We generate 1,000 trajectories, each of length 500,000. The process is not too complex: it is a simple diffusive stochastic process where each increment comes from a normal distribution and is accumulated using a basic Euler scheme.

Mathematically, the OU process is defined as:

dXt=θ(μXt)dt+σdWt dX_t = \theta (\mu - X_t)\,dt + \sigma\,dW_t

where:

  • θ\theta is the speed of mean reversion,
  • μ\mu is the long-term mean,
  • σ\sigma is the volatility,
  • dWtdW_t is the standard Brownian motion increment.

🚀 Let's see the code

🦀 Rust solution

Before diving in, all tests were run on a MacBook Pro M1. I deliberately didn't use stochastic-rs (which already contains a lot of optimizations), but instead wrote a straightforward, iterative generator.

All compiler settings were left at their defaults, and I used rand and rand_distr crates. The code was run with:

cargo run --release
use rand::SeedableRng;
use rand_distr::{Distribution, Normal};
use std::time::Instant;

const N: usize = 500_000;
const T: f64 = 1.0;
const DT: f64 = T / N as f64;
const THETA: f64 = 1.0;
const MU: f64 = 1.0;
const SIGMA: f64 = 1.0;
const NUM_RUNS: usize = 1000;

fn run_once(rng: &mut rand::rngs::SmallRng) -> u128 {
    let start = Instant::now();

    let mut gn = [0.0; N - 1];
    let normal = Normal::new(0.0, DT.sqrt()).unwrap();

    for i in 0..N - 1 {
        gn[i] = normal.sample(rng);
    }

    let mut ou = [0.0; N];
    for i in 1..N {
        ou[i] = ou[i - 1] + THETA * (MU - ou[i - 1]) * DT + SIGMA * gn[i - 1];
    }
    std::hint::black_box(&ou);

    start.elapsed().as_nanos()
}

fn main() {
    let mut total_ns: u128 = 0;
    let mut rng = rand::rngs::SmallRng::from_rng(&mut rand::rng());

    for _ in 0..NUM_RUNS {
        total_ns += run_once(&mut rng);
    }

    let avg_ns = total_ns / NUM_RUNS as u128;
    println!("Average elapsed time over {} runs: {} ns", NUM_RUNS, avg_ns);
    println!("Total: {} ns", total_ns);
}

⚡ Zig solution

In Zig, as a "complete beginner," I quickly ran into some module system struggles before I even got the code to compile.

For random number generation, I used zprob. The following code was run with:

zig build run -Doptimize=ReleaseFast
const std = @import("std");
const zprob = @import("zprob");

const num_runs = 1000;

fn run_once(allocator: *std.mem.Allocator) !u64 {
    const n = 500_000;
    const t = 1.0;
    const dt = t / @as(f64, n);

    var gn = try allocator.alloc(f64, n - 1);
    defer allocator.free(gn);

    var ou = try allocator.alloc(f64, n);
    defer allocator.free(ou);

    const theta = 1.0;
    const mu = 1.0;
    const sigma = 1.0;

    const seed: u64 = @intCast(std.time.microTimestamp());
    var prng = std.Random.DefaultPrng.init(seed);
    var rand = prng.random();
    var normal = zprob.Normal(f64).init(&rand);

    var timer = try std.time.Timer.start();

    for (0..n - 1) |i| {
        const sample = try normal.sample(0.0, std.math.sqrt(dt));
        gn[i] = sample;
    }

    ou[0] = 0.0;
    for (1..n) |i| {
        ou[i] = ou[i - 1] + theta * (mu - ou[i - 1]) * dt + sigma * gn[i - 1];
    }

    return timer.read();
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    var allocator = gpa.allocator();

    var total_ns: u64 = 0;

    for (0..num_runs) |_| {
        const elapsed_ns = try run_once(&allocator);
        total_ns += elapsed_ns;
    }

    const avg_ns = total_ns / num_runs;
    std.debug.print("Average elapsed time over {} runs: {d} ns\n", .{ num_runs, avg_ns });
    std.debug.print("Total: {d} ns\n", .{total_ns});
}

👴 C solution

Surprisingly, my C solution went more smoothly than Zig. For sanity (and simplicity), I implemented my own normal distribution generator using the classic Box-Muller method.

The code was compiled with:

gcc -O3 -march=native -o main main.c -lm
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define N 500000
#define T 1.0
#define DT (T / (double)N)
#define THETA 1.0
#define MU 1.0
#define SIGMA 1.0
#define NUM_RUNS 1000

double rand_uniform() {
    return ((double)rand() + 1.0) / ((double)RAND_MAX + 1.0);
}

long run_once() {
    double *gn = malloc((N - 1) * sizeof(double));
    double *ou = malloc(N * sizeof(double));

    if (gn == NULL || ou == NULL) {
        printf("Memory allocation failed!\n");
        exit(1);
    }

    srand((unsigned int)time(NULL));

    const double dt_sqrt = sqrt(DT);

    struct timespec start_time, end_time;
    clock_gettime(CLOCK_MONOTONIC, &start_time);

    int i = 0;
    while (i < N - 1) {
        double u1 = rand_uniform();
        double u2 = rand_uniform();
        double r = sqrt(-2.0 * log(u1));
        double theta = 2.0 * M_PI * u2;

        double z0 = r * cos(theta);
        double z1 = r * sin(theta);

        gn[i] = z0 * dt_sqrt;
        if (i + 1 < N - 1) {
            gn[i + 1] = z1 * dt_sqrt;
        }
        i += 2;
    }

    ou[0] = 0.0;
    for (int j = 1; j < N; j++) {
        ou[j] = ou[j - 1] + THETA * (MU - ou[j - 1]) * DT + SIGMA * gn[j - 1];
    }

    clock_gettime(CLOCK_MONOTONIC, &end_time);

    long seconds = end_time.tv_sec - start_time.tv_sec;
    long nanoseconds = end_time.tv_nsec - start_time.tv_nsec;
    long elapsed_ns = seconds * 1e9 + nanoseconds;

    free(gn);
    free(ou);

    return elapsed_ns;
}

int main() {
    long total_ns = 0;

    for (int run = 0; run < NUM_RUNS; run++) {
        total_ns += run_once();
    }

    long avg_ns = total_ns / NUM_RUNS;
    printf("Average elapsed time over %d runs: %ld ns\n", NUM_RUNS, avg_ns);
    printf("Total: %ld ns\n", total_ns);

    return 0;
}

⚔️ The ugly truth

Somewhere deep down, I was hoping Rust would be the fastest — and it actually was! Of course, this result is likely a skill issue on my part, due to my lack of deep knowledge in Zig and C. But again, this benchmark was meant to illustrate what happens when a "regular guy" jumps online and tries to code something after a few Google searches and some docs reading.


📊 Results

LanguageAvg (ms)Total (ms)
Rust 🦀3.08 ms3077 ms (~3.1 sec)
Zig3.58 ms3584 ms (~3.6 sec)
C 👴3.61 ms3611 ms (~3.6 sec)

As you can see, Rust crushed the other two here, averaging about 0.6 seconds faster — roughly 17% faster in this particular setup.


💬 Let's discuss!

Let me know if you'd also like a slightly more casual or more formal version!

All code is available on GitHub: rust-dd/probability-benchmark.

Feel free to join the discussion and let’s talk about what I could (should) have done differently, possible optimizations, or other interesting approaches! Discussion available here: Discussion #1

Conclusion

Although I'm a Rust user (so I clearly can’t claim to be fully unbiased), if I put on my poker face and imagine this was my first time ever writing code, Rust still gave me by far the best developer experience. Next would be C, where — surprisingly — after a bit of Googling and some help from ChatGPT, I was able to compile the code relatively easily. However, this was definitely not the case with Zig: module installation was quite a struggle at first, and since Zig is still a young language, even large language models don’t provide very relevant support yet.