Profile and benchmark#
rustmatrix is mostly-Rust under the hood but called from Python.
The speedup vs pytmatrix (~6× orientation averaging, ~10× PSD
tabulation, ~430× orient_averaged_adaptive) depends on which
call you make. This page tells you where the Rust kernels apply,
where you’re paying Python tax, and how to measure.
Where Rust runs#
Operation |
Rust? |
|---|---|
|
Python (cheap) |
|
Python dispatch + Rust rotation |
|
Rust T-matrix build + rotation |
|
Rust, GIL released, parallel |
|
Rust, GIL released, parallel — this is where you see 100× speedups |
|
Rust, parallel across diameters |
PSD integration against the cached table |
NumPy on the Python side (cheap) |
|
NumPy / Python (cheap; inputs are the pre-integrated bulks) |
|
Python loop over velocity bins + NumPy; the T-matrix work is already done |
Rule of thumb: anything that evaluates the T-matrix at many diameters × orientations lives in Rust and parallelises across cores. Everything downstream (polarimetric algebra, PSD weighting, spectral assembly) is NumPy-speed.
Quick benchmarks#
The benches/ directory ships a pytmatrix head-to-head:
uv pip install pytmatrix
python benches/bench_vs_pytmatrix.py
That script runs each of the hot operations against the same problem on both backends and prints wall-time ratios. Don’t benchmark on a cold import — the first T-matrix call pays a one-time JIT-like cost for the pyo3 bindings.
Profiling your own code#
The usual Python profilers work, but they can’t descend into the Rust side. What they can do is tell you whether you’re spending time in the Rust kernel or in your own glue:
python -m cProfile -s cumulative my_script.py | head -30
If the hot function is rustmatrix._core.calctmat or
init_scatter_table, you’re in Rust — any further speedup needs
fewer calls, not faster ones. If the hot function is somewhere in
spectra.py or in your own code, you have Python-side work to cut.
Common speed traps#
Rebuilding the scatter table per PSD.
init_scatter_tableis expensive;s.psd = new_psd; integrateis nearly free. Build the table once per (shape, refractive-index, wavelength) tuple and swap PSDs against it.Calling
set_geometryon every drop. Geometry switches are cheap (no T-matrix re-compute) but still Python-side; call once per back / forward sweep, not in an inner loop.num_pointstoo large. 64 quadrature diameters is almost always enough; 256 rarely changes the answer past the third digit and quadruples the Rust cost.Single-geometry calls in a loop instead of one vectorised PSD integration. If you’re iterating over raindrop diameters in Python and calling
Scattererper drop, stop — build aPSDIntegratorand let the Rust kernel batch.
When to reach for cargo bench#
The Rust crate has its own microbenchmarks (cargo bench). You
want them when:
you’re modifying the Rust kernels themselves;
you want to isolate Rust-only cost from Python overhead;
you care about per-diameter T-matrix timings, not end-to-end polarimetrics.
For application-level work (“does my spectrum run fast enough?”),
the Python benchmarks in benches/ are the right altitude.