# Profile and benchmark

`rustmatrix` is mostly-Rust under the hood but called from Python.
The speedup vs `pytmatrix` (~6× orientation averaging, ~10× PSD
tabulation, ~430× `orient_averaged_adaptive`) depends on which
call you make. This page tells you where the Rust kernels apply,
where you're paying Python tax, and how to measure.

## Where Rust runs

| Operation | Rust? |
|---|---|
| `Scatterer(...)` construction | Python (cheap) |
| `scatterer.set_geometry(...)` | Python dispatch + **Rust** rotation |
| `scatter.amplitude_matrix`, single geometry | **Rust** T-matrix build + rotation |
| `orientation.orient_averaged_fixed` | **Rust**, GIL released, parallel |
| `orientation.orient_averaged_adaptive` | **Rust**, GIL released, parallel — this is where you see 100× speedups |
| `psd.PSDIntegrator.init_scatter_table` | **Rust**, parallel across diameters |
| PSD integration against the cached table | NumPy on the Python side (cheap) |
| `radar.refl`, `Zdr`, `Kdp`, …  | NumPy / Python (cheap; inputs are the pre-integrated bulks) |
| `spectra.SpectralIntegrator.compute` | Python loop over velocity bins + NumPy; the T-matrix work is already done |

Rule of thumb: **anything that evaluates the T-matrix at many
diameters × orientations** lives in Rust and parallelises across
cores. Everything downstream (polarimetric algebra, PSD weighting,
spectral assembly) is NumPy-speed.

## Quick benchmarks

The `benches/` directory ships a pytmatrix head-to-head:

```bash
uv pip install pytmatrix
python benches/bench_vs_pytmatrix.py
```

That script runs each of the hot operations against the same problem
on both backends and prints wall-time ratios. Don't benchmark on a
cold import — the first T-matrix call pays a one-time JIT-like cost
for the pyo3 bindings.

## Profiling your own code

The usual Python profilers work, but they can't descend into the
Rust side. What they *can* do is tell you whether you're spending
time in the Rust kernel or in your own glue:

```bash
python -m cProfile -s cumulative my_script.py | head -30
```

If the hot function is `rustmatrix._core.calctmat` or
`init_scatter_table`, you're in Rust — any further speedup needs
fewer calls, not faster ones. If the hot function is somewhere in
`spectra.py` or in your own code, you have Python-side work to cut.

## Common speed traps

* **Rebuilding the scatter table per PSD.** `init_scatter_table`
  is expensive; `s.psd = new_psd; integrate` is nearly free.
  Build the table once per (shape, refractive-index, wavelength)
  tuple and swap PSDs against it.
* **Calling `set_geometry` on every drop.** Geometry switches are
  cheap (no T-matrix re-compute) but still Python-side; call once
  per back / forward sweep, not in an inner loop.
* **`num_points` too large.** 64 quadrature diameters is almost
  always enough; 256 rarely changes the answer past the third
  digit and quadruples the Rust cost.
* **Single-geometry calls in a loop instead of one vectorised
  PSD integration.** If you're iterating over raindrop diameters
  in Python and calling `Scatterer` per drop, stop — build a
  `PSDIntegrator` and let the Rust kernel batch.

## When to reach for `cargo bench`

The Rust crate has its own microbenchmarks (`cargo bench`). You
want them when:

* you're modifying the Rust kernels themselves;
* you want to isolate Rust-only cost from Python overhead;
* you care about per-diameter T-matrix timings, not end-to-end
  polarimetrics.

For application-level work ("does my spectrum run fast enough?"),
the Python benchmarks in `benches/` are the right altitude.