Benchmarks¶
XAD is benchmarked against CppAD, Adept 2, autodiff, and finite differences on four quantitative-finance workloads.
All results, source code, and methodology are fully reproducible from the ad-benchmarks repository.
-O3 -mavx2 -mfma, 10K MC paths. Results¶
Gradient time measures the cost of computing all sensitivities in a single pass. The Primal column is the cost of evaluating the same workload once with plain double (no AD machinery), so the gap between Primal and each AAD library shows the recording-and-reverse-sweep overhead. FD (finite differences) scales as approximately (N + 1) x Primal.
| # | Benchmark | Sensitivities | Primal | FD | XAD | XAD‑Codegen | CppAD | Adept | autodiff |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Heston MC | 8 | 9.3 ms | 83 ms | 40 ms | 21 ms | 268 ms | 91 ms | DNF |
| 2 | SABR Calibration | 15 | 2.1 ms | 33 ms | 8.4 ms | 4.9 ms | 32 ms | 19 ms | 38 ms |
| 3 | XVA CVA | 40 | 591 ms | 24.1 s | 2.6 s | 0.57 s | 8.0 s | 7.1 s | DNF |
| 4 | LIBOR Swaption | 161 | 138 ms | 21.6 s | 1.00 s | 0.31 s | 4.57 s | 1.15 s | DNF |
Median of 10 measured iterations after warmup, reverse-mode for the AAD libraries.
Key Observations¶
-
Adjoint AD vs finite differences. FD scales O(N) with input count, so the gap widens from roughly 4x on Heston (8 inputs) to roughly 70x on LIBOR (161 inputs). FD is fine for spot checks but becomes the bottleneck once you need more than a handful of sensitivities.
-
Tape libraries cluster within an order of magnitude. XAD's tape mode is the fastest tape library on every benchmark, by margins of roughly 1.1x (LIBOR vs Adept) up to 2.3x (Heston vs Adept). CppAD is consistently slowest of the three on the MC benchmarks but stays within an order of magnitude.
-
XAD-Codegen compiles the recorded graph to AVX2 native code at runtime and is roughly 2x--5x faster than XAD's own tape mode on these benchmarks. See the JIT tutorial for how to use this feature.
-
autodiff completes only 1 of 4 benchmarks (SABR, using forward
dualmode). Forward mode is O(N) in input count and its alternativevarreverse mode does not scale to MC pricing or larger calibrations.
Benchmarked Libraries¶
| Library | Modes | Recording approach |
|---|---|---|
| XAD | Forward and Adjoint, higher-order | Tape-based; optional xad-codegen backend compiles the recorded graph to AVX2 native code |
| CppAD | Forward and Reverse, higher-order | Tape-based ADFun record/replay |
| Adept 2 | Forward and Reverse | Expression templates with stack recording |
| autodiff | Forward (dual) and Reverse (var) | Compile-time dual numbers / runtime expression tree |
All four libraries support both forward and reverse modes. The suite exercises reverse mode, the standard choice for many-inputs/one-output workloads such as risk and pricing.
Workloads¶
| Benchmark | Inputs | Description |
|---|---|---|
| Heston MC | 8 | Monte Carlo option pricing under the Heston stochastic-volatility model |
| SABR Calibration | 15 | SABR volatility surface calibration |
| XVA CVA | 40 | Credit Valuation Adjustment on a swap portfolio |
| LIBOR Swaption | 161 | LIBOR-based swaption portfolio pricing (adapted from Prof. Mike Giles) |
Methodology¶
- Identical compiler flags across libraries:
-O3 -mavx2 -mfma(GCC/Clang) or/O2 /arch:AVX2 /fp:fast(MSVC). - Idiomatic APIs. Each library uses its own recommended pattern -- no micro-optimisations applied to one library that wouldn't be applied to another.
- Median of measured iterations after warmup; warmup excluded.
- Gradient correctness verified against finite differences within numerical tolerance.
- Same machine, same run. Re-running on a different machine scales all rows by roughly the same factor.
Full source code, raw CSV data, and build instructions: auto-differentiation/ad-benchmarks.