Skip to content

Benchmarks

XAD is benchmarked against CppAD, Adept 2, autodiff, and finite differences on four quantitative-finance workloads.

All results, source code, and methodology are fully reproducible from the ad-benchmarks repository.

Benchmark chart comparing AD library performance
GCC 13.3, Intel Xeon Platinum 8488C, Ubuntu 24.04, -O3 -mavx2 -mfma, 10K MC paths.

Results

Gradient time measures the cost of computing all sensitivities in a single pass. The Primal column is the cost of evaluating the same workload once with plain double (no AD machinery), so the gap between Primal and each AAD library shows the recording-and-reverse-sweep overhead. FD (finite differences) scales as approximately (N + 1) x Primal.

# Benchmark Sensitivities Primal FD XAD XAD‑Codegen CppAD Adept autodiff
1 Heston MC 8 9.3 ms 83 ms 40 ms 21 ms 268 ms 91 ms DNF
2 SABR Calibration 15 2.1 ms 33 ms 8.4 ms 4.9 ms 32 ms 19 ms 38 ms
3 XVA CVA 40 591 ms 24.1 s 2.6 s 0.57 s 8.0 s 7.1 s DNF
4 LIBOR Swaption 161 138 ms 21.6 s 1.00 s 0.31 s 4.57 s 1.15 s DNF

Median of 10 measured iterations after warmup, reverse-mode for the AAD libraries.

Key Observations

  • Adjoint AD vs finite differences. FD scales O(N) with input count, so the gap widens from roughly 4x on Heston (8 inputs) to roughly 70x on LIBOR (161 inputs). FD is fine for spot checks but becomes the bottleneck once you need more than a handful of sensitivities.

  • Tape libraries cluster within an order of magnitude. XAD's tape mode is the fastest tape library on every benchmark, by margins of roughly 1.1x (LIBOR vs Adept) up to 2.3x (Heston vs Adept). CppAD is consistently slowest of the three on the MC benchmarks but stays within an order of magnitude.

  • XAD-Codegen compiles the recorded graph to AVX2 native code at runtime and is roughly 2x--5x faster than XAD's own tape mode on these benchmarks. See the JIT tutorial for how to use this feature.

  • autodiff completes only 1 of 4 benchmarks (SABR, using forward dual mode). Forward mode is O(N) in input count and its alternative var reverse mode does not scale to MC pricing or larger calibrations.

Benchmarked Libraries

Library Modes Recording approach
XAD Forward and Adjoint, higher-order Tape-based; optional xad-codegen backend compiles the recorded graph to AVX2 native code
CppAD Forward and Reverse, higher-order Tape-based ADFun record/replay
Adept 2 Forward and Reverse Expression templates with stack recording
autodiff Forward (dual) and Reverse (var) Compile-time dual numbers / runtime expression tree

All four libraries support both forward and reverse modes. The suite exercises reverse mode, the standard choice for many-inputs/one-output workloads such as risk and pricing.

Workloads

Benchmark Inputs Description
Heston MC 8 Monte Carlo option pricing under the Heston stochastic-volatility model
SABR Calibration 15 SABR volatility surface calibration
XVA CVA 40 Credit Valuation Adjustment on a swap portfolio
LIBOR Swaption 161 LIBOR-based swaption portfolio pricing (adapted from Prof. Mike Giles)

Methodology

  • Identical compiler flags across libraries: -O3 -mavx2 -mfma (GCC/Clang) or /O2 /arch:AVX2 /fp:fast (MSVC).
  • Idiomatic APIs. Each library uses its own recommended pattern -- no micro-optimisations applied to one library that wouldn't be applied to another.
  • Median of measured iterations after warmup; warmup excluded.
  • Gradient correctness verified against finite differences within numerical tolerance.
  • Same machine, same run. Re-running on a different machine scales all rows by roughly the same factor.

Full source code, raw CSV data, and build instructions: auto-differentiation/ad-benchmarks.