Benchmarks¶

XAD is benchmarked against CppAD, Adept 2, autodiff, and finite differences on four quantitative-finance workloads.

All results, source code, and methodology are fully reproducible from the ad-benchmarks repository.

Benchmark chart comparing AD library performance — GCC 13.3, Intel Xeon Platinum 8488C, Ubuntu 24.04, `-O3 -mavx2 -mfma`, 10K MC paths.

Results¶

Gradient time measures the cost of computing all sensitivities in a single pass. The Primal column is the cost of evaluating the same workload once with plain double (no AD machinery), so the gap between Primal and each AAD library shows the recording-and-reverse-sweep overhead. FD (finite differences) scales as approximately (N + 1) x Primal.

#	Benchmark	Sensitivities	Primal	FD	XAD	XAD‑Codegen	CppAD	Adept	autodiff
1	Heston MC	8	9.3 ms	83 ms	40 ms	21 ms	268 ms	91 ms	DNF
2	SABR Calibration	15	2.1 ms	33 ms	8.4 ms	4.9 ms	32 ms	19 ms	38 ms
3	XVA CVA	40	591 ms	24.1 s	2.6 s	0.57 s	8.0 s	7.1 s	DNF
4	LIBOR Swaption	161	138 ms	21.6 s	1.00 s	0.31 s	4.57 s	1.15 s	DNF

Median of 10 measured iterations after warmup, reverse-mode for the AAD libraries.

Key Observations¶

Adjoint AD vs finite differences. FD scales O(N) with input count, so the gap widens from roughly 4x on Heston (8 inputs) to roughly 70x on LIBOR (161 inputs). FD is fine for spot checks but becomes the bottleneck once you need more than a handful of sensitivities.
Tape libraries cluster within an order of magnitude. XAD's tape mode is the fastest tape library on every benchmark, by margins of roughly 1.1x (LIBOR vs Adept) up to 2.3x (Heston vs Adept). CppAD is consistently slowest of the three on the MC benchmarks but stays within an order of magnitude.
XAD-Codegen compiles the recorded graph to AVX2 native code at runtime and is roughly 2x--5x faster than XAD's own tape mode on these benchmarks. See the JIT tutorial for how to use this feature.
autodiff completes only 1 of 4 benchmarks (SABR, using forward dual mode). Forward mode is O(N) in input count and its alternative var reverse mode does not scale to MC pricing or larger calibrations.

Benchmarked Libraries¶

Library	Modes	Recording approach
XAD	Forward and Adjoint, higher-order	Tape-based; optional xad-codegen backend compiles the recorded graph to AVX2 native code
CppAD	Forward and Reverse, higher-order	Tape-based `ADFun` record/replay
Adept 2	Forward and Reverse	Expression templates with stack recording
autodiff	Forward (`dual`) and Reverse (`var`)	Compile-time dual numbers / runtime expression tree

All four libraries support both forward and reverse modes. The suite exercises reverse mode, the standard choice for many-inputs/one-output workloads such as risk and pricing.

Workloads¶

Benchmark	Inputs	Description
Heston MC	8	Monte Carlo option pricing under the Heston stochastic-volatility model
SABR Calibration	15	SABR volatility surface calibration
XVA CVA	40	Credit Valuation Adjustment on a swap portfolio
LIBOR Swaption	161	LIBOR-based swaption portfolio pricing (adapted from Prof. Mike Giles)

Methodology¶

Identical compiler flags across libraries: -O3 -mavx2 -mfma (GCC/Clang) or /O2 /arch:AVX2 /fp:fast (MSVC).
Idiomatic APIs. Each library uses its own recommended pattern -- no micro-optimisations applied to one library that wouldn't be applied to another.
Median of measured iterations after warmup; warmup excluded.
Gradient correctness verified against finite differences within numerical tolerance.
Same machine, same run. Re-running on a different machine scales all rows by roughly the same factor.

Full source code, raw CSV data, and build instructions: auto-differentiation/ad-benchmarks.