As explained in Algorithmic Differentiation Background: Higher Orders, higher order derivatives can be computed by nesting first order algorithmic differentiation techniques. For example, one can obtain second order by computing forward mode over adjoint mode. With XAD, this technique can be used directly to compute higher order derivatives.
XAD's automatic differentiation interface structures (see AD Mode Interface) define second order mode data types for easy access. Types for third or higher orders need to defined manually from the basic first-order types.
We will demonstrate second-order derivatives using forward-over-adjoint mode in the following.
For demonstration purposes, we use the same algorithm from Basic Usage:
We are interested in derivatives at the point:
Forward Over Adjoint¶
In this mode, we can compute all first-order derivatives (as a single output function derived with adjoints gives all first order derivatives), and the first row of the Hessian matrix of second order derivatives. The full Hessian is defined as:
Note that the Hessian matrix is typically symmetric, which can be used to reduce the amount of computation needed for the full Hessian.
The first step is to set up the tape and active data types needed for this computation:
Note that the active type for this mode is actually
Now we need to setup the independent variables and register them:
As we compute the second order using forward mode, we need to seed the initial derivative for the second order before running the algorithm:
The inner call to
value takes the value of the outer type, i.e. it returns the value as the type
FReal<double>, of which we set the derivative to
Now we can start recording derivatives on the tape and run the algorithm:
For the inner adjoint mode, we need to register the output and seed the initial adjoint with 1:
Here, the inner call to
derivative gives the derivative of the outer type, i.e. the derivative of the adjoint-mode active type. This is of type
FReal<double>, for which we set the value to
Next we compute the adjoints, which computes both the first and second order derivatives:
We can now output the result:
And the first order derivatives:
Note again that the inner call to
derivative obtains the derivative of the outer active data type, hence it gives a
FReal<double> reference that represents the first order adjoint value. We can get this value as a
double using the
The second order derivatives w.r.t.
x0 can be obtained as:
which 'unwraps' the derivatives of the first and second order active types.
The result of the running the application for the given inputs is:
Forward over adjoint is the recommended mode for second-order derivatives.
This example is included with XAD (
Other Second-Order Modes¶
Other second-order modes work in a similar fashion. They are briefly described in the following.
Forward Over Forward¶
With forward-over-forward mode, there is no tape needed and the derivatives of both orders need to be seeded before running the algorithm. One element of the Hessian and one first-order derivative can be computed with this method, if the function has one output. The derivative initialization sequence in this mode is typically:
After the computation, the first order derivative can be retrieved as:
And the second order derivative as:
With different initial seeding, different elements of the Hessian can be obtained.
Adjoint Over Forward¶
Here the inner mode is forward, computing one derivative in a tape-less fashion, and the outer mode is adjoint, requiring a tape. With this mode, we need to initialize the forward-mode derivative with:
As the derivative of the output corresponds to the first order result, we need to seed its derivative (i.e. the adjoint) after running the algorithm:
After tape interpretation, we can now obtain the first-order derivative as:
Due to the symmetries in this mode of operation, the same first-order derivatives can also be obtained as:
Which allows to get all first-order derivatives w.r.t. to all inputs in this mode, similar to the forward-over-adjoint mode.
The second-order derivatives can be obtained as:
Adjoint Over Adjoint¶
As both nested modes are adjoint, this mode needs to two tapes for both orders. Hence the types defined in the interface structure
adj_adj need an inner and an outer tape type:
In this mode, no initial derivatives need to be set, but it is important that both tapes are initialized and a new recording is started on both before running the algorithm.
After the execution, the outer derivative needs to be seeded as:
And then the outer tape needs to compute the adjoints. This computes the
value(derivative(x)) as an output, and the derivative of this needs to be set before interpreting the inner tape:
computeAdjoints() on the inner tape, we can read the first-order derivatives as:
And the second-order derivatives as: