In this section, we will illustrate how to use XAD to compute first order derivatives in both forward and adjoint mode.
As an example, we choose a simple function with 4 inputs and 1 output variable, defined as:
We will compute derivatives of this function at the point:
Prerequisite: Replace Active Variables¶
In order to use XAD to differentiate this function, we first must replace all independent data types and all values that depend on them with an active data type provided by XAD. In the above function, all variables depend on the inputs and thus all occurrences of
double must be replaced.
This can be done in one of two ways:
- The variables can be replaced directly, given the desired mode of differentiation. For example, for forward mode
doubleis replaced by the type
FRealand for adjoint mode the type
- The function is made a template, so that it can be called with any data type, including the original
We choose the second approach for this tutorial, thus the function becomes:
This means we can use the same definition with both forward and adjoint modes.
As illustrated in Algorithmic Differentiation Background: Forward Mode, when applied to a function with a single output, the forward mode of algorithmic differentiation can compute one derivative at a time. For illustration, we choose to derive the function with respect to the input variable
To initiate the forward mode, we must first declare active variables with the appropriate type. XAD provides convenience typedefs to select the mode of differentiation, illustrated in detail in AD Mode Interface. For forward mode, we can declare the types needed as:
We can then use the
AD typedef for our variables.
The next step is to initialize the dependent variables, which is simply done by assigning the input values to new variables of type
For forward mode, we must now seed the initial derivative for the variable we are interested in with the value 1 (as described in Algorithmic Differentiation Background: Forward Mode), as:
At this point we are ready to call our function and it will compute the function value as well as the derivative we are interested in:
We can now access the results using the
derivative functions on the output (or the member functions
FReal::getValue). For example, the following code outputs them to the console:
This example is included with XAD (
The adjoint mode of automatic differentiation is the natural choice for the function at hand, as it has a single output and multiple inputs. We can get all four derivatives in one execution.
Adjoint mode needs a tape to record the operations and their values during the valuation. After setting the adjoints of the outputs, this tape can then be rolled back to compute the adjoints of the inputs.
Both the active data type and the tape type can be obtained from the interface structure
The first step for computing adjoints is to initialise the tape::
This calls the default constructor
Tape::Tape, which creates the tape and activates it.
Next, we create the input variables and register them with the tape:
Note that only variables registered as inputs with the tape and all variables dependent on them are recorded. Also note that before registering active variables, the current threads needs to have an active tape. To ensure thread-safety, every thread of the application can have its own active tape.
Once the independent variables are set, we can start recording derivatives on tape and run the algorithm:
At this stage, we have all operations recorded and have the value computed. We now need to register the outputs with the tape as well, before we can seed the initial adjoint of the output wit 1 as explained in Algorithmic Differentiation Background: Adjoint Mode:
This uses the global function
derivative, which returns a reference to the stored derivative (or adjoint) of the given parameter. Alternatively the member functions
AReal::setDerivative can be used for the same purpose.
What is left is interpreting the tape to compute the adjoints of the independent variables:
This example is included with XAD (
When the algorithm to be evaluated has less outputs than inputs, adjoint mode should be preferred. However, when only a small number of derivatives are needed (e.g. less than 5), the memory for the tape can be avoided by using forward mode. Experimentation is advised to find the optimal mode for the given algorithm.