(11.36)
The error covariance matrix is then constructed from the error correlation matrix by multiplying the terms by the error variances of the corresponding elements (square roots for the off-diagonal terms).
Eigenanalysis of an error covariance matrix can be useful for identifying the dominant error patterns. The matrix has full rank n, since otherwise would imply that an element (or combination of elements) is perfectly known. It therefore has n orthonormal eigenvectors ei with eigenvalues λI, and can be decomposed along its eigenvectors as follows:
(11.37)
where E is the matrix of eigenvectors arranged by columns and Λ is the diagonal matrix of eigenvalues:
(11.38)
In the base for x defined by the eigenvectors, eigenvector ei has a value of 1 for its ith element and a value of zero for all its other elements; we then see from (11.37) that the error covariance matrix in that base is Λ. The eigenvalue λi thus represents the error variance associated with the error pattern ei. Eigendecomposition of S and ranking of eigenvalues identifies the dominant orthogonal error patterns and their contributions to the overall error. Box 11.1 gives an example. The eigenvalues of an error covariance matrix are all positive since they represent error variances. It follows that any error covariance matrix S is positive definite, a condition defined by the property that xTSx ≥ 0 for any vector x of real numbers.
Box 11.1 Eigendecomposition of an Error Covariance Matrix
To illustrate the eigendecomposition of an error covariance matrix, consider the matrix S:
(11.39)
Its eigenvectors and eigenvalues are
(11.40)
The four eigenvectors define four orthogonal error patterns with error variances given by the eigenvalues. The total error variance is 3.3 + 1.2 + 1.0 + 0.5 = 6.0. The first error pattern defined by e1 contributes an error variance of 3.3, more than half of the total error variance. This error pattern is dominated by the first two elements 1 and 2, as would be expected since they contribute most of the error variance in the diagonal of S. The error pattern has opposite dependences for elements 1 and 2, as would be expected from the negative error correlation between the two (). The second error pattern as defined by e2 amounts to 20% of the total error variance and accounts for the error patterns associated with elements 3 and 4, again with opposite dependences reflecting their negative error covariance. The third and fourth error patterns are less straightforward to interpret but account together for only 25% of the total error variance.
Bayesian solution to the inverse problem requires construction of the prior error covariance matrix and of the observational error covariance matrix as input to the problem. The observational error vector εO is the sum of the instrument error vector εI, the representation error vector εR, and the forward model error vector εM, in the same way as for the scalar problem (11.12). These errors are generally uncorrelated so that SO is the sum of the instrument error covariance matrix , the representation error covariance matrix , and the forward model error covariance matrix :
SO = SI + SR + SM
(11.41)
Note the similarity to the addition of variances in the scalar problem (11.13).
It is generally difficult to go beyond rough estimates in specifying the error covariance matrices SA and SO. Box 11.2 gives some simple construction procedures. Particular uncertainty applies to constructing the off-diagonal terms (covariance structure). Simple assumptions are usually made, such as an error correlation length scale that relates adjacent vector elements and populates the off-diagonals nearest to the diagonal, producing a band matrix (Box 11.2). However, there is no guarantee that such an ad-hoc construction will yield a bona fide error covariance matrix, as the assumed error correlations between different elements may not be consistent across the whole vector. This problem is more likely to arise if the covariance structure is extensive. The validity of the construction can be checked by computing the eigenvalues and verifying that they are all positive. If they are not then the matrix needs to be corrected.
Box 11.2 Construction of Prior and Observational Error Covariance Matrices
Accurate knowledge of the prior and observational error covariance matrices SA and SO is in general not available and rough estimates are often used. It is good practice in those cases to repeat the inversion with a range of estimates of SA and SO – for example, changing their magnitudes by a factor of 2 – to assess the implied uncertainty on inversion results.
Estimating SA often relies on expert judgment regarding the quality of the prior information. In the absence of better knowledge, simple estimates are often used. For example, one might assume a uniform 50% error on the individual components of xA with no error correlation between the components. In that case, SA is a diagonal matrix with elements 0.25. Error correlation between adjacent components of xA is often approximated by an e-folding length scale, populating the off-diagonals of SA adjacent to the diagonal and with a cut-off beyond which the off-diagonal terms are zero. This produces a band matrix where the presence of a large population of zero elements (sparse matrix) allows the use of fast algorithms for matrix inversion. For example, let us assume 50% error on the individual components of xA, an error correlation coefficient r = 0.5 for adjacent components, and zero error correlation for non-adjacent components. The resulting prior error covariance matrix is given by
(11.42)
The observational error covariance matrix SO can be constructed by adding the contributions from the instrument error (SI), representation error (SR), and forward model error (SM) estimated independently. SI is typically a diagonal matrix that can be obtained from knowledge of the instrument precision relative to calibration standards. SR can be constructed from knowledge of the subgrid variability of observations and can also in general be assumed diagonal. Construction of SM is more difficult as calibration data for the forward model are generally not available. An estimate can be made by comparing different independent forward models.
An alternate approach for constructing SO is the residual error method (Heald et al., 2004). In this method, we conduct a forward model simulation using the prior estimate of the state vector, compare to observations, and subtract the mean bias to obtain the observational error:
(11.43)
where the averaging can be done over the ensemble of observations or just a subset (for example, the observation time series at a given location). Here we assume that the systematic component of the error in y – F(xA) is due to error in the state vector x to be corrected through the inversion, while the random component is the observational error. The statistics of εO are then used to construct SO. An example is shown in Box 11.2 Figure 1. The assumption that the systematic error is due solely to x may not be correct, as there may also be bias in the observing system; however, it is consistent with the premise of the inverse analysis that errors be random. From independent knowledge of SI and SR one can infer the forward model error covariance matrix as SM = SO – SI – SR, and from there diagnose the dominant terms contributing to the observational error.
Box 11.2 Figure 1 Diagonal terms of the observational error covariance matrix constructed for an inversion of carbon monoxide (CO) sources in East Asia in March–April 2001 using MOPITT satellite observations of CO columns and a CTM as forward model. The daily observations are averaged over 2° × 2.5° CTM grid squares and compared to the CTM simulation using the prior estimate of sources, producing a time series of CTM-MOPITT differences in each grid square. The mean of that time series is subtracted and the residual difference defines the observational error for that grid square. The resulting error variance is normalized to the mean CO column for the grid square, thus defining a relative error expressed as percentage. The off-diagonal terms of the observational error covariance matrix are derived from an estimated 180-km error correlation length scale.
From Heald et al. (2004).
11.4.2 Gaussian Probability Density Function for Vectors
Application o
f Bayes’ theorem to the vector-matrix formalism requires formulation of PDFs for vectors. We derive here the general Gaussian PDF formulation for a vector x of dimension n with expected value E[x] and error covariance matrix S. If the errors xi – E[xi] for the individual elements of x are uncorrelated (i.e., if S is diagonal), then the PDF of the vector is simply the product of the PDFs for the individual elements. This simple solution can be obtained by projecting x on the basis of eigenvectors ei of S with i = [1, … n]. The error variances in that base are the eigenvalues λi of S (see derivation in Section 11.4.1). Let z = ET (x – E[x]) be the value of x – E[x] projected on the eigenvector basis, where E is the matrix of eigenvectors arranged by columns. The PDF of z is then
(11.44)
which can be rewritten as
(11.45)
Here |S| is the determinant of S, equal to the product of its eigenvalues:
(11.46)
and Λ is the diagonal matrix of eigenvalues (11.38). Replacing z in (11.45), we obtain:
(11.47)
Recall the matrix spectral decomposition S = EΛET (11.37). A matrix and its inverse have the same eigenvectors and inverse eigenvalues so that S–1 = EΛ–1ET. Replacing into (11.47) we obtain the general PDF expression for the vector x:
(11.48)
11.4.3 Jacobian Matrix
The Jacobian matrix is the derivative of the forward model. We denote it K in this chapter to avoid confusion with the standard notation for the cost function (J). The Jacobian gives the local sensitivity of the observation variables y to the state variables x as described by the forward model:
(11.49)
with individual elements kij = ∂yi/∂xj. It is used in inverse modeling to compute the minimum of the Bayesian cost function (see (11.27) for application to the simple scalar problem). It is also used in the analytical solution to the inverse problem as a linearization of the forward model (see (11.28) for application to the simple scalar problem). If the forward model is linear, then K does not depend on x and fully describes the forward model for the purpose of the inversion. If the forward model is nonlinear, then K needs to be calculated initially for the prior estimate xA, representing the best initial guess for x, and then re-calculated as needed for updated values of x during iterative convergence to the solution.
Construction of the Jacobian matrix may be done analytically if the forward model is simple, as for example in a 0-D chemical model where the evolution of concentrations for the n different species is determined by first-order kinetic rate expressions. If the forward model is complicated, such as a 3-D CTM, then the Jacobian must be constructed numerically. This can be done column by column if the dimension of the state vector is not so large as to make it computationally prohibitive. The task involves first conducting a base forward model calculation using the prior estimate xA over the observation period, and then successively perturbing the individual elements xi of the state vector by small increments Δxi to calculate the resulting perturbation Δy. This yields the sensitivity vector Δy/Δxi ≈ ∂y/∂xi, which is the ith column of the Jacobian. A total of n + 1 forward model calculations are required to fully construct the Jacobian matrix.
If the observations are sparse and the state vector is large, such as in a receptor-oriented problem where we wish to determine the sensitivity of concentrations at a few selected locations to a large array of surface fluxes, then a more effective way to construct the Jacobian is row by row using the adjoint of the forward model; this is described below. If both the state vector and the observation vector are large, then one can bypass the calculation of the Jacobian matrix in the minimization of the cost function by using the adjoint of the forward model; this will be described in Section 11.6.
11.4.4 Adjoint
The adjoint of a forward model is the transpose KT of its Jacobian matrix (Section 11.4.3). It turns out to be very useful in inverse modeling applications for atmospheric chemistry where observed concentrations are used to constrain a state vector of emissions or concentrations at previous times. In that case, the adjoint model does not necessarily involve explicit construction of KT, but instead the application of KT to vectors called adjoint forcings. We will discuss this in Section 11.6. The adjoint model can also be useful for numerical construction of the Jacobian matrix when dim(y) ≪ dim(x). As we will see, by using the adjoint we can construct the Jacobian matrix row by row, instead of column by column, and the number of model simulations needed for that purpose is dim(y) rather than dim(x). A common application is in receptor-oriented problems where we seek, for example, to determine the sensitivity of the model concentration at a particular point to the ensemble of concentrations or emissions at previous times over the 3-D model domain. In that example, dim(y) = 1 but dim(x) can be very large, and a single pass of the adjoint model delivers the full vector of sensitivities. Box 11.3 illustrates the construction of the adjoint in a simple case.
Box 11.3 Simple Adjoint Construction
Consider a three-element state vector (x0, y0 z0)T on which an operation x = y2 + z is applied. The resulting vector (x1, y1, z1)T is
The Jacobian matrix for that operation is given by
and the adjoint is then
The null value of the first row of KT means that (x1, y1, z1)T has no sensitivity to x0.
To understand how the adjoint works, consider a CTM discretized over time steps [t0, … ti, … tp]. Let y(p) represent the vector of gridded concentrations of dimension m at time tp. We wish to determine its sensitivity to some state vector x(0) of dimension n at time t0. For example, x could be the gridded emissions. The corresponding Jacobian matrix is Κ = ∂y(p)/∂x(0). By the chain rule,
(11.50)
where the right-hand side is a product of matrices. The adjoint model applies the transpose:
(11.51)
where we have made use of the property that the transpose of a product of matrices is equal to the product of the transposed matrices in reverse order: (AB)T = BTAT.
Consider now the application of KT as expressed by (11.51) to a unit vector v = (1, 0, … 0)T taken as adjoint forcing. Following (11.51), we begin by applying matrix (∂y(p)/∂y(p − 1))T to v:
(11.52)
This yields a vector of adjoint variables ∂y(p) , 1/∂y(p − 1) that represents the sensitivity of y(p),1 to y(p–1). Let us now apply the next matrix (∂y(p − 1)/∂y(p − 2))T in (11.51) to this vector of adjoint variables:
(11.53)
where we have made use of
(11.54)
We thus obtain ∂y(p) , 1/∂y(p − 2). Application of the next matrix (∂y(p − 2)/∂y(p − 3))Tto this vector yields ∂y(p) , 1/∂y(p − 3) and so on. By sequential application of the suite of matrices in (11.51) we thus obtain ∂y(p) , 1/∂x(0), which is a row of the Jacobian matrix. Repeating this exercise for the m unit vectors v representing the different elements of y yields the full matrix K = ∂y(p)/∂x(0).
Notice from the above description that a single pass with the adjoint yields the sensitivity vectors ∂y(p) , 1/∂y(p − 1), ∂y(p) , 1/∂y(p − 2), … ∂y(p) , 1/∂y(0). This effectively integrates the CTM back in time, providing the sensitivity of the concentration at a given location and time (here y(p),1) to the complete field of concentrations at prior times, i.e., the backward influence function. The same single pass with the adjoint can also provide the sensitivities of y(p),1 to the state vector at any prior time; thus:
(11.55)
(11.56)
(11.57)
and so on. For example, if the state vector represents the emission field, we can obtain in this manner the sensitivity of the concentration y(p),1 to the emissions at all prior time steps. Box 11.4 illustrates such an application.
Box 11.4 Computing Adjoint Sensitivities
We illustrate the computation of adjoint sensitivities with an example from Kim et al. (2015), shown in Box 11.4 Figure 1. Here the adjoint of a CTM is used to compute the sensitivity of mean smoke particle concentrations in Singapore in July–November 2006 to fire
s in different locations of equatorial Asia. The CTM has 0.5° × 0.67° horizontal grid resolution and simulates smoke concentrations on the basis of a fire emission inventory that has the same grid resolution as the CTM and daily temporal resolution. The fires emit smoke particles that are transported by the model winds and are eventually removed by wet and dry deposition. Panel 1 shows the mean emissions and winds used in the CTM, and Panel 2 shows the resulting distribution of smoke concentrations in surface air.
Box 11.4 Figure 1 Application of the adjoint method to determine the sensitivity of smoke concentrations in Singapore in July–November 2006 to fire emissions across equatorial Asia. Panel 1 shows mean July-November fire emissions and 0–1 km winds. Panel 2 shows the mean smoke concentrations simulated by the CTM (circles show observations). Panel 3 shows the sensitivity of smoke concentrations in Singapore to fire emissions in different regions. Panel 4 shows the contributions of different fire regions to the smoke concentrations in Singapore.
Adapted from Kim et al. (2015).
We now want to use the CTM adjoint to determine the emissions contributing to the mean smoke concentrations in Singapore in July–November 2006. Fire emissions were limited to that period (dry season). We define x(i) as the vector of 2-D gridded fire emissions at CTM time ti ∈ [t1, tp] where t1 refers to 00:00 local time on July 1 and tp refers to 00:00 on December 1. We define y(i) as the vector of 3-D smoke concentrations simulated by the model at time step i, and choose the first element of that vector y(i),1 to represent the smoke concentration in surface air at Singapore.
Modeling of Atmospheric Chemistry Page 61