Following (11.52), we apply the model adjoint over one time step [tp, tp–1] to a unit forcing v = (1, 0, … 0)T at time tp. This yields the sensitivity vector ∂y(p) , 1/∂y(p − 1) that describes the sensitivity of concentrations at Singapore at time tp to the 3-D field of concentrations at time tp–1. It also yields the sensitivity vector ∂y(p) , 1/∂x(p) that describes the sensitivity of concentrations at Singapore to the 2-D field of emissions at time tp. We archive ∂y(p) , 1/∂x(p), add a unit forcing v to the sensitivity vector ∂y(p) , 1/∂y(p − 1), and apply the adjoint over the next time step [tp–1, tp–2]. From there we get ∂(y(p) , 1 + y(p − 1) , 1)/∂y(p − 2) and ∂(y(p) , 1 + y(p − 1) , 1)/∂x(p − 1). We archive ∂(y(p) , 1 + y(p − 1) , 1)/∂x(p − 1), which is the sensitivity of concentrations in Singapore to the emission field at time tp–1, and proceed in that manner backward in time until time step 1.
A single pass of the adjoint simulation over the time interval [tp, t1] thus yields sensitivities of the mean concentration in Singapore over the period [t1, tp] to the emission field at every time step over that period:
(11.58)
where we recognize that emissions at a given time can only affect concentrations after that time. Panel 3 shows the mean adjoint sensitivities. These indicate the potential of fires occurring in different locations to affect Singapore. We can express the smoke concentration at Singapore in July–November 2006 as the sum of these adjoint sensitivities weighted by the actual emissions:
(11.59)
where the index j refers to the 2-D emission grid. Panel 4 shows the contributions of emissions in individual model grid squares j to the mean smoke concentration at Singapore.
The sensitivities computed by the adjoint method are true local derivatives. For a nonlinear problem they are sometimes called adjoint sensitivities. This is to contrast them with the sensitivities obtained by finite differencing calculations, i.e., by perturbing the state vector elements by Δxi and diagnosing the difference in output Δy. Finite differencing entails some effect of nonlinearity, which can be reduced by selecting a small Δxi but at the cost of numerical noise.
Henze et al. (2007) describe in detail the steps involved in constructing the adjoint of a CTM. The main difficulty is linearization to express the CTM as a product of matrices. Linearization involves differentiation of the model (cf. (11.2)). One can differentiate either the model equations (continuous adjoint) or the model code (discrete adjoint). The discrete adjoint is more consistent with the actual CTM. The differentiated model is the Jacobian of the CTM and is called the tangent linear model (TLM). Construction of the TLM can be an arduous task and commercial software packages are available for this purpose.
We present here an elementary example of CTM adjoint construction to illustrate the basic tasks involved. The CTM calculates the evolution of concentrations over a time step [ti, ti+1] by successive application of operators describing the different model processes. Consider a CTM including 3-D advection (operator A), chemistry (operator C), and emissions (operator E), with operator splitting described by
y(i + 1) = A • C • E(y(i))
(11.60)
where the • symbol means “applied to.” The operators may or may not be linear in y. If not, they need to be linearized by differentiation, as we did with the forward model in (11.49). Let A, C, E be the matrices of the linear operators. We have:
y(i + 1) = ACEy(i)
(11.61)
so that
(11.62)
The transpose is given by
(11.63)
The case of linear operators offers insight into the physical meaning of the adjoint. Let us begin with the advection operator. Three-dimensional advection is generally described by operator splitting with 1-D operators. Consider then a 1-D advection algorithm using a linear upstream scheme on an Eulerian grid (see Section 7.3.2):
y(i + 1) , j = cy(i) , j − 1 + (1 − c)y(i) , j
(11.64)
where α is the Courant number, (i) is the time index, and the flow is from gridbox j – 1 to j. Let us take as an example a uniform cyclical flow over a domain j = [1, 3]. The advection operator is written in matrix form as
(11.65)
and its transpose is
(11.66)
We see that the transpose describes the reverse of the actual flow:
y(i + 1) , j = cy(i) , j + 1 + (1 − c)y(i) , j
(11.67)
This result is readily generalizable to any number of gridboxes and non-uniform flow. Thus the adjoint of a linear transport operator is simply the reverse flow, and this is also found in the continuous adjoint by differentiating the advection equation (Henze et al., 2007). Advection operators may not be exactly linear because of safeguards for stability, positivity, or mass conservation. Nevertheless, the approximation of reverse flow is frequently used to construct the adjoint because of its simplicity.
Consider now a first-order loss chemistry operator dy/dt = –ky where k is a loss rate constant. Application of this operator over a time step ∆t is expressed in matrix form as follows:
(11.68)
which is a diagonal matrix. In this case the transpose operator is the same as the original operator; we refer to the operator as self-adjoint. It makes sense that the sensitivity going back in time should decay with the same time constant as the first-order chemical loss. From a coding standpoint, it means that the adjoint can use the same chemical operator as the forward model.
Finally, consider the emission operator E. Its application over a time step Δt modifies the concentration field as y(i+1) = y(i) + x(i+1)Δt where x is an emission flux vector that is non-zero only for gridboxes receiving emissions. In terms of sensitivity to concentrations at the previous time step, the emission operator is the identity matrix Im and thus self-adjoint:
(11.69)
while the sensitivity to emissions is also self-adjoint:
(11.70)
We have thus shown how the matrices AT, CT, ET can be computed in simple cases to define the adjoint model. In this manner the adjoint model marches back in time to describe the sensitivity of concentrations to concentrations and emissions at prior times. See Box 11.4 for an example application.
Another simple application of the adjoint is to linear multi-box models, often used in geochemical modeling to simulate the evolution of concentrations in m different coupled reservoirs (boxes). The model is described by
(11.71)
where y is the vector of concentrations or masses in the different boxes, K is a Jacobian matrix of transfer coefficients kij describing the transfer between boxes, and s is a source vector. Starting from initial conditions at time t0, the evolution of the system for one time step Δt = t1 – t0 is given in forward finite difference form by
y(1) = My(0) + s(0)Δt
(11.72)
where M = Im + KΔt. We see that ∂y(1)/∂y(0) = M and ∂y(1)/∂s(0) = ImΔt; the corresponding adjoint operators are MT and ImΔt (the source operator is self-adjoint). Consider a time period of interest [t0, tp] (say from pre-industrial to present time). A single pass of the adjoint backward in time over [tp, t0] yields the sensitivity of the concentrations in a given box at a given time to the concentrations and sources at previous times for all other boxes.
11.5 Analytical Inversion
The vector-matrix tools presented in Section 11.4 allow us to apply Bayes’ theorem (Section 11.2) to obtain an optimal estimate of a state vector x (dim n) on the basis of the observation vector y (dim m), the prior information xA, the forward model F, and the error covariance matrices SA and SΟ. This is done by finding the minimum of a cost function describing the observational and prior constraints. Here we present the analytical solution to this minimization problem assuming Gaussian errors. A major advantage of the analytical approach, as we will see, is that it provides complete error characterization as part of the solution. It can also be fast and well suited for conducting an ensemble of inversions with varying assumptions. But it has three limitations:
1. It requires construction of the Jacobian K = ∇xF, which may be computationally impractical for a very large state vector or for a nonlinear problem where the Jacobian would have to be re-constructed at each iteration toward the solution.
2. It requires the assumption of Gaussian errors, which may not always be appropriate and in particular does not guarantee positivity of the solution.
3. It does not accommodate prior constraints other than specified through Bayes’ theorem.
Other approaches to solving the inverse problem that lift these limitations will be presented in subsequent sections. The reader is encouraged to consult the simple scalar example of Section 11.3 in order to develop intuition for the material presented here. Many of the equations derived here have scalar equivalents in Section 11.3 that are easier to parse and understand.
11.5.1 Optimal Estimate
Assuming Gaussian distribution of errors, the PDFs to be used for application of Bayes’ theorem are given by (11.48):
(11.73)
−2 ln P(y| x) = (y − F(x))TSO−1(y − F(x)) + c2
(11.74)
from which we obtain by application of Bayes’ theorem, P(x| y) ∝ P(x)P(y| x):
(11.75)
Here c1, c2, c3 are constants. The optimal estimate is defined by the maximum of P(x|y), or equivalently by the minimum of the scalar-valued χ2 cost function J(x):
(11.76)
We find this minimum by solving ∇xJ(x) = 0:
(11.77)
where KT = ∇xFT is the transpose of the Jacobian matrix. Equations (11.26) and (11.27) in Section 11.3 are the scalar analogues.
Let us assume that F(x) is linear or can be linearized as given by (11.2), i.e., F(x) = Kx + c where c is a constant, and for simplicity of notation let c = 0 (this can always be enforced by replacing y by y – c). We then have
(11.78)
The solution of (11.78) is straightforward and can be expressed in compact form as
(11.79)
where G is the gain matrix given by
G = SAKT(KSAKT + SO)−1
(11.80)
G describes the sensitivity of the optimal estimate to the observations, i.e., . It is a valuable diagnostic for the inversion as it tells us which observations contribute most to constrain specific components of the optimal estimate. Equations (11.78), (11.79), and (11.80) have scalar analogues (11.17), (11.19), and (11.20) in Section 11.3.
The posterior error covariance matrix of can be calculated in the same manner as in Section 11.3 by rearranging the right-hand side of (11.75) with F(x) = Kx to be of the form . This yields
(11.81)
Again, note the similarity of this equation to its simple scalar equivalent (11.24) in Section 11.3. An important feature of the analytical solution to the inverse problem is that it provides a full characterization of errors on the optimal estimate through , as well as a diagnostic of the influence of different observations through G.
11.5.2 Averaging Kernel Matrix
Error characterization in the analytical solution to the inverse problem allows us to measure the capability of the observing system to constrain the true value of the state vector. This is done with the averaging kernel matrix , representing the sensitivity of the optimal estimate to the true state x. A is the product of the gain matrix and the Jacobian matrix K = ∂y/∂x:
A = GK
(11.82)
Replacing (11.82) and y = Kx + εO into (11.79) we obtain an alternate form for :
(11.83)
or equivalently
(11.84)
where In is the identity matrix of dimension n. Equations (11.21) and (11.22) in Section 11.3 are scalar analogues. A is a weighting factor for the relative contribution to the optimal estimate from the true state vs. the prior estimate. Ax represents the contribution of the true state to the solution, (In – A)xA represents the contribution from the prior estimate, and GεΟ represents the contribution from the random observational error mapped onto state space by the gain matrix G. A perfect observational system would have A = In. (In – A)(xA – x) is called the smoothing error. From (11.84) we can also derive an alternate expression for the error covariance matrix :
(11.85)
from which we see that can be decomposed into the sum of a smoothing error covariance matrix (In − A)SA(In − A)T and an observational error covariance matrix in state space GSOGT. The smoothing error covariance matrix describes the smoothing of the solution by the prior constraints. The observational error covariance matrix describes the noise in the observing system.
Algebraic manipulation yields an alternate form of the averaging kernel matrix as
(11.86)
which relates the improved knowledge of the state vector measured by A to the variance reduction previously discussed for the scalar problem (see (11.25) for the scalar analogue). This is a convenient way to derive A from knowledge of .
The averaging kernel matrix constructed from knowledge of SA, SO, and K is a very useful thing to know about an observing system. When designing the observing system it can be used to evaluate and compare the merits of different designs for quantifying x. By relating the observed state to the true state, it enables comparison of data from different instruments (Rodgers and Connor, 2003; Zhang et al., 2010) Box 11.5 illustrates the utility of the averaging kernel matrix in interpreting satellite data.
Box 11.5 Averaging Kernel Matrix for an Observing System
We illustrate the analytical solution to the inverse problem with the retrieval of carbon monoxide (CO) vertical profiles from the MOPITT satellite instrument (Deeter et al., 2003). The instrument makes nadir measurements of the temperature-dependent IR terrestrial emission at and around the 4.6 μm CO absorption band. Atmospheric CO is detected by its temperature contrast with the surface. The radiances measured at different wavelengths constitute the observation vector for the inverse problem. The state vector is chosen to include CO mixing ratios at seven different vertical levels from the surface to 150 hPa, plus surface temperature and surface emissivity. The forward model is a radiative transfer model (RTM) computing the radiances as a function of the state vector values. The observational error covariance matrix is constructed by summing the instrument error covariance matrix (obtained from knowledge of instrument noise) and the forward model error covariance matrix (obtained from comparison of the RTM to a highly accurate but computationally prohibitive line-by-line model). The prior CO vertical profile and its error covariance matrix are climatological values derived from a worldwide compilation of aircraft measurements. The prior surface temperature is specified from local assimilated meteorological data and the prior surface emissivity is taken from a geographical database. The forward model is nonlinear, in particular because the sensitivity to the CO vertical profile depends greatly on the surface temperature; thus a local Jacobian matrix needs to be computed for each scene.
Box 11.5 Figure 1 (a) shows the averaging kernel matrix A constructed in this manner for a typical ocean scene. Here, A is plotted row by row for the CO vertical profile elements only, with each line corresponding to a given vertical level indicated in the legend. The line for level i gives , the sensitivity of the retrieval at that level to the true CO mixing ratios at different levels. A perfect observing system (A = In) would show unit sensitivity to that level () and zero sensitivity to other levels. However, we see from Box 11.5 Figure 1 that the averaging kernel elements are much less than 1 and that the information is smoothed across vertical levels.
Box 11.5 Figure 1 Retrieval of CO mixing ratios by the MOPITT satellite instrument for a scene over the North Pacific. Lines with different colors in (a) show the rows of the averaging kernel matrix for seven vertical levels from the surface to 150 hPa. (b) Shows the MOPITT retrieval (solid line with symbols and posterior error standard deviations) together with a validation profile measured coincidently from aircraft. The dashed line represents the smoothing of the aircraft profile by the MOPITT averaging kernel matrix.
Fro
m Jacob et al. (2003).
Consider the retrieval of the CO mixing ratio at 700 hPa (blue line). We see that the retrieved value at 700 hPa is actually sensitive to CO at all altitudes, so that it is not possible from the retrieval to narrowly identify the CO mixing ratio at 700 hPa (or at any other specific altitude). The temperature contrast between vertical levels is not sufficient. We retrieve instead a broad CO column weighted toward the middle troposphere (700–500 hPa). In fact, the retrieval at 700 hPa is more sensitive to the CO mixing ratio at 500 hPa than at 700 hPa. Physically, this means that a given mixing ratio of CO at 500 hPa will give a spectral response similar to a larger mixing ratio at 700 hPa, because 500 hPa has greater temperature contrast with the surface.
Consider now the retrieval of CO in surface air (black line). There is some thermal contrast between surface air and the surface itself, but the signal is very faint. If we had a perfect observing system we could retrieve it; because of observational error, however, the sensitivity of the surface air retrieval to surface air concentrations is close to zero. In fact, the surface air retrieval is very similar to that at 700 hPa and exhibits the same maximum sensitivity at 700–500 hPa.
Inspection of the averaging kernel matrix in Box 11.5 Figure 1 suggests that the retrieval only provides two independent pieces of information on the vertical profile, one for 700–500 hPa (from the retrievals up to 500 hPa) and one for above 300 hPa (from the retrievals above 350 hPa). We can quantify the DOFS by the trace of the averaging kernel matrix, reading and adding up from the figure the values for the seven vertical levels. Starting from the lowest level, we find a DOFS of 0.09 + 0.23 + 0.33 + 0.21 + 0.20 + 0.22 + 0.20 = 1.4. Thus the retrieval mostly provides a CO column weighted toward the middle troposphere, with a smaller additional piece of information in the upper troposphere.
Modeling of Atmospheric Chemistry Page 62