Modeling of Atmospheric Chemistry
Page 56
Intercomparisons of different models provide yet another way of estimating model error. This is done regularly in community assessments to determine how well different state-of-science models can reproduce specific aspects of atmospheric composition, or to estimate errors associated with future projections. It involves the comparison of simulations conducted with different models for the same conditions. For example, Figure 10.11 shows predicted surface concentrations of particle matter (PM10) forecast for the same day by five different regional air quality models using the same chemical and meteorological initial and boundary conditions. On this particular day, several models show the formation of a dust layer over the Sahara and two of them predict an intense transport of dust particles toward eastern Europe. Other models do not capture this event and predict low concentrations of particles in most areas of Europe. The differences between these projections show the impact of the choices made in these different model formulations. In the absence of better information, a “wisdom of crowds” assumption is often made that the average of the different models (also shown) is better than any single model.
Figure 10.11 Simulated surface concentration [μg m–3] of particulate matter (PM10) on September 1, 2015, 20 UTC by five regional models (a–e) contributing to a multi-model ensemble prediction system for regional air pollution in Europe. Two of these models clearly show an intrusion of dust-rich air from the Sahara toward Southern and Eastern Europe with high dust concentrations extending from the Baltic to the Black Sea. Other models do not reproduce such a strong intrusion. The average of the six models involved in this air quality prediction is shown in (f) and is compared with observations (small color dots). The ensemble simulation is in rather good agreement with the data in the western and northern parts of Europe, but with the lack of measurements in Eastern Europe no conclusion can be drawn regarding the intensity of the Saharan dust intrusion. The color scale is identical for all graphs.
From the Copernicus Atmosphere Monitoring Service (CAMS) coordinated by ECMWF and supported by the European Commission.
10.4 General Considerations for Model Evaluation
10.4.1 Selection of Observations
Model evaluation generally relies on the one-to-one comparison of observations to model values sampled at the same location and time. It is important to recognize that simulated and observed fields may not be exactly comparable. The model may simulate a spatial average over a grid cell while the observations are from a particular location that may not reflect the grid cell average. This is called representation error and is discussed in Chapter 11 in the context of inverse modeling. Spatial interpolation of the observations (see Section 4.16) may help to reduce representation error, but the error is often not random. For example, sites from surface pollution networks are often concentrated in urban areas or in the vicinity of point sources, introducing bias when comparing to a coarse-resolution model that simulates the broader regional atmosphere. It may be necessary to exclude such sites from the comparison as non-representative.
Representation error applies to temporal variability as well. Time series measured at surface sites or from aircraft often show high-frequency anomalies such as spikes driven by concentrated plumes or local meteorological conditions. The model may not be designed to capture these anomalies, either because of grid averaging or because of temporal averaging of the input data. In addition, small transport errors may cause the model to slightly misplace plumes in a way that may not be relevant for general model evaluation but weighs heavily in model comparison statistics. Such statistical outliers in the distribution of observations can be illuminating in terms of understanding processes, and often deserve attention on a case-by-case basis. However, they should be excluded from a general model evaluation data set.
Surface air observations over land often show a large diurnal cycle driven by suppressed vertical mixing in the shallow stratified surface layer at night. Nighttime concentrations may thus be very low for species taken up by the surface, and very high for species emitted at the surface. Coarse-resolution models typically cannot capture this nighttime stratification, which may not be relevant for broader model evaluation since it affects only a small volume of atmosphere and may be viewed again as a representation error. In such cases, the nighttime values must be excluded from the statistical data used for model evaluation and the focus must be on simulating daytime values, when the surface measurements are more representative of a deep mixed layer that can be captured by the model.
10.4.2 Use of Satellite Observations
Satellites provide observations with coarse spatial resolution in the vertical (nadir view) or in the horizontal (limb view) and the model fields must be correspondingly averaged for comparison. A difficulty is that the satellite retrievals make assumptions about the atmosphere (prior information) that may be inconsistent with the model atmosphere. In that case, a straight comparison between model and observed fields can be very misleading. It is essential to re-process the model or observed fields to simulate what the satellite would see if it was observing the model atmosphere, rather than the true atmosphere with the assumed prior information. In the case of an optimal estimate satellite retrieval for gases (Rodgers, 2000; Chapter 11), the satellite reports vertical concentration profiles as
(10.9)
where the vector of dimension n is the retrieved profile consisting of concentrations at n vertical levels, x is the true vertical profile, xA is the prior estimate, A is the averaging kernel matrix (see Chapter 11), and I is the identity matrix. The satellite data set provides not only but also A and xA. One can then compare the observations to the corresponding model fields computed as
(10.10)
where xM is the actual model vertical profile (which would be the true profile if the model were perfect). It is important to recognize that the prior term (I – A)xA is common to and , and may give the illusion of better agreement between model and observations than is actually the case. See Zhang et al. (2010) for methods to address this issue.
As another example, the column concentration Ω of a gas reported by a solar backscatter instrument is often retrieved as
(10.11)
where Ωs is the slant column measured by the satellite along its line of sight, and is the air mass factor (AMF) that converts the slant column to the actual vertical column. The AMF was introduced in Section 5.2.4 for a non-scattering atmosphere. In that case, it was a simple geometric conversion factor. For the actual case of a scattering atmosphere, the AMF must be computed with a radiative transfer model that accounts for the scattering properties of the surface and the atmosphere, and for the assumed relative vertical concentration profile (shape factor) of the gas being measured (Palmer et al., 2001). The shape factor assumed in the retrieval may be inconsistent with that in the model, and this then biases the comparison of model and observed Ω. The satellite data set generally includes not only Ω but also the corresponding Ωs (or AMF, from which Ωs can be obtained). For the purpose of model evaluation one must discard the reported Ω, recompute the AMF by using the local shape factor from the model, and apply it to the measured slant column Ωs. See González Abad et al. (2015) for simple methods to do this.
As yet another example, satellite data for aerosol optical depth (AOD) are generally retrieved from nadir measurements of the top-of-atmosphere reflectance from the Earth’s surface and its atmosphere. The aerosol contribution to this reflectance, from which the AOD is derived, is obtained with a radiative transfer model including assumed aerosol size distributions and refractive indices. These assumed aerosol characteristics are generally different from those simulated locally in the chemical transport model and used to compute the model AOD. One-to-one comparison of model to observed AODs is still valid inasmuch as the AOD is a physical diagnostic quantity. The comparison is difficult to interpret, however, because differences in AODs may be attributable to model errors in either aerosol mass concentrations or aerosol optical properties, and the assumed aerosol optical prop
erties in the satellite retrieval are also subject to error. This is a problem in particular for data assimilation, as there are multiple ways to correct a model-observation difference in AODs.
10.4.3 Preliminary Evaluation and Temporal Scales
Section 10.5 presents different statistical metrics for evaluating the ability of a model to fit large observational data sets. A first step in model evaluation should be to visually inspect the simulated and observed fields for any prominent features that need to be better understood. This visual inspection should encompass as many of the variables as possible, for different spatial domains and temporal scales, as the different perspectives can provide unique information in the driving processes and the ability of the model to simulate them. For example, examination of mean vertical profiles in an aircraft data set offers quick information on the ability of the model to simulate boundary layer mixing, planetary boundary layer (PBL) depth, ventilation to the free troposphere, and any large-scale free tropospheric bias. A large contrast in observations over land and ocean may point to the need for separate statistical evaluation of both. An inability of the model to pick up this contrast may call into question the simulation of transport or chemical loss. For time series of large data sets it can be insightful to identify dominant patterns in the observations using empirical orthogonal functions (EOFs) and diagnose the ability of the model to reproduce these patterns. Calculation of EOFs is described in Appendix E.
We elaborate here on the consideration of different timescales when comparing model to observations. These timescales can be usefully separated as intra-day (diurnal), day-to-day (synoptic), seasonal, and interannual (or long-term trends). Concentrations at surface sites often show large diurnal variations due to mixed layer growth and decay, surface sources and sinks, and photochemistry. Comparison of mean diurnal variations between model and observations can test the model representation of these processes. An example is given in Figure 10.12 for the marine boundary layer.
Figure 10.12 Mean diurnal cycle of reactive gaseous mercury (RGM) in surface air over the Pacific. Observations from ship cruises (black lines, interquartile range in shading) are compared to model results with two different mechanisms for photochemical oxidation of elemental mercury (Hg0) to RGM. The model with halogen oxidants (red line) features a steeper morning rise than the model with oxidation by OH (blue line) and is more consistent with observations.
Reproduced with permission from Holmes et al. (2009).
Model evaluation on a day-to-day scale is useful to assess the capability of the model to account for synoptic-scale variations in chemistry, boundary layer dynamics, and the advection of different air masses. Figure 10.13 shows the complexity of the day-to-day variation of species concentrations at the surface. In this particular example, which compares calculated and measured concentrations of carbon monoxide (CO) and ozone, the model slightly underestimates the mixing ratio of both species (mean bias) as well as the amplitude of the fluctuations. Further analysis would quantify these differences and assess the overall skill of the model.
Figure 10.13 Time evolution of the surface mixing ratios [ppbv] of carbon monoxide (a) and ozone (b) in Shangdianzi, close to Beijing, China in January 2010. The values (red line) provided by the coupled meteorological and chemical regional model (WRF-Chem) are compared with surface measurements (black line). The comparison suggests that the model captures most high-pollution events (high CO concentrations) when the direction of the winds favors transport from pollution sources in the urban and industrial regions of China. During these events, ozone concentrations are generally low, presumably as a result of ozone titration by high concentrations of nitrogen oxides (not shown). During periods characterized by clean air, the model underestimates background carbon monoxide and ozone. Variations associated with diurnal variations in the height of the boundary layer are clearly visible in the ozone signal.
Results provided by Idir Bouarar, Max Planck Institute for Meteorology (MPI-M). Measurements are from the Global Atmospheric Watch (GAW).
Seasonal variations provide information on influences from different climatological regimes, photochemistry, and emissions. Plotting simulated and observed mean seasonal cycles can provide a quick revealing analysis of simulation bias. Seasonal amplitude is generally much larger than interannual amplitude for species with lifetimes less than a few months, so that one can usefully compare seasonal cycles in models and observations from different years. As an example, Figure 10.14 compares the seasonal variation of simulated ozone with ozonesonde data for different latitudes and altitudes.
Figure 10.14 Annual cycle of the ozone mixing ratio [ppbv] at three atmospheric levels (750 hPa, 500 hPa, and 250 hPa) averaged over four latitude bands (90–30° S, 30° S-eq, eq-30° N, 30–90° N). Comparison of multi-year climatological ozonesonde measurements (Logan, 1999; Thompson et al., 2003) with three model simulations by the 3-D chemical transport model of Wild (2007). The differences between the BASE and the IIASA cases result from differences in the emissions of ozone precursors; the differences between the IIASA and ACCENT cases reflect differences in meteorology, model resolution, and the lightning source of NOx.
Reproduced with permission from Wild (2007).
Finally, comparison of observed and simulated interannual variability and long-term trends in species concentrations indicates how well the model accounts for climate modes and trends in emissions. As an example, Figure 10.15 compares simulated and observed multi-year records of NO2 column in Europe and east China, testing the ability of emission inventories used in models to reproduce the trend of NOx emissions in each region.
Figure 10.15 Comparison of calculated and observed seasonal evolution of the NO2 column [cm–2] for Europe (a) and East Asia (b). The black line represents retrievals from the GOME-2 and SCIAMACHY instruments. The numerical simulations are provided by two different models (TM5 and MOZART) with no data assimilation (blue and yellow lines) and by the ECMWF weather forecasting system with coupled chemistry and data assimilation (red line). There is good agreement between model results and observations in Europe, but not in East Asia, specifically during wintertime.
Reproduced with permission from Eskes et al. (2015).
10.4.4 Aerosol Metrics
Aerosol concentrations are characterized in observations by a range of metrics including total mass concentrations for different species, total number concentrations, condensation nuclei (CN) and cloud condensation nuclei (CCN) number concentrations, size distributions (sometimes including speciation), hygroscopicity, aerosol optical depth (AOD), and absorbing aerosol optical depth (AAOD). Single-particle measurements provide additional information on particle phase and on the degree of internal mixing of different aerosol species. All of these measurements are relevant for model evaluation and provide different perspectives on aerosol sources and properties.
Many chemical transport models do not resolve aerosol microphysics and simulate only the speciated aerosol mass concentrations, treating individual aerosol species as chemicals in the model equations and ignoring the microphysical terms. These “mass-only” models can be compared directly to measurements of aerosol mass concentrations to evaluate the simulation of aerosol sources, chemistry, and loss by scavenging. They often assume fixed aerosol size distributions and optical properties for the purpose of simulating heterogeneous chemistry, aerosol radiative effects, and scavenging efficiencies. These can be compared to observations as part of the evaluation of model parameters. The simulation of radiative effects can be evaluated with measurements of AOD and AAOD, as illustrated in Figure 10.16.
Figure 10.16 Aerosol optical depths over the Southeast USA in August–September 2013. The figure compares a mass-only aerosol simulation with the GEOS-Chem global model (background grid) to observations from the ground-based AERONET network (circles). Observations are highest in the western part of the region, which the model attributes to a dominant biogenic organic aerosol source.
From Kim et al. (2015).
> Models including aerosol microphysics predict the number and size distributions of different aerosol species in addition to their mass. They can simulate the degree of mixing between different aerosol species and interactions with clouds. Such models can be evaluated with the full range of aerosol observations listed above to lend insight into particle nucleation, aerosol optical properties, chemical processes, and cloud effects. Figure 10.17 gives an example of model evaluation with observed size distributions.
Figure 10.17 Comparison of the GEOS-Chem global model simulation with full aerosol microphysics to aerosol number size distributions measured at sites in Europe. The focus of comparison is to evaluate different model treatments of secondary organic aerosol (SOA). The BASE simulations assume SOA to be mainly biogenic, while the XSOA simulations include an additional anthropogenic source. The SURF simulations assume SOA formation to be kinetically limited by uptake to aerosol surfaces (irreversible uptake), while the MASS simulations assume SOA to be thermodynamically partitioned between the gas and pre-existing aerosol (reversible uptake). Results show that the best simulation is generally achieved for irreversible uptake including additional anthropogenic SOA. That simulation avoids in particular the overestimate in ultrafine aerosol concentrations, as the additional SOA promotes condensational growth of ultrafine aerosol to larger sizes.
Reproduced with permission from D’Andrea et al. (2013).