• Micro-based data sets, which calculate inequality measures directly from household surveys.9 These include: CEPALSTAT, by the UN Economic Commission for Latin America and the Caribbean (ECLAC), which provides income distribution estimates for Latin American countries and is computed by ECLAC based on the micro-data transmitted by statistical offices in the region; the Standard Indicators of the Commitment to Equity (CEQ) Institute, Tulane University; the income distribution estimates underpinning the EUROMOD microsimulation model (developed by the University of Essex); the OECD Income Distribution Database (IDD), which provides indicators and semi-aggregated tables computed by national contact points in member countries based on common definitions and treatments; the micro-data files on the distribution of income and wealth provided by the Luxembourg Income Study (LIS); the Socio-Economic Database for Latin America and the Caribbean (SEDLAC), compiled by the Center of Distributive, Labor and Social Studies at Universidad Nacional de La Plata and the World Bank; and PovcalNet (World Development Indicators, World Bank).10
• Secondary Sources Data Sets, which combine inequality indicators from a variety of other sources, typically from household surveys: these include the All the Ginis (ATG); the GINI Project; and the World Income Inequality Database (WIID, UNU-WIDER) (see Atkinson and Brandolini, 2001).
• Imputation and Statistical Inference-Based Data Sets. This type of data set generates inequality measures through a variety of imputation and statistical inference methods instead of relying directly on household surveys or unit-record data sets. These include the Global Consumption and Income Project (GCIP); the Standardized World Income Inequality Database (SWIID);11 and the University of Texas Income Inequality Project (UTIP).
• Finally, there is the World Inequality Database (WID.world) launched in January 2017, whose precursor was the World Top Incomes Database (WTID) (Alvaredo et al., 2015a). Unlike the other data sets, WID.world uses information from tax returns (mainly) to estimate the share of income earned by certain groups at the top of the distribution (such as the richest 1% or 0.5% of the population) and gross up the income totals to match their equivalent to National Accounts. WID.world includes series on income inequality for more than 30 countries, spanning most of the 20th and early 21st centuries, with over 40 additional countries now under study. The database was recently extended to study the long-run evolution of top wealth shares (Saez and Zucman, 2016; Alvaredo, Atkinson, and Morelli, 2016; and Garbinti, Goupille-Lebret, and Piketty, 2017). The key feature of WID.world is to combine fiscal data (tax data, in particular), survey data, and national accounts data in a systematic manner. This characteristic sets it apart from the other data sets that rely on survey data almost exclusively, and from the data sets that rely on imputations or statistical inference. As stated on its website: “The overall long-run objective of WID.world is to be able to produce Distributional National Accounts (DINA), that is, to provide annual estimates of the distribution of income and wealth using concepts of income and wealth that are consistent with the macro-economic national accounts.”12 Chapter 6 of this volume discusses in detail the proposed methodology to accomplish this objective.
The above data sets differ in a number of ways. First, and most obviously, they differ in their geographical coverage, hence on the quality of the underlying national data feeding them. Second, they differ in the nature of the individual welfare metric used: given that in most of the developing world household surveys are consumption-based, the existing data sets that are global in reach report consumption inequality for most developing and emerging countries, and income inequality for advanced countries and Latin America. Third, for advanced countries, economic inequality is typically measured based on equivalized income (i.e., household income is measured by pooling the income streams of each household member and then, attributing this to each member, based on an “adjustment” to reflect differences in needs across households of different size and structure) while in the rest of the world, per capita consumption or income is used. Fourth, while in principle the income variable should be disposable income (i.e., income after direct taxes and current transfers), this is often not clear when it comes to developing countries’ data, where it is often difficult to establish whether the reported income is net or gross of direct taxes, or pre- or post-transfers. Likewise, while income or consumption should include consumption of goods produced for own production and imputed rent of owner’s occupied housing, in practice this is not the case in general and, in some cases, it is hard to tell.
Lastly, the databases differ on whether adjustments (and which ones) are made to the micro-data to correct for under-reporting, to eliminate outliers, or to address missing responses.13 While in most OECD countries such adjustments and data cleaning are performed by the statistical offices themselves, before making the data available to users, such practice is far less common in low- and middle-income countries, implying that the international data sets with broad geographic coverage often rely on adjustments implemented by the agency responsible for the secondary data, or on data nonadjusted for item nonresponse. For the data sets that use imputation methods or statistical inference, results are sensitive to the methods utilized, and one often does not have the full information on the characteristics of the underlying data even if the methods are described with care (which is also not always the case).
Given the differences in definitions and methods across data sets, the analyses can therefore yield conflicting pictures of economic inequality, both in terms of levels and trends, depending on the data set used (see Bourguignon, 2015b; Ferreira, Lustig, and Teles, 2015; Gasparini and Tornarolli, 2015; Jenkins, 2015; Ravallion, 2015; Smeeding and Latner, 2015; and Wittenberg, 2015). For example, in the case of sub-Saharan Africa (SSA) and its inequality dynamics over the 1990s and 2000s, the IMF Fiscal Monitor (2012, p. 51) suggests that in 11 out of 16 SSA countries inequality had fallen between 1985–95 and 2000–10. However, as shown in Table 3.1, when compared with the World Bank PovcalNet inequality trends for the same countries, not only levels but, more importantly, also the direction of change is sensitive to the choice of data set. Matters get even more complicated if we draw on other data sets as well.
Furthermore, important questions such as whether or not economic inequality has converged across countries in the world—the finding that income inequality has fallen in what had been highly unequal countries, and risen in countries that had been more egalitarian (Benabou, 1996; Bleaney and Nishiyama, 2003; and Ravallion, 2003)—are affected by the choice of data set. As shown in Lustig and Teles (2016), different data sets frequently produce different results in terms of inequality convergence, even when the countries, welfare concept, inequality metric, and time period are the same.
Table 3.1. Change in Inequality 1985–95 to 2000–10
Country
IMF Fiscal Monitor
PovcalNet Average
Côte d’Ivoire
5.0
6.5
Ghana
2.4
6.3
Kenya
-6.2
-2.1
Madagascar
-1.0
0.2
Niger
-6.2
0.4
Senegal
-7.8
-7.6
Tanzania
-3.1
2.3
Zambia
-13.5
-3.5
Note: Change in inequality is measured as the percent change in the Gini coefficient between two points in time.
Source: Author, based on Table 5 in Ferreira, F.H.G., N. Lustig, and D. Teles (2015), “Appraising cross-national income inequality databases: An introduction,” in Ferreira, F.H.G. and N. Lustig, “Appraising cross-national income inequality databases,” special issue, Journal of Economic Inequality, Vol. 13(4), pp. 497–526. StatLink 2 http://dx.doi.org/10.1787/888933839506.
Assessments of fiscal redistribution are als
o sensitive to the choice of data sets. Figure 3.1 shows the difference between the Gini coefficients for disposable (i.e., net) incomes and for market incomes for the same survey and country, as estimated both by CEQ (which calculates them through a detailed fiscal incidence analysis, validated by local experts and through a series of robustness checks) and SWIID (where all data points are estimated through multiple imputation methods using whichever data are available from other sources as the basis for the “rectangularization”). While discrepancies between the two sources are not systematic (i.e., sometimes SWIID’s estimate of redistribution is higher and sometimes lower than CEQ’s), they can be quite large (e.g., Guatemala and Indonesia) or contradictory (e.g., Armenia, where taxes and benefits are unequalizing according to SWIID—i.e., net income inequality is higher than market income inequality—and equalizing in CEQ).14
Figure 3.1. Fiscal Redistribution: Change in Gini from Two Databases
Note: Difference in Gini points. CEQ’s Disposable Income is equivalent to SWIID’s Net Income, e.g., market income after taxes and government cash transfers for the scenario that considers contributory pensions as government transfers. Based on Younger and Khachatryan (2014) in the case of Armenia; Paz Arauco et al. (2014) for Bolivia; Higgins and Pereira (2014) for Brazil; Sauma and Trejos (2014) for Costa Rica; Beneke, Lustig, and Oliva (2018) for El Salvador; Hill et al. (2017) for Ethiopia; Cabrera, Lustig, and Moran (2015) for Guatemala; Afkar, Jellema, and Wai-Poi (2017) for Indonesia; Scott (2014) for Mexico; Jaramillo (2014) for Peru; Inchauste et al. (2017) for South Africa; and Bucheli et al. (2014) for Uruguay. For both data sources, contributory pensions were classified as a government transfer (CEQ has estimates for pensions as deferred income—part of market income—as well). Comparisons for Bolivia, Brazil, Peru, and Uruguay refer to data collected in 2009; for Costa Rica, Guatemala, Mexico, and South Africa to 2010; for Armenia and El Salvador to 2011; for Indonesia to 2012. The comparison for Ethiopia is made with the CEQ estimate for 2011 and the SWIID estimate for 2010.
Source: CEQ Institute Data Center on Fiscal Redistribution (http://commitmentoequity.org/datacenter) and SWIID: V 5.0 database. StatLink 2 http://dx.doi.org/10.1787/888933839487.
The above discussion makes clear that, basically, many of the limitations of the international databases are due to the limitations of their main input: country-level household surveys. We turn to this issue in the next section.
Household Surveys: Data Challenges
The overwhelming majority of analysis on income, consumption, and wealth inequality over the last four decades has been based (directly or indirectly) on household surveys, the main data source for research on distribution. While data availability, coverage, and quality have improved relative to 2009 when the Stiglitz-Sen-Fitoussi report was released, there are still a number of important issues to be resolved. Furthermore, the problems faced by high-income countries in measuring distribution of economic well-being are orders of magnitude larger in poorer and middle-income countries, where surveys are undertaken infrequently (if at all), generally based on different welfare metrics (either income or consumption), with potentially inadequate and outdated sampling frames, and often with large rates of nonresponse (see, for instance, Ferreira, Lustig, and Teles, 2015).
Most OECD countries undertake regular (annual, sometimes every 2 or 3 years) collections of income distribution data, based on household surveys or registers that started in the 1980s or 1990s. Household budget surveys are undertaken in OECD countries around every 5 years, typically based on diaries that households use to record the value of their consumption expenditures.15 Even in advanced OECD countries, however, there are important challenges in terms of coverage of various income streams (e.g., imputed rents) or asset types (e.g., pension wealth or the stock of consumer durables), of frequency of data collections, and of timeliness of the resulting estimates, which in many countries lag by years the timing of releases of GDP data. In these areas, despite the many initiatives that have been taken by statistical offices since 2009, we are still far from the objective of feeding policy discussion with income distribution data that are as timely as conventional measures of quarterly GDP growth.
The picture of data availability is different in the developing countries. The number of low- and middle-income countries with household surveys has increased dramatically since 1990. For instance, the World Bank estimate of extreme poverty in 1990 was based on data for only 22 countries. The data in the World Bank’s PovcalNet presently cover 153 countries, of which 34, as of July 2013, are classified as High Income (Atkinson, 2016).16 However, lack of data is still a problem. In the Middle East and North Africa region, where there are 19 countries, only around half are covered by PovcalNet. Furthermore, according to World Bank (2016), the largest possible set of countries for which at least two comparable data points are available between 2008 and 2013 was 83 countries. This set covered 75% of the world’s population but fewer than half of the world’s countries; population coverage was 94% in the East Asia and Pacific region but only 23% in sub-Saharan Africa.17,18 Even if surveys exist, in many countries governments still restrict access to the micro-data, a factor that limits the ability of independent researchers to carry out an analysis of their own.
A second problem is that, with exceptions, household surveys collect data on either income or consumption, which significantly limits the possibility of undertaking the joint analysis of both variables and rigorous cross-country comparisons. Of the 83 countries included in World Bank (2016), for example, 34 contained consumption data and 49 contained income data. The latter included primarily OECD countries and Latin America. If OECD high-income countries are excluded, of the 1,165 data sets included in the World Bank’s PovcalNet database, 41% (59%) were income- (consumption-) based (Table 3.2). While the distribution of income—if income is properly measured—may closely mirror that of consumption expenditures in countries at low levels of economic development, that assumption becomes less tenable as countries develop and household saving rates increase, casting doubts on the practice of combining measures of income and consumption inequalities as if they were describing the same underlying phenomenon.19
While there are international conventions and standards for measuring income distribution (first articulated in the 2001 Canberra Group Handbook, codified in the 2003 standards adopted by the International Conference of Labour Statisticians, and brought up to date with the 2011 revision of the Canberra Handbook), important issues—such as the systematic under-reporting of incomes at both extremes—subsist.
Table 3.2. Income and Consumption Distributions in PovcalNet Number of datasets
Note: This table excludes distributions from high-income countries available in the LIS and/or other databases.
Source: Ferreira, F.H.G. et al. (2016), “A global count of the extreme poor in 2012: Data issues, methodology and initial results,” Journal of Economic Inequality, Vol. 14(2), pp. 141–172. StatLink 2 http://dx.doi.org/10.1787/888933839525.
Second, while the World Bank’s Living Standards Measurement Surveys (LSMS) use a series of guidelines to measure household consumption, no international conventions or guidelines exist in this field. Frequently, the instruments that are used to collect micro-level data on consumption expenditures (household budget surveys) are conducted with the main goal of deriving average weights for the consumer price index rather than to assess household economic well-being. While deemed easier to implement than income surveys in less developed countries where informality is widespread, the comparability of these estimates is affected by factors such as the length of the reference period considered, and the length of the list of items that households are asked to report (Beegle et al., 2012). These deficiencies led the Commission on Global Poverty to include developing a set of statistical standards for household consumption as one of its key recommendations (Atkinson, 2016).
Third, international guidelines on measuring the distribution of household wealth have yet to go through a simila
r process of convention-setting by an international body in charge of setting standards globally.20
Finally, even when measures exist on the distribution of household income, consumption, and wealth, very few countries undertake these data collections in ways that would allow the joint distribution of household income, consumption, and wealth to be analyzed in a coherent way, one of the key recommendations of the Stiglitz-Sen-Fitoussi report.21
Even when international standards and guidelines exist, however, countries’ data collections may adhere to them to different degrees, implying that some items are available and included in measured household income and consumption for some countries (e.g., imputed rents, taxes paid, and agricultural goods produced for own consumption) but not for others.22 In the best of cases, the income or consumption concept reported in household surveys corresponds to what the Canberra convention would describe as “disposable income” and “final consumption expenditures,” but not all countries are able to adequately measure these concepts.23 Additionally, there is evidence that the problems related to unit nonresponse, item nonresponse, and measurement errors in household surveys have increased over time (Groves et al., 2009; Meyer, Mok, and Sullivan, 2015).
Although there are countries for which long historical series on the distribution of wealth from a variety of administrative registries exist, survey-based data collections on the distribution of household wealth are much more recent than those for income, and the available data are significantly less comparable across countries than income data, mainly on account of the different capacity of surveys to capture developments at the top end of the distribution. Wealth distribution data are available, with varying degrees of quality, for the United States (based on the Survey of Consumer Finances), the United Kingdom (based on the Wealth and Asset Survey), countries in the euro area (through the Household Finance and Consumption Survey coordinated by the European Central Bank), as well as for Australia, New Zealand, Canada, China, Indonesia, Norway, Korea, Japan, and Chile.24 As discussed in Chapter 4 of this volume, there are also a series of new initiatives to measure the distribution of wealth by gender. In the case of the distribution of wealth, survey estimates are even more likely to go wrong, simply because wealth is much more unequally shared than income, so that all the problems associated with the estimation of the shares of small groups of top wealth holders are exacerbated.
For Good Measure Page 9