The purpose of this work is to study the quality of the economic information from a household survey, with special reference to the accuracy of income data. The well known difficulty of collecting good-quality income data may derive on the one hand from the survey instruments, on the other hand from the respondents. The former may not be able to ensure the exhaustive registration of the various items that make up the total individual or household income. The latter may be reticent in providing the requested information or, at least, in providing accurate information. For this reason, the observed income levels often underestimate the actual income: the degree of the under-reporting is especially marked for income from self-employment as well as from financial assets. In this study we propose a two-stage method for evaluating and correcting the underestimation in income data, on the basis of the rich informative content of the Survey of Household Income and Wealth conducted in year 2000 by the Bank of Italy. At the first stage only the information from the sample survey is used in order to identify the income values that, being too low, are strongly suspected of underestimation: a total household income is considered too low if its corresponding equivalised income (once the household size and composition are taken into account) is not greater than a threshold which is defined as function of the first quartile of the distribution. Low income values are deleted and then replaced with newly estimated values resulting from a random regression imputation procedure: the imputed income is the predicted value from the linear model based on a set of explanatory variables that are correlated with income, including consumption expenditure, number of earners, characteristics of the head of household. Such a procedure works within imputation classes: in order to take account of the heterogeneity with reference to their economic status, the households are grouped on the basis of their sources of income, the labour market position of their members, the presence of spouse and children.The second stage consists in comparing the average incomes from the survey after the imputation with the average incomes from an external source, the National Accounts estimates. For this purpose we refer to the classification of income by source: wages and salaries, income from self-employment, pensions and other transfers, property income. In order to make a suitable comparison between the two sources, the definitions and concepts must be made homogeneous; in particular, the net wages and salaries from the survey are to be converted in gross values, through the estimation of tax and welfare contributions. From this comparison a correction coefficient is derived that increases the incomes from the survey and makes them consistent with the aggregate estimates from the National Accounts. The improvement in the data quality at the end of the procedure is evaluated through the changes at each stage in the main statistics of income distribution as well as in the main inequality indicators.
How the Under-Reporting Affects the Income Data from a Household Survey
QUINTANO C;
2004-01-01
Abstract
The purpose of this work is to study the quality of the economic information from a household survey, with special reference to the accuracy of income data. The well known difficulty of collecting good-quality income data may derive on the one hand from the survey instruments, on the other hand from the respondents. The former may not be able to ensure the exhaustive registration of the various items that make up the total individual or household income. The latter may be reticent in providing the requested information or, at least, in providing accurate information. For this reason, the observed income levels often underestimate the actual income: the degree of the under-reporting is especially marked for income from self-employment as well as from financial assets. In this study we propose a two-stage method for evaluating and correcting the underestimation in income data, on the basis of the rich informative content of the Survey of Household Income and Wealth conducted in year 2000 by the Bank of Italy. At the first stage only the information from the sample survey is used in order to identify the income values that, being too low, are strongly suspected of underestimation: a total household income is considered too low if its corresponding equivalised income (once the household size and composition are taken into account) is not greater than a threshold which is defined as function of the first quartile of the distribution. Low income values are deleted and then replaced with newly estimated values resulting from a random regression imputation procedure: the imputed income is the predicted value from the linear model based on a set of explanatory variables that are correlated with income, including consumption expenditure, number of earners, characteristics of the head of household. Such a procedure works within imputation classes: in order to take account of the heterogeneity with reference to their economic status, the households are grouped on the basis of their sources of income, the labour market position of their members, the presence of spouse and children.The second stage consists in comparing the average incomes from the survey after the imputation with the average incomes from an external source, the National Accounts estimates. For this purpose we refer to the classification of income by source: wages and salaries, income from self-employment, pensions and other transfers, property income. In order to make a suitable comparison between the two sources, the definitions and concepts must be made homogeneous; in particular, the net wages and salaries from the survey are to be converted in gross values, through the estimation of tax and welfare contributions. From this comparison a correction coefficient is derived that increases the incomes from the survey and makes them consistent with the aggregate estimates from the National Accounts. The improvement in the data quality at the end of the procedure is evaluated through the changes at each stage in the main statistics of income distribution as well as in the main inequality indicators.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.