This paper focuses on implementation and evaluation of imputation procedures, applied to the Italian students data from the OECD’s Programme for International Student Assessment (PISA 2003) . The analysis of Italian students data shows a large spread of non response item. This problem is dealt with developing and implementing five different procedures, that combine classification algorithms – like classification and regression tree - with some imputation techniques like iterative and sequential regression and hot deck imputation. Afterwards, a comparative evaluation of the applied imputation strategies has been performed by comparing the original data and the final data produced by each method. The assessment procedure has been led by developing some impact indexes that measure the impact of the data imputation at two levels. First, by evaluating the preservation of distributional accuracy (categorical variables) and the preservation of aggregates (quantitative variables), second, by studying the preservation of bivariate relations before and after imputation. Moreover, in order to summarize the large amount of information produced by the impact indexes and to graduate the five imputation approaches, some specific measures, called “ranking indicators”, are developed.
Imputation strategies and impact indexes to improve the data accuracy from students’ PISA survey
QUINTANO C;
2008-01-01
Abstract
This paper focuses on implementation and evaluation of imputation procedures, applied to the Italian students data from the OECD’s Programme for International Student Assessment (PISA 2003) . The analysis of Italian students data shows a large spread of non response item. This problem is dealt with developing and implementing five different procedures, that combine classification algorithms – like classification and regression tree - with some imputation techniques like iterative and sequential regression and hot deck imputation. Afterwards, a comparative evaluation of the applied imputation strategies has been performed by comparing the original data and the final data produced by each method. The assessment procedure has been led by developing some impact indexes that measure the impact of the data imputation at two levels. First, by evaluating the preservation of distributional accuracy (categorical variables) and the preservation of aggregates (quantitative variables), second, by studying the preservation of bivariate relations before and after imputation. Moreover, in order to summarize the large amount of information produced by the impact indexes and to graduate the five imputation approaches, some specific measures, called “ranking indicators”, are developed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.