Educational Sciences: Theory & Practice

ISSN: 2630-5984

Effect of Differential Item Functioning on Test Equating

Kübra Atalay Kabasakal
Department of Educational Sciences, Faculty of Education, Hacettepe University, Ankara Turkey
Hülya Kelecioğlu
Department of Educational Sciences, Faculty of Education, Hacettepe University, Ankara Turkey

Abstract

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test length, DIF magnitude, and the test type. The MIRMs, in which the DIF factors were added as parameters, were compared with the Stocking–Lord (SL) method (one of the IRM-based calibration methods) and concurrent calibration method. According to the results, differences were found in the performances of the methods under the analyzed conditions. More specifically, the MIRMs were able to identify the DIF items, carry out the equating processes, and eliminate the biases caused by DIF in only one analysis. However, this does not indicate that using MIRMs is the best approach since the increase in sample size and test length generally had a positive effect on IRM-based equating, whereas MIRMs were less affected by these two conditions. Considering the IRM-based methods, it was found that separate calibration methods were more affected by the presence of DIF items compared to concurrent calibration. Moreover, this effect becomes most significant when DIF items are in common test and the magnitude of DIF is C.

Keywords
Test equating, Differential item functioning, Equating error, Equating bias, Multilevel item response models, Hierarchical Rasch Model.