El sesgo de los instrumentos de mediciónTests justos

  1. Gómez Benito, Juana
  2. Hidalgo Montesinos, María Dolores
  3. Guilera Ferré, Georgina
Journal:
Papeles del psicólogo

ISSN: 0214-7823 1886-1415

Year of publication: 2010

Issue Title: Metodología al servicio del psicólogo

Volume: 31

Issue: 1

Pages: 75-84

Type: Article

More publications in: Papeles del psicólogo

Abstract

Psychological assessment must ensure the equity and validity of interpretations and of any decisions taken as a result of them. That is it necessary the use of bias-free assessment instruments those are capable of evaluating the personal and social needs of individuals with different characteristics. The study about the possible bias of tests, or some of their items, has had great relevance in psychometric research for the last 30 years and it will probably continue to be an important focus of interest for professionals and researchers involved in psychological and educational testing. The aim of this paper is providing to the applied psychologist the background about bias, differential functioning and impact concepts, item or tests bias detection procedures and evaluation of its possible causes and, therefore, for improving the validity of psychological measurement.

Bibliographic References

  • Ackerman, T.A. (1992). A didactic explanation of items bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.
  • Allalouf, A., Hambleton, R. K. y Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185- 198.
  • American Psychological Association, American Educational Research Association y National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
  • Bolt, D.M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113-141.
  • Camilli, G. y Shepard, L. A. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
  • Dorans, N. J., y Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
  • Ferne, T. y Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4, 113-148.
  • Fidalgo, A.M. (1994). MHDIF – A computer-program for detecting uniform and nonuniform differential item functioning with the Mantel-Haenszel procedure. Applied Psychological Measurement, 18(3), 300-300.
  • Fidalgo, A. M. (1996). Funcionamiento diferencial de los ítems. En J. Muñiz ( Coord.), Psicometría (pp. 370- 455), Madrid: Universitas.
  • French, B.F. y Maller, S.J. (2007). Iterative purification and effect size use with Logistic Regression for Differential Item Functioning Detection. Educational and Psychological Measurement, 67, 373-393.
  • Gelin, M.N. y Zumbo, B.D. (2007). Operating characteristics of the DIF MIMIC approach using Jöreskog’s covariance matrix with ML and WLS estimation for short scales. Journal of Modern Applied Statistical Methods, 6, 573-588.
  • Gierl, M. J., y Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164-187.
  • Gómez-Benito, J., e Hidalgo, M.D. (1997). Evaluación del funcionamiento diferencial en ítems dicotómicos: Una revisión metodológica. Anuario de Psicología, 74(3), 3-32.
  • Gómez-Benito, J. e Hidalgo, M.D. (2007). Comparación de varios índices del tamaño del efecto en regresión logística: Una aplicación en la detección del DIF. Comunicación presentada en el X Congreso de Metodología de las Ciencias Sociales y de la Salud, Barcelona, 6-9 febrero.
  • Gómez-Benito, J., Hidalgo, M. D., Padilla, J. L., y González, A. (2005). Desarrollo informático para la utilización de la regresión logística como técnica de detección del DIF. Demostración informática presentada al IX Congreso de Metodología de las Ciencias Sociales y de la Salud, Granada, España.
  • Gómez-Benito, J., y Navas, M.J. (1996). Detección del funcionamiento diferencial del ítem: Purificación paso a paso de la habilidad. Psicológica, 17, 397-411.
  • González, A., Padilla, J.L, Hidalgo, M.D., Gómez-Benito, J. y Benítez, I. (2009) EASY-DIF: Software for analysing differential item functioning using the Mantel- Haenszel and standardization procedures. Applied Psychological Measurement. (Enviado para su publicación).
  • Hambleton, R.K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11 Suppl. 3), S182-S188.
  • Hambleton, R.K., y Rogers, H.J. (1995). Item bias review (EDO-TM-95-9). Washington, DC: Clearinghouse on Assessment and Evaluation.
  • Hessen, D.J. (2003). Differential item functioning: Types of DIF and observed score based detection methods. Dissertation (supervisors: G.J. Mellenbergh & K. Sijtsma). Amsterdam: University of Amsterdam.
  • Hidalgo, M. D., y Gómez-Benito, J. (1999). Técnicas de detección del funcionamiento diferencial en ítems politómicos. Metodología de las Ciencias del Comportamiento, 1(1), 39-60.
  • Hidalgo, M. D., y Gómez-Benito, J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1-11.
  • Hidalgo, M. D., y Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (3rd edition). USA: Elsevier Science & Technology.
  • Hidalgo, M.D., Gómez-Benito, J. y Zumbo, B.D. (2008). Efficacy of R-square and Odds-Ratio effect size using Dicriminant Logistic Regression for detecting DIF in polytomous items. Paper presented at the 6th Conference of the International Test Commission, 14-16 July, Liverpool, UK.
  • Hidalgo, M. D., y López-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(4), 903-915.
  • Holland, P., y Thayer, D. (1988). Differential item performance and the Mantel-Haenszel procedure. En H. Wainer y H. I. Braun (Eds.), Test Validity (pp.129- 145). Hillsdale, NJ: LEA.
  • Jensen, A.R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39(1), 1-123.
  • Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
  • Jöreskog, K.G., y Sörbom, D. (2006). Lisrel 8 (version 8.8). Chicago, Illinois: Scientific Software International, Inc.
  • Mellenbergh, G. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118.
  • Messick, S. (1989). Validity. En R. Linn (Ed.). Educational measurement (3rd edition, pp. 13-104). Washington, DC: American Council on Education.
  • Monahan, P.O., McHorney, C.A., Stump, T.E. y Perkins, A.J. (2007). Odds-ratio, Delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Behavioral Statistics, 32, 1, 92-109.
  • Muñiz, J. (2010). Las teorías de los tests: Teoría Clásica y Teoría de Respuesta a los Ítems. Papeles del Psicólogo, 31(1), 57-66.
  • Muñiz, J., y Hambleton, R.K. (1996). Directrices para la traduccion y adaptacion de los tests. Papeles del Psicólogo, 66.
  • Muñiz, J., Hambleton, R. K., y Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135.
  • Muthén, L.K., y Muthén, B.O. (1998, 2007). MPLUS statistical analysis with latent variables. User’s Guide. Los Angeles, CA: Muthén and Muthén.
  • Navas-Ara, M. J. y Gómez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment, 18(1), 9-15.
  • Oshima, T. C, Raju, N. S. y Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of item and tests (DFIT) framework. Journal of Educational Measurement, 43, 1-17.
  • Osterlind, S. J., y Everson, H. T. (2009). Differential item functioning (2nd edition). Thousand Oaks, California: Sage Publications, Inc.
  • Penfield, R. D. (2005). DIFAS: Differential Item Functioning Analysis System. Applied Psychological Measurement, 29(2), 150-151.
  • Penfield, R. D., y Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5-15.
  • Potenza, M., y Dorans, N. (1995). DIF assessment for politomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
  • Prieto, G. y Delgado, A. (2010). Fiabilidad y validez. Papeles del Psicólogo, 31(1), 67-74.
  • Ramsay, J. O. (2000). TestGraph: A program for the graphical analysis of multiple choice and test questionnaire. Unpublished manual.
  • Roussos, L. y Stout, W. (1996). A multidimensionalitybased DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371.
  • SPSS 15.0. (2009). SPSS Inc. 1989-2009.
  • Stout, W. y Roussos, L. (1999). Dimensionality-based DIF/DBF package [Computer Program]. William Stout Institute for Measurement. University of Illinois.
  • Swaminathan, H. y Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
  • Thissen, D. (2001). IRTLRDIF v2.0b. Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio Test for Differential Item Functioning. Available on Dave Thissen’s web page.
  • van de Vijver, F., y Leung, K. (1997). Methods and data analysis for cross-cultural research. London: Sage Publications.
  • Waller, N. G. (1998a). EZDIF: Detection of uniform and nonuniform differential item functioning with the Mantel-Haenszel and Logistic Regression procedures. Applied Psychological Measurement, 22, 391.
  • Waller, N.G. (1998b). LINKDIF: Linking item parameters and calculating IRT measures of Differential Item Functioning of Items and Tests. Applied Psychological Measurement, 22, 392.
  • Wang, W.-C, Shih, C.-L. y Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential i tem functioning. Educational and Psychological Measurement, 69, 713-731.
  • Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
  • Zumbo, B. D. y Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological / community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.
  • Zumbo, B. D., y Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Working paper of the Edgeworth Laboratory for Quantitative Behavioral Science, University of Northern British Columbia: Prince George, B.C.