El sesgo de los instrumentos de medición: Tests justos

Gómez Benito, Juana; Hidalgo Montesinos, María Dolores; Guilera Ferré, Georgina

El sesgo de los instrumentos de mediciónTests justos

Gómez Benito, Juana
Hidalgo Montesinos, María Dolores
Guilera Ferré, Georgina

Revista:

Papeles del psicólogo

ISSN: 0214-7823, 1886-1415

Año de publicación: 2010

Título del ejemplar: Metodología al servicio del psicólogo

Volumen: 31

Número: 1

Páginas: 75-84

Tipo: Artículo

DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Papeles del psicólogo

Resumen

Las evaluaciones psicológicas deben garantizar la equidad y validez de las interpretaciones y decisiones adoptadas a partir de las mismas. Para ello es necesario la utilización de instrumentos libres de sesgo, y capaces de evaluar necesidades personales y sociales de individuos con diferentes características. El estudio sobre el posible sesgo de los tests, o de parte de sus ítems, ha ocupado un lugar relevante en la investigación psicométrica de los últimos 30 años y es previsible que siga constituyendo un importante foco de interés para los profesionales e investigadores implicados en la evaluación mediante el uso de los tests. Este trabajo pretende abordar esta perspectiva ofreciendo al psicólogo aplicado unas directrices y un bagaje de conocimientos sobre los conceptos de sesgo, funcionamiento diferencial e impacto, los procedimientos de detección de ítems o tests sesgados y la evaluación de sus posibles causas para, en conjunto, mejorar la validez de las mediciones psicológicas.

Referencias bibliográficas

Ackerman, T.A. (1992). A didactic explanation of items bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.
Allalouf, A., Hambleton, R. K. y Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185- 198.
American Psychological Association, American Educational Research Association y National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Bolt, D.M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 113-141.
Camilli, G. y Shepard, L. A. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
Dorans, N. J., y Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale, NJ: Lawrence Erlbaum.
Ferne, T. y Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4, 113-148.
Fidalgo, A.M. (1994). MHDIF – A computer-program for detecting uniform and nonuniform differential item functioning with the Mantel-Haenszel procedure. Applied Psychological Measurement, 18(3), 300-300.
Fidalgo, A. M. (1996). Funcionamiento diferencial de los ítems. En J. Muñiz ( Coord.), Psicometría (pp. 370- 455), Madrid: Universitas.
French, B.F. y Maller, S.J. (2007). Iterative purification and effect size use with Logistic Regression for Differential Item Functioning Detection. Educational and Psychological Measurement, 67, 373-393.
Gelin, M.N. y Zumbo, B.D. (2007). Operating characteristics of the DIF MIMIC approach using Jöreskog’s covariance matrix with ML and WLS estimation for short scales. Journal of Modern Applied Statistical Methods, 6, 573-588.
Gierl, M. J., y Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164-187.
Gómez-Benito, J., e Hidalgo, M.D. (1997). Evaluación del funcionamiento diferencial en ítems dicotómicos: Una revisión metodológica. Anuario de Psicología, 74(3), 3-32.
Gómez-Benito, J. e Hidalgo, M.D. (2007). Comparación de varios índices del tamaño del efecto en regresión logística: Una aplicación en la detección del DIF. Comunicación presentada en el X Congreso de Metodología de las Ciencias Sociales y de la Salud, Barcelona, 6-9 febrero.
Gómez-Benito, J., Hidalgo, M. D., Padilla, J. L., y González, A. (2005). Desarrollo informático para la utilización de la regresión logística como técnica de detección del DIF. Demostración informática presentada al IX Congreso de Metodología de las Ciencias Sociales y de la Salud, Granada, España.
Gómez-Benito, J., y Navas, M.J. (1996). Detección del funcionamiento diferencial del ítem: Purificación paso a paso de la habilidad. Psicológica, 17, 397-411.
González, A., Padilla, J.L, Hidalgo, M.D., Gómez-Benito, J. y Benítez, I. (2009) EASY-DIF: Software for analysing differential item functioning using the Mantel- Haenszel and standardization procedures. Applied Psychological Measurement. (Enviado para su publicación).
Hambleton, R.K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11 Suppl. 3), S182-S188.
Hambleton, R.K., y Rogers, H.J. (1995). Item bias review (EDO-TM-95-9). Washington, DC: Clearinghouse on Assessment and Evaluation.
Hessen, D.J. (2003). Differential item functioning: Types of DIF and observed score based detection methods. Dissertation (supervisors: G.J. Mellenbergh & K. Sijtsma). Amsterdam: University of Amsterdam.
Hidalgo, M. D., y Gómez-Benito, J. (1999). Técnicas de detección del funcionamiento diferencial en ítems politómicos. Metodología de las Ciencias del Comportamiento, 1(1), 39-60.
Hidalgo, M. D., y Gómez-Benito, J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. European Journal of Psychological Assessment, 19(1), 1-11.
Hidalgo, M. D., y Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In P. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (3rd edition). USA: Elsevier Science & Technology.
Hidalgo, M.D., Gómez-Benito, J. y Zumbo, B.D. (2008). Efficacy of R-square and Odds-Ratio effect size using Dicriminant Logistic Regression for detecting DIF in polytomous items. Paper presented at the 6th Conference of the International Test Commission, 14-16 July, Liverpool, UK.
Hidalgo, M. D., y López-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(4), 903-915.
Holland, P., y Thayer, D. (1988). Differential item performance and the Mantel-Haenszel procedure. En H. Wainer y H. I. Braun (Eds.), Test Validity (pp.129- 145). Hillsdale, NJ: LEA.
Jensen, A.R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39(1), 1-123.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Jöreskog, K.G., y Sörbom, D. (2006). Lisrel 8 (version 8.8). Chicago, Illinois: Scientific Software International, Inc.
Mellenbergh, G. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118.
Messick, S. (1989). Validity. En R. Linn (Ed.). Educational measurement (3rd edition, pp. 13-104). Washington, DC: American Council on Education.
Monahan, P.O., McHorney, C.A., Stump, T.E. y Perkins, A.J. (2007). Odds-ratio, Delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Behavioral Statistics, 32, 1, 92-109.
Muñiz, J. (2010). Las teorías de los tests: Teoría Clásica y Teoría de Respuesta a los Ítems. Papeles del Psicólogo, 31(1), 57-66.
Muñiz, J., y Hambleton, R.K. (1996). Directrices para la traduccion y adaptacion de los tests. Papeles del Psicólogo, 66.
Muñiz, J., Hambleton, R. K., y Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135.
Muthén, L.K., y Muthén, B.O. (1998, 2007). MPLUS statistical analysis with latent variables. User’s Guide. Los Angeles, CA: Muthén and Muthén.
Navas-Ara, M. J. y Gómez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Journal of Psychological Assessment, 18(1), 9-15.
Oshima, T. C, Raju, N. S. y Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of item and tests (DFIT) framework. Journal of Educational Measurement, 43, 1-17.
Osterlind, S. J., y Everson, H. T. (2009). Differential item functioning (2nd edition). Thousand Oaks, California: Sage Publications, Inc.
Penfield, R. D. (2005). DIFAS: Differential Item Functioning Analysis System. Applied Psychological Measurement, 29(2), 150-151.
Penfield, R. D., y Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19, 5-15.
Potenza, M., y Dorans, N. (1995). DIF assessment for politomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
Prieto, G. y Delgado, A. (2010). Fiabilidad y validez. Papeles del Psicólogo, 31(1), 67-74.
Ramsay, J. O. (2000). TestGraph: A program for the graphical analysis of multiple choice and test questionnaire. Unpublished manual.
Roussos, L. y Stout, W. (1996). A multidimensionalitybased DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371.
SPSS 15.0. (2009). SPSS Inc. 1989-2009.
Stout, W. y Roussos, L. (1999). Dimensionality-based DIF/DBF package [Computer Program]. William Stout Institute for Measurement. University of Illinois.
Swaminathan, H. y Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
Thissen, D. (2001). IRTLRDIF v2.0b. Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio Test for Differential Item Functioning. Available on Dave Thissen’s web page.
van de Vijver, F., y Leung, K. (1997). Methods and data analysis for cross-cultural research. London: Sage Publications.
Waller, N. G. (1998a). EZDIF: Detection of uniform and nonuniform differential item functioning with the Mantel-Haenszel and Logistic Regression procedures. Applied Psychological Measurement, 22, 391.
Waller, N.G. (1998b). LINKDIF: Linking item parameters and calculating IRT measures of Differential Item Functioning of Items and Tests. Applied Psychological Measurement, 22, 392.
Wang, W.-C, Shih, C.-L. y Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential i tem functioning. Educational and Psychological Measurement, 69, 713-731.
Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modelling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
Zumbo, B. D. y Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological / community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.
Zumbo, B. D., y Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF. Working paper of the Edgeworth Laboratory for Quantitative Behavioral Science, University of Northern British Columbia: Prince George, B.C.

Fuente de los datos: Dialnet