Résumés
Résumé
Certains étudiants peuvent répondre au hasard ou être inattentifs dans une situation de testing. Plusieurs approches ont déjà été développées pour détecter ce type de réponse. Parmi celles-ci, l’utilisation d’indices de détection (person-fit indexes) de patrons de réponses inappropriés est l’approche qui est la plus étudiée et qui semble la plus prometteuse. Dans le cadre de cette étude, nous nous concentrons sur trois indices de détection populaires qui présentent des caractéristiques permettant d’en faciliter l’interprétation : lz, ZU et ZW. Des études antérieures ont montré que ces trois indices sont fortement affectés par le fait que l’habileté d’un étudiant est estimée plutôt que réelle. Snijders (2001) a proposé une version corrigée de l’indice lz (nommée lz*) afin de tenir compte de cette difficulté. Magis, Béland et Raîche (2014) ont déjà corrigé deux autres indices selon l’approche de Snijders : U* et W*. Il reste cependant à analyser plus en détail le comportement des indices corrigés lz*, U* et W* et des indices standardisés lz, ZU et ZW. Pour ce faire, nous effectuons deux études selon différentes valeurs de l’habileté, soit une analyse des erreurs de type I des indices (probabilité de se tromper en identifiant un patron de réponses inapproprié) et une analyse de leur puissance de détection. Ces analyses permettront de démontrer que ce sont généralement les indices corrigés lz* et W* qui sont les plus intéressants à utiliser puisque leurs scores suivent approximativement la loi normale et qu’ils permettent de bien détecter la réponse au hasard et l’inattention.
Mots-clés :
- théorie de la réponse à l’item,
- indice de détection de patrons de réponses inappropriés,
- réponse au hasard,
- inattention
Abstract
Some students may guess at random or be inattentive in a testing situation. Several approaches have been developed to detect these types of behavior. The use of person-fit index is the most studied approach and seems very promising. In this study, we focus on three popular indices which have many features to facilitate their interpretation: lz, ZU and ZW. Nevertheless, previous studies have shown that these three indices are strongly affected by the fact that the ability of a student is estimated rather than known. Snijders (2001) proposed a corrected version of the lz index (named lz*) to take account of this problem. Magis, Béland, and Raîche (2014) have already used the Snijders correction to create two person-fit indexes: U* and W*. It is now time to extend our understanding of the corrected indexes lz*, U*, and W*, and standardized indices lz, ZU, and ZW. To do this, we conduct two studies using different values of the student’s ability: an analysis of type I errors (probability of being wrong in identifying inappropriate response patterns), and an analysis of the power of detection of theses indexes. Our results show that the corrected indices lz* and W* are most interesting because their scores are approximately normally distributed and allow to adequately detect guessing at random and inattention response patterns.
Keywords:
- item response theory,
- person-fit index of response patterns,
- guessing at random,
- inattention
Resumo
Alguns alunos podem responder de forma aleatória ou desatenta numa situação de teste. Várias abordagens têm sido utilizadas para detetar este tipo de resposta. A utilização de índices de deteção (person-fit indexes) de padrões de respostas inapropriadas é a abordagem mais estudada e a que parece ser mais promissora. Neste estudo, concentramos-nos em três índices de deteção populares que apresentam características que permitem facilitar a interpretação: Iz, ZU e ZW. Estudos anteriores demonstraram que estes três índices são fortemente afetados pelo facto de que a habilidade de um aluno é mais estimada do que real. Snijders (2001) propôs uma versão corrigida do índice Iz (denominado Iz*) para ter em conta esta dificuldade. Magis, Béland e Raîche (2014) já corrigiram dois outros índices segundo a abordagem de Snijders: U* e W*. Resta, porém, analisar mais em detalhe o comportamento dos índices corrigidos Iz*, U* e W* e os índices padronizados lz, ZU e ZW. Para fazer isso, realizámos dois estudos usando diferentes valores de habilidade, seja uma análise dos erros do tipo I dos índices (probabilidade de estar errado na identificação de respostas inapropriadas) e uma análise do seu poder de deteção. Estas análises permitirão demonstrar que são geralmente os índices corrigidos lz* e W* que são os mais interessantes, uma vez que as suas pontuações seguem aproximadamente a lei normal e permitem detetar adequadamente a resposta ao acaso ou desatenta.
Palavras chaves:
- Teoria da resposta ao item,
- índice de deteção de padrões de respostas inadequadas,
- respostas ao acaso,
- desatenção
Parties annexes
Bibliographie
- Adams, R. L., & Wu, M. L. (2007). The mixed-coefficients multinomial logit model: A generalised form of the Rasch model. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 57-75), New York, NY: Springer.
- Al-Mahrazi, R. (2003). Investigating a new modification of the residual-based person fit index and its relationship with other indices in dichotomous item response theory (Unpublished doctoral dissertation). University of Iowa, Iowa City, IA.
- Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York, NY: Dekker.
- Bertrand, R. & Blais, J.-G. (2004). Modèle de mesure : l’apport de la théorie de la réponse aux items. Sainte-Foy, Québec : Presses de l’Université du Québec.
- De la Torre, J., & Deng, W. (2008). Improving person fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45, 159-177. doi: 10.1111/j.1745-3984.2008.00058.x
- Drasgow, F., Levine, M. V., & McLaughlin, M. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11, 59-79. doi: 10.1177/014662168701100105
- Drasgow, F., Levine, M. V., & Williams, E. A. (1985). Appropriateness measurement with polytomous item response models and standardized indices. British Journal of Mathematical and StatisticalPsychology, 38, 67-86. doi: 10.1111/j.2044-8317.1985.tb00817.x
- Emons, W. H. M., Glas, C. A. W., Meijer, R. R., & Sijtsma, K. (2003). Person fit in order-restricted latent class models. Applied Psychological Measurement, 27, 459-478. doi: 10.1177/0146621603259270
- Glas, C. A. W., & Meijer, R. R. (2003). A Bayesian approach to person-fit analysis in item response theory models. Applied Psychological Measurement, 26, 217-233. doi: 10.1177/0146621603027003003
- Hambleton, R. K., & Swaminathan, H. (1985). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications.
- Hendrawan, I., Glas, C. A. W., & Meijer, R. R. (2005). The effect of person misfit on classification decisions. Applied Psychological Measurement, 29, 26-44. doi: 10.1177/0146621604270902
- Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277-298. doi: 10.1207/S15324818AME1604_2
- Kogut, J. (1987). Detecting aberrant item response patterns in the Rasch model. Research report No. 87-3. Enschede, Netherlands: University of Twente.
- Levine, M. V., & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical and Statistical Psychology, 35, 42-56. doi: 10.1111/j.2044-8317.1982.tb00640.x
- Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4, 269-290. doi: 10.3102/10769986004004269
- Li, M. F., & Olejnik, S. (1997). The power of Rasch person-fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21, 215-231. doi: 10.1177/01466216970213002
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
- Magis, D., Béland, S., & Raîche, G. (2014). Snijders’s correction of Infit and Outfit indexes with estimated ability level: An analysis with the Rasch model. Journal of Applied Measurement, 15, 82-93.
- Magis, D., Raîche, G., & Béland, S. (2011). A didactic presentation of Snijders’ lz* index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 37, 57-81. doi: 10.3102/ 1076998610396894
- Meijer, R. R., Muijtjens, A. M. M., & van der Vleuten, C. P. M. (1996). Nonparametric person-fit research: Some theoretical issues and an empirical example. Applied Measurement in Education, 9, 77-89.
- Molenaar, I. W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75-106.
- Molenaar, I. W., & Hoijtink, H. (1996). Person-fit and the Rasch model, with an application to knowledge of logical quantors. Applied Measurement in Education, 9, 27-45.
- Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. Applied Psychological Measurement, 19, 121-129. doi: 10.1177/014662169501900201
- Nering, M. L. (1997). The distribution of indexes of person-fit within the computerized adaptive testing environment. Applied Psychological Measurement, 21, 115-127. doi: 10.1177/01466216970212002
- Noonan, B. W., Boss, M. W., & Gessaroli, M. E. (1992). The effect of test length and IRT model on the distribution and stability of three appropriateness indices. Applied Psychological Measurement, 16, 345-352.
- Raîche, G. (2002). Le dépistage du sous-classement aux tests de classement en anglais, langue seconde, au collégial. Gatineau, Québec : Collège de l’Outaouais.
- Raîche, G. (2014). irtProb: Utilities and probability distributions related to multidimensional person item response models. Retrieved from: https://cran.r-project.org/web/packages/irtProb/index.html
- Raîche, G. & Blais, J.-G. (2003). Efficacité du dépistage des étudiantes et des étudiants qui cherchent à obtenir un résultat faible au test de classement en anglais, langue seconde, au collégial. Dans J.-G. Blais et G. Raîche (dir.), Regards sur la modélisation de la mesure en éducation et en sciences sociales (pp. 73-90). Saint-Nicolas, Québec : Presses de l’Université Laval.
- Raîche, G., & Blais, J.-G. (July, 2005). Characterization of the distribution of the lz index of person fit according to the estimated proficiency level. Paper presented at the 70th annual convention of the Psychometric Society, Tilburg, Netherlands.
- Raîche, G., Magis, D., Blais, J.-G., & Brochu, P. (2012). Taking atypical response patterns into account. In M. Simon, K. Ercikan, & M. Rousseau (Eds.), Improving large scale assessment in education: Theory, issues and practice (pp. 238-259). New York, NY: Taylor & Francis.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago, IL: University of Chicago Press.
- Sijtsma, K., & Meijer, R. R. (2001). The person responses function as a tool in person-fit research. Psychometrika, 66, 191-208. Retrieved from https://www.psychometricsociety.org/sites/default/files/pdf/tocv66n2.pdf
- Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331-342. doi: 10.1007/BF02294437
- Tatsuoka, K. (1996). Use of generalized person-fit indexes, Zetas for statistical pattern classification. Applied Measurement in Education, 9, 65-76. doi: 10.1207/s15324818ame0901_6
- Tatsuoka, K., & Linn, R. L. (1983). Indices for detecting unusual patterns: Links between two general approaches and potential applications. Applied Psychological Measurement, 7, 81-96. doi: 10.1177/014662168300700111
- Van Krimpen-Stoop, E. M. L. A., & Meijer, R. R. (1999). The null distribution of person-fit statistics for conventional and adaptive tests. Applied Psychological Measurement, 23(4), 327-345. doi: 10.1177/01466219922031446
- Warm T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450. doi: 10.1007/BF02294627
- Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: MESA Press.
- Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.
- Zhang, B., & Walker, C. M. (2008). Impact of missing data on person model fit and person trait estimation. Applied Psychological Measurement, 32, 466-480.
- Zickar, M. J., & Dragow, F. (1996). Detecting faking on a personality instrument using appropriateness measurement. Applied Psychological Measurement, 20, 71-87. doi: 10.1177/014662169602000107