Abstracts
Résumé
Cet article vise à expliquer les principes méthodologiques sous-jacents à l’élaboration et à la mise en oeuvre des enquêtes internationales dans le domaine de l’éducation. Plus spécifiquement, ce sont les épreuves de rendement disciplinaires qui sont abordées et non pas les questionnaires contextuels. À cette fin, le développement du cadre de référence, le choix des tâches et des items, l’essai de terrain, le plan d’évaluation ainsi que la définition de la population de référence et le plan d’échantillonnage sont décrits. De plus, les aspects psychométriques de base sont abordés : modélisations issues de la théorie de la réponse à l’item utilisées ; méthodes d’estimation des paramètres et particularités propres à ces enquêtes ; qualité des mesures obtenues.
Mots-clés :
- Cadre d’évaluation,
- élaboration des items,
- population et échantillon,
- plan d’évaluation,
- théorie de la réponse à l’item,
- échantillonnage matriciel,
- méthodes d’estimation,
- qualité des mesures
Abstract
This paper explains methodological principles specific to the elaboration and application on international educational surveys. More specifically, performance test construction is considered but not contextual questionnaires. To this end, the development of the reference context, the choice of tasks and items, the population definition and the sample design are described. Also, psychometric aspects are explained: item response theory models; parameter estimation and specific characteristics of these surveys; measurement quality.
Keywords:
- Framework,
- test development,
- population definition and sampling,
- item response theory,
- matrix sampling,
- estimation methods,
- measurement quality
Resumo
Este artigo visa explicar os princípios metodológicos subjacentes à elaboração e à realização de inquéritos internacionais no domínio da educação. Mais especificamente, são abordadas as provas de rendimento disciplina e não os questionários contextuais. Neste sentido, são descritos os seguintes aspetos: o desenvolvimento do quadro de referência, a escolha das tarefas e dos itens, o plano de avaliação e a definição da população de referência e o plano de amostragem. Além disso, os aspetos psicométricos de base são abordados: modelizações da teoria de resposta ao item utilizada; métodos de estimativa de parâmetros e particularidades próprias destes inquéritos; qualidade da medição obtida.
Palavras chaves:
- Quadro de avaliação,
- elaboração de itens,
- população e amostra,
- plano de avaliação,
- teoria da resposta ao item,
- amostragem matricial,
- métodos de estimativa,
- qualidade da medição
Appendices
Références
- Adams, R., Berezner, A., & Jakubowski, M. (2010). Analysis of PISA 2006 preferred items ranking using the percent correct method. OECD Education Working Paper n° 46.
- Andersen, E. M. (1970). Asymptotic properties of conditional maximum likelihood estimators. Journal of the Royal Statistical Society, Series B, 32, 283-301.
- Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
- Bertrand, R., & Blais, J-G. (2004). Modèles de mesure. L'apport de la théorie des réponses aux items. Ste-Foy, Québec: Presses de l’Université du Québec.
- Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (éds), Statistical theories of mental test scores. Reading, MT: Addison-Wesley.
- Blais, J.-G. (dir.) (2008). Évaluation des apprentissages et technologies de l'information et de la communication. Enjeux, applications et modèles de mesure. Ste-Foy, QC: Presses de l’Université Laval.
- Blais, J.-G., Raîche, G., & Magis, D. (2009). La détection des patrons de réponses problématiques dans le contexte des tests informatisés. In J.-G. Blais (dir.), Évaluation des apprentissages et technologies de l'information et de la communication: enjeux, applications et modèles de mesure (p. 275-291). Ste-Foy, QC: Presses de l’Université Laval.
- Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443-459.
- Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (éds), Differential item functioning. Mahwah, NJ: LEA.
- du Toit, M. (2003). IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL: Scientific software international.
- Grisay, A. (2003). Translator procedures in OECD/PISA 2000 international assessments. Language Testing, 20, 228-240.
- Grisay, A., de Jong, J. H. A. L., Gebhardt, E., Berezner, A., & Halleux-Monseur, B. (2007). Translation Equivalence across PISA Countries. Journal of Applied Measurement, «(3), 249-266.
- Kennedy, C. A., Wilson, M. R., Draney, K., Tutunciyan, S., & Vorp, R. (2008). Construct Map v4.4.4.0. Quick start guide. Berkeley, CA: University of Berkeley.
- Lafontaine, D. (2004). From comprehension to literacy: thirty years of reading assessment. In J. Moskowitz & M. Stephens (éds), Comparing learning outcomes: international assessment and education policy, (p. 29-46). London: Routledge Falmer.
- Lafontaine, D., & Monseur, C. (2009). Gender Gap in Comparative Studies of Reading Comprehension: to what extent do the test characteristics make a difference? European Educational Research Journal. Special issue on PISA and gender; 8(1), 69-79.
- Lafontaine, D., & Simon, M. (2008). L’évaluation des systèmes éducatifs. Mesure et évaluation en éducation, 31(3), 95-125.
- Martin, M. O., Mullis, I. V. S., & Kennedy, A. M. (2007). PIRLS 2006 Technical Report. Lynch School of Education: Boston College.
- Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149174.
- Mislevy, R. J. (1991). Randomization-based inference about latent variables from complex samples. Psychometrika, 56(2), 177-196.
- Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133-161.
- Mislevy, R. J., & Sheehan, K. M. (1989). The role of collateral information about examinees in item parameter estimation. Psychometrika, 54(4), 661-679.
- Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71.
- Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159-176.
- Mullis, I. V. S., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 Assessment framework. Lynch School of Education: Boston College.
- Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., Arora, A., & Erberber, E. (2005). TIMSS 2007 Assessment framework. Chestnut Hill, MA: Boston College.
- OCDE (1999). Mesurer les compétences et les connaissances des élèves: un nouveau cadre d'évaluation. Paris: OCDE.
- OCDE (2005). PISA 2003 Technical Report. Paris: OCDE.
- OCDE (2009a). PISA 2006 Technical Report. Paris: OCDE.
- OCDE (2009b). Le cadre d'évaluation de PISA 2009: les compétences en compréhension de l'écrit, en mathématiques et sciences. Paris: OCDE.
- Raîche, G., & Magis, D. (2009). irtProb 1.0 - Utilities and probability distributions related to multidimensional person item response models (IRT). R package.
- Raîche, G., Magis, D., & Blais, J.-G. (2009, juillet). Multidimensional item response theory models integrating additional inattention, pseudo-guessing, and discrimination person parameters. Communication présentée au congrès annuel de la Psychometric Society, Durham, NH.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.
- Robitaille, D. F., Schmidt, W. H., Raizen, S., Mc Knight, C., Britton, E., & Nicol, C. (éds) (1993). Curriculum frameworks for mathematics and science. Vancouver: Pacific Educational Press.
- Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometric Monographs, No. 17.
- Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (éds), Handbook of item response theory (p. 85-100). Mahwah, NJ: LEA.
- Sinhary, S., Guo, Z., & von Davier, M. (2010). Assessing the fit of latent regression models. In IERI monograph series 3, Issues and methodologies in large-scale assessments, volume 3. Princeton, NJ: Educational testing service.
- Sinhary, S., & von Davier, M. (2005). Extension of the NAEP BGROUP program to higher dimensions. Princeton, NJ: Educational testing service.
- Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
- Wright, B. D., & Stone, M. (1979). Best test design: Rasch measurement. Chicago, IL: MESA Press.
- Wu, M. L. (1997). The development and application of a fit test for use with marginal maximum likelihood estimation and generalized item response models. Mémoire de maîtrise non publié, Université de Melbourne.
- Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). A CER Conquest. Generalised item response modelling software. Melbourne, Victoria, Australia: ACER Press.