Résumés
Résumé
L’épreuve d’expression orale du TEF s’effectue au moyen d’une entrevue entre un candidat et un examinateur-animateur. Or, le comportement de ce dernier peut représenter une menace à la fidélité du test. Il a été démontré que malgré les nombreuses mesures prises afin de minimiser les variabilités dans l’évaluation, des divergences sur plusieurs aspects pouvaient être présentes chez les examinateurs. Cette étude vise à observer si l’on trouve ces divergences chez les examinateurs du TEF. Ainsi, 10 participants ont pris part à la recherche et la technique de la pensée à voix haute a été utilisée. Les résultats révèlent que des divergences existent. Les examinateurs peuvent accorder une même note pour une même performance alors que leurs interprétations peuvent différer, et inversement. Certains peuvent être influencés de façon positive en raison de leur familiarité avec l’accent des candidats. D’autres peuvent faire des inférences non pertinentes pour attribuer des significations aux difficultés rencontrées par les candidats. Enfin, l’attitude de l’animateur lors de la conversation avec le candidat peut être perçue différemment et avoir une conséquence négative sur la note.
Mots-clés :
- tests de langue seconde,
- expression orale,
- examinateurs,
- divergences dans l’évaluation
Abstract
The TEF speaking test is conducted through an interview between a candidate and an interviewer-examiner, but the conduct of the latter may represent a possible threat to the reliability of the test. It has been shown that despite many measures taken to minimize variability in the assessment, discrepancies in several aspects may be present among examiners. This study aims to observe whether these discrepancies are found among TEF examiners. A total of 10 participants took part in the research and the thinking aloud technique was used. The results show that discrepancies are present. Examiners may award the same score for the same performance but with varying interpretations, and vice versa. Some may be positively influenced by their familiarity with the candidate’s accent. Others may make irrelevant inferences to make sense of the difficulties faced by candidates. Finally, the interviewer’s attitude when talking to the candidate may be perceived differently and have a negative impact on the score.
Keywords:
- second language tests,
- speaking test,
- examiners,
- discrepancies in assessment
Resumo
A prova de expressão oral TEF é realizada por meio de uma entrevista entre um candidato e um examinador-facilitador, mas o comportamento deste último pode representar uma possível ameaça à fidelidade do teste. Demonstrou-se que, apesar das muitas medidas tomadas para minimizar a variabilidade na avaliação, pode haver discrepâncias em vários aspetos entre os examinadores. Este estudo tem como objetivo observar se essas discrepâncias são encontradas entre os examinadores do TEF. A investigação contou com 10 participantes e foi utilizada a técnica de pensar em voz alta. Os resultados revelam que as discrepâncias estão presentes. Os examinadores podem dar a mesma nota para o mesmo desempenho, enquanto as suas interpretações podem diferir e vice-versa. Alguns podem ser influenciados positivamente devido à sua familiaridade com o sotaque do candidato. Outros podem fazer inferências irrelevantes para produzir significados para as dificuldades encontradas pelos candidatos. Por fim, a atitude do facilitador durante a conversa com o candidato pode ser percebida de forma diferente e impactar negativamente na pontuação.
Palavras chaves:
- testes de segunda língua,
- expressão oral,
- examinadores,
- discrepâncias na avaliação
Parties annexes
Liste de références
- American Educational Research Association, American Psychological Association et National Council on Measurement in Education (2014). Standards for educational and psychological testing. https://www.testingstandards.net/uploads/7/6/6/4/76643089/9780935302356.pdf
- Ang-Aw, H. T. & Goh, C. C. M. (2011). Understanding discrepancies in rater judgment on national-level oral examination tasks. RELC journal, 42(1), 31-51. https://doi-org.proxy3.library.mcgill.ca/10.1177/0033688210390226
- Bachman, L. F. (1990). Fundamental Considerations in Language Testing, Oxford University Press.
- Bachman, L. F., Lynch, B. & Mason, M. (1995). Investigating Variability in Tasks and Rater Judgements in a Performance Test of Foreign Language Speaking. Language Testing, 12(2), 239-257. https://doi.org/10.1177/026553229501200206
- Bachman, L. F. & Palmer, A. S. (2010). Language assessment in practice. Oxford University Press.
- Baddeley, A. (2012). Working memory : theories, models, and controversies. Annual review of psychology, 63, 1-29. https://doi.org/10.1146/annurev-psych-120710-100422
- Baddeley, A., Eysenck, M. W. & Anderson, M. C. (2009). Memory. Psychological Press.
- Bejar, I. I. (2012). Rater cognition : Implications for validity. Educational Measurement : Issues and Practice, 31(3), 2-9. https://doi.org/10.1111/j.1745-3992.2012.00238.x
- Brown, A. (1995). The Effect of Rater Variables in the Development of an Occupation Specific Language Performance Test. Language testing, 12(2), 1-15. https://doi.org/10.1177/026553229501200101
- Brown, A. (2000). An investigation of the rating process in the IELTS oral interview. IELTS Research Reports, 3(3), 49-84.
- Brown, A. (2003). Interviewer variation and the co-construction of speaking. Language testing, 20(1), 1-25. https://doi.org/10.1191/0265532203lt242oa
- Brown, A. (2005). Interviewer variability in oral proficiency interviews. Language testing andevaluation, 4, Peter Lang.
- Brown, A. (2006). An examination of the rating process in the revised IELTS Speaking Test. IELTSResearch Reports, 6. https://search.informit.org/doi/10.3316/informit.078722747791492
- Brown, A. & Hill, K. (1998). Interviewer Style and Candidate Performance in the IELTS Oral Interview. International English Language Testing System (IELTS) Research Reports, 1.
- Brunswik, E. (1952). The conceptual framework of psychology. University of Chicago Press.
- Cafarella, C. (1994). Assessor accommodation in the V.C.E. Italian oral test. Australian Review of Applied Linguistics, 20, 21-41. https://doi.org/10.1075/aral.20.1.02caf
- Carey, M. D., Mannel, R. H. & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201-219. https://doi.org/10.1177/0265532210393704
- Casanova, D. & Demeuse, M. (2011). Analyse des différentes facettes influant sur la fidélité de l’épreuve d’expression écrite d’un test de français langue étrangère. Mesure et évaluation en éducation, 34(1), 25-53. https://doi.org/10.7202/1024862ar
- Chapelle, C. A., Enright, M. A. & Jamieson, J. M. (2008). Building a validity argument for the test of English as a foreign language. Routledge. https://doi.org/10.4324/9780203937891
- Conseil de l’Europe (2001). Cadre européen commun de référence pour les langues : apprendre, enseigner, évaluer. Didier.
- Crisp, V. (2008). Exploring the nature of examiner thinking during the process of examination marking. Cambridge Journal of Education, 38, 247-264. https://doi.org/10.1080/03057640802063486
- Crisp, V. (2010). Towards a model of the judgement processes involved in examination marking. Oxford Review of Education, 36, 1-21. https://doi.org/10.1080/03054980903454181
- Dehn, M. J. (2008). Working memory and academic learning. John Wiley & Sons Inc.
- Diederich, P. B., French, J. W. & Carlton, S. T. (1961). Factors in judgments of writing ability (Research Bulletin No. RB-61–15). Educational Testing Service. http://dx.doi.org/10.1002/j.2333-8504.1961.tb00286.x
- Douglas, D. (1994). Quantity and quality in speaking test performance. Language Testing 11(2), 125–144. https://doi.org/10.1177/026553229401100203
- Douglas, D. & Selinker, L. (1992). Analysing oral proficiency test performance in general and specific purpose contexts. System 20, 317-328. https://doi.org/10.1016/0346-251X(92)90043-3
- Douglas, D. & Selinker, L. (1993). Performance on a general versus a field-specific test of speaking proficiency by international teaching assistants. Dans C. Chapelle et D. Douglas (dir.), A new decade of language testing research (p. 235-256). TESOL Publications.
- Ducasse, A. & Brown, A. (2009). Assessing paired orals : Raters’ orientation to interaction. Language Testing, 26(3), 423-443. https://doi.org/10.1177/0265532209104669
- Eckes, T. (2011). Introduction to many-facet Rasch measurement : Analyzing and evaluating rater-mediated assessments. Peter Lang.
- Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society, 53, 460-475, 644-663.
- Engelhard Jr., G. & Myford, C. (2003). Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program with a Many-Faceted Rasch Model (publication no 2003-1 ETS RR-03-01). College Entrance Examination Board. http://dx.doi.org/10.1002/j.2333-8504.2003.tb01893.x
- Fechner, G. T. (1897). Kollektivmasslehre. Wilhelm Engelmann.
- Filipi, A. (1994). Interaction in an Italian oral test : the role of some expansion sequences. Australian Review of Applied Linguistics, 11, 119-136. https://doi.org/10.1075/aralss.11.06fil
- Freedman, S. W. & Calfee, R. C. (1983). Holistic assessment of writing : Experimental design and cognitive theory. Dans P. Mosenthal, L. Tamor et S. A. Walmsley (dir.), Research on writing : principles and methods (p. 75-98). Longman.
- Fuess, C. M. (1950). The College Board, its first fifty years. Columbia University Press.
- Gagné, E. D., Yekovich, C. W. & Yekovich, F. R. (1993). The Cognitive Psychology of School Learning. Harper Collins College Publishers.
- Galaczi, E. D., Lim, G. & Khabbazbashi, N. (2012, 1er novembre). Descriptor salience and clarity in rating scale development and evaluation [communication orale]. The Language Testing Forum. University of Bristol, Bristol.
- Goasdoué, R. & Vantourout, M. (2016). Évaluations scolaires et étude du jugement des enseignants : pour une docimologie cognitive. Dans P. Detroz, M. Crahay & A. Fagnant (dir.), L’évaluation à la lumière des contextes et des disciplines (p. 141-168). DeBoeck Supérieur.
- Gui, M. (2012). Exploring differences between Chinese and American EFL teachers’ evaluations of speech performance. Language Assessment Quarterly, 9, 186-203. https://doi.org/10.1080/15434303.2011.614030
- Han, Q. (2016). Rater Cognition in L2 Speaking Assessment : A Review of the Literature. Teachers College,Columbia University Working Papers in TESOL & Applied Linguistics, 16(1), 1-24. https://doi.org/10.7916/D82R53MF
- Hsieh, C. N. (2011). Rater effects in ITA testing : ESL teachers’ versus American undergraduates’ judgments of accentedness, comprehensibility, and oral proficiency. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 9, 47-74.
- Huang, B., Alegre, A. & Eisenberg, A. (2016). A cross-linguistic investigation of the effect of raters’ accent familiarity on speaking assessment. Language Assessment Quarterly, 13(1), 25-41. https://doi.org/10.1080/15434303.2015.1134540
- Huang, B. & Jun, S. A. (2015). Age Matters, And So May Raters : Rater Differences in the Assessment of Foreign Accents. Studies in Second Language Acquisition, 37(4), 623-650.
- Joe, J. N., Harmes, J. C. & Hickerson, C. A. (2011). Using verbal reports to explore rater perceptual processes in scoring : A mixed methods application to oral communication assessment. Assessment in Education : Principles, Policy and Practice, 18, 239-258. https://doi.org/10.1080/0969594X.2011.577408
- Jörgensen, C. (2003). Image retrieval : Theory and research. Scarecrow Press.
- Kane, M. T. (2006). Validation. Dans R. L. Brennan (dir.), Educational measurement (4e éd., p. 17-64). Praeger Publishers.
- Kim, Y. H. (2009). An investigation into native and non-native teachers’ judgments of oral English performance : A mixed methods approach. Language Testing, 26(2), 187-217. https://doi.org/10.1177/0265532208101010
- Lazaraton, A. (1996a). Interlocutor support in oral proficiency interviews : the case of CASE. Language Testing, 13, 151-72. https://doi/10.1177/026553229601300202
- Lazaraton, A. (1996b). A qualitative approach to monitoring examiner conduct in the Cambridge assessment of spoken English (CASE). Dans M. Milanovic et N. Saville (dir.), Performance testing, cognition and assessment : selected papers from the 15th Language Testing Research Colloquium (p. 18-33). Cambridge University Press.
- Lazaraton, A. & Saville, N. (1994). Processes and outcomes in oral assessment [communication orale]. 16th Language Testing Research Colloquium, Washington DC.
- Lumley, T. & McNamara, T. F. (1995). Rater characteristics and rater bias : Implications for training. Language testing, 12(1), 54-71. https://doi.org/10.1177/026553229501200104
- Lumley, T. & McNamara, T. F. (1997). The Effect of Interlocutor and Assessment Mode Variables in Overseas Assessments of Speaking Skills in Occupational Settings. Language Testing, 14(2), 140-56. https://doi.org/10.1177/026553229701400202
- McNamara, T. F. (1996). Measuring second language performance. Longman.
- Morton, J., Wigglesworth, G. & Williams, D. (1997). Approaches to the evaluation of interviewer performance in oral interaction tests. Dans G. Brindley et G. Wigglesworth (dir.), Access : issues in English language test design and delivery (p. 175-196). National Centre for English Language Teaching and Research.
- Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement : Part 1. Journal of Applied Measurement, 4(4), 386-422.
- Nakatsuhara, F., Inoue, C., Berry, V. & Galaczi, E. (2017). Exploring the Use of Video-Conferencing Technology in the Assessment of Spoken Language : A Mixed-Methods Study. Language Assessment Quarterly, 14(1), 1-18. https://doi.org/10.1080/15434303.2016.1263637
- Norman, W. T. & Goldberg, L. R. (1966). Raters, ratees, and randomness in personality structure. Journal ofPersonality and Social Psychology, 4(6), 681-691. https://doi.org/10.1037/h0024002
- Orr, M. (2002). The FCE speaking test : using rater reports to help interpret test scores. System, 30, 143-154. https://doi.org/10.1016/S0346-251X(02)00002-7
- O’Sullivan, B. & Weir, C. J. (2011). Test development and validation. In Language testing : theories and practices. Palgrave Advances in linguistics, 13-32.
- Pollit, A. & Murray, N. L. (1996). What raters really pay attention to? Dans M. Milanovic et N. Saville (dir.), Performance testing, cognition and assessment : Selected papers from the 15th language research testing colloquium (vol. 3, p. 74-91). Cambridge University Press.
- Provalis Research (2021). QDA Miner (version 5.0) [logiciel] https://provalisresearch.com/fr/produits/
- Purpura, J. E. (2012). What is the role of strategic competence in a processing account of L2 learning or use? [communication orale]. American Association for Applied Linguistics Conference, Boston, MA.
- Reed, D. J. & Cohen, A. D. (2001). Revisiting raters and ratings in oral language assessment. Studies in language testing. Experimenting with uncertainty, 11, 82-96.
- Reed, D. J. & Halleck, G. B. (1997). Probing above the ceiling in oral interviews : what’s up there? Dans V. Kohonen, A. Huhta, A., L. Kurki-Suonio. et S. Luoma (dir.), Current developments and alternatives in language assessment : proceedings of LTRC 96 (p. 225-38). University of Jyvaskyla and University of Tampere.
- Sanderson, P. J. (2001). Language and differentiation in examining at A level [Thèse de doctorat non publiée]. Université de Leeds.
- Saville, N. (2009). Language assessment in the management of international migration : A framework for considering the issues. Language Assessment Quarterly, 6(1), 17-29. https://doi.org/10.1080/15434300802606499
- Savoie-Zajc, L. (2011). La recherche qualitative/interprétative en éducation. Dans T. Karsenti et L. Savoie-Zajc (dir.). La recherche en éducation : Étapes et approches, 3e édition (p. 123-147). ERPI.
- Shohamy, E. (1983). The stability of oral proficiency assessment on the oral interview testing procedures. Language Learning, 33, 527-40. https://doi.org/10.1111/j.1467-1770.1983.tb00947.x
- Shohamy, E. (2007). The power of language tests, the power of the English language and the role of ELT. Dans J. Cummins et C. Davison (dir.), International handbook of English language teaching, 11 (p. 521-532). Springer.
- Spolsky, B. (1995). Measured words : The development of objective language testing. Oxford University Press.
- Taylor, L. & Galaczi, E. (2011). Scoring validity. Studies in language testing 30, Examining speaking. Research and practice in assessing second language speaking. Cambridge University Press, 171-233.
- Upshur, J. A. & Turner, C. E. (1999). Systematic effects in the rating of second language speaking ability : Test method and learner discourse. Language Testing, 16(1), 82-111. https://doi.org/10.1177/026553229901600105
- Wesolowski, B. C. (2016). Exploring rater cognition : A typology of raters in the context of music performance assessment. Psychology of Music. https://doi.org/10.1177/0305735616665004
- Winke, P., Gass, S. & Myford, C. (2011). The relationship between raters’ prior language study and the evaluation of foreign language speech samples. ETS Research Report Series, 2, i-67. http://dx.doi.org/10.1002/j.2333-8504.2011.tb02266.x
- Winke, P., Gass, S. & Myford, C. (2012). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30(2), 231-252. https://doi.org/10.1177/0265532212456968
- Wolfe, E. W. (1997). The relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing, 4, 83-106. https://doi.org/10.1016/S1075-2935(97)80006-2
- Zhang, Y. & Elder, C. (2011). Judgments of oral proficiency by non-native and native english speaking teacher raters : Competing or complementary constructs? Language Testing, 28(1), 31-50. https://doi.org/10.1177/0265532209360671