Tests are inhuman – and that is what is so good about them

Posted on 11-10-2015

One of the frequent complaints about tests is that they are a bit dehumanising. Every pupil is herded into an exam hall, there to answer exactly the same questions. The questions they answer are often rather artificial ones, stripped from real-world contexts and on occasions placed in formats, such as multiple choice, that they will be unlikely to encounter outside of an exam hall. If they feel ill, or if they had an argument with their parents the night before, then no special allowances can be made.

In contrast with this, assessment based on teacher judgment seems not just nicer, but much fairer. Exams offer a highly artificial snapshot of a pupil’s grasp of atomised knowledge at just one moment in time. Teachers have knowledge of a pupil that spans months, maybe years, and takes into account the pupils’ performance on a range of different tasks and topics. Teacher assessment also doesn’t necessitate the one-off do-or-die pressure of the exam hall. So it’s clear why many people see teacher assessment as an altogether more sensible arrangement than exams: not just nicer on the pupil, but fairer too.

However, we run up against the problem that whilst teacher assessment appears on the surface to be much nicer and fairer than exams, what we find when we look at the research is that in some very important ways, teacher assessment is not only less fair, it also produces outcomes that are not that nice either. Specifically, teacher assessment has a huge problem with bias. You can watch this video by Rob Coe which sums up a lot of this research. A screenshot from it is below.

Coe screenshot

Here are some quotations from some key research papers backing up these points.

Both high and medium weight evidence indicated the following: there is bias in teachers’ assessment (TA) relating to student characteristics, including behaviour (for young children), gender and special educational needs; overall academic achievement and verbal ability may influence judgement when assessing specific skills. (Harlen, 2004)

Studies of the National Curriculum Assessment (NCA) for students aged 6 and 7 in England and Wales in the early 1990s, found considerable error and evidence of bias in relation to different groups of students (Shorrocks et al., 1993; Thomas et al., 1998). (Ibid)

It is argued that pupils are subjected to too many written tests, and that some should be replaced by teacher assessments… The results here suggest that might be severely detrimental to the recorded achievements of children from poor families, and for children from some ethnic minorities. (Burgess and Greaves, 2009)

Teachers tended to perceive low-income children as less able than their higher income peers with equivalent scores on cognitive assessments. (Campbell 2015)

In short, as we can see, teacher assessment is biased against disadvantaged pupils. This bias isn’t conscious, and it’s a feature of the kinds of flaws with all human judgment which people like Daniel Kahneman have written so much about. Tests, by contrast, avoid a lot of these flaws precisely because of all the dehumanising things about them which I outlined at the start. Every pupil is treated the same, they take the same questions in the same conditions at the same time, and it’s hard or even impossible to get special treatment. The wealthy pupil doesn’t get the chance to have their exam taken for them by their tutor. They don’t get the chance to redraft and redo it several times. Tests are also normally blind marked and structured and artificial questions like multiple-choice ones are easy to mark reliably and fairly.

As you can see, this research is very well-established, but it doesn’t seem to be very well-known. A couple of years ago, when there were discussions about reducing the amount of teacher assessment in national exams, many people actually asserted that reducing teacher assessment would penalise disadvantaged pupils. Here, for example, is Mary Bousted of the ATL:

“We have serious concerns that the new-style GCSE will not give all children the chance to demonstrate what they have learned and will particularly disadvantage children with difficult home lives.”

And Ian Toone of Voice:

“Three-hour exams test academically able pupils’ ability to recall and present information under test conditions, but for very many young people, including those with special needs, coursework and teacher assessment are a better measure of their knowledge and abilities.”

But as we’ve seen, there is good evidence that teacher assessment does not help SEN pupils and those from low-income backgrounds. So we are in an odd situation, whereby we have solid research evidence that disadvantaged pupils do better on tests than on teacher assessments, but the popular understanding is that exactly the reverse is true. Yet another example, I would argue, of the serious consequences of the lack of high quality training in assessment.

So whilst on the surface teacher assessment seems a more human and fairer form of assessment, in practice it is often less fair. And what we also find is that the very things about exams which makes them seem so inhuman are also the very things which help guarantee their fairness. If you want fairness, progress, equality and reliability, then human judgment may not be the best method.

Tests are inhuman – and that is what is so good about them

All my new writing is now on Substack!