Why is teacher assessment biased?
Posted on 01-11-2015
In my last post, I spoke about how disadvantaged pupils do better on tests than on teacher assessments – and also about how many people assume the opposite is the case. It’s interesting that today, we seem to think that teacher assessment will help the disadvantaged. In the late 19th and early 20th century, the meritocratic advantages of tests were better understood than they are today, as were the nepotism and prejudice that often resulted from other types of assessment. In the 19th century, the civil service reformers Charles Northcote and Stafford Trevelyan fought a long battle to make entry to the civil service based on test scores, rather than family connections. In the early 20th century, Labour educationalists such as RH Tawney and Beatrice and Sidney Webb fought for an education system based around exams, because they believed that only exams could ensure that underprivileged children were treated fairly. Since then, we have only gathered more evidence about the equalizing power of exams, but oddly, we seem to have forgotten these insights.
In my last post, I explained why it is that tests are fairer – they treat every pupil the same, and every pupil has to answer the same questions. However, whilst I gave plenty of evidence that teacher assessment was biased, I didn’t fully explain why this bias happens. As a result, quite a few people said that the solution was simply ‘better teacher assessment’, perhaps by introducing more moderation and CPD, as a group of teaching unions recommended doing in 2011. How realistic is that? Are the problems with teacher assessment really so insoluble that we have to resort to tests? And what exactly are the nature of these flaws?
Teacher assessment is biased not because it is carried out by teachers, but because it is carried out by humans. Tammy Campbell, the IoE researcher whose recent research showed bias in teacher assessments of 7-year-olds, is at pains to point this out. She says, ‘I want to stress that this isn’t something unique to teachers. It’s human nature. Humans use stereotypes as a cognitive shortcut and we’re all prone to it.’ A growing body of research reinforces Campbell’s point. We all have difficulties making certain complex judgments and decisions, and we resort to shortcuts when the mental strain becomes too great (see Daniel Kahneman’s work for more on this). Indeed, it is plausible to speculate that the reason why teacher assessment is biased is because it is so burdensome: when we are faced with difficult cognitive challenges we often default to stereotypes.
And teacher assessment really is burdensome. I thought I had it bad as a secondary teacher marking coursework for GCSE pupils, but I recently spoke to a primary colleague who told me about the hours they spend gathering evidence for KS1 assessments and cross-referencing it against the level descriptors – a task which, as I’ve said before, is at the limits of human cognitive capacity. When faced with such a difficult challenge, defaulting to stereotypes is in many ways a sensible attempt by our unconscious minds to reduce our workload. We know that on average pupils on free school meals do not attain as well, we know that the essay we are marking isn’t great, but it isn’t terrible, we know it sort of meets some of the criteria on the mark scheme, we need some more evidence in order to reach a final judgment, we could reread the essay and mark scheme but the mark scheme is hard to interpret…we also know that the pupil who wrote the essay is from the wrong side of the tracks. Done: it’s a below average essay. None of us want to admit that this is how our minds work, and for most of us, our minds don’t consciously work like this, but there is plenty of evidence that this is how our reasoning goes. That’s the nice story: here, Adrian Wooldridge gives the less charitable interpretation of the mental processes of assessors. He says that when you base assessment around ‘Oxford dons who pronounce mystically on whether a candidate possesses “a trace of alpha”’, don’t then be surprised when ‘a large number of those who show up favorably on the alpha detectors turn out to be Old Etonians.’
Is it possible to counter such bias in any way? Is it possible to ‘do better teacher assessment’? After all, whilst we humans are susceptible to bias, we are also self-aware. We know we make these errors, and in most of the fields where we make these errors, we have found ways around them. I once heard the scientific method described as a collection of practices designed to counteract human bias. Are there practices we can introduce to teacher assessment that would function like this, and counteract human bias? Yes, there are. We could anonymize the work that’s being marked. We could standardize tasks, conditions and marking protocols. We could carry out some statistical analysis of the marks different pupils get and that different teachers give. And once we had done all that, we would find that we had eliminated many of the biases associated with teacher assessment, but that we had also pretty much eliminated the teacher assessment and replaced it with a test. The flaws with teacher assessment are inherent in its very nature. Doing teacher assessment better basically means making it more test-like. The whole point of the test is that it, like the practices that characterize the scientific method, is essentially a method for countering human bias. As the Bew report says here, most of the attempts to reduce the bias of teacher assessment have failed, and those that have succeeded do so by making teacher assessment more test-like.
Teacher assessment discriminates against poorer pupils and minorities, and generates significant workload for teachers. Tests are fairer and less burdensome. They deserve a better reputation than they have, and a greater role in our assessment system.
Whilst I’m in favour of tests, and sceptical about the possibility of improving teacher assessment, I still think there are other ways we could improve assessment. In my next few posts I will look at some recent assessment innovations and see if they offer any improvements on the status quo.