How useful are tests?

Posted on 08-12-2013

This is part one of my review of Daniel Koretz’s Measuring Up: What Educational Testing Really Tells Us. Part two, Validity and reliability, is here. Part three, Why teaching to the test is so bad, is here.

I’ve recently finished reading an excellent book about assessment by Daniel Koretz, a professor of education at Harvard University. It’s called Measuring Up: What Educational Testing Really Tells Us. It is very readable and clarifies a number of tricky issues about assessment. It’s yet another book which you would struggle to find on any English teacher training course but which I think provides answers to the questions a lot of teachers ask all the time.

As well as being interesting and useful for teachers, Koretz’s book should also be required reading for policymakers in the UK and US. Koretz is American and the book largely focuses on the American context. But it is startling to see just how similar the misuses of tests have been in both countries, and how similar the important issues are. Startling, and also quite depressing – policymakers and educationalists on both sides of the Atlantic have made almost exactly the same mistakes.

In the next few blog posts, I am going to summarise the main things I learnt from reading this book.

How useful are tests?

It is not difficult to find things that have gone wrong with the way tests are used in this country. Indeed, it might be harder to find things that are right about our assessment system.  Teaching to the test, focussing resources on C/D borderline pupils, constant resits, the pressure of league tables, the variation between different exam boards – those are just a few things that are wrong with our assessment system. Given the pernicious effect of many of the above developments, it is not hard to see why many people would want the abolition of league tables at the very least, and why some would call for the abolition of exams entirely.  Often, the debate seems to polarise around these two extremes.  In this debate, one side see any criticism of the exam system as an attempt to evade accountability, and the other side see any defence of exams as being opposed to the true aims of education.

Koretz makes a number of arguments that will be very uncomfortable for policymakers, and tells a few anecdotes about his meetings with education policymakers and the errors and false assumptions many of them make. He argues that tests have been misused, that high-stakes tests have created perverse incentives, and that some tests are inimical to the true aims of education. He denies the possibility of an optimal test, and rejects the idea that one measure can ever tell us all we need to know about education. But he is also equally firm that tests can provide us with useful information. He is critical of those who dismiss tests as devices for ‘creating winners and losers’, pointing out rather that tests merely reveal winners and losers.  In a sense, his is a kind of Nixon-to-China position – he is able to criticise the ways assessments have been used in such strong terms because his background in devising assessments means that no-one can doubt his commitment to them as providers of useful information. And indeed, Koretz does make it clear that ‘careful testing can in fact give us tremendously valuable information about student achievement that we would otherwise lack and it does rest on several generations of accumulated scientific research and development.’

Koretz himself uses the Nixon-to-China analogy to describe a fellow assessment expert, E.F. Lindquist. Lindquist devised many of the major American assessments, but also wrote a paper outlining the limitations such assessments had. So what are the strengths and limitations of tests? Tests can never directly measure what we want to measure. The ultimate aim of education is for our pupils to apply their skills in real life contexts long after their education has finished. But this is exceptionally difficult to measure. We can’t have individuals going around tracking adults seeing if they apply algebra in every day life or if they read 19th century novels before going to sleep at night. Even if you could, you would have great difficulty compiling a scale to measure this kind of performance. And of course if you waited this long to make an assessment then you wouldn’t be able to use the information from it to improve instruction. So an assessment never can be a direct measure of the aims of education. (An article here notes that this is an important difference between test scores and the types of measures often used in evaluating hospitals such as patient survival rates. ‘Patient survival is not an indicator of the desired end-state, it is the desired end-state for heart surgery.’ By contrast, test scores are only indicators of the desired end-state.) In Koretz’s words:

Test scores usually do not provide a direct and complete measure of educational achievement. Rather they are incomplete measures, proxies for the more comprehensive measures that we would ideally use but that are generally unavailable to us.

For E.F. Lindquist, this led to certain conclusions about the structures of tests. It meant 1) that tests should focus on measuring what pupils have learnt from the curriculum, as this is the clearest proxy for educational achievement. 2) The point of a test is to elicit the behaviour you want to measure. 3) You need to standardize the test so that it is the same for everyone. 4) Tests should isolate specific knowledge and skills, because if they don’t, you won’t know what it is that caused the failure or success on that test. 5) As a result of these limitations, we should not rely solely on test scores for information about a pupil’s achievement. Koretz suggests that these sources might include the kind of information colleges and universities require for admissions – teacher assessments, personal statements, persistence in extra-curricular activities, etc.

Koretz notes that some of these conclusions are controversial.  Point 4), that tests should isolate specific knowledge and skills, has certainly come under attack. A lot of modern tests don’t follow this principle, and instead aim to embed the use of knowledge and skills in more authentic, ‘real-world’ contexts. I’ll discuss this further in a later post.