“Best fit” is not the problem

I can remember taking part in marking moderation sessions using the Assessing Pupil Progress grids. We marked using ‘best fit’ judgments. At their worst, such ‘best fit’ judgments were really flawed. A pupil might produce a very inaccurate piece of writing that everyone agreed was a level 2 on Assessment Focus 6 – write with technical accuracy of syntax and punctuation in phrases, clauses and sentences. But then someone would point out how imaginative it was, and say that it deserved a much higher level for Assessment Focus 1 – write imaginative, interesting and thoughtful texts. Using a best fit judgment, therefore, a pupil would end up with the national average level even though they had produced work that had serious technical flaws. Another pupil might get the same level, but produce work that was much more accurate. Given this, it is easy to see why ‘best fit’ judgments have fallen out of favour. On the new primary interim writing frameworks, to get a certain grade you have to be ‘secure’ at all of the statements. So, we’ve moved from a best fit framework to a secure fit one. Another way of saying this is that we have moved from a ‘compensatory’ approach, where weaknesses in one area can be made up with strengths in another, to a ‘mastery’ approach, where pupils have to master everything to succeed.

The problem with the ‘secure’ or ‘mastery’ approach to assessment is that when it is combined with open tasks like extended writing it leads to tick box approaches to teaching. However good an essay is, if it doesn’t tick a particular box, it can’t get a particular grade. However bad an essay is, if it ticks all the boxes, it gets a top grade. It is much harder than you might think to construct the tick boxes so that good essays are never penalised, and that bad essays are never rewarded. I’ve written about this problem before, here. This approach penalises ambitious and original writers. For example, if a pupil knows that to achieve a certain grade they have to spell every word correctly and can’t misspell one, then the tactical thing to do is to only use very very basic words. Similarly with sentence structure, punctuation, grammar, etc. Thus, the pupil who writes a simple, boring but accurate story does better than the pupil who writes an interesting, thoughtful and ambitious story with a couple of errors. Teachers realise this is what is happening and adapt their teaching in response, focussing not just on the basics, but also, more damagingly, on actively eliminating anything that is even slightly more ambitious than the basics.

Regular readers might be surprised to hear me say this, since I have always made a point of the importance of accuracy, and of the importance of pupils learning to walk before they can run. I have also been very enthusiastic about mastery approaches to learning. So is this me changing my mind? No. I still think accuracy is extremely important, and that it enables creativity rather than stifling it. I still also think that mastery curriculums are the best type. My issue is not about the curriculum and teaching, but about assessment. Open tasks like extended writing are not the best way to assess accuracy or mastery. This is because, crucially, open tasks do not ask pupils to do the same thing in the same way. They introduce an element of discretion into the task. Pupil one might spell a lot of words wrong in her extended task, but that might be because she has attempted to use more difficult words. Pupil 2 might spell hardly any words wrong, but she may have used much easier words. Had you asked pupil 2 to spell the words that pupil 1 got wrong, she may not have been able to. So she isn’t a better speller, but she is credited as such. If you insist on marking open tasks in a secure fit way, this becomes a serious problem as it leads to huge distortions of both assessment and teaching. Essentially, what you are doing is giving the candidate discretion to respond to the task in different ways, but denying the marker similar discretion. Better spellers are marked as though they are weaker spellers, because they have essentially set themselves a higher standard thanks to having a more ambitious vocabulary. Lessons focus on how to avoid difficult words rather than use them, how to avoid the apostrophe rather than use it correctly. If the secure fit approach to marking open tasks really did reward accuracy, I would be in favour of it. But it doesn’t reward accuracy. It rewards gaming. Michael Tidd’s great article in the TES here shows exactly how this process works. James Bowen also has an excellent article here looking at some of these problems.

It is obviously important that pupils can spell correctly. But open tasks are not the best way of assessing this. The best and fairest way of checking that pupils can spell correctly is to give them a test on just that. If all pupils are asked to spell the same set of words in the same conditions, you can genuinely tell who the better spellers are, and the test will also have a positive impact on teaching and learning as the only way to do well on the test is to learn the spellings.

One final point is to note that the problems I outlined at the start about the flaws with ‘best fit’ judgments actually had less to do with ‘best fit’ and more to do with (drum roll) vague prose descriptors. The fundamental problem is getting any set of descriptors with whatever kind of ‘fit’ to adequately represent quality.

The prose descriptors allowed for pupils to be overmarked on particularly vague areas like AF1 – write imaginative texts –  when in actual fact they were probably not doing all that well on those areas. I don’t think there are hundreds of thousands pupils out there who write wonderfully imaginatively but can’t construct a sentence, or vice versa. It’s precisely because accuracy enables creativity that there aren’t millions of pupils out there struggling with the mechanics of writing but producing masterpieces nonetheless. I have made this point again and again with reference to evidence from cognitive psychology, but let me now give you a recent piece of assessment evidence that appears to point the same way. We have held quite a few comparative judgment sessions at Ark primary schools. You can read more about comparative judgment here, but essentially, it relies on best fit comparisons of pupils work, rather than ticks against criteria. You rely on groups of teachers making independent, almost instinctive judgments about what is the better piece of work. At the start of one of our comparative judgment sessions, one of the teachers said to me that he didn’t think we would get agreement at the end because we would all have different ideas of what good writing was. For him, good writing was all about creativity, and he was prepared to overlook flaws with technical accuracy in favour of really imaginative and creative writing. OK, I said, for today, I will judge as though the only thing that matters is technical accuracy. I will look solely for that, and disregard all else. At the end of the judging session, we both had a high level of agreement with the rest of the judging group.  This is of course just one small data point, but as I say, I think it goes to prove something which has been very well-evidenced in cognitive psychology. The high level of agreement between all teachers at this comparative judgment session (and on all the others we have run) also shows us that judging writing and even judging creativity are perhaps not as subjective as we might think. It is not the judgments themselves that are subjective, but the prose descriptors we have created to rationalise the judgments.

Similarly, if the problem with best fit judgments wasn’t actually the best fit, but the prose descriptors, then keeping the descriptors but moving to secure fit judgments won’t solve the fundamental problem. And again, we have some evidence that this is the case too. Michael Tidd has collected new writing framework results from hundreds of schools nationally. The results are, in Michael’s words, ‘erratic’. They don’t follow any kind of typical or expected pattern, and they don’t even correlate with schools’ previous results.  Whatever the precise reason for this, it is surely some evidence that introducing a secure fit model based on prose descriptors is not going to solve our concerns around the validity and reliability of writing judgments.

In conclusion

  • If you want to assess a specific and precise concept and ensure that pupils have learned it to mastery, test that concept itself in the most specific and precise way possible and mark for mastery – expect pupils to get 90% or 100% of questions correct.
  • If you want to assess performance on more open, real world tasks where pupils have significant discretion in how they respond to the task, you cannot mark it in a ‘secure fit’ or ‘mastery’ way without risking serious distortions of both assessment accuracy and teaching quality. You have to mark it in a ‘best fit’ way. If the pupil has discretion in how they respond, so should the marker.
  • Prose descriptors will be inaccurate and distort teaching whether they are used in a best fit or secure fit way. To avoid these inaccuracies and distortions, use something like comparative judgment which allows for performance on open tasks to be assessed in a best fit way without prose descriptors.

0 responses to ““Best fit” is not the problem”

  1. How true. Our present year 6 have gone hyphen-mad and as for semi colons; secondary school teachers are going to be grinding their teeth over their over inclusion for years. As the government wants an accountability measure of some sort for writing, why don’t they use the SPAG for the test. Then to ensure we don’t then give up on creative writing altogether, to some mammoth comparative judgement exercise whereby schools were required to rank writing and then submit pieced from say, 3 mid-ranking children. These are then CJ-ed either nationally – or if this is logistically impossible- in clusters (regionally? randomly?) Schools could then receive a mega-ranking, based on how their pieces fared, compared to others. Or maybe schools wouldn’t know until May what placed ranking they would be asked for- to prevent concentrating on just those kids. One year it might be children ranked coming a quarter of way down ranking, another year three quarters. Or maybe just use SPAG for accountability but submit writing for a national competition to celebrate great writing which could then be published? Just musing but there had to be a better way than the current crazy approach.

    • I think some kind of national distributed comparative judgment process is more than possible. It would keep teacher judgment at the heart of the writing assessment and be much more reliable and efficient. As you say, there has to be a better way!

      • teachwell says:

        The killer here Daisy is how to get children to use the correct spellings in their writing as opposed to just on a test. Because ultimately that is what we want. Thoughts?

        • I think that is ultimately a curriculum question, not an assessment one. The programme I really like for teaching writing is Expressive Writing – I think a similar approach to practising spelling would work, eg over learning and direct instruction.

          • teachwell says:

            Thanks – I think your right. I was thinking about it after I left the comment and thought about how I should have insisted on giving them the same spellings until they did spell them correctly. The culture of spelling tests and their gradual decline means that the practice hasn’t been refined or improved.

  2. fish64 says:

    Spot on – let’s ditch “can do” statements which lead to “adverb soup”!

  3. […] has come about as a result of the use of a secure fit approach to assessment. In her post ‘”Best fit” is not the problem’ Daisy Christodoulou outlines the problems with both best and secure fit assessment. She proposes […]

  4. John Hodgson says:

    A very interesting post. Researchers in the 1960s found that English teachers who marked essays holistically gave more reliable judgments than those who attempted to separate out ‘technical accuracy’ and oither aspects. It’s ironical that a return to ‘secure fit’ makes matters worse. As regards prose descriptors, these have never been much more than a post hoc rationalisation of teachers’ professional judgments.

  5. […] That’s as far as I want to go here, but you might want to read this very interesting blog on the matter of assessing writing using a “secure fit” approach of this kind in relation to the […]

  6. debrakidd says:

    I found this fascinating and completely agree with the issues you raise in terms of the problems with trying to map mastery assessment onto open tasks. I’m starting to move in favour of mastery skills being assessed via multiple choice questions and open tasks being completely removed from formal testing, but rather being evidenced in portfolio work which is peer teacher assessed. Time consuming, but probably ultimately more reliable.

  7. […] screams’ etc). As has been pointed out by my colleague in the HfL English team here and by Daisy Christodoulou in her blog, this “must have everything” approach weights the small and the significant in equal […]

  8. […] as head of education research at the Ark chain, clearly and persuasively argues of the limitations that marking according to pre-defined criteria involves (variously described by […]

Leave a Reply

Your email address will not be published. Required fields are marked *