Multiple choice questions, part two

In my previous blog post I gave an example of what I thought was an excellent multiple choice question, taken from the British Columbia leaving exam. It’s as follows:

15. How did the Soviet totalitarian system under Stalin differ from that of Hitler and Mussolini?
A. It built up armed forces.
B. It took away human rights.
C. It made trade unions illegal.
D. It abolished private land ownership.

In the comments on that post, Barry Naylor asked exactly why I thought that this question required higher order thinking. Here is why.

Firstly, the question itself is a good one. That would be the case if there were a short or long open question. It is asking pupils to compare and evaluate, and it requires a lot of knowledge about three different historical regimes – in fact, arguably it requires knowledge of even more types of political regimes, because in order to properly understand totalitarian regimes you need some idea of what totalitarianism isn’t.

Secondly – and here are the factors unique to the question as a multiple choice question – the distractors are carefully designed so that they home in on two important and frequent misconceptions.

Misconception one is that German, Italian and Russian totalitarianism were all the same. This is a very basic misconception, but one that I think is quite common. Popular culture is saturated with the images of these regimes and dictators, and most pupils easily understand that all three regimes were evil. But I think very many assume that they were all evil in precisely the same way. For the pupil who has made this misconception, all four statements will seem equally likely. They all seem like bad things, and these dictators all did bad things, so it will be hard for them to see that there was one bad thing more specific to one regime.

Misconception two is to do with the nature of communism and Soviet Russia. A pupil with a bit more sophistication will understand that Hitler and Mussolini were fascists, but that Stalin was a communist. They will understand that communism is based on collective ownership, and that communism is generally friendly to trade unionism – unlike fascism. They might therefore assume that, given that the Russian regime under Stalin was communist, the difference between the different regimes was option C.

Then, the correct answer, Option D, requires a relatively sophisticated understanding of a fairly complex idea – that of private property. There’s a famous (and I think probably apocryphal) story about Soviet bureaucrats visiting Britain after the fall of the Berlin Wall and asking their British counterparts how they set prices. The British showed them round Smithfield Market, the stock exchange and a supermarket. Then they ended up back in Whitehall, and the Soviets said ‘OK, that was lovely. Now where’s the office where you set the prices?’

The reverse problem is true with us. If you have grown up in a capitalist economy the very concept of private property is, paradoxically, quite hard to appreciate. I’ve had discussions about this point with smart 6th formers who just don’t get it. ‘What do you mean, you can’t own something? You’ve got to be able to own stuff! You can’t get rid of owning stuff!’

The problem in both cases is ideology – or the fish in water problem, as David Foster Wallace puts it. ‘There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”’

If you grow up in the modern West, private property is the water that surrounds you.  By asking about this concept, this question is testing whether you have properly understood one of the alien things about a different society.

So that’s why I think this question promotes higher order thinking. Of course, essays promote higher order thinking too. But essays are much less efficient. You can’t cover so many ares if you only assess using essays. In the time it takes to write one essay question you could probably answer 30-odd multiple choice questions of this type. Since I wrote this previous post, a colleague has told me that trainee doctors often have to sit multiple choice exams like this – along the lines of: patient 1 has problem x, y and z and interesting features a, b and c. What intervention would you recommend? Such questions are a highly efficient way of testing higher order thinking.

0 responses to “Multiple choice questions, part two”

  1. My problem with this question is that it relies on a linguistic “gotcha” to distinguish between C and D. You might completely understand the history, but if you misread the question (by flipping “how did x differ from y” to “how did y differ from x”) then C is the right answer. And you know what…? This is EXACTLY what I did when I first read the question. So I’d have failed this exam. (And never have been able to go on to get my MA in… er… history!)

    • You can always misread a question, no matter what type of question it is! And if you’d misread this question, you wouldn’t have failed the exam because there are another 22 of these questions, plus a source paper and an essay paper. Take a look at the whole paper – Of course, if you misread an essay question, you basically do fail the exam (or drop several grades).

      • Yes… I was being flippant! (Sorry!) But I think the point still stands – the question is deliberately obfuscated behind a linguistic fog; which requires you to think twice – once about what the question is ACTUALLY asking and once about the answer – and I feel uncomfortable about that.

        The question could equally be phrased as ‘Which of the following statements apply to Soviet totalitarianism and NOT to German and Italian fascism?’ which is much clearer… isn’t it? (Obviously the answers would have to be slightly re-phrased too.)

        I guess it depends whether this is an intelligence test as well as a history test. As an intelligence test, the phrasing is fine – it will catch out some people (like me) “wot is a bit fick, innit?”; and that would be the point. But if it is testing history, then it needs to test history, and the question should be phrased as clearly and directly as humanly possible. No…???

        (Put it this way: does my re-phrasing negate any of your arguments as to why it is a good question?) (Arguments with which I completely agree, by the way.)

  2. Barry Naylor says:

    I disagree that higher order thinking is necessarily indicated. I answered the question with the knowledge that the state took ownership of property in the USSR. I remembered that, I didn’t reason it.

    Your reply above would have made a good essay response and demonstrated higher order thinking.

    Implying such from an answer of ‘D’ is I think lacking.

    MCQs are efficient and easy to mark and hence their use in high stakes knowledge based tests in the US. Shifting to the assessment of higher order thinking with reliability and validity is not so easy as I think this question illustrates.

    Interesting discussion.

    • I think what you’re missing here, Barry is that nobody discounts that there are other ways to get a MCQ correct than to understand, and that understanding students can sometimes get them wrong. In this respect they are are similar to other kinds of questions one might ask.

      A test that places a lot of weight on a single, stand-alone MCQ (or in my case True/False question) would be highly suspect. What makes them valuable is the cumulative effect of a large number of such questions. Since these can be answered very briefly many questions can be asked in the same amount of time one might ask a single essay-style question, and one can cover, with pretty reliable CUMULATIVE effectiveness, the understanding of students over a great many points of understanding.

      As with my T/F questions the teacher reasonably assumes that there is a random element given which one should expect a positive (latent) score from straight guessing. The power of the Central Limit Theorem is that confidence in the resulting estimate of mastery of the subject increases dramatically with the number of questions asked. The result of a single question provides essentially no confidence. But a well-framed 100-question test would produce quantifiably reliable scores that do not depend on a teacher’s judgement or disposition at the time of marking. (Yes, there are other kinds of deficiencies that may occur in a MC test; let us not discount them!)

      To expand on the point about nonzero scores due to guessing and the power of averaging over many data points: A certain province here (Canada) with a reputation for being a world leader in maths education participated in the Grade 8 2011 TIMSS assessment. Their performance on a certain question raised eyebrows. In this question students were asked which, of four options, represented an appropriate procedure for finding the value of 1/3 – 1/4. The options included choices such as (1-1)/(4-3) (incorrect) and (4-3)/(4×3) (correct).

      World-class systems such as those in Asia scored around 70-85%, meaning that was the proportion of their students who answered correctly, out of the thousands in each system who participated. In this province, the score was 27.8%.

      The world average on this question (an extremely low standard, I might add!) was 37%.

      Given that it was a four-choice question anyone judging the outcome ought first to understand that the latent score is 25% — that would be what a system would receive on that question if all participating students guessed randomly. Again, the response of a surprising number of people in that school system was that it was not particularly alarming because it showed students were coming along as about 1/4 of them showed they could do the question. Uh … no. That’s the point about latency here: it is the baseline of complete lack of competence. 25% is the “zero mark” of this question, and the system tested only 2.8% above that mark. Considering that the confidence interval is around 3%, this is not statistically distinguishable from system-wide total incompetence on that skill. Further, it is very significantly below the world average and genuine high-performing systems.

      Now some in the school system responded with a shrug, saying “Hey — it’s only one question on a multiple choice test! It’s easy to make a mistake! Further, it’s only ONE question; the province did fine on many other questions” This is a serious misunderstanding of the meaning of that datum. Random fluctuations in a single student’s performance on such a question (as with student “brain farts” while answering a single MCQ on a longer test) can indeed explain an anomalous point of data. However it does not, and cannot, explain system-wide poor performance on that question. That is a virtual statistical impossibility, thanks to the Central Limit Theorem.

      From this single question, even taken in isolation from the rest of the assessment, we can have a great deal of certainty that a very real weakness of instruction in a key strand of fractional arithmetic is revealed in that province by this outcome.

      That is the power of many data points to increase precision of a measurements which, individually, might be unreliable. If each one of those data points (again speaking of a single student’s score on a MCQ test with many questions) tests for a point of understanding the test as a whole can be remarkably effective in testing for precisely that.

      Here is a link to a graphic I made on that question and another (even more telling!) from TIMSS 2011 for an article I wrote a couple of years ago for one of our national papers:

  3. […] my last posts on multiple choice questions (here and here), Kris Boulton and Joe Kirby have pointed me in the direction of Robert Bjork’s work on […]

  4. […] I’ve since written two more posts on this topic, here and […]

  5. In my university math courses I often give a series of 10 True/False questions as a warmup. These are generally excellent tests of understanding. A good true/false question tests commonly misunderstood ideas, and it is a great spot-check (and slap-on-the-hand) for students with small problems in thinking that hold them back. It provides a tiny penalty, and students generally take notice of which ones they get wrong, and why. A teacher who has mastered their subject and taught it enough times should have a large collection of things they know are typically misunderstood. I have two versions of each in my collection — one that is correct, and one that is wrong (and a million variants, so I could ask essentially the same question all day without actually repeating myself, were I so inclined).

    T/F is perhaps the most attenuated of M/C – type questions. It is truly minimalist. Those who think they can’t test understanding simply misunderstand them. They say “you could get 50% just by guessing!” Yes, and that is why 50% is the baseline. It is the “zero” against which you judge performance. Against that “zero” students could even perform “negatively”. But is that common? Well … it is. Among unprepared students — for they are most likely to have inadvertently picked up the “wrong version” of these common difficulties. But among students of any sort of mastery, it is not. Indeed, a good student should get very few wrong. I limit to this small number of such questions largely out of mercy — it is quite brutal upon students with conceptual difficulties to make too large a component of exams True/False; there is little leeway for fudging or gaming for partial marks without really having much mastery. Not that I want to encourage the latter, but one should allow some room for small credit where students know SOMETHING even when they haven’t got all the pieces in the puzzle. T/F alone is a bit to dull a knife to identify this.

    I have practiced this for many years now (going on 20) and have always observed the same strong correlation between performance on the T/F questions and the more comprehensive questions on an exam. They don’t correlate as well with the medium-level skill question. Thus (for a trivial example) a student who incorrectly marks True for “(x+y)^2 = x^2 + y^2” is generally far more likely to solve a related rates problem in calculus than one who marks False; however, there is less statistical distinction between those two students when it comes to asking for unsimplified calculations of derivatives of moderately interesting expressions like x sin (1+x).

    • Revisiting I see I messed up my final illustration in this comment. Here’s what was meant:

      “Thus (for a trivial example) a student who incorrectly marks True for “(x+y)^2 = x^2 + y^2” is generally far [LESS] likely to successfully solve a related rates problem in calculus than one who correctly marks False; however, there is less statistical distinction between those two students when it comes to asking for unsimplified calculations of derivatives of moderately interesting expressions like x sin (1+x).”

  6. Your story about the soviet bureaucrats in London reminds me of another, probably just as apocryphal, story about the first McDonalds that opened in Moscow. As related, the prospective cashiers were undergoing training by their new employers and during the lesson on demeanor during a sale one asked: “Why should we have to always smile? WE’RE the ones with the food!”

  7. I would have to disagree that this question does test higher order thinking. While it certainly may test higher order thinking if the learner takes time to think about it, it could equally just identify that the learner has wrote learnt a fact, or worse that the learner has guessed the correct answer.

    I take on board Robert Craigen’s point, however, that asking a large number of MCQ’s such as this would provide a higher level of test score reliability but even so this cannot tell us anything more than the learner got the correct answer or they didn’t. However, an open writing question could tell us much more about a learner’s understanding if we ask them to reason or justify their answer. This, of course, would then require a different question to be constructed to prompt for a more open response.

    There is also the issue of the time required to analyse each individual learner’s responses to MCQ’s so as to understand what they have and have not understood and the reasons why? This in itself then becomes a very time-consuming exercise and is open to human error.

    I think the problem lies in teachers understanding what they hope to achieve from the assessment in the first place. Do they jut want to know if the learner can answer the questions correctly or do they want to know what they understand in order to inform future teaching and learning? If the teacher wants to use the outcomes of the assessment to inform future learning I believe the time it would take to analyse the data from MCQ’s would have been better spent listening to or reading learners responses to more open questions.

    I would like to see teachers being trained in assessment design so that they understand when, how and why to sue different forms of assessment including the pro’s and con’s of the wealth of assessment instruments available.

Leave a Reply

Your email address will not be published. Required fields are marked *