Wednesday, July 30, 2014

Writing good multiple choice questions

Lots of us use MCQs in homeworks, exams, quizzes, etc., and there's a wealth of info on writing good ones.  Beyond the obvious things like "make all the distractors plausible", here are some tips I've distilled from various sources, which are listed at the end according to the numbered citations.

Review of assessment terminology

  • Bloom's taxonomy orders cognition levels from "lower" to "higher": knowledge (recall/memory), comprehension, application, analysis, synthesis, evaluation.  Challenge is to test the higher-level skills using multiple choice questions.

  • Reliability: the extent to which a learner's answer to a question reflects her true knowledge. Guessing and slipping can thwart it.

  • Discrimination: how well a question separates learners who really understand it from those who don't.

  • Difficulty: the median level of mastery above which students are likely to get the question right.  Difficulty and discrimination are two of the parameters that can be measured using item response theory.

  • Transfer: the extent to which successful performance on an assessment will allow valid generalizations about achievements to be made.

Checklist: the stem (base of question)

  • Write a stem that is specific to the question: this immediately focuses the question on a specific learning outcome.
    BEFORE: Which of the following statements is true?  [various statements about unit tests]
    AFTER: Which characteristic is most commonly observed in unit tests?  [rephrase choices to focus on characteristics of unit tests]

  • Don't put "fill-in blanks" in the stem—it increases student cognitive load without testing their cognition any better.
    BEFORE:  Mocks and ____ allow you to isolate behaviors in unit tests.
    AFTER: Besides mocks, what other mechanism allows you to isolate behaviors in unit tests?

Checklist: answer & distractors

  • Every answer should form a grammatically correct sentence when appended to the stem (pronoun agreement, etc.)

  • Keep items similar in length, complexity, formality, tone, etc., and avoid re-using exact wording from lecture/textbook/notes.  Otherwise student may pick the "most textbook-like" answer, the "most nuanced" answer, the longest answer, etc.

  • Either truly randomize the order of the answers, or use a deterministic rule such as alphabetical order.

  • Avoid questions where students could get right answer for wrong reason (even if not guessing): 

    • "all of the above" (students who can identify >1 correct answer can choose it even if don't understand why all answers correct)

    • "none of the above" (better, but may be chosen even if misconception of why true)

    • true/false questions (you won't know if they understand why it's true or false, plus can guess)

    • negative questions (unless learning outcome specifically requires it, i.e., being able to indicate a non-example of something; otherwise students may be able to identify an incorrect answer without knowing the correct answer).

  • Avoid complex combinations of items ("(a) and (b) only", "all of (a),(b),(c)" etc): a sophisticated test-taker can use partial knowledge to guess correct answer. (Also, students hate this kind of question since they may get no partial credit for knowing part of the answer.)
    Possible alternative: "Select all that apply" of N choices, and get 1/N of credit for correctly determining whether each choice is checked or not.

How many choices per question?  TL;DR:  Three.

The metric of interest is the number of total choices on the exam, i.e. 30 questions of 4 choices creates a comparable cognitive load to 40 questions of 3 choices, so the trade-off is really one of longer tests vs. more choices per question.

A meta-analysis of 80 years of MCQ research [] reveals both theoretical and empirical evidence for 3 total choices per question:

  • Theoretical: Frederic Lord, one of the architects of Item Response Theory, showed statistically that longer tests with fewer choices per question "increases exam efficiency for high-level examinees, and decreases it for low-level examinees".  Tversky later showed that three choices per question  maximizes the information obtained per time unit regarding students' ability.

  • Empirical: You’d think more distractors would thwart guessing, but on existing standardized and high-quality career tests, only 16% of 4-option items had 4 effective choices (ie,  all plausible enough to be chosen a nontrivial fraction of the time) and only 5% of 5-option questions had 5 functional items.

  • Caveat: the meta-analyses assume that exactly one of the choices per question is correct, and that the learner gets a single attempt to answer each question.

Types of questions that test higher levels of cognition

  • Memory + Application: instead of "Ricardo's Principle of Comparative Advantage states that…" (memory), you can ask "Which of the following is an example of applying Ricardo's Principle of Comparative Advantage?" and give N scenarios, exactly one of which illustrates applying the principle.

  • Premise-Consequence: If X happens, then which of the following will happen?

  • Analogy: X is to Y as  W is to which of the following?

  • Case study: a background paragraph serves as the setting for a series of questions that require the student to analyze the scenario from various angles.

  • Incomplete scenario: show a diagram, taxonomy, architecture, etc. similar but not identical to what's been seen in lecture/readings.  Ask students to fill in blanks, or ask questions about what makes it different from the version seen in lecture.

  • Evaluation: present both a question and a proposed answer, eg a set of design constraints and a proposed design.  Provide a rubric according to which students must indicate whether the proposed answer is correct, complete, etc.

  • Inference/higher-level reasoning: present a scenario, then ask which of n statements can reasonably be said to follow from the scenario.

Students' rules of thumb for guessing on multiple-choice tests from []  (and ways to thwart them)

  1. Pick longest or most scientific-sounding answer (make all choices comparably long and use comparable prose)

  2. Pick 'b' or 'c' (randomize or use deterministic order)

  3. Avoid choices containing 'always' or 'never' (don't use those words)

  4. If two choices are opposites, one of them is probably the answer (include 2 choices that are opposites and are both distractors)

  5. Pick keywords/phrases that were related to this topic (include keywords/phrases in distractors)

  6. True/False questions are more often true than false, since instructors tend to emphasize true things. (use both forms of a question, or avoid T/F questions)

Useful open-source tools to help prepare & grade multiple-choice exams

  • RuQL is an open-source tool I made that lets you write questions and create tests in a variety of formats (printed, HTML, edX interactive quiz, etc.).  Some command-line skillz required.

  • AutoQCM lets you generate printable answer-bubble sheets that can be scanned on a high-speed scanner and graded using open source software.  RuQL can generate answer sheets and grading keys for AutoQCM.


  1. , Dr. Timothy Bothell, BYU Faculty Center

  2. , U of Oregon Teaching Effectiveness Program

  3. ,Cynthia J. Brame, Assistant Director, Vanderbilt University Center for Teaching

  4. Using multiple-choice questions effectively in information technology education, Karyn Woodford and Peter Bancroft, Queensland U. of Tech.

  5. , Ben Clay, Kansas Curriculum Center.

  6. Three Options Are Optimal for Multiple-Choice Items: A Meta-Analysis of 80 Years of Research, Michael C. Rodriguez, University of Minnesota. Educational Measurement: Issues and Practice, Summer 2005 issue.


No comments:

Post a Comment

Comments are disabled because the only commenters are spammers, despite Google's best efforts. But I welcome actual comments: Google my name and you can easily direct an email to me, and I'll publish your comment here.

Note: Only a member of this blog may post a comment.