eTest Question Analysis
Analysis of eTest Questions
An item analysis is the systematic evaluation of the effectiveness of each item of a test. An item analysis can yield statistics that provide useful information for improving the quality and accuracy of multiple choice and true/false questions. Some of these statistics are the following:
- The difficulty of the item.
- The discriminating power of the item.
- The effectiveness of each alternative.
The difficulty of the item: The percentage of students who correctly answered the item.
- Also referred to as the p-value.
- The range is from 0% to 100%, or more typically written as a proportion of 0.0 to 1.00.
- The higher the value, the easier the item.
- Calculation: Divide the number of students who got an item correct by the total number of students who answered it.
- Ideal value: Slightly higher than midway between chance (1.00 divided by the number of choices) and a perfect score (1.00) for the item. For example, on a four-alternative, multiple-choice item, the random guessing level is 1.00/4 = 0.25; therefore, the optimal difficulty level is .25 + (1.00 -.25) / 2 = 0.62. On a true/false question, the guessing level is (1.00/2 = .50) and, therefore, the optimal difficulty level is .50+(1.00-.50)/2 = .75.
- P-values above 0.90 are very easy items and should be carefully reviewed based on the instructor’s purpose. For example, if the instructor is using easy “warm-up” questions or aiming for student mastery, then some items with p values above .90 may be warranted. In contrast, if an instructor is mainly interested in differences among students, these items may not be worth testing.
- P-values below 0.20 are very difficult items and should be reviewed for possible confusing language, removed from subsequent exams, and/or identified as areasfor re-instruction. If almost all of the students got the item wrong, there is either a problem with the item or the students were not able to learn the concept.However, if an instructor is trying to determine the top percentage of students who learned a certain concept, this highly difficult item may be necessary.
The discriminating power of the item: The relationship between how well students did on the item and their total exam score.
- Also referred to as the Point-Biserial correlation (PBS).
- The range is from –1.00 to 1.00.
- The higher the value, the more discriminating the item. A highly discriminating item indicates that the students who had high exam scores got the item correct, whereas students who had low exam scores got the item incorrect.
- Items with discrimination values near or less than zero should be removed from the exam. This indicates that students who did poorly overall on the exam did better on that item than students who did well overall. The item may be confusing for your better-scoring students in some way.
- Acceptable range: 0.20 or higher.
- Ideal value: The closer to 1.00 the better.
- Calculation: where
- ΧC= the mean total score for students who responded correctly to the item
- ΧT= the mean total score for all students
- p= the difficulty value for the item
- q=(1 –p)
- S. D. Total= the standard deviation of total exam scores
The effectiveness of each alternative, or distractor: By comparing the number of students scoring in the upper third to those scoring in the lower third who selected each incorrect alternative, determine the effectiveness of the distracters. A good distracter will attract more students from the lower group than the upper group. Thus, for our illustrative item analysis data below, alternatives A and D are functioning effectively; alternative C is poor since it attracted more students from the upper group; and alternative E is completely ineffective since it attracted no one. An analysis such as this is useful in evaluating a test item, and when combined with an inspection of the item itself, can provide helpful information for improving the item.Example question where 30 students responded:
* = correct answer
Example of Question Analysis on a Multiple Choice Question: