There were two questions posted recently on the AP Calculus Community bulletin board. One teacher was concerned that his students took two different forms of the Calculus exam, and the means were not the same. He felt that one group has an easier time than the other. The other writer noted that on his (physics) exam three questions were not counted – there appeared to be only 32 questions instead of the 35 he expected.
My answer, which you may be interested in, was:
It is impossible to make two forms of the same test of equal difficulty. I repeat: It is impossible to make two forms of the same test of equal difficulty. (And if the two forms are equal in difficulty, it is due more to dumb luck than good management.)
What the ETS (Educational Testing Service) does to account for this fact is to adjust the cut points for the scores (5-4-3-2-1). A form of the exam that is “easier”, in the sense of having higher overall means, also has higher cut points. Regardless of the difficulty of the form of the exam the score (5-4-3-2-1) reflects the same amount of knowledge of the subject (as best as possible). Any other scheme would certainly not be fair. So, there is no need to be concerned that someone else had an easier exam than your students. They may well have, but their and your students’ score (5-4-3-2-1) reflects the same knowledge. Your students, and those with the easier exam, will get the score they earned.
Then I suggested that he consider his students one at a time without regard to the form of the test they took. Check and see if the students got the score you expected them to get. Keeping in mind that students often surprise or disappoint us, did the students get the scores he anticipated. If, in general, they got the scores he expected without regard to the form, then the ETS did its job.
As to the second concern: The ETS looks at the results individually for each and every question on the exam. If everyone scores very low on a particular question, or if some identifiable sub-group (men, women, one or more minorities) has scores that are way out of line with everyone else, the question is rejected and not scored. The other scores are re-weighted accordingly and the final score (5-4-3-2-1) reflects the same knowledge of the subject. This happens in math and science, but I suspect it happens more often in history, English, and the social sciences.
You might also refer to my recent post of May 12, 2014 Percentages Don’t Make the Grade on this topic.
Updated and revised July 12, 2014.