If we are measuring ability to achieve specific CCSS or NGSS benchmarks, why is there a difference in difficulty attached to the items that are given to some students and not others?
The EOC exams are online adaptive exams, which means that students answer different sets of items. The online adaptive EOC Exam System selects items for each student that most accurately align with his or her performance on the exam based on the items that have been answered to that point. In general, students who are doing well on the exam will see more difficult items, and students who are struggling will see easier items. Regardless of the difficulty of the items, all students are tested on the breadth of the content for the course, and all students get an opportunity to demonstrate their higher-order thinking skills.
How is a student’s achievement for an EOC exam measured if there is the factor of difference in difficulty of items given to students?
Each item has a measured difficulty based on the field-test results, so the items can be arranged along a scale. Student scores lie along that same scale. Imagine two students, one getting difficult items and the other receiving easier items. Suppose they both answer half of their items correctly. The student with the more difficult items will receive a higher score. This is made possible through a statistical process known as equating, and it is used on virtually all adaptive tests.
The standard error of measurement (SEM) also needs to be considered when reviewing a student’s scores for an EOC exam. The observed score on any exam is an estimate of the true score. If a student took a similar exam several times, the resulting scale score would vary across administrations, sometimes being a little higher, a little lower, or the same. The SEM represents the precision of the scale score, or the range in which the student would likely score if a similar exam was administered several times. The “+/–” next to the student’s scale score provides information about the certainty, or confidence, of the score’s interpretation. The boundaries of the score band are one standard error of measurement above and below the student’s observed score, representing a range of score values that is likely to contain the true score. For example, 310 ± 10 indicates that if a student was tested again, two out of three times the student’s true score would likely fall between 300 and 320. Because students are administered different sets of items of varying item difficulties for an EOC exam, the SEM can be different for the same scale score, depending on how closely the administered items match the student's ability.
A student’s scale score should be evaluated after the SEM is added to or subtracted from the scale score. This provides a score range that includes the student’s true score with 68 percent certainty (i.e., across repeated administrations, the student’s test score would fall in this range about 68 percent of the time).
A small difference between scale scores (e.g., within one SEM) should not be interpreted as a significant difference. The measurement error should be taken into account when users are comparing scores. For example, students with scores of 301 and 303 are not reliably different because those scores differ by less than one SEM. The student’s true score can lie outside the score band. The score band contains a student’s true score with 68 percent certainty; therefore, the student’s true score can lie outside the score band.
How does marking an item for review during an exam affect the next item? Does it lower the level and value? Does it remain at the same difficulty level?
Marking an item for review does not in any way affect the selection of subsequent items. It is simply a way for a student to make a note to himself or herself to review the initial answer for an item. Only a student’s initial response to an item (independent of whether the item is marked for review), which is used to update the student’s ability estimate, will affect the selection of subsequent items. If a student changes the initial response to a marked or unmarked item, the change in response will result in an update of the student’s ability estimate, which will affect the selection of any additional items.