Interpretation of Test Scoring Statistics – UCLA Center for the Advancement of Teaching

Following are explanations of the most frequently asked about statistics on our reports:

Roster of Results Report

Rank

Rank is a measure of where a score is positioned relative to all other scores in the group, where a rank of “1” indicates the highest score in the group. In cases where there are multiple instances of the same score, the rank is calculated as the average of the range of ranks covering the duplicate scores. For example, here is a partial list of percent scores and their preliminary and adjusted ranks:

Score (%)	Rank	Adj. Rank
95.4	1	1
92	2	2.5
92	3	2.5
89	4	4
88.5	5	5

Because there are two instances of the score of 92%, the ranks for the two are averaged, and a rank of (2 + 3)/2 = 2.5 is assigned to each.

T-Scores

T-scores indicate how many standard deviation units an examinee’s score is above or below the mean. T-Scores always have a mean of 50 and a standard deviation of 10, so any T-Score is directly interpretable. A T-Score of 50 indicates a raw score equal to the mean. A T-Score of 40 indicates a raw score one standard deviation below the mean, while a T-Score of 65 indicates a raw score 1.5 standard deviations above the mean.

Both rank and T-Scores describe test performance in terms of the examinee’s relative position in the distribution of test scores. While rank has the advantage of being easier to understand, it has the serious disadvantage of representing a scale where the percentile units are not equal on all parts of the scale.

A rank difference near the middle of the scale represents a much smaller difference in test performance than the same percentile difference at the ends. T-Scores, on the other hand, provide equal units that can be treated arithmetically. T-Scores from several tests taken during a semester can thus be summed and averaged.

Z-Scores

Z-Scores are raw scores expressed in standard deviation units, relative to the mean score. Positive Z-scores indicate a raw score that is above the mean, negative Z-scores indicate a raw score that is below the mean, and a Z-score of zero indicates a raw score that is equal to the mean. In a normally-distributed set of data, the general rule states that 68% of all scores will fall within ±1 SD of the mean; 95% of all scores will fall within ±2 SD, and 99.7% of all scores within ±3 SD. Z-scores between -2.00 and +2.00 are therefore considered relatively ordinary, while values greater than -2.00 and +2.00 are unusual.

Test Item Analysis Report

Difficulty Index (DIF Index)

The Difficulty Index (DIF Index) indicates how many in the entire group answered the question correctly, expressed as a percent.

There is a formula than can be used to calculate or explain this:

c ÷ s = p

Where:

DIF	Difficulty Index
c	The number of students who answer a question correctly
s	The total number of students in the class who answered the question
p	Difficulty level which is then usually turned into a percentage

The answer will equal a value between 0.0 and 1.0, with harder questions resulting in values closer to 0.0 and easier questions resulting in values closer to 1.0.

Example: Out of the 20 students who answered question five, only thirteen answered correctly.

Therefore: 13 ÷ 20 = 0.65 which also equals 65%

Discrimination Index (DISC Index)

The formula being used for the Discrimination Index (DISC Index) figure on the Test Item Analysis report is:

DISC = (a – b) / n

Where:

DISC	Discrimination Index
a	Response frequency of the upper 27% of the scorers
b	Response frequency of the lower 27% of the scorers
n	Number of respondents in the upper 27% of the scorers

DISC Index, or index of discrimination, is a measure of how well a particular question is a predictor of success in the test overall. It is simply the difference between the percentage of high achieving students who got an item right and the percentage of low achieving students who got the item right. The high and low achieving students are usually (and is in the Scan ‘n’ Score reports) defined as the upper and lower twenty-seven percent of the students based on the total examination score.

A useful rule of thumb in interpreting the index of discrimination is to compare it with the maximum possible discrimination for an item. The maximum possible discrimination is a function of item difficulty. When half or less of the sum of the upper group plus the lower group answered the item correctly, the maximum possible discrimination is the sum of the proportions of the upper and lower groups who answered the item correctly. For example, if 30% of the upper group and 10% of the lower group answered the item correctly, the maximum possible discrimination is 30 plus 10, or 40. This maximum possible discrimination would occur when 40% of the upper group and none of the lower group answered the item correctly.

When more than half of the sum of the upper group plus the lower group answers an item correctly, the maximum possible discrimination is 200 minus the sum of the proportions of the upper and lower groups who answered the item correctly. For example, if 96% of the upper group and 84% of the lower group answered the item correctly, the maximum possible discrimination for the item would be 200 minus 180 (96 plus 84), or 20.

Thus, if everyone in the group (high- and low-scorers alike) answers a question correctly, its DISC Index will be zero, meaning it is a poor predictor of overall success. In contrast, if all of the high-scoring group answer the question correctly and all of the low-scoring group get it wrong, the DISC Index will be 100, indicating the question is an excellent predictor of overall success on the test.

Following are explanations of the most frequently asked about statistics on our reports:

Roster of Results Report

Test Item Analysis Report

Interesting links

Pages

Categories

Archive