UMMC / Education / Academic Affairs / Institutional Research
Test Scoring and Analysis
a) the correct response to each item
b) ZZZKEY in the first six columns of the NAME box.
c) 000000001 in the first 9 columns of the IDENTIFICATION NUMBER box.
a) A for all the items associated with the first part (instructor, objective), B for the items associated with the second part (instructor, objective), etc.
b) ZZZPARTS in the first eight columns of the NAME box.
c) 000000002 in the first 9 columns of the IDENTIFICATION box.
a) A for all items that you wish to assign a weight of "1" and B for all items that you wish to assign a weight of "2", etc.
b) ZZZWEIGHTS in the first ten columns of the NAME box.
c) 000000003 in the first 9 columns of the IDENTIFICATION NUMBER box.
Always attach a Test Scoring and Analysis Request Form. They are available in U170 and may be duplicated.

The Kudor-Richardson formula 20 is a measure of reliability which indicates inter-item consistency. The value of KR-20 ranges between 0 and +1.00. If a test consists only of items that measure the same type of material and which require the same kind of thinking behavior from the students the KR-20 should be close to 1.00.
If a test consists of items that measure the different types of material and which require different kinds of thinking, the KR-20 will be closer to 0.00.
The KR-20 can be affected by the number of items on the test. Longer tests tend to have scores closer to 1.00. Occasionally part scores will have higher KR-20's than the total test because the parts have more homogenous items that the test considered as a whole. The higher the KR-20 value the more confident the instructor can be in making judgments about the test results.
The Standard Error of Measurement is a second measure of reliability. A student's score is made up of a "true" score and a number of chance factors. These chance factors may raise or lower the student's score from his/her true raw score in an unknown fashion.
The instructor can assume with 68 percent confidence (2 times out of 3) that the student's true score is in a range of the achieved score plus or minus the standard error. For example, if the student' s score on the score sheet is 82 and the standard error is 4.5, then about 2/3 of the time, the instructor can assume that his score without any chance factors would be between 77.5 and 86.5.
To be 99% certain, the instructor would have to increase the range to three standard errors above and below the achieved score.
A small standard error is preferable to a large standard error.
Both these measures are designed to indicate to the instructor if students who scored well on the test as a whole correctly answered a given item and if students who scored poorly on the test as a whole answered a given question incorrectly. The values range from +1.00 to -1.00.
A +1.00 indicates that the students who scored well on the test as a whole correctly answered the item and the students who scored poorly on the test as a whole answered the item incorrectly.
A 0.00 indicates that the students who scored well on the test as a whole responded to the item correctly at the same rate as did students who scored poorly on the test as a whole.
A -1.00 indicates that the students who scored well on the test as a whole failed to answer the item correctly and the students who scored poorly on the test as a whole answered the item correctly.
If the numbers of students taking the exam and/or the number of items is less than 50, the R Point Biserial should be used.
If the numbers of students taking the exam and/or the number of items is less than 50, the discrimination index should be used.


