Font Size:  Small  Medium  Large

Evaluating Inter-rater Reliability of a National Assessment Model for Teacher Performance

Jenna M. Porter, David Jelinek


This study addresses the high stakes nature of teacher performance assessments and consequential outcomes of passing versus failing based on decisions of those who subjectively score them. Specifically, this study examines the inter-rater reliability of an emerging national model, the Performance Assessment for California Teachers (PACT). Current reports on the inter-rater reliability of PACT use percent agreement that combines exact and within 1 point agreement, but such measurements are problematic because adjacent scores of 1 point could be the difference between passing or failing. Multiple methods were used to examine the inter-rater reliability of PACT using 41 assessments (451 double scores) from an accredited institution in California. This study separated and examined the failing and passing groups, in addition to evaluating inter-rater reliability by combining them. Both percent agreement (exact and within 1 point) and Kappa (Cohen, 1960) were estimated to report the level of agreement among PACT raters for candidates who failed versus passed the assessment. Results indicate that inter-rater reliability ranged from poor to moderate, depending on whether a candidate passed or failed. A number of recommendations are proposed, including a model for more precise measurements of inter-rater reliability and improvements for training and calibration processes.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.