Item-score reliability: Estimation and evaluation

Eva A.O. Zijlmans*

*Corresponding author for this work

Research output: ThesisDoctoral Thesis

117 Downloads (Pure)


In psychology and education, tests are used to measure intelligence and school performance. Test scores are used to make decisions about individuals (who is admitted to a particular school level or a job?) and have impact on people’s lives as well as on schools and organizations. Thus, test scores must be reliable to guarantee that decisions based on test scores are correct. Reliability is the degree to which retesting a person provides the same result. In practice, re-testing the same persons to determine reliability is unrealistic, because memory and other unwanted effects will influence the test result. Estimation of a test score’s reliability therefore is based on the test results of a sample of people who took the test just once. This approach has produced several methods to estimate reliability of the test score.

Methods for estimating the reliability of a test score all relate to a test consisting of multiple items (problems to be solved, questions to be answered). However, individual items also must have high reliability, and thus it is important to assess the reliability of a single item, that is, the item-score reliability. So far, items were assessed using indices that address aspects of item quality other than reliability, but methods to assess item-score reliability were hardly available and thus had to be developed and their performance evaluated. This was the topic of this dissertation.

In this dissertation, methods for estimating item-score reliability were developed and the usability of these methods was evaluated. First, reliability methods based on test scores were used as a basis for developing methods for estimating item-score reliability. These methods were evaluated in controlled studies using simulated data. Three promising methods resulted. In a second study, these three item-score reliability methods were used to estimate the item-score reliability in several empirical-data sets. The resulting values were compared to values of item indices assessing other aspects of item quality. The relation between the three item-score reliability methods and the other item indices was investigated in a third study using simulated data. In a final study, the usability of item-score reliability for selecting or rejecting items based on their contribution to test-score reliability was investigated.

The studies in this dissertation show that item-score reliability methods provide insight into the quality of an item and help to decide whether an item should be included in the test. Also, the relationship between item-score reliability and other aspects of item quality is investigated. Our methods may contribute to the
improvement of psychological and educational tests.
Original languageEnglish
QualificationDoctor of Philosophy
  • Sijtsma, K., Promotor
  • van der Ark, L.A., Promotor
  • Tijmstra, Jesper, Co-promotor
  • Meijer, R.R., Member PhD commission, External person
  • Veldkamp, Bernard P., Member PhD commission, External person
  • Wicherts, Jelte, Member PhD commission
  • Bouwmeester, Samantha, Member PhD commission, External person
  • Keijsers, L.G.M.T., Member PhD commission
Award date15 Feb 2019
Place of PublicationEnschede
Print ISBNs978-94-6323-482-5
Publication statusPublished - 2019


Dive into the research topics of 'Item-score reliability: Estimation and evaluation'. Together they form a unique fingerprint.

Cite this