Item-score reliability: Estimation and evaluation

Eva A.O. Zijlmans

Research output: ThesisDoctoral ThesisScientific

10 Downloads (Pure)

Abstract

In psychology and education, tests are used to measure intelligence and school performance. Test scores are used to make decisions about individuals (who is admitted to a particular school level or a job?) and have impact on people’s lives as well as on schools and organizations. Thus, test scores must be reliable to guarantee that decisions based on test scores are correct. Reliability is the degree to which retesting a person provides the same result. In practice, re-testing the same persons to determine reliability is unrealistic, because memory and other unwanted effects will influence the test result. Estimation of a test score’s reliability therefore is based on the test results of a sample of people who took the test just once. This approach has produced several methods to estimate reliability of the test score.

Methods for estimating the reliability of a test score all relate to a test consisting of multiple items (problems to be solved, questions to be answered). However, individual items also must have high reliability, and thus it is important to assess the reliability of a single item, that is, the item-score reliability. So far, items were assessed using indices that address aspects of item quality other than reliability, but methods to assess item-score reliability were hardly available and thus had to be developed and their performance evaluated. This was the topic of this dissertation.

In this dissertation, methods for estimating item-score reliability were developed and the usability of these methods was evaluated. First, reliability methods based on test scores were used as a basis for developing methods for estimating item-score reliability. These methods were evaluated in controlled studies using simulated data. Three promising methods resulted. In a second study, these three item-score reliability methods were used to estimate the item-score reliability in several empirical-data sets. The resulting values were compared to values of item indices assessing other aspects of item quality. The relation between the three item-score reliability methods and the other item indices was investigated in a third study using simulated data. In a final study, the usability of item-score reliability for selecting or rejecting items based on their contribution to test-score reliability was investigated.

The studies in this dissertation show that item-score reliability methods provide insight into the quality of an item and help to decide whether an item should be included in the test. Also, the relationship between item-score reliability and other aspects of item quality is investigated. Our methods may contribute to the
improvement of psychological and educational tests.
Original languageEnglish
QualificationDoctor of Philosophy
Supervisors/Advisors
  • Sijtsma, Klaas, Promotor
  • van der Ark, L.A., Promotor
  • Tijmstra, Jesper, Co-promotor
  • Meijer, R.R., Member PhD commission, External person
  • Veldkamp, Bernard P., Member PhD commission, External person
  • Wicherts, Jelte, Member PhD commission
  • Bouwmeester, Samantha, Member PhD commission, External person
  • Keijsers, Loes, Member PhD commission
Award date15 Feb 2019
Place of PublicationEnschede
Publisher
Print ISBNs978-94-6323-482-5
Publication statusPublished - 2019

Fingerprint

evaluation
earning a doctorate
school
human being
performance
Values
guarantee
intelligence
psychology

Cite this

Zijlmans, E. A. O. (2019). Item-score reliability: Estimation and evaluation. Enschede: Gildeprint.
Zijlmans, Eva A.O.. / Item-score reliability : Estimation and evaluation. Enschede : Gildeprint, 2019. 109 p.
@phdthesis{fe8efb67e4444292b0d59b4a51c53f5f,
title = "Item-score reliability: Estimation and evaluation",
abstract = "In psychology and education, tests are used to measure intelligence and school performance. Test scores are used to make decisions about individuals (who is admitted to a particular school level or a job?) and have impact on people’s lives as well as on schools and organizations. Thus, test scores must be reliable to guarantee that decisions based on test scores are correct. Reliability is the degree to which retesting a person provides the same result. In practice, re-testing the same persons to determine reliability is unrealistic, because memory and other unwanted effects will influence the test result. Estimation of a test score’s reliability therefore is based on the test results of a sample of people who took the test just once. This approach has produced several methods to estimate reliability of the test score.Methods for estimating the reliability of a test score all relate to a test consisting of multiple items (problems to be solved, questions to be answered). However, individual items also must have high reliability, and thus it is important to assess the reliability of a single item, that is, the item-score reliability. So far, items were assessed using indices that address aspects of item quality other than reliability, but methods to assess item-score reliability were hardly available and thus had to be developed and their performance evaluated. This was the topic of this dissertation.In this dissertation, methods for estimating item-score reliability were developed and the usability of these methods was evaluated. First, reliability methods based on test scores were used as a basis for developing methods for estimating item-score reliability. These methods were evaluated in controlled studies using simulated data. Three promising methods resulted. In a second study, these three item-score reliability methods were used to estimate the item-score reliability in several empirical-data sets. The resulting values were compared to values of item indices assessing other aspects of item quality. The relation between the three item-score reliability methods and the other item indices was investigated in a third study using simulated data. In a final study, the usability of item-score reliability for selecting or rejecting items based on their contribution to test-score reliability was investigated.The studies in this dissertation show that item-score reliability methods provide insight into the quality of an item and help to decide whether an item should be included in the test. Also, the relationship between item-score reliability and other aspects of item quality is investigated. Our methods may contribute to theimprovement of psychological and educational tests.",
author = "Zijlmans, {Eva A.O.}",
year = "2019",
language = "English",
isbn = "978-94-6323-482-5",
publisher = "Gildeprint",

}

Zijlmans, EAO 2019, 'Item-score reliability: Estimation and evaluation', Doctor of Philosophy, Enschede.

Item-score reliability : Estimation and evaluation. / Zijlmans, Eva A.O.

Enschede : Gildeprint, 2019. 109 p.

Research output: ThesisDoctoral ThesisScientific

TY - THES

T1 - Item-score reliability

T2 - Estimation and evaluation

AU - Zijlmans, Eva A.O.

PY - 2019

Y1 - 2019

N2 - In psychology and education, tests are used to measure intelligence and school performance. Test scores are used to make decisions about individuals (who is admitted to a particular school level or a job?) and have impact on people’s lives as well as on schools and organizations. Thus, test scores must be reliable to guarantee that decisions based on test scores are correct. Reliability is the degree to which retesting a person provides the same result. In practice, re-testing the same persons to determine reliability is unrealistic, because memory and other unwanted effects will influence the test result. Estimation of a test score’s reliability therefore is based on the test results of a sample of people who took the test just once. This approach has produced several methods to estimate reliability of the test score.Methods for estimating the reliability of a test score all relate to a test consisting of multiple items (problems to be solved, questions to be answered). However, individual items also must have high reliability, and thus it is important to assess the reliability of a single item, that is, the item-score reliability. So far, items were assessed using indices that address aspects of item quality other than reliability, but methods to assess item-score reliability were hardly available and thus had to be developed and their performance evaluated. This was the topic of this dissertation.In this dissertation, methods for estimating item-score reliability were developed and the usability of these methods was evaluated. First, reliability methods based on test scores were used as a basis for developing methods for estimating item-score reliability. These methods were evaluated in controlled studies using simulated data. Three promising methods resulted. In a second study, these three item-score reliability methods were used to estimate the item-score reliability in several empirical-data sets. The resulting values were compared to values of item indices assessing other aspects of item quality. The relation between the three item-score reliability methods and the other item indices was investigated in a third study using simulated data. In a final study, the usability of item-score reliability for selecting or rejecting items based on their contribution to test-score reliability was investigated.The studies in this dissertation show that item-score reliability methods provide insight into the quality of an item and help to decide whether an item should be included in the test. Also, the relationship between item-score reliability and other aspects of item quality is investigated. Our methods may contribute to theimprovement of psychological and educational tests.

AB - In psychology and education, tests are used to measure intelligence and school performance. Test scores are used to make decisions about individuals (who is admitted to a particular school level or a job?) and have impact on people’s lives as well as on schools and organizations. Thus, test scores must be reliable to guarantee that decisions based on test scores are correct. Reliability is the degree to which retesting a person provides the same result. In practice, re-testing the same persons to determine reliability is unrealistic, because memory and other unwanted effects will influence the test result. Estimation of a test score’s reliability therefore is based on the test results of a sample of people who took the test just once. This approach has produced several methods to estimate reliability of the test score.Methods for estimating the reliability of a test score all relate to a test consisting of multiple items (problems to be solved, questions to be answered). However, individual items also must have high reliability, and thus it is important to assess the reliability of a single item, that is, the item-score reliability. So far, items were assessed using indices that address aspects of item quality other than reliability, but methods to assess item-score reliability were hardly available and thus had to be developed and their performance evaluated. This was the topic of this dissertation.In this dissertation, methods for estimating item-score reliability were developed and the usability of these methods was evaluated. First, reliability methods based on test scores were used as a basis for developing methods for estimating item-score reliability. These methods were evaluated in controlled studies using simulated data. Three promising methods resulted. In a second study, these three item-score reliability methods were used to estimate the item-score reliability in several empirical-data sets. The resulting values were compared to values of item indices assessing other aspects of item quality. The relation between the three item-score reliability methods and the other item indices was investigated in a third study using simulated data. In a final study, the usability of item-score reliability for selecting or rejecting items based on their contribution to test-score reliability was investigated.The studies in this dissertation show that item-score reliability methods provide insight into the quality of an item and help to decide whether an item should be included in the test. Also, the relationship between item-score reliability and other aspects of item quality is investigated. Our methods may contribute to theimprovement of psychological and educational tests.

M3 - Doctoral Thesis

SN - 978-94-6323-482-5

PB - Gildeprint

CY - Enschede

ER -

Zijlmans EAO. Item-score reliability: Estimation and evaluation. Enschede: Gildeprint, 2019. 109 p.