Analysis of cross-cultural comparability of PISA 2009 scores

M. Kankaras, G.B.D. Moors

Research output: Contribution to journalArticleScientificpeer-review

218 Downloads (Pure)

Abstract

The Program for International Student Assessment (PISA) is a large-scale cross-national study that measures academic competencies of 15-year-old students in mathematics, reading, and science from more than 50 countries/economies around the world. PISA results are usually aggregated and presented in so-called “league tables,” in which countries are compared and ranked in each of the three scales. However, to compare results obtained from different groups/countries, one must first be sure that the tests measure the same competencies in all cultures. In this paper, this is tested by examining the level of measurement equivalence in the 2009 PISA data set using an item response theory approach (IRT) and analyzing differential item functioning (DIF). Measurement in-equivalence was found in the form of uniform DIF. In-equivalence occurred in a majority of test questions in all three scales researched and is, on average, of moderate size. It varies considerably both across items and across countries. When this uniform DIF is accounted for in the in-equivalent model, the resulting country scores change considerably in the cases of the “Mathematics,” “Science,” and especially, “Reading” scale. These changes tend to occur simultaneously and in the same direction in groups of regional countries. The most affected seems to be Southeast Asian countries/territories whose scores, although among the highest in the initial, homogeneous model, additionally increase when accounting for in-equivalence in the scales.
Keywords: measurement equivalence, PISA, differential item functioning, cross-cultural research, educational measurement
Original languageEnglish
Pages (from-to)381-399
JournalJournal of Cross-Cultural Psychology
Volume45
Issue number3
DOIs
Publication statusPublished - 2014

Fingerprint

PISA study
equivalence
Mathematics
Educational Measurement
mathematics
science
educational research
Group
economy
student

Cite this

@article{7001b8bfed734c1c88cc1e862966ee68,
title = "Analysis of cross-cultural comparability of PISA 2009 scores",
abstract = "The Program for International Student Assessment (PISA) is a large-scale cross-national study that measures academic competencies of 15-year-old students in mathematics, reading, and science from more than 50 countries/economies around the world. PISA results are usually aggregated and presented in so-called “league tables,” in which countries are compared and ranked in each of the three scales. However, to compare results obtained from different groups/countries, one must first be sure that the tests measure the same competencies in all cultures. In this paper, this is tested by examining the level of measurement equivalence in the 2009 PISA data set using an item response theory approach (IRT) and analyzing differential item functioning (DIF). Measurement in-equivalence was found in the form of uniform DIF. In-equivalence occurred in a majority of test questions in all three scales researched and is, on average, of moderate size. It varies considerably both across items and across countries. When this uniform DIF is accounted for in the in-equivalent model, the resulting country scores change considerably in the cases of the “Mathematics,” “Science,” and especially, “Reading” scale. These changes tend to occur simultaneously and in the same direction in groups of regional countries. The most affected seems to be Southeast Asian countries/territories whose scores, although among the highest in the initial, homogeneous model, additionally increase when accounting for in-equivalence in the scales.Keywords: measurement equivalence, PISA, differential item functioning, cross-cultural research, educational measurement",
author = "M. Kankaras and G.B.D. Moors",
year = "2014",
doi = "10.1177/0022022113511297",
language = "English",
volume = "45",
pages = "381--399",
journal = "Journal of Cross-Cultural Psychology",
issn = "0022-0221",
publisher = "Sage Publications, Inc.",
number = "3",

}

Analysis of cross-cultural comparability of PISA 2009 scores. / Kankaras, M.; Moors, G.B.D.

In: Journal of Cross-Cultural Psychology, Vol. 45, No. 3, 2014, p. 381-399.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Analysis of cross-cultural comparability of PISA 2009 scores

AU - Kankaras, M.

AU - Moors, G.B.D.

PY - 2014

Y1 - 2014

N2 - The Program for International Student Assessment (PISA) is a large-scale cross-national study that measures academic competencies of 15-year-old students in mathematics, reading, and science from more than 50 countries/economies around the world. PISA results are usually aggregated and presented in so-called “league tables,” in which countries are compared and ranked in each of the three scales. However, to compare results obtained from different groups/countries, one must first be sure that the tests measure the same competencies in all cultures. In this paper, this is tested by examining the level of measurement equivalence in the 2009 PISA data set using an item response theory approach (IRT) and analyzing differential item functioning (DIF). Measurement in-equivalence was found in the form of uniform DIF. In-equivalence occurred in a majority of test questions in all three scales researched and is, on average, of moderate size. It varies considerably both across items and across countries. When this uniform DIF is accounted for in the in-equivalent model, the resulting country scores change considerably in the cases of the “Mathematics,” “Science,” and especially, “Reading” scale. These changes tend to occur simultaneously and in the same direction in groups of regional countries. The most affected seems to be Southeast Asian countries/territories whose scores, although among the highest in the initial, homogeneous model, additionally increase when accounting for in-equivalence in the scales.Keywords: measurement equivalence, PISA, differential item functioning, cross-cultural research, educational measurement

AB - The Program for International Student Assessment (PISA) is a large-scale cross-national study that measures academic competencies of 15-year-old students in mathematics, reading, and science from more than 50 countries/economies around the world. PISA results are usually aggregated and presented in so-called “league tables,” in which countries are compared and ranked in each of the three scales. However, to compare results obtained from different groups/countries, one must first be sure that the tests measure the same competencies in all cultures. In this paper, this is tested by examining the level of measurement equivalence in the 2009 PISA data set using an item response theory approach (IRT) and analyzing differential item functioning (DIF). Measurement in-equivalence was found in the form of uniform DIF. In-equivalence occurred in a majority of test questions in all three scales researched and is, on average, of moderate size. It varies considerably both across items and across countries. When this uniform DIF is accounted for in the in-equivalent model, the resulting country scores change considerably in the cases of the “Mathematics,” “Science,” and especially, “Reading” scale. These changes tend to occur simultaneously and in the same direction in groups of regional countries. The most affected seems to be Southeast Asian countries/territories whose scores, although among the highest in the initial, homogeneous model, additionally increase when accounting for in-equivalence in the scales.Keywords: measurement equivalence, PISA, differential item functioning, cross-cultural research, educational measurement

U2 - 10.1177/0022022113511297

DO - 10.1177/0022022113511297

M3 - Article

VL - 45

SP - 381

EP - 399

JO - Journal of Cross-Cultural Psychology

JF - Journal of Cross-Cultural Psychology

SN - 0022-0221

IS - 3

ER -