Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The Elo Rating System which originates from competitive chess has been widely utilised in large-scale online educational applications where it is used for on-the-fly estimation of ability, item calibration, and adaptivity. In this paper, we aim to critically analyse the shortcomings of the Elo rating system in an educational context, shedding light on its measurement properties and when these may fall short in accurately capturing student abilities and item difficulties. In a simulation study, we look at the asymptotic properties of the Elo rating system. Our results show that the Elo ratings are generally not unbiased and their variances are context-dependent. Furthermore, in scenarios where items are selected adaptively based on the current ratings and the item difficulties are updated alongside the student abilities, the variance of the ratings across items and students artificially increases over time and as a result the ratings do not converge. We propose a solution to this problem which entails using two parallel chains of ratings which remove the dependence of item selection on the current errors in the ratings.
Original languageEnglish
Number of pages16
JournalBritish Journal of Mathematical and Statistical Psychology
DOIs
Publication statusE-pub ahead of print - Jun 2025

Keywords

  • adaptive learning systems
  • Elo rating system
  • measurement

Fingerprint

Dive into the research topics of 'Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings'. Together they form a unique fingerprint.

Cite this