Local Asymptotic Normality for Multi-Armed Bandits

Research output: Working paperScientific

Abstract

Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H\a'{a}jek-Le Cam, for (contextual) bandits whose arms' expected payoffs differ by $O(T^{-1/2})$, is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms' expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.
Original languageUndefined/Unknown
Publication statusPublished - 13 Dec 2025

Keywords

  • math.ST

Cite this