TY - UNPB
T1 - Local Asymptotic Normality for Multi-Armed Bandits
AU - van den Akker, Ramon
AU - Werker, Bas J. M.
AU - Zhou, Bo
PY - 2025/12/13
Y1 - 2025/12/13
N2 - Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H\a'{a}jek-Le Cam, for (contextual) bandits whose arms' expected payoffs differ by $O(T^{-1/2})$, is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms' expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.
AB - Van den Akker, Werker, and Zhou (2025) showed that the limit experiment, in the sense of H\a'{a}jek-Le Cam, for (contextual) bandits whose arms' expected payoffs differ by $O(T^{-1/2})$, is Locally Asymptotically Quadratic (LAQ) but highly non-standard, being characterized by a system of coupled stochastic differential equations. The present paper considers the complementary case where the arms' expected payoffs are fixed with a unique optimal (in the sense of highest expected payoff) arm. It is shown that, under sampling schemes satisfying mild regularity conditions (including UCB and Thompson sampling), the model satisfies the standard Locally Asymptotically Normal (LAN) property.
KW - math.ST
M3 - Working paper
BT - Local Asymptotic Normality for Multi-Armed Bandits
ER -