$p$ -Mean Regret for Stochastic Bandits

Anand Krishna, Philips George John, Vincent Y. F. Tan, Adarsh Barik

June 2024

Abstract

In this work, we extend the concept of the $p$ -mean welfare objective from social choice theory to study $p$ -mean regret in stochastic multi-armed bandit problems. The $p$ -mean regret, defined as the difference between the optimal mean among the arms and the $p$ -mean of the expected rewards, offers a flexible framework for evaluating bandit algorithms, enabling algorithm designers to balance fairness and efficiency by adjusting the parameter $p$ . Our framework encompasses both average cumulative regret and Nash regret as special cases. We introduce a simple, unified UCB-based algorithm (Explore-Then-UCB) that achieves novel $p$ -mean regret bounds. Our algorithm consists of two phases: a carefully calibrated uniform exploration phase to initialize sample means, followed by the UCB1 algorithm of Auer et al. Under mild assumptions, we prove that our algorithm achieves a $p$ -mean regret bound of $\tilde{O} (\sqrt{\frac{k}{T^{\frac{1}{2 | p |}}}})$ for all $p \leq - 1$ , where $k$ represents the number of arms and $T$ the time horizon. When $- 1 < p < 0$ , we achieve a regret bound of $\tilde{O} (\sqrt{\frac{k^{1.5}}{T^{\frac{1}{2}}}})$ . For the range $0 < p \leq 1$ , we achieve a $p$ -mean regret scaling as $\tilde{O} (\sqrt{\frac{k}{T}})$ , which matches the previously established lower bound up to logarithmic factors. This result stems from the fact that the $p$ -mean regret of any algorithm is at least its average cumulative regret for $p \leq 1$ . In the case of Nash regret (the limit as $p$ approaches zero), our unified approach differs from prior work of Barman et al., which requires a new Nash Confidence Bound algorithm. Notably, we achieve the same regret bound up to constant factors using our more general method.

Type

Preprint

Publication

In AAAI 2025

p-Mean Regret for Stochastic Bandits

Abstract

Anand Krishna

Research Fellow

$p$ -Mean Regret for Stochastic Bandits