Linear Bandits on Uniformly Convex Sets.

TITLE: Linear Bandits on Uniformly Convex Sets.
AUTHORS: T. Kerdreux, C. Roux, A. d'Aspremont, S. Pokutta
ABSTRACT: Linear bandit algorithms yield two types of structural assumptions lead to better pseudo-regret bounds. When K is the simplex or an l_p ball with p in ]1,2], there exist bandits algorithms with O(sqrt{nT}) pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond l_p balls that enjoy pseudo-regret bounds of O(sqrt{nT}), which answers an open question from (BCB12, S 5.5). Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than O(sqrt{n}). However, this comes at the expense of asymptotic rates in T varying between O(sqrt{T}) and O(T).
ArXiv PREPRINT: 2103.05907