Cluster KL-UCB: Optimism for the Best, Pessimism for the Rest
An improvement and extension of the KL-UCB algorithm in a clustered multi armed bandit setting
Abstract
The project presents an allocation strategy for the stochastic multi armed bandit
when considering instances with a clustered structure. Using the architecture
of the KL-UCB policy as a source of inspiration, an algorithm which exploits and
takes advantage from a clustered structure is derived. Firstly, encouraged by previous
work related to the subject, a multi-level structure approach will constitute as an initial
examination. Secondly, the Cluster KL-UCB policy will be derived and evaluated
considering three di erent approaches. It will be shown, both theoretically and empirically,
that adapting to a clustered environment improves the performance compared
to its non cluster-adapting ancestor. Both upper and lower bounds on the regret will
be provided in order to theoretically ensure the performance of the algorithm. Lastly,
a number of empirical experiments will be performed in order to further ensure the
performance and validate the theoretical results.
Degree
Student essay