Risk-Averse Multi-Armed Bandit Problem with Multiple Plays
Abstract
This study aims to construct an efficient heuristic, referred to as RA, for a riskaverse Markovian multi-armed bandit problem (MAB) with multiple plays. The RA incorporates risk-aversion and multiple plays by modifying the Gittins index strategy. The performance of RA is compared to a risk-neutral version of the Gittins index strategy (RN) on a risk-averse MAB, using Markov Decision Process (MDP) policies as references for optimality. The results demonstrate that RA outperforms RN in the majority of the tested cases, showcasing its effectiveness in a risk-averse setting for multiple plays. Notably, RA exhibits a substantial performance improvement, with up to a 120.86% better performance than RN for MAB instances with rewards generated from a normal distribution, and a remarkable 270.55% better performance for MAB using exponential distributions.
Furthermore, the runtime of RA exhibits a linear growth pattern as the problem size increases, which is in contrast to the exponential growth observed in MDP approaches. The study’s findings provide strong evidence of the RA heuristics efficacy
in solving risk-averse MAB problems with multiple plays.
Degree
Student essay
Collections
View/ Open
Date
2023-10-23Author
Dahlgren, Siri
Marriott, Nicholas
Keywords
MAB
Gittins
Markovian bandit
risk-aversion
policy iteration
multiple plays
Language
eng