Risk-Averse Multi-Armed Bandit Problem with Multiple Plays

Dahlgren, Siri; Marriott, Nicholas

Abstract

This study aims to construct an efficient heuristic, referred to as RA, for a riskaverse Markovian multi-armed bandit problem (MAB) with multiple plays. The RA incorporates risk-aversion and multiple plays by modifying the Gittins index strategy. The performance of RA is compared to a risk-neutral version of the Gittins index strategy (RN) on a risk-averse MAB, using Markov Decision Process (MDP) policies as references for optimality. The results demonstrate that RA outperforms RN in the majority of the tested cases, showcasing its effectiveness in a risk-averse setting for multiple plays. Notably, RA exhibits a substantial performance improvement, with up to a 120.86% better performance than RN for MAB instances with rewards generated from a normal distribution, and a remarkable 270.55% better performance for MAB using exponential distributions. Furthermore, the runtime of RA exhibits a linear growth pattern as the problem size increases, which is in contrast to the exponential growth observed in MDP approaches. The study’s findings provide strong evidence of the RA heuristics efficacy in solving risk-averse MAB problems with multiple plays.

Degree

Student essay

Date

2023-10-23

Author

Dahlgren, Siri

Marriott, Nicholas

Keywords

MAB

Gittins

Markovian bandit

risk-aversion

policy iteration

multiple plays

Language

eng

Metadata

Show full item record