An Empirical Survey of Bandits in an Industrial Recommender System Setting
In this thesis, the effects of incorporating unstructured data—images in the wild—in contextual multi-armed bandits are investigated, when used within a recommender system setting, which focuses on picture-based content suggestion. The idea is to employ image features, extracted by a pre-trained convolutional neural network, and study the resulting bandit behaviors when including respective excluding this information in the typical context creation, which normally relies on structured data sources—such as metadata. The evaluation is made both online, through A/B-testing enabled by the industrial partner YouPic AB, and offline, effectuated by a simulation pipeline that models the online counterpart. The results are compiled as a survey, covering a selection of contextual bandit algorithms, highlighting the differences brought by the unstructured data. The offline result points towards that if the contextual bandit utilizes a joint or hybrid action-value function, with respect to the parameterization, the addition of the image vectors can significantly outperform the instances without it; however, if a disjoint model is instead employed, no noticeable change is observed. In comparison, those from the online trials can be interpreted as supporting the inclusion of convolutional features, but due to meager and unbalanced sample sizes, the outcomes are deemed inconclusive. To summarize, though there is support for incorporating unstructured data, given that the action-value function is joint or hybrid, the online experiments gave too little evidence for any trustworthy findings; in other words, the question is still partially open.
contextual multi-armed bandits