A Generic Model of Motivation in Artificial Animals Based on Reinforcement Learning
This thesis is a part of a broader research project at Chalmers University of Technology focused on ecosystems’ simulations using reinforcement learning artificial animals, called animats. The scope of this project is to provide animats with a reward signal which should ultimately drive animats’ learning towards adaptation of their environment. We introduce a framework based on basic biological mechanisms of homeostatic regulation, i.e. the regulation of physiological conditions, to reward animats for maintaining their optimal homeostatic state, i.e. for maintaining homeostasis. As such, homeostasis is each animat’s objective. Previous, theoretical work adopting homeostatic regulation as a mechanism of reward generation lack the ability of regulating needs’ importance and needs’ interaction, and as shown by our results fail in environments where animats eventually die. We extend on previous theoretical efforts of modeling homeostatic regulation by defining the animat’s happiness as a function of its needs through several simple univariate utility functions. Modeling the utility of each need singularly enables high flexibility in design and easily configurable interactions between different needs. Moreover, in this framework vital needs have priority over non-vital or sensory needs. We show that this framework can be used to elicit six important animat behaviors, emulations of real animal behaviours, and in particular can be used to recreate typical behaviours observed in free-living planktonic copepods such as quick escape reactions from fast-approaching predators and diel vertical migration. We compare 2 models for reward generation utilizing different happiness functions to previous theoretical work and to a generalization of said previous work in a diverse array of environments, showing that one model of motivation is superior in all tested environments and allows animats to learn the six objective behaviours. The models are also compared against a baseline reward, rewarding staying alive. We show that the proposed models produce a better performance compared to the baseline model, implicating that motivational models based on homeostatic regulation are a good choice for reward generation for animats. Finally, we test the models in a more general marine environment, showing that using this framework animats can learn copepod behaviour.