Multiobjective Deep Reinforcement Learning for ATM Cash Planning: Why?
The current framework of reinforcement learning is mainly based on a single objective performance optimization, that is maximizing the expected returns based on scalar rewards that come either from univariate environment response or from a weighted aggregation of a multivariate response. If the problem’s environment is complex, with huge states and actions spaces where the …