Abstract
Reinforcement learning agents often acquire wrong action-values in some states when the environment has problem such as perceptual aliasing. Especially, this is a serious problem for reinforcement learning that uses bootstrapping, because it propagates wrong action-values to other states. To solve this problem, we propose DBLA in which the agent skips aliased states and does backup from the first non-aliased state. We demonstrate effectiveness of DBLA in an example of a grid-world maze. The result shows that the influence of the wrong action-values is reduced very much with this method.