Proceedings of the 2003 International Conference on Machine Learning and Cybernetics
Download PDF

Abstract

Reinforcement learning agents often acquire wrong action-values in some states when the environment has problem such as perceptual aliasing. Especially, this is a serious problem for reinforcement learning that uses bootstrapping, because it propagates wrong action-values to other states. To solve this problem, we propose DBLA in which the agent skips aliased states and does backup from the first non-aliased state. We demonstrate effectiveness of DBLA in an example of a grid-world maze. The result shows that the influence of the wrong action-values is reduced very much with this method.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles