Abstract
In this paper, an Adaptive Forgettable Profit Sharing reinforcement learning method is introduced. This method enables agents to adapt the environmental changes very quickly. It can be used to learn the robust and effective actions in the uncertain environments which have the non-markov property, especially the partial observable markov process (POMDP). Profit Sharing learns rational policy that is easy to be learned and results in good behavior in POMDP. However, the policy becomes worse in the dynamic and huge environment that changes frequently and require the lots of actions to achieve the goal. In order to handle such kind of environment, the forgetting, which gives the adaptability and rationality to Profit Sharing, is implemented. This method allows the agent to forget past experiences that reduce the rationality of its policy. The usefulness of the proposed algorithm is demonstrated through the numerical examples.