Abstract
In this paper, we investigate a context-aware proactive caching problem in a heterogeneous network consisting of a single macro-cell base station (MBS) with grid power supply and multiple small-cells with energy harvesting, aiming to maximize the service ratio at the small-cell base stations (SBSs) by designing an effective context-aware proactive caching policy. We first formulate this problem as a Markov Decision Process (MDP) framework. Then, to address the incomplete stochastic information about the system dynamics and the “curse of dimensionality” issue of the formulated MDP, we propose a Post-Decision State based Approximate Reinforcement Learning (PDS-ARL) algorithm, which learns on-the-fly the optimal proactive caching policy with a high learning efficiency. The simulation results validate the efficacy of our algorithm by comparing it with baselines in terms of both the learning rate and the service ratio performance at the SBSs.