Abstract
Reinforcement learning (RL) is machine learning (ML) paradigm that has produced systems capable of performing at or above a professional-human level. This research explored the ability of RL to train AI agents to achieve best possible offensive behavior in small tactical engagements resembling a simple 1D military simulation. Battlefield environment is complex domain, therefore, planning and building combat simulation is challenging. Therefore, it has been an ongoing interest in offline learning approaches. Previously, we found that the performance of offline model-based and model-free methods to be profoundly effective in Cartpole environment. In this study, we implemented model-based and model-free offline RL in incremental approach in 1-D, aggregate-level military constructive simulation. We performed extensive experiments across several RL methods to find a good policy from previously collected dataset. The consistent improvements of our approach were measured in terms of both state dynamics prediction and eventual reward. Future work will seek to validate RL performance in larger and more complex combat scenarios. We have discussed the potential of offline RL to enable new approaches for developing complex defense planning.