During the summer I decided to focus on something completely new. Since Reinforcement Learning is something really interesting and quite difficult, I took it up to challenge myself. I started by going through various state of the art algorithms that gave me a decent base to start making some progress towards writing a thesis in the subject.
The summer got over and along came Fall and I now realized that directly going through these algorithms was not enough as I wasn’t ready to modify any of them. Strengthening the basics by reading the “Introduction to Reinforcement Learning” by Sutton and Barto along with our lab sessions on the topics helped me get through this phase. Dr. Parks suggestions to replicate the World Model in Pytorch and later Tensorflow 2 was also really helpful as it got me comfortable with writing large custom models, debugging them and playing around with multiple and processors and gpus to improve performance. Before this I don’t think I had ever been able to create any sort of working Recurrent Neural Network let alone ones which use Mixture Density Networks. I Also got to tinker with libraries like horovod and openmpi which are extremely impressive as they greatly reduce the complexity for someone trying to make use of all the processing power available.
My goal for the semester was to replace the last layer in the World Model paper. The idea was to remove CMA-ES and use Proximal Policy Optimization instead for the Controller Model. The reason for this is the CMA-ES is computationally very inefficient. It requires many many CPU cycles and burns through many combinations of parameters as it is a trial and error process. PPO is a state of the art model free algorithm and with additional data of the predicted next state as its input it should ideally be able to outperform the simple CMA-ES. The results however were not satisfactory, it did not outperform it which proved quite disappointing.
Although I didn’t receive the performance that I expected, I already have a couple of ideas on why it could have happened. All in all the semester was quite a success. I go into my final semester with quite some more exploring to do to achieve my goal.
References

