Kimang Khun

Learning Algorithms for Markovian bandits: Is posterior sampling more scalable than optimism?

In this video, I introduce my work with my supervisors about using Posterior Sampling Reinforcement Learning and Upper Confidence Reinforcement Learning algorithms in Markovian bandit problem. You can find our paper here: https://openreview.net/pdf?id=Sh3RF9JowK