A Distributional Perspective on Reinforcement Learning
Understanding distributional RL
- distributional RL is where instead of treating everything as a value (reward, value of state, etc.) you treat it as a distribution.
- you learn the value distribution and then use that to update your parameters.
- specifically, instead of learning a function that approximates the q-value of a given state, action you learn a distribution of q-values for a given state,action
- so the q-update is no longer dropping in your next function approximator in place of the old one but rather projecting your older distribution onto the new one and minimizing the distance between them.
- the value distribution is modelled using a discrete distribution whose support is a set of atoms. the atoms are the canonical returns of the distribution.
- training details are very similar to that of DQN, except that instead of outputting the probability of an action, they output atom probabilities.
- the q-value is the expectation of the atoms.
- the update is done by
- computing the projection
- distributing the probability of an atom to it’s neighbors
- in practice, they use the mean of the distribution as the q-value and then decide the action to be taken based on that
- however, it results in improvements while still actually using the same function. not exactly clear why.