Slow processes of neurons enable a biologically plausible approximation to policy gradient

Anand Subramoney, Franz Scherr, Guillaume Bellec, Elias Hajek, Darjan Salaj, Robert Legenstein, Wolfgang Maass

July, 2020

Abstract

Recurrent neural networks underlie the astounding information processing capabilities of the brain, and play a key role in many state-of-the-art algorithms in deep reinforcement learning. But it has remained an open question how such networks could learn from rewards in a biologically plausible manner, with synaptic plasticity that is both local and online. We describe such an algorithm that approximates actor-critic policy gradient in recurrent neural networks. Building on an approximation of backpropagation through time (BPTT): e-prop, and using the equivalence between forward and backward view in reinforcement learning (RL), we formulate a novel learning rule for RL that is both online and local, called reward-based e-prop. This learning rule uses neuroscience inspired slow processes and top-down signals, while still being rigorously derived as an approximation to actor-critic policy gradient. To empirically evaluate this algorithm, we consider a delayed reaching task, where an arm is controlled using a recurrent network of spiking neurons. In this task, we show that reward-based e-prop performs as well as an agent trained with actor-critic policy gradient with biologically implausible BPTT.

Type

Preprint

Publication

Slow processes of neurons enable a biologically plausible approximation to policy gradient

NeurIPS 2019 Workshop

Slow processes of neurons enable a biologically plausible approximation to policy gradient

Abstract

Franz Scherr

Research Scientist