An Alternative Approach to Anticipative Reinforcement Learning

Alastair Hewitt

Harvard University, Extension School

Abstract

Reinforcement learning avoids the need for a supervisor, but still requires a reward system to direct the agent to a predefined goal. An alternative approach is presented using the following three components: a selection strategy used to select from a finite set of actions, a model consisting of a set of Bernoulli trials, and a prediction strategy used to record cause and effect correlations. The selection strategy searches for the best action based on the past behavior of the system via the prediction strategy. The reward mechanism is replaced by assigning each state transition a desirability representing the frequency to which that transition should occur. The selection strategy then attempts to reach an equilibrium where the desirability of relevant transitions equals their observed probability. This can be achieved by simply avoiding two scenarios—high probabilities of undesirable transitions and low probabilities of desirable transitions. Even though the goal is to reach an equilibrium, the exact location of the equilibrium is not clearly defined. An example of this idea is presented in the form of a 2D cellular automaton and various results are demonstrated including computational irreducibility.