# Notes

## Section 12: Human Thinking

Neural network models

The basic rule used in essentially all neural network models is extremely simple. Each neuron is assumed to have a value between -1 and 1 corresponding roughly to a firing rate. Then given a list s[i] of the values of one set of neurons, one finds the values of another set using s[i + 1] = u[w . s[i]], where in early models u = Sign was usually chosen, and now u = Tanh is more common, and w is a rectangular matrix which gives weights—normally assumed to be continuous numbers, often between -1 and +1—for the synaptic connections between the neurons in each set. In the simplest case, studied especially in the context of perceptrons in the 1960s, one has only two sets of neurons: an input layer and an output layer. But with suitable weights one can reproduce many functions. For example, with three inputs and one output, w = {{-1, +1, -1}} yields essentially the rule for the rule 178 elementary cellular automaton. But out of the 22n possible Boolean functions of n inputs, only 14 (out of 16) can be obtained for n = 2, 104 (out of 256) for n = 3, 1882 for n = 4, and 94304 for n = 5. (The VC dimension is n + 1 for such systems.) The key idea that became popular in the early 1980s was to consider neural networks with an additional layer of "hidden units". By introducing enough hidden units it is then possible—just as in the formulas discussed on page 616—to reproduce essentially any function. Suitable weights (which are typically far from unique) are in practice usually found by gradient descent methods based either on minimization of deviations from desired outputs given particular inputs (supervised learning) or on maximization of some discrimination or other criterion (unsupervised learning).

Particularly in early investigations of neural networks, it was common to consider systems more like very simple cellular automata, in which the s[i] corresponded not to states of successive layers of neurons, but rather to states of the same set of neurons at successive times. For most choices of weights, such a system exhibits typical class 3 behavior and never settles down to give an obvious definite output. But in special circumstances probably not of great biological relevance it can yield class 2 behavior. An example studied by John Hopfield in 1981 is a symmetric matrix w with neuron values being updated sequentially in a random order rather than in parallel.

From Stephen Wolfram: A New Kind of Science [citation]