-
Training and Learning in the Perceptron

There is a further rather dramatic parallel with the brain in the way in
which this functioning is achieved. In a conventional computer, this kind of pattern
recognition could be achieved perhaps by having a very long and complicated set of rules
which are preprogrammed into the machine. The use of these rules together with rigorous
logic would then allow it to make decisions, which might, to use our previous example,
allow it to determine whether or not there was a triangle in the input picture. Remember,
it must be able to recognize the triangle independently of whatever else is on the
screen.
For simple problems, the traditional computer obeying fixed rules can accomplish this
task, but as the complexity of the decision making increases the process becomes too
lengthy.
In contrast, the perceptron network can easily cope with this increased complexity. The
method of ensuring it captures the right images is not controlled by a set of rules but by
a learning process. In many respects this learning process is rather similar to the
way the brain learns to distinguish certain patterns from others.
This learning process proceeds by way of presenting the network with a training set
composed of input patterns together with the required response pattern. By present we mean
that a certain pattern is fed into the input layer of the network. The net will then
produce some firing activity on its output layer which can be compared with a `target'
output. By comparing the output of the network with the target output for that pattern we
can measure the error the network is making. This error can then be used to alter the
connection strengths between layers in order that the network's response to the same input
pattern will be better the next time around.
We start off with a network which gives a random output for a given input. We then train
the network by presenting it with successive patterns drawn from an example set which is
typical of the problem we want the network to work on. For each of these patterns, we look
at the output pattern the network gives us and compare it with the output we would ideally
like.
For example, suppose we want to train the network to learn about triangles in the
input. We show it a picture which contains a triangle. The output (i.e. the "on"
lights on the second screen) will in general be nothing like a triangle. However we can
measure how far off the output is from the desired output. This error may be reduced by a
judicious alteration of the connection strengths (hidden in our "black box").
The precise way this is achieved is beyond the scope of these lectures. It is sufficient
to know that a well-defined mathematical procedure can be applied which changes the
connections between layers in such a way that the error will always decrease. (For those
interested the technical name for this procedure is called back-propagation). This
process is somewhat similar to the way a memory is recalled in the Hopfield network. We
can imagine a ball rolling on surface. As the training proceeds the ball moves downhill
until eventually it reaches a well or low point at which it stops. In this case the set of
network connections is represented by the ball and the height of the surface the network
error on an output pattern.
Thus, to train the network on spotting triangles, we repeatedly show the network a
picture containing a triangle, measure how "wrong" the response of the network
is and then change the connection strengths accordingly. Eventually, we will have made the
network's response to a triangle picture as close as we wish to the "ideal"
response. The perceptron network now knows about triangles! We can then continue this
process adding more types of shapes for the network to recognize and building in whatever
responses we care to.
We give an example of this in another neural network simulation. You can access this by
clicking on Perceptron Pattern Demonstration.
The goal of this simulation is to teach the network to recognize various geometrical
shapes.
Other applications of the Perceptron
The range of tasks that the perceptron can handle is much larger than just decisions
concerning simple shapes and pattern recognition. For example, one could train the network
to form the past tense of English verbs, read English text and handwriting, and a whole
variety of other problems.
For example, neural networks have been used to predict financial markets and make
medical diagnosis. All that is required to use networks in this way is a "code"
which allows us to write problems in one field in terms of pattern classification
problems.
For example, NETTalk is a perceptron network which is capable of transforming a
written English text into its individual sound types (phonetworkic representations) and
then pronouncing it using a voice synthesizer. In essence, this works by associating a
given pattern of node activity (flashing of lights on the input screen of our simple
model) with a given English word, for example "fish". However, on the output
screen a given node activity is now tied to a given type of sound in English. Training the
network means then ensuring that whenever the text "fish" is presented to the
network input (in the form of a pattern of lights), the output will be a pattern of lights
which codes for the sound of "fish" which is finally produced by the voice
synthesizer. So, in broad terms, we are using the network to associate a given word
(written in English text) with a particular sound (English pronunciation of the word).
NETTalk has around 300 neurons (nodes) (80 in the hidden layer) and 20,000 individual
connections. The network was trained with isolated words and with continuous text. After
twelve hours of learning, a 95% success rate was found on the learning data. With a new
text about 80% success rate was achieved. In each case the errors the network made were
quite close to the correct pronunciation. Indeed, many of the errors made by NETTalk were
strikingly similar to those made by young children.
Towards thinking - networks can generalize
We shall show you two further examples of neural networks which expose some of the
power of these systems. The first illustrates that the presence of a hidden layer is a
crucial feature which allows the network to make generalizations from the training
data. The network can learn about common features in the input patterns such that when new
patterns are presented which possess some of these common features the network can `sense'
these features and produce sensible and useful output. An example will clarify this
somewhat.
Consider a picture of an ellipse. This can be specified as a black and white image on
the input layer by representing black portions of the image by firing nodes and white
space by `off' nodes. However, the ellipse can also be represented in short hand by
giving the position of its center, its height and width and its angle of tilt. This
compact description of the ellipse may be stored on the hidden layer using far fewer nodes
than is required by the image of the ellipse on the input layer. For example, a particular
hidden layer node being `on' can represent an ellipse with zero angle of tilt, another
being `on' might mean an angle of tilt of ninetworky degrees etc.
We can train the network on a host of examples of ellipses and by adjusting the
connections to the output layer ensure that it produces a letter E on the output layer for
all ellipses in the training set. Then if we present it with a new picture of an ellipse -
not one it has ever encountered before it should be able to generalize from its
earlier examples and produce the correct `E' response.
So, for input patterns which are "ordered" in some sense, the network can
learn how to recognize this order and act upon it. The ability of the frog's brain to
extract the most important features of what it receives from its visual senses, and to
produce appropriate responses is based on this.
The Perceptron as an encoder network
To illustrate this in a simple example, consider a very simple neural network
consisting of four input and four output nodes with a hidden layer containing just two
nodes. We use four input patterns which are just the four patterns gotten by setting three
of the nodes be "off" and the other "on". The task of the network is
just to produce the same output as input ! We shall call this network an encoder net since
the goal will be to store four input patterns each needing four input nodes with just the
two hidden layer nodes. The four patterns will be `encoded' with just two hidden layer
nodes.
This task is at first sight somewhat tricky. There are four input patterns each
composed of the firing activities of four nodes which must be reproduced on the output
layer. But the activities on the output layer are determined by the activities of the
hidden layer nodes of which there are only two! How can this be done? For any general set
of four patterns it cannot but notice that there is a certain structure to the input
patterns - there is only ever one `firing' node. So to be able to recognize any one of
these input patterns and reproduce it on the output layer, the network must just be able
to distinguish which is the `on' node in the input. There are just four possible nodes to
be `on' so the hidden layer must be able to produce at least four distinct firing
patterns to be able to distinguish between these four nodes. Of course, with two nodes
there are precisely four different patterns of activity : (on,on), (on,off), (off,on) and
(off,off).
These are precisely the patterns of activity that are seen in the simulation as we
cycle through all the input patterns! What we have spent a whole paragraph trying to
explain the network has discovered for itself without any careful programing! Furthermore,
it has discovered these rather complex relationships by a simple `learning' process in
which it is simply shown the patterns and desired responses a number of times. Between
successive `viewings' it has adjusted its connections to try to improve its responses to
the input patterns - to put them closer to the desired output patterns. It is quite
remarkable that in doing so it has inadvertedly done what we might describe as some quite
complicated thinking !
To try out this simulation click here on Perceptron
Encoder Demonstration.
To recap, the network must learn about the special features in the input data (in this
example that there is only ever one `on' node in the input) which allow it to be
represented on a hidden layer which has fewer nodes. Notice that because the network
adjusts its connections to learn about these features of the input data, we can then
expect it to generalize - if we had trained it on only three of the patterns it would have
produced the correct response for the fourth anyhow. Notice also that if we had only
allowed it one hidden node it could never have been able to classify four patterns - the
moral being that as we increase the number of patterns to be classified we must increase
the size of the hidden layer.
For enthusiasts only: actually if we think of the input patterns as representing
numbers we see that the network has learned about their binary representation!
The Perceptron learns to add integers!
As our final example consider training a network to do simple integer arithmetic. In
order that we treat this problem using neural networks, we need to translate it into a
pattern classification problem. In other words, we ascribe some unique pattern of
activities of the input nodes to a given pair of numbers to be added. The patterns are a
"code" for the original numbers. We then require that the networks response be
another pattern which under "decoding" yields the correct sum. In this way, we
can train the network to learn that the sum of, say, 2 plus 3 is 5. In fact, in the
example we give, we train the network on a sample of some sixty four sums of integers in
the range zero through seven. The network we will use will have 6 input nodes, 4 output
nodes and 15 nodes in the hidden layer.
As input, we present the two integers and the output will correspond to their sum. As
you will see at first when we start to train the network it will not be able to respond
with any number at all when we ask it for any sum in the training set - its output
doesn't "look" like a number at all! However, after a while some of the sums
will start to produce a valid number (which sometimes will be wrong!). After even longer,
the network will begin to respond correctly to essentially all the sums in the training
set. It is also possible to train the network on just a subset of all the possible sums of
integers between zero and seven, say 60 rather than the full 64 sums. Then the network
will never have seen certain sums during its training but the fascinating thing is that it
is still able to produce the correct responses!
Again the network has been able to generalize (by forming some internal representation
of the data) and supply a correct response to a new question - not one it had ever been
asked during the training process. This decision making ability is built into the network
in a highly non-trivial way and is perhaps illustrative of the way the brains neural
networks are able to structure themselves to learn by example and to extract
generalizations from experience.
To access the simulation read the Perception
Arithmetic Demonstration.
The perceptron model is impressive in its versatility, power and flexibility. It is
already commercially very successful and this surely will only increase with time as it
finds application in wider and more diverse areas. The method of programming is very
different from conventional computers and much closer to the learning process used by the
human brain. This is encouraging but it must be emphasised that the learning process is
strictly `supervised' - a teacher must train the network what is important and must
manually change the network connections to achieve this goal. In nature, much of the brain
must learn its own learning procedures - it must be self-organizing.
This is what we shall turn to next.
Biology of the Brain - Artificial
Intelligence - Glossary - Neural
Net Key Points
|