Re: behavior as mapping
- From: jalegris@xxxxxxxxxxxx
- Date: 1 Jun 2006 17:34:23 -0700
Curt Welch wrote:
"JGCASEY" <jgkjcasey@xxxxxxxxxxxx> wrote:
from thread,
foundations of intelligence
Curt Welch wrote:
What I'm getting at I think can be better understood by
seeing behavior as a mapping from a very high dimension
stimulus states to lower dimension behavior space. To
understand what I mean by that, lets take a very simple
example. Assume you were trying to build a simple digital
device with 32 binary inputs, and 8 binary output. Its
function at any time would be specified as a mapping
function from the 4 billion input states to the 256
possible output states. For each clock cycle, it's inputs
must have some value, and it must produce some set of
outputs.
Or, looking at it from another direction, each of the 256
possible output behaviors has a corresponding map of input
values that the behavior is associated with.
When this machine learns something new, it simply means
that the map has been changed. Values which once mapped
to one output behavior, now map to a different output
behavior.
So, if the output map for each behavior was the same size,
that would mean that each possible output behavior would
have 2^24 different input values that would produce that
output.
This would also mean that the machine could not "tell the
difference" between input states, that produced the same
output. So if input 0xA72BCF3D produced the same output
behavior as input 0xA72BCF00 this machine could not notice
that the input had made this change. If on the other hand,
that change in input state produced a change in output
behavior, the machine would "notice" the change.
In other words, a device like this creates it's
"understanding" of the state of the input, in terms
of the outputs it can produce. And it's ability to
discriminate differences, is defined by how the lines
are drawn to create the output maps.
So when I wrote this:
So I suspect it's not as much a process of reducing
the variety, but instead, of just moving the boundaries
on a fixed sized set of varieties.
I'm saying the machine has the ability to produce a
fixed number of different behaviors based on it's design,
and that when we learn, we don't change the number of
behaviors we can produce, we only change the map that
determines when we produce the different behaviors.
When we loose the ability to hear the difference between
L and R, it's probably because the maps were changed such
that L and R, which were once in separate behavior maps,
have ended up in the same behavior map. But if they ended
up in the same map, it means some other output behavior
map had to get smaller - which means it became more
discriminating. This is because the size of the input
space, and the size of the output space, is fixed by
hardware and doesn't change. Only the mapping changes.
To understand this from the simple digital example I gave
above, lets pretend the mapping is limited to sequential
ranges of binary input values. Meaning the 32 binary inputs
is seen as a 32 bit binary number with values from 0 to 4
billion. And the values from 0 to 16777215 map to the output
of 0, and the values of 16777216 to 33554431 map to the output
of 1, etc.
So for the first few output values, the map looks like this:
input output
0-16777215 0
16777216-33554431 1
33554431-50331647 2
50331648-67108863 3
With this mapping, the machine behaves the same for all the
inputs from 0 to 16 million. It can't discriminate between
in input of 10 and an input of 20. But though learning, we
can change the mapping to this:
input output
0-10 0
11-19 1
20-29 2
30-67108863 3
Now, the machine can react differently to an input of 10,20.
But to get it do that, it had to loose it's ability to tell
the difference between 16 million, and 50 million. This is
because 16 million and 50 million both look like a "3" to
this machine now.
The brain has a finite and relatively fixed number of neurons,
and each must have some set of environmental conditions that
will make it fire. And by "environment" to the neuron, I'm
talking about everything in the universe outside the single
neuron, so the rest of the brain is the environment to a
single neuron as well as the rest of the body and the rest
of the universe outside the body.
If there are two different states of the environment, that map
to the same behavior of our neurons, then that means we can't
sense that difference. In other words, if a rock rolls down a
hill on Mars, and that change in the environment has no effect
on the behavior (outputs) of our neurons, then we have no
awareness of the event. And likewise, inside the brain, if a
different sound (L vs R) comes into our ear, and generates
different signals in one part of our brain, but the rest of
the brain connecting that to our outputs, lumps the behaviors
into the same output map, it means we can't react differently
to those inputs. It means we can't discriminate the difference
in those inputs.
Because we have a fixed number of neurons that most likely, are
configured to always do something useful (react to a different
set of conditions than all other neurons), when we learn a new
behavior, all that is happening, is that our behavior maps are
being adjusted. So when we learn to recognize something new,
it always come at the price of loosing the ability to recognize
something old. The only way to learn a new behavior, is by
forgetting how to do some old behavior at the same time.
We don't tend to think about behavior in these terms, but it
seems to me that there's really no way else it could work.
We think for example about learning a new behavior like riding
a bike. And we don't think about it as loosing some other
behavior. But in fact, before we learned how to ride a bike,
we had a perfectly good set of "falling off the bike" behaviors
built into us. The act of learning how to ride a bike was just
as much about erasing all our old "falling off a bike" behaviors
as it was about building new "stay on the bike" behaviors.
To learn how to ride a bike, the brain is simply making
adjustments to the behavior maps of some set of our neurons.
It doesn't add new neurons, or remove old ones, in order for
us to develop this new bike riding skill (as far as I'm aware).
So the number of total behaviors we can produce are really
defined by the number of neurons we have (a neuron firing is
the basic behavior all our brain's behavior is built from).
So unless we are adding more neurons as we learn, we are not
adding new behaviors, we are just changing the shape of all
our old behaviors.
So, for the most part, whenever we learn something new, it
comes at the price of forgetting something old. We don't see
it like that because many times, the old behaviors we are
forgetting, were of little or no real value to us, and the
process of learning is always one of replacing an old
behavior of little value, with a new behavior of more value
(value always being defined in terms of the total rewards the
behaviors are predicted to produce).
Let's make it real simple, two binary inputs
and one binary output and we can list all
the possibilities.
AB | 16 possible output combinations
---+--------------------------------
00 | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
01 | 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
10 | 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
11 | 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
*
Now I understand you to be saying this, that
if the mapping is:
AB
(00) -> 0
(01) -> 1
(10) -> 1
(11) -> 0
Which is the combination in the 7th column, marked
with a * , means (00) and (11) would be the same and
(01) and (10) would be the same.
Right. That simple machine treats them as being the same.
You see, I would call this reducing the variety of
the input set from 4 to 2. Recognition is always
a many to one reduction such as all those pixel
values in an images being reduced to a set of
"object" outputs such as cat, dog, Curt.
Exactly. As is all behavior. recognition is behavior.
If we map the above two binary inputs to another
output we reduce them to two objects instead of
just one.
Thus we now can recognize an S object and a C
object both of which might exist in the same input
as in the case below both.
ABC S C
000 0 0
001 1 0
010 1 0
011 0 1
100 1 0
101 0 1
110 0 1
111 1 1
Right.
Such mapping of a whole input to a set of outputs
amounts to crude template matching and is not how
real animals or recognition machines work.
Right.
Real ones use large sets of complex temporal maps chained together in
complex feedback topologies instead of a simple single level binary spatial
map with no feedback paths. The temporal aspect is the most important one
left out of these simple examples. The devices respond to more than their
current inputs - the output maps are a function of past events as well.
But logically, it's the exact same problem - it's just an issue of
adjusting the maps that define the output behavior of each device. So, when
we learn a new behavior, the low level hardware is just changing it's
behavior maps. Since our brains don't seem to be adding more neurons to do
more pattern recognition as we learn, any learning has to be defined as a
change of the maps, and not the addition of new behaviors with new maps.
I think it is a mistake to say that an organism has just as many
behaviours after it is trained as it did beforehand. If we equate the
number of behaviours with the amount of information represented by them
then we can show this using Shannon's information entropy.
For example, suppose the output behaviour is poorly related to the
probability distribution of the input patterns, for example a random
mapping or a uniformly partitioned mapping as you suggested above, i.e:
input output
0-16777215 0
16777216-33554431 1
33554431-50331647 2
50331648-67108863 3
We would expect the map to change during training because the actual
probabilities of the inputs are not uniformly distributed. We would
also expect that, in the long term, the organism should extract as much
information as possible from the training. According to Shannon, the
largest amount of information available at the outputs corresponds to
equiprobable states, giving 2 bits for each of 4 possible outputs.
So, if each output state has p=0.25 of occurence after training, then
we can estimate the probability of each output state before training:
input output probability (approximately)
0-16777215 0 0.812
16777216-33554431 1 0.062
33554431-50331647 2 0.062
50331648-67108863 3 0.062
The actual amount of information available before training comes out to
roughly just 1 bit, using the formula H = - SUM {p(i) log [base 2]
p(i)}.
The amount of info available in the behaviour has doubled through
training.
--
Joe Legris
.
- Follow-Ups:
- Re: behavior as mapping
- From: Curt Welch
- Re: behavior as mapping
- From: JGCASEY
- Re: behavior as mapping
- From: jalegris
- Re: behavior as mapping
- References:
- Re: behavior as mapping
- From: Curt Welch
- Re: behavior as mapping
- Prev by Date: Re: behavior as mapping
- Next by Date: Re: behavior as mapping
- Previous by thread: Re: behavior as mapping
- Next by thread: Re: behavior as mapping
- Index(es):
Relevant Pages
|