Re: The Brain
- From: curt@xxxxxxxx (Curt Welch)
- Date: 23 Mar 2008 22:28:41 GMT
casey <jgkjcasey@xxxxxxxxxxxx> wrote:
On Mar 17, 11:24=A0am, c...@xxxxxxxx (Curt Welch) wrote:
selected responses,
If you use a teacher that knows the correct answer -
aka the correct behavior to produce, then the learning
system can't learn anything other than what the teacher
already knows. It's limited in it's ability to become
only as smart as the teacher is. It has no creativity.
If you can build a machine that can only become as
smart as the teacher you have done well !
That's true. If we could make a machine learn to be as smart as a human
teacher, we would have done something no one has done yet. However,
without the ability to solve things on its own without being taught by
others, I think the machine will be seen to be as stupid as a mirror.
Most people are not all that creative, indeed without
the social environment giving us supervisory learning
and our so called mirror neurons we would most likely
all be morons.
Well, I don't think that's true at all. Though we single out a few
individuals in society for their extra special levels of creativity, I
think it's very wrong to believe most people aren't creative. In fact, I
think we are all highly creative. I think everything we learn to do is
ultimately an act of individual creativity even if we did get hints and
guidance from watching someone else.
When we learn to tie our shoes, no one is showing us how to move our
fingers and hands to get the job done. We figure out all the specific
motions for ourselves. We are only using a very high level mimic behavior
where we attempt to watch what our hands are doing and mimic what we saw
someone else do. In the end, we each tend to end up with our own specific
behaviors for how we get the job done.
When we learn to walk, it's no because someone else taught us (for the most
part). When we learn to ride a bike, it's not because someone else told us
how to move the handle bars and pedals to keep from falling over - it's
because we figured it out by trial and error on our own. It is just
another example of the innate creativity humans have.
If you build a machine that attempts to learn to ride a bike by copying the
behavior it sees someone else do, but has no innate "learn by results"
ability, the odds of it being able to learn to ride is near zero. I doubt
it's even possible pick up enough information about the subtle timing and
reactions required to ride a bike just by watching someone else do it.
Humans use their innate creativity to learn all these things.
It just so happens that when we create something new which everyone else
has already created before us (like walking), we don't call it "creative".
We tend to use the word creative for the acts of creation which stand out
as being unique in society. But since I believe we actually create all our
behavior from scratch, I believe it should all be seen as a act of
creativity.
What appears to be creative or original behaviors may
be entirely due to the environment. An ant may appear
to make many complex "intelligent" moves to navigate
across the landscape and yet internally be controlled
by a very simple algorithm. The same may be true for
us only our landscape is social.
Well, learning machines produce behaviors which are a result of the
interaction between the environment and the machine. It's nature (the
learning machine) plus nurture (the environment) which is the complete
cause of the resulting behavior. It's never just one or the other.
However, a trivial environment (combined with a trivial critic) will lead
to trivially simple behavior no matter how capable the machine is of
learning complex behavior. If all it has to do, and all it can do to get a
reward, is push a button, then it's behavior will be nothing more than a
life time of pushing a button over and over again. You can't make a
learning machine learn human adult levels of behavior if you don't put it
into the correct nurturing environment.
Reinforcement learning however has no such limit.
It has the power to be creative. It has the power to
learn behaviors that the judge (aka critic in RL terms)
has no knowledge of. It has the power to create new
knowledge.
You are, I assume, assigning "creativity" to a random
moves that turn out to be useful?
Yes, except the moves are never random.
I assume you are thinking in terms of how evolution
created all the different life forms from natural
selection via random changes to dna sequences?
No, I was just thinking of RL machines in general. I do happen to believe
that the process of evolution is yet another example of an RL machine, and
I do believe the creation of complex live forms is yet another example of
the creativity of RL machines, but I was not thinking of that at the time I
wrote the above. All RL machines are creative within their domain of
behavior.
But back to the issue of random moves.
RL machines constantly improve their behavior. They never make random
moves. Their moves are always a best guess answer to what it should do at
the time. The more experience the machine accumulates, the better the
guesses become. There's nothing random about it.
Each new child born is not a random act. If it were, humans might give
birth to cows, or rocks. That would be a random act. But they don't.
They give birth to a life form that is the system's best guess at what will
be good at surviving based on past experience.
Searching for better behavior generation systems which can
be reinforced is the key.
Any behavior *can* be reinforced. Searching for a behavior
generator that can produce behaviors worth reinforcing I
would suggest is the main problem.
Yes. That's the key. A learning system that never produces a behavior of
riding a bike without falling over is never going to learn to ride a bike
no matter how much practice it gets.
The trick is in the system's power to abstract good behaviors from past
experience. It has to combine learned knowledge from multiple past
behaviors in order to produce a constant stream of better behaviors.
The better the system is at abstracting similarities between situations,
and using the learned knowledge from those similar past experiences to
select a "good" behavior for the current situation, the faster the system
will learn.
The whole issue is tied to the idea that when the agent is rewarded for a
behavior, it must learn to apply that reward to other future situations
which are similar. But how does it measure this concept of "similar"? How
should it determine that one set of stimulus signals is similar to another
set of stimulus signals?
If we see a dog, and we learn that a type of behavior in response to the
dog is good, how does the brain know when we see the dog that we should use
the same behavior? It can only do this if it is able to classify the
sensory data as being "similar". But if we are seeing the dog from a
completely different perspective this time, how does the brain know that
the current visual data of the dog is similar to visual data the last time
we saw the dog? Or maybe even a better example, if we see a different dog,
(a brown one instead of a black one), how does the learning brain know that
this is a situation similar to the situation of the black dog?
It can only make that sort of judgment of being similar correctly if the
brain has on it's own, without reinforcement training, learned to abstract
out the features of a dog, and notice that many of the features of this
situation are similar to the situations of the last dog, even if the color
of the hair on the dog is different.
Building a system which can correctly abstract out common features in
sensory data like this so that learning can be applied correctly to
"similar" situations is key, and it's a key which I don't think anyone has
solved. And without solving this key part of the problem, the actual act
of reenforcement isn't very interesting because the system won't correctly
apply it's learning to other "similar" situations.
This is the same problem as we have talked about in the past of how the
brain is able to identify the stimulus to apply the reinforcement to. It's
easy for us to see that the rat is learning to respond to a stimulus of a
light flashing, but how is the rat's brain transforming the raw stimulus
data into some internal representation that allows it to "see" the light in
the first palace? This is the question that I don't think anyone has
really answered. A strong RL system has to first transform the sensory
data from the environment into "objects" like "flashing light" so that all
learning can be applied to these sorts of environmental features instead of
applying it directly to features of the raw data.
It can be argued that this transformation ability is unique for each type
of data and that evolution has built different modules for transforming
each type of data for us to make it easy to learn from. However, I think
this transformation is far more automatic and generic than that.
None the less, until you get the transformation happening correctly, the
learning has no hope of producing behaviors worth reinforcing. And I am
very sure this is exactly the problem with all current RL systems. They
aren't correctly transforming the raw sensory data and as a result, they
don't tend to produce behaviors worth reinforcing.
If you look at evolution the components of biological systems
were built up in working stages and I would suggest learning
also takes place in *working* stages.
Well, learning is a process of constant improvement. If you have a good
learning system, it's not just adding one new behavior on top of another to
a list of "things learned". Instead, it's transforming its behavior set
from some starting condition, to one which is more optimal for the
environment it's in. This transformation never stops. With each learning
experience, the entire behavior set of the agent is slowly transformed into
one which is a better fit for the environment.
At the same time, as the behavior transforms, the environment transforms
with it. Before the agent learned to ride a bike, it had to walk
everywhere. It's environment was one limited to what it could reach by
walking. But once it learned to ride the bike, it's environment changed.
So now it has to re-optimize it's behavior set to match the new
environment. Each behavior we learn, tends to work like this at least in a
small way. The learning never stops, the optimizing of the behavior set to
the environment never stops, and each new learning experience tends to be
built on top of the last. sou can't learn to ride the bike to the store to
get some food until you first learn to ride a bike.
If you look at the very basic way learning algorithms work, then tend to
accumulate statistical knowledge in parameters like a weight. The current
value of each weight is a result of all the accumulated knowledge from the
past learning events. Each new learning event changes the value of a
weight by a small amount - but this new learning is added on top of the old
learning, it's added to the old value. It's always an act of accumulating
knowledge, of building new on top of old.
If the robot manages to get power applied to it's charging cord
(because it managed to get the cord plugged into the outlet),
then it's rewarded for the result, not because it moved in the
ways the trainer was trying to get it to move.
But didn't the builder want it to move in a way that resulted
in the cord getting plugged in? That was the feedback signal.
The robot isn't going to get "creative" and plug it in your ear?
The builder created a critic circuit which rewarded it for applying power
to the charger. If the way to do that is to plug it into the outlet, then
the agent will learn to do that. But if you put this same agent in a
different environment, such as one in which the only way to get power is to
plug the cord into an ear, the agent will learn to do that instead. The
builder didn't have to know before hand whether the power came from the
ear, or the wall outlet. This is the whole point of reinforcement
learning. The builder specifies the result needed (apply power to the
charger), and the agent then searches for the behaviors to achieve this
result. The builder doesn't have to specify the behavior, only the desired
result.
The outcome is rather predictable I would have thought and
builder determined. It is also dependent on the robot being
able to generate behaviors that produce this result. You can't
reinforce what doesn't or can't happen. This is a requirement I
think is overlooked when people say it is "just" reinforcement.
I've never overlooked that fact. As I wrote above, producing useful
behaviors is the entire key to building strong learning systems and it's
virtually the only thing I've worked on for the past 20 years. If you fail
to understand that this is a key part of "just reinforcement", then it's
your lack of understanding of the nature of the RL problem that is limiting
your understanding of what I'm talking about.
All I've worked on in AI for the past 25 some odd years, is searching for
different behavior generation systems in an attempt to find ones which can
produce high quality behaviors so they can be reinforced.
The outcome is not predictable by looking at the agent alone, or by looking
at the environment alone. The outcome however is very predictable for a
given agent in a given environment. There is normally only a small set of
optimal behaviors for a given agent in a given environment. In one
environment the agent learns the skill of plugging the outlet into an
outlet, in the other environment, the agent learns the skill of plugging it
into an ear.
Why do you think that human behavior is so predictable? We all lean to
walk, and talk at about the same age. This is because if you put similar
learning agents into similar environments, they do tend to learn all the
same sorts of behaviors in the same amount of time. The more you change
the environment, the more you change their behaviors. But the resulting
behavior is always a mix of the agent, and the environment.
But at the same time, humans latch on to very different behavior sets
because our environment is complex enough to allow very different behaviors
to solve the same problems. We are all just trying to keep ourselves fed,
and safe, and some of us learned to do this by playing golf, others learned
to do it by programming computers, others learned to it by trading stocks,
other learned to do it by selling drugs.
Learning systems that strengthen connections are not always
reinforcement learning systems. Only if the judge controlling
the learning is triggered by the state of the environment, and
not simply by the behavior produced, can you justify calling
it reinforcement learning.
Well the "judge" in a biological system is not in the environment.
It is if you say it is. It's just a matter of convention. In RL research,
the standard is to consider the critic part of the environment. It's just
an arbitrary drawing of a border line we are talking about here.
We can manipulate the behavior of an animal by making use of
our knowledge of "the judge", the reward system of the animal.
There is no "reward" button. It is the animal that decides if
the current input is a reward.
Or more accurately, it's the critic _hardware_ IN the animal which makes
that decision.
If the animal makes the judgment
that something is not rewarding then it is not rewarding.
This happens at two levels in strong RL systems. The first level is the
critic hardware which is all the parts of the system which determine which
sensory conditions act as punishment events (what will cause us direct
pain).
The second level is that strong learning systems must have the power to
predict future rewards. Any event which causes the prediction of future
rewards to increase will also act as a reward even though the low level
critic hardware did not generate a current reward. It might even have
generated a punishment, but if the net result of the prediction plus the
current punishment is a net reward, the result is rewarded.
This need to predict future rewards is called secondary reinforcement.
This prediction system fits better with your statement of "if the animal
makes the judgment it's not rewarding".
I doubt your character recognition program did any type of
learning like that.
There are many types of learning the character recognition
program couldn't do including making use of context. But it
was an illustration of learning requiring "reinforcement"
not an example of a general purpose learning machine.
Yes, but a program which uses some form of "reinforcement" is not an
example of RL unless it is RL. And by now, you should be well aware that
when I say something like "just reinforcement", I mean the full complexity
of reinforcement learning complete with a strong behavior generator and
secondary reinforcement and not just any type of reinforcement.
If it has no ability to predict that a complex sequence of
behaviors will increase the odds of a future reward, it
will not be able to learn that complex sequence of behaviors.
But does it really learn that a particular complex sequence
of behaviors will work? Isn't it more a case of learning
what strategies and rules will work? It needs to be able to
generate moves without knowing the actual sequence that will
follow for each game may be completely unique.
Yeah, it's not learning an actual sequence. It's learning the value of each
of the steps in the sequence and when to use them. A given set of learned
steps is likely to produce the same sequence over and over again, but it's
not the sequence which is actually being learned, it's the micro behaviors
which tend to produce the sequence which need to be learned.
However, the system still needs the power to predict the future. It needs
the power to predict that taking this step now will get it closer to a
future reward, or else it will never learn to the value of taking the first
step and the odds of taking the 10 steps required to get a reward are very
unlikely to happen.
If an environment needs the agent to take 10 steps to get a reward, it can
take a long time to luck into getting that first reward. But once it's
gotten the reward, the agent has to be able to use this experience to help
guide it's steps in the future. Not only must the last step before the
reward be seen as a good thing, but there must be something happening where
the agent is learning the value of the 9 steps before it as being better
than other options. This is the credit assignment problem. How does the
the system know how much of the credit of the reward to assign to the many
steps that happened before the reward? It must do it in some way that
allows the system to produce better predictions in the future about which
steps to take, because if you assign the credit correctly, the level of
credit each behavior is given will be a prediction of the expected future
rewards for using the micro-behavior in the future.
If this is not done correctly, then it will never learn to produce the 10
step sequence correctly. It will get the reward simply out of luck each
time and it might take the agent 1000 steps on average to get the reward
instead of the optimal behavior of 10 steps. An agent which is able to
predict the future correctly by the correct assignment of rewards will be
able to get the reward in fewer and fewer steps each time until it
converges on the optimal behavior of 10 steps.
There are many different RL algorithms that are already well
known, but none of them show much hope in duplicating the
power of the human brain. This however doesn't mean that RL
is the wrong approach, it only shows that none of these RL
algorithms are able to duplicate the power of the brain.
There are many different GOFAI algorithms that are already well
known, but none of them show much hope in duplicating the
power of the human brain. This however doesn't mean that GOFAI
is the wrong approach, it only shows that none of these GOFAI
algorithms are able to duplicate the power of the brain.
:)
Even so, I do favor the bottom up approach to ultimately
understanding the human brain.
:)
--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.
- Follow-Ups:
- Re: The Brain
- From: casey
- Re: The Brain
- References:
- The Brain
- From: leskaPaul
- Re: The Brain
- From: Curt Welch
- Re: The Brain
- From: casey
- Re: The Brain
- From: Curt Welch
- Re: The Brain
- From: casey
- The Brain
- Prev by Date: Re: Challenge to Curt
- Next by Date: Re: The Brain
- Previous by thread: Re: The Brain
- Next by thread: Re: The Brain
- Index(es):