Re: motor emulation, modularity, feedback, prediction, and dreams



"JGCASEY" <jgkjcasey@xxxxxxxxxxxx> wrote:
> > Curt Welch wrote:
> > ...
>
> > > The main point being, is that the model ends up being
> > > structured in ways that don't look anything like the
> > > world models we like to talk about. And the big reason
> > > for that, is that the best way to model the world, is
> > > to not model the world, but to instead, model how the
> > > agent needs to react to the world. It's harder for
> > > us to think in those terms, but not at all hard, for a
> > > learning system, to be set up to learn everything in
> > > those terms.
>
> feedbackdroids wrote:
>
> > Regards what Curt has said, it basically agrees with
> > the premise of the book, BUT Curt's wording is still
> > on the behaviorist extreme, I think.
> >
> > To give an example, Clark talks about babies learning
> > to reach for objects, and also to walk. Yes, they do
> > learn how to do this, but specifically in the context
> > of learning how to operate an "innate", or preconstructed,
> > system or infrastructure.
>
> As far as I know Curt accepts the existence of innate
> structures etc but believes his system can evolve all
> that so in his system it isn't there to begin with to
> learn how to operate.

Well, I've just simplified the learning problem to what I think is the bare
essence so I can solve that one key problem first. In the end, the more
structure you add to minize the complextiy of the problem to be solved, the
easier it makes it for the learning hardware. But without strong learning
hardware, you don't haev AI no matter how much other structure you add.

For example, a lot of robot designs will use servo controllers which allow
the central controller to send a position or motion request, and the servo
controller solves the complex dynamics problems of the motor/arm/power
problem to get the arm to the correct position. The same types of
techniques can be used on my learning network to simplify the cmoplexity of
what it has to learn.

However, with strong learning, you should be able to use it to solve all
the control prolems - but it might require separate learning systems
optimized to the need of each problem, and it might simply add years to the
time the learning systems all master 5 problems instead of just 1.

> All his system is meant to do is produce "behaviors"
> by transforming the current input pulses into an output
> of pulses. When the system produces an i/o desired by
> a critic the reinforcement button is pressed to increase
> the likelihood of that happening again.
>
> In practice the critic will die an old man before much
> happens as I don't see his system ever producing much
> to select from.

I'm confused. Do you not understand that the entire point of RL is that it
doesn't need a human (intelligent) critic? There isn't anyone pushing
buttons. The critic is special (simple) hardware, which generates the
reward signal to define the purpose/goal of the learning problem.

And one of the huge speed-ups in the RL paradigm is the reward prediction
technlogy which trains all behavior constantly even when there are no
signals from the critic. And that's the system I'm struggling with for my
type of pulse sorting network.

> As regards to what he did say above:
>
> "model how the agent needs to react to the world"
>
> It needs spelling out as to what that really means
> if you are going to build it.

Yeah, will that's what my network is. It's a spelled out answer to that
question that seems to be ideal for the RL paradigm. But I've not
experimented with larger networks enough to really understand the full
nature of this technique and it's limits.

However, the "react to the world" just means that it must produce some
decision on what to do next, based on every new peice of sensory input data
that shows up. If one pixel in the eye senses that the light is getting
slight brighter, how should the system as a whole react to that? the point
is, any AI system must be able to react to any and all bits of data flowing
into it. The reaction might be to just ignore the data and pretend it
enver happaned, but most liekly, the reaction will at a mimimal be to
remeber the data for some period of time, and allow the memory of that data
to effect future reactions.

> To start with what are the "needs" of the agent?
>
> His system has no "needs" only the critic has
> "needs" which is the only place I can see any
> intelligence at work in the whole scheme.

After I'll I've written about RL you still don't even grasp the basics of
what RL is do you?

The critic is the defintion of the needs. And the critic is not
intelligent. It's dumb. It's just a simple statement of the goal which
the learning machine is trying to solve implemented as a little black box
who's purpose is to monitor some small state of the universe and generate
reward signals based on the state of the environment.

If you want the goal of the machine to be making money, you can build
critic hardware which generates rewards for every penny that shows up in
the robots bank account. If you want to goal to be driving a car across
the desert, you can build a crtic that generates rewards based on how close
the car is to the finish line. The closer it gets, the faster it generates
rewards.

The primary need of the system is by defintion hard coded to be: "maximise
all future rewards from your critic hardware". All the secondary needs
however are learned. This is the reward prediction system that's key to
all RL algorthms. It's called the value function in many RL algorithms
becaue the value function is a prediction of what future rewards will
result from the current environment. The policy function (using standard RL
temonology), is the algorthm for picking behaviors in order to maximise the
value function. Generally, it's some polocy that tends to pick the
behaviors that produce the best values (highest expected future rewards).

> Hook his system up to arms, legs and sensory
> inputs, and feedbacks and it will learn nothing.
> In theory it is supposed to learn when the critic
> notes an "intelligent" i/o (which means the
> "intelligence" already exists in the critic!),
> and a reinforcement button is pressed.
>
> A real learning organism has an inbuilt critic
> that "knows" if its trial actions are achieving
> its goals. Curt's system has no inbuilt goals.

Holly *** batman. Where have you been all these years? All RL systems
have inbuilt critics which define the goals. That's a primary definition
of what RL is. There is no human critic juding the performance of the
system. It's a very dumb hunk of hardware genertaing a reward signal.

Here, read these two pages from Sutton's book if you don't understand what
RL is:

1.1 Reinforcement Learning
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node7.html

3.1 The Agent-Environment Interface
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node28.html

> The idea of the behaviorists is that the environment
> teaches everything doesn't make sense to me as the
> environment has no teaching goals. The real environment
> does not "arrange contingencies".

Well, that's just how they like to talk about it.

Look at that second linke form above to see how it's framed as an RL
problem.

The agent, receives state information, and _reward_ informatino from the
environment. But to the RL problem, say in a robot, the entire robot, and
the critic hardware, is seen as part of the environment to the RL agent.

Humans have critic hardware in them, which is key to defining what part of
the the environment acts as a reward stimuls and which parts act as
punishment. Fire is a punishment because we have heat sensors in us wired
to some specilized critic hardware which is generating the punishment
signal to the learning agent in our brain. IN terms of how we frame the RL
probem, the "agent" is only the learning part of the brain. The heat sensor
and the arms and critic hardware what makes us sense a buring hand as
"pain" is all seen as part of the external environment to learning agent.

> The organism learns about the environment via input,
> actions and feedback. It learns what actions to take
> and how to modulate them with feedback in order to
> get the kinds of inputs it wants.

Yeah, that's basically what RL is all about.

> Sure the trainer of the rat has a goal.

The lab guy pushing the food button or shock button isn't the real critic
here. It's the hardware in the rat that senses food and sends a reward
signal to the learning brain. The lab guy and the food is just more of the
environment.

> Just as
> a programmer has access to a set of programming
> instructions and can shape them into a sequence
> to achieve the programmers goal so to does a rat
> or a pigeon have some inbuilt behaviors that the
> trainer can shape into a sequence of actions via
> a reward system to select a sequence of actions.
> But the "intelligent" behavior of the rat and the
> pigeon is the work of the trainer not the animal
> just as a program is the work of the programmer
> not the program itself.

Well, a good RL system will solve the RL probelm it is presented with. If
you motivate an RL robot to not move, by sending it reward signals only
when it's standing still, it will quickly learn to stand still and do
nothing. As "stupid" as that behavior looks to us, it's actualy very
intelligent in that it's a perfect solution to the RL problem that robot
was given to solve.

The complexity of the behavior produced by the learning machine is toally a
function of the complexity needed to solve the problem given to it (and of
course also limited by the learning powers of the machine).

One way to say that is that the environment creates the complexity. But
you can't forget that the critic is a highly important part of the
environment, beacuse the critic defines the goal. The environment just
defines how hard it is to reach the goal. If you want to create behavior
more interesting than just "standing still an not moving", then you have to
give the learning system a hard problem to solve. That means giving it a
very complex environment and a problem that idealy has no solution or goal
state, but instead, one of just trying to do the best possible - which is
the default type of problem of RL - maximise total estimated future
rewards.

Humans create such interesting complex behavior because they are given such
a hard problem to solve: "be happy", or "stop as much bad *** from
happening to you as possible and make as much good *** happen as you can".
There's no end to the problem because we are always simply trying to do
better than we did the day before. And beause the world (universe) is so
complex, there seems to be an endless number of improvements we can make to
our lives.

> One of the basic things a baby learns is to
> move its arm around until it contacts something
> which triggers a reflex grasping action. This
> enables baby to bring the item toward the mouth.
> Touch, smell, vision are all on the item as it
> is "tested" for edibility or to see if it "just
> feels good to play with" or some other *inbuilt*
> criteria of goodness.

Right, all *inbuilt* criterias for goodness are called the critic when you
frame it as an RL problem. There's no end to how many hard-coded mods you
could make to a truely general purpose RL machine, and there's no doubt
that evolution has done a ton of that in humans. But until we understand
how to build a good general purpose RL learning machine, we don't have the
foundtion to start modifying. If you try to build the modifications first,
then you hide the fact that you are missing the key foundation that all AI
needs to be intelligent. You end up with things like a knowledge database
that doesn't know what to do with the knowledge and has no common sense and
doesn't know how to get it on it's own. That's because it's the core RL
learning system that sucks the knowledge out of the environment on it's own
- but directed at all times by it's one and only main abstract goal - get
more rewards.

> We have built in rat and pigeon brains but we
> are more than that. We rely on all those learnt
> reflexes to function but we are more than just
> a set of environmentally triggered reflexes.
> We select and shape our own reflex behaviors.
>
> In general what you say Clark talks about doesn't
> seem to conflict with anything I have read about
> with regards the many systems at work in the
> human brain. We may have models as to what
> we need to react to as part of our basic sensory
> motor systems but how is that extended into the
> internal reasoning process, those internal
> behaviors called "thinking"?

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.


Quantcast