Re: Temporal Learning



Traveler <traveler@xxxxxxxxxx> wrote:
> On 03 Oct 2005 02:28:28 GMT, curt@xxxxxxxx (Curt Welch) wrote:
>
> [cut]
>
> >What I talk about when I say "learn to walk", I mean you have a generic
> >reaction learning algorithm, connected to sensors and effectors that
> >control the legs, where the algorithm has no pior knowledge about the
> >meaning of any of the signals, or about the nature of the bot it is
> >controlling. Yet, on it's own, in response to some reward signals
> >generated to give it a purpose,
>
> Yes, but where does this reward signal comes from? Is it hardwired? Or
> is it created by the robot on the fly?
>
> > it is able to master the task of walking
> >just because intentional motion of the body is something that leads the
> >bot to more rewards.
>
> IMO, a truly intelligent machine doesn't need any external reward
> signal in order to learn how to walk.

The point is that the bot MUST have some goal in life hard coded into it,
or else it has no reason to walk. Why not just lay on it's back and kick
it's feet? Why would that be "bad"?

In order to be creative (as opposed to randomly psychotic), the bot must
have a system to evaluate good from bad hard coded into it.

Feel free to suggest how that should be done, but I can gurantee, it aint
AI with out a value system built into it to direct it's behavior.
Otherwise, it's got no reason to learn anything.

> It should be able to generate
> one or more random goals (such as moving closer to a target), and then
> use that goal as part of a reinforcement strategy for learning how to
> walk. Newly created goals should be given a positive value, unless
> they are punished by the environment.
>
> Question is, what is a goal? I mean, how are goals wired in the brain?

Hum. I've just written a few thousands words on this subject in another
thread. :)

I shake my head.

You can't use goals as a value system for general AI.

There's one simple, and obvious answer to what the correct and only type of
value system is for AI. It's reinforcement learning. You don't give your
AI goals, you give it a critic which rewards specific results. AI's goal
is to keep the critic as happy as possible by performing whatever behaviors
work best to cause the critic to reward the bot.

It has nothing to do with "external rewards". It's internal rewards. It's
the critic hardware you build into the AI which defines it's purpose in
life. That's how you give it goals.

Like the light seeking bot I talked about, you could create a critic
hardware function that rewards the robot based on the current light level.
The reinforcement learning bot then develops behaviors in order to maximise
total reward (in this case - maximize light falling on it). So it learns to
walk so that it can move it's body into the light. If you put it in a
complex environment of constantly shifting light patterns, it will develop
incresingly complex sets of behaviors for finding and keeping itself in the
brightest light it can find.

If it's smart enough, it will learn how to turn the lights on. Or climb up
on a desk, turn the desk lamp on, and lay under it. When the bulb burns
out, it, it will go get a new one and replace it. When the power fails, it
will call the power company and report a power outage - all in the name of
a little golden light!

It may look like it's getting external rewards from the light, but the
rewards are in fact generated internally by the critic. And the purpose of
AI is to do whatever it can, to keep it's internal critic hardware happy.

So you hardwire a "goal" into a bot, by building it a critic which rewards
the results you want the bot to achieve.

Our critic hardware rewards us for getting food in us when we are hungry,
and keeping sharp pointy objects out of our skin, and for reproducing, and
for keeping in the right temperature range. So we spend our life running
around just to keep our internal critic hardware happy. We learn to walk
because it helps us get the things we need to keep the critic hardware
happy (like food).

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: The latest batch of wirehead enthusiasm
    ... instead of maximising the expected value of some utility function. ... but I've changed my life goals several times. ... and because they including learning ... attempt to maximize rewards. ...
    (comp.ai.philosophy)
  • Re: Does artificial intelligence require pleasure?
    ... Even if the brain ... Reinforcement learning machines try to maximize rewards ... If I want a reinforcement learning machine to feel pleasure and pain, ...
    (comp.ai.philosophy)
  • Re: Have you ever seen a commercial application applying "Reinforcement Learning"?
    ... machine learning labs one day? ... Cramming schools. ... I also get positive rewards: ... I believe that people have some kind of an intelligence threshold ...
    (comp.ai.philosophy)
  • Re: Machine learns on its own (more or less)
    ... taste to guide her learning and understanding. ... future rewards of different behaviors. ... rewards per how our internal hardware measures rewards. ... For example, if the robot wanders away from the only charger, that should ...
    (comp.ai.philosophy)
  • Re: What are we going to wish for?
    ... *goals* of AIs should be here. ... Humans are just reinforcement learning machines (as you might guess by now, ... quest for happiness - the happiness which comes from increasing our rewards ... Humans are not survival machines as many assume. ...
    (comp.ai.philosophy)