Re: Robotics, AI, and Ethics



On May 14, 11:34 am, c...@xxxxxxxx (Curt Welch) wrote:
But our goal is not to make it easy for the learning
system, but instead, to make a _stronger_ learning
system, because the one you wrote to play TTT isn't
even good enough to play Backgammon, let alone act
like a human.

Well I haven't tried it on Backgammon yet :)

The brain I assume you think is fairly "strong" and yet
those without a pain feedback break bones, damage their
bodies and so on. We don't have radiation sensors so we
will happily play with radioactive stuff until it kills
us. I think the "strength" of a learning system is limited
by its feedback system. We can't learn everything because
we don't find everything rewarding. And we can't learn
about things we cannot sense in some way.

Even though the hardware that generates our reward signal
is in us, it's _not_ part of the learning system that
learns how to respond to it.

You can separate out the reward system if you like but
without it there is no learning. And if the only reward
signal is a win in backgammon that is all it will learn
no matter how "strong" it is at the task.

The idea that most feedback is not immediate is wrong.
Games are peculiar in that way, but even then, we play
games because in fact we get rewards while playing the
game, independently of the win reward.

You do not get an education (move) and then give it a
high value when you get a good job (win signal) as in
backgammon. You get a good education because you give
it a good value by observing the association between
people with good jobs and their good education.

If we instead, make assumptions about what type of
environment we are dealing with, and only create an
algorithm that can work with that one type of
environment, we will have limited the strength of
our algorithm.

News flash. One environment. 3d world of interacting objects.

There is no single, finite sized algorithm, that can solve
all problems. The fact that we have to work with finite
hardware (finite memory, finite compute per seconds power)
means we have to make some assumptions about the nature of
the environment we are dealing with. So creating a pure
generic algorithm is impossible. But the goal is to make
it as generic as possible - and as strong as possible,
which means it makes as few assumptions about the nature
of the problem it's trying to solve.

At last, agreement! No pure generic algorithm possible.

The agent in RL does NOT generate the reward signal.

That is right. The agent only determines if it is a reward
signal. The signal is only a reward signal because of how
the agent reacts to it. The environment generates lots of
signals but what is made of them is not determined by the
environment.


In this introduction chapter of Sutton's book, he
explains how to write a RL algorithm to play TTT.

http://www.cs.ualberta.ca/~sutton/book/ebook/node10.html

I have read Sutton’s book and implemented the ttt example.
You must have forgotten the exchanges on this subject?

The algorithm he specifies in that chapter, had you
used it in your learning system, would have solved
the problem of learning where to move, without having
to give it instant rewards - by simply giving it only
rewards for winning.

I have used that algorithm and we had an exchange on
the need for an exploratory moves to build up values
for each state before the system can start exploiting
those values.

No, the brain doesn't "look around". It, in a single
sweep, comes to the conclusion that it's time to move
the eyes a little to the right. And then in the next
instant, it comes to the conclusion to move the eyes
a little further to the right. And then in the next
instant, it comes to the conclusion to move the eyes
down, etc. Each decision being made in one large
parallel, "single sweep".

Just as the serial adder decides, in one single parallel
sweep, on the sum and the carry. It then uses that carry
to decide on the next output. But the whole adding
process is serial even if each step is done by parallel
circuitry and the adder requires extra circuitry to
deal with that.

The brain decides in one parallel sweep some things
about what it is looking at. It holds the results of
that first parallel process to decide what to do next.
For example you see a page of text. That is a parallel
output that may be used to determine the next step in
the serial process such as to direct the eyes to the
start of the text.

This was something demonstrated very well by Brooks'
work with subsumption architecture designs. He showed
how a simple reaction machine, one which had various
hard-coded ways of reacting to the _current_ condition
of the environment, could produce goal seeking behavior,
even though there was not "goal" hard wired directly
into the hardware.

Depends what you mean by hard coded. The path taken by
a light seeking robot is not hard coded but the reaction
to light is hard coded. In subsumption the hard coded
low level reactions are subsumed by higher level hard
coded reactions.

... the hardware that creates these serial processes
(all human behavior) needs to be configured by training.
So how do you suggest we do that?

Or the serial process can be computed. When you read
text or add numbers you follow a fixed procedure which
deals with different types of data, text or numbers.
This -general- procedure is learned. The ability to
drive a car is a procedure for converting the data,
the current visual/tactile input, into actions on the
steering wheel, accelerator and brake. If this procedure
fails to bring about the goal or sub goal state then
the learning process is implemented on that procedure.

I would say most of our actions are learned procedures
which automatically adjust to different situations.
It is only when they fail are the higher centres alerted
to a problem. The same applies to good programming
procedures. When an error occurs the program jumps to
an error processing routine.


It is all about feedback. Is a brain a weak system
because of its vast feedback system?


Yes, it is all about feedback. Dan likes to remind us
of that. :) Not sure what that has to do with any of
this however.

It has everything to do with it. What you call a "reward"
is feedback. Unlike a simple RL algorithm the brain has
a vast feedback system.


Brain research is important which is why I don't totally
ignore it. But it's a path I let other people take
because I'm not a neuroscientist and don't expect to
become one in my life.

Sure I understand that you are not, and never will be,
a neuroscientist but you do ignore the latest research
that doesn't require you to be a neuroscientist to
understand. Instead you base all your ideas on Skinner's
rat research.

And it's not going to get off the ground if you keep
working of machines that flap their wings just because
wing flapping is an important part of how evolution
made all birds fly.

And the equivalent to flapping wings IS an important
part of how to build a flying machine. It is how they
implement powered flight and we do the same using an
engine and a rotating wing just as we use engines and
rotating wheels for cars instead of muscles and legs.
However we can build flying machines that flap their
wings (you can buy toys that work that way) and build
walking machines.


Solving the engineering problem of powered flight
was not solved by studying birds, or by studying
the evolution of birds, even though both those
subjects mirror exactly every argument you make
about AI. It was solved by someone looking for,
identifying, and mastering, the PRINCIPLES that
governed the technologist.

And how did they know where to look for these principles?
Don't be so sure that they didn't have birds or rising
smoke as the inspiration to build gliding machines and
hot air balloons. You looked to Skinner's rats for your
conditioning principles.


The only way to solve AI, is to work on the learning
agent, not on the environment because we don't get to
pick the environment. The environment is called the
universe,

The "environment" is not really the Universe. The
environment is that part of the Universe that you have
not defined as the agent. In biology the dividing line
isn't that clear for we cannot exist outside a biosphere.

Also your claim that we don't get to pick the environment
is also not the case. We change our environment all the
time to suit ourselves. Our environment of oxygen was
created by life. We act on our environment just as much
as it acts on us. We can even build extra feedback in
the environment such as a smoke alarm.

True inventors don't need to copy anything. They invent
because they have an RL based brain that learns by trial
and error, not just by mimicking. That's what gives us
our creativity in the first place. You invent by
exploring new ideas, not by coping old ones.

So you think these "new" ideas just come from trial and
error? When you get a new idea it is it never based on
previous observations or learning? Are no new ideas the
modification of old ideas?

Our true intelligence is not in what we do, as much as
it's in how we change our behavior over time as we adapt
to whatever sort of environment we find ourselves in.

I understand about learning and adaptation. I read about it
in the first book I bought on the subject a long time ago.

You do not help by referring to learning and adaptation
(relearning) as "true intelligence". That is not how the
word "intelligence" is used. In this case you have taken
the category label "intelligence" and applied it only
to one of its members, learning, leaving all the other
innate or already learned behaviors we call "intelligent"
without a label. It is like saying a "true" fruit is an
apple leaving all the other items like oranges and bananas
without a category name.


No amount of hard-coding behaviour into a machine will
explain how it's able to reprogram itself, and that's
the key to AI - learning how to build machines that
can constantly reprogram themselves to adapt to a
changing environment.

Learning and adaptation (relearning) is a desired behavior.
I have not disagreed with that. It is my interest as well.


JC


.



Relevant Pages

  • Re: Is the human brain a optimal generic learning system?
    ... to fit whatever environment they are placed in. ... The learning algorithm is generic, ... to maximize a reward signal. ... One system using RL techniques will do better than ...
    (comp.ai.philosophy)
  • Re: motor emulation, modularity, feedback, prediction, and dreams
    ... Clark talks about babies learning ... And one of the huge speed-ups in the RL paradigm is the reward prediction ... reward signals based on the state of the environment. ... have inbuilt critics which define the goals. ...
    (comp.ai.philosophy)
  • Re: Reinforcement learning machines
    ... the learning algorithm. ... It expects that the environment ... can send a reward for every "move" the RL algorithm makes. ... and any critic to generate reward signals. ...
    (comp.ai.philosophy)
  • Re: Reinforce learn this
    ... controller, which works as well for the robot, that our brain works for us. ... give us a few years of exposure to an environment with ... Only some basic learning hardware was there at ... This is where reinforcement learning comes in. ...
    (comp.ai.philosophy)
  • Re: Reinforcement learning machines
    ... can learn by that reward system. ... things I see missing in your idea of a generic learning ... an RL algorithm sees the critic and the ... It expexts that the environment can send a reward for every "move" the RL ...
    (comp.ai.philosophy)