Re: Robotics, AI, and Ethics
- From: curt@xxxxxxxx (Curt Welch)
- Date: 14 May 2009 18:34:34 GMT
casey <jgkjcasey@xxxxxxxxxxxx> wrote:
On May 13, 3:38=A0pm, c...@xxxxxxxx (Curt Welch) wrote:
That simply shows a weakness in that algorithm so
you should look for ways to make it better.
Well I did find a way to make it better by providing
a signal that nothing was changing which the network
would find "boring" and that would be the signal to
change it weights as it is in us.
You didn't make the learning algorithm better by
giving it negative rewards for illegal moves. You
CHANGED THE ENVIRONMENT to make the problem easier
to solve. Giving your agent something easier to do
is not how you make the learning algorithm stronger.
There is no rule I know of that the feedback cannot be
immediate such as when you place your hand on a sharp
point.
Of course there's no rule against it. And if you want to make it as easy
as possible for the algorithm to learn something, you give it as much
reward information as you can. But our goal is not to make it easy for the
learning system, but instead, to make a _stronger_ learning system, because
the one you wrote to play TTT isn't even good enough to play Backgammon,
let alone act like a human.
I no more changed the environment than nature did
when it gave the brain "pain" feedback. The change is in
the learner (pain) not in the environment (sharp point).
Maybe it's best that you don't waste your time trying to create RL
algorithms. :)
The problem in front of us to build a strong learning algorithm which is
directed by a reward signal. As we try to create a strong generic
algorithm we take the stance that we know nothing about the nature of the
environment - so as to create an algorithm strong enough to learn how to
manipulate any environment it comes across. We do this because that's what
humans can do. Our brain can learn how to manipulate environments for the
purpose of getting higher rewards which evolution could not possibility
have hard wired into us.
Evolution did not hard wire our ability to use a pencil and fill in circles
on a *** of paper so that a optical test scanner could read our answers
to a math test so we can get a better grade so that we can get into a
better college so that we can get a better job, so that we can get food to
feed our family so that we don't have to suffer the pain of hunger 30 years
down the road. Humans are masters at learning how to manipulate (and
survive) in highly complex environments - in environments that make a
single game of Backgammon look completely trivial - and in environment were
most rewards, are highly delayed from the actions which were produced that
helped create them.
Even though the hardware that generates our reward signal is in us, it's
_not_ part of the learning system that learns how to respond to it. The
hardware that generates the rewards defines the goal (which for humans are
a large set of different rewards all selected because they motivate us to
keep our genes alive. The learning system figures out what type of
behavior, works best for making sure we have food in 30 years by filling in
little circles on a test today.
When we study, and attempt to design stronger generic learning algorithms,
we assume we don't know the environment, or what we are trying to learn
about the environment as defined by the reward signal. That's why it's
generic - becuase we as an intelligent designer, need to be blind to what
problem our algorithm is going to solve. If we are blind to, and we build
a better algorithm that can _any_ problem, then it's got a better chance of
solving problems we have never seen, or thought about. If we instead, make
assumptions about what type of environment we are dealing with, and only
create an algorithm that can work with that one type of environment, we
will have limited the strength of our algorithm.
There is no single, finite sized algorithm, that can solve all problems.
The fact that we have to work with finite hardware (finite memory, finite
compute per seconds power) means we have to make some assumptions about the
nature of the environment we are dealing with. So creating a pure generic
algorithm is impossible. But the goal is to make it as generic as possible
- and as strong as possible, which means it makes as few assumptions about
the nature of the problem it's trying to solve.
But in all this, from the prospective of the learning algorithm, the reward
signal is part of the environment, not part of the learning algorithm.
If for some odd reason you think all this is something I'm just pulling out
of my ass and making it up as I go along, try reading this page:
http://www.cs.ualberta.ca/~sutton/book/ebook/node28.html
"The environment also gives rise to rewards"...
or this one:
http://www.cs.ualberta.ca/~sutton/book/ebook/node29.html
"In reinforcement learning, the purpose or goal of the agent is
formalized in terms of a special reward signal passing from the
environment to the agent."
The agent in RL does NOT generate the reward signal. It's got one fixed
goal which is to maximize long term rewards _from_ _the_ _environment_.
When you change the code to generate different rewards, you are changing
the environment in the RL problem, not the agent. Our goal is not to make
our agents work better by giving them a simpler environment to work in, but
by finding new ways to code the agent, so that it can deal effectively with
more complex environments.
It's fine to put instant rewards in the environment. But if you can take
the instant rewards out, and your agent can still solve the problem, that
shows you have a better agent.
Witting a RL based TTT program that not only needs to learn which move
works best to win the game, but must also learn only to move in empty
squares (without the help of instant rewards), is TRIVIAL. If you can't do
that, then there's no hope of you making any progress of creating RL
algorithms to solve the problems the brain can solve.
If the algorithm you wrote only worked if it was given instant rewards,
then your algorithm is weaker than the example algorithms given in the
sutton book.
In this introduction chapter of Sutton's book, he explains how to write a
RL algorithm to play TTT.
http://www.cs.ualberta.ca/~sutton/book/ebook/node10.html
In it he asks the students some simple questions to make them think, this
is one of them:
"Exercise 1.3: Greedy Play Suppose the reinforcement learning player
was greedy, that is, it always played the move that brought it to the
position that it rated the best. Would it learn to play better, or
worse, than a nongreedy player? What problems might occur?"
The answer to that exercise question in this introduction chapter of a book
which is only an introduction to the whole field of RL, is the problem you
algorithm ran into, and which you "fixed", not by improving your learning
agent, but by simplifying the environment.
The algorithm he specifies in that chapter, had you used it in your
learning system, would have solved the problem of learning where to move,
without having to give it instant rewards - by simply giving it only
rewards for winning.
Biological brains seem to show that intelligence can
evolve at the top. That the lower level input/output
processors are parallel in nature
? It's parallel at all levels, not just the low level.
Not everything can be done in a single sweep and how
much a brain or a computer can do in a single sweep
is limited by hardware. The human brain does have a
lot of parallel processing hardware but it is not
unlimited that is why for example we have to "look
around" to take in the details of a scene.
No, the brain doesn't "look around". It, in a single sweep, comes to the
conclusion that it's time to move the eyes a little to the right. And then
in the next instant, it comes to the conclusion to move the eyes a little
further to the right. And then in the next instant, it comes to the
conclusion to move the eyes down, etc. Each decision being made in one
large parallel, "single sweep".
Just because you talk about your own behavior as "sweep around the room"
does not in any way mean the brain works that way. That's just how you
like to talk about what you do. And that's the classic error of AI - to
assume that the abstraction you use to describe behavior, is also the best
abstraction for describing what the hardware is doing.
This was something demonstrated very well by Brooks' work with subsumption
architecture designs. He showed how a simple reaction machine, one which
had various hard-coded ways of reacting to the _current_ condition of the
environment, could produce goal seeking behavior, even though there was not
"goal" hard wired directly into the hardware. His hardware was not
structured so as to create some internal "goal" signal to represent what
it's "intention" currently was. Instead, it was structured simply as a
list of how to react based only on the condition of the _current_
environment.
All human behavior can be explained like that. The brain did not "decide"
to "scan the room". The brain simply fell into a pattern or reactions that
caused it to scan the room, until something (likely in the environment)
triggered it to start following a different path of reactions.
Some things are serial by nature.
Yeah, like all human behavior.
Typing this text is
a serial process initiated by the cortex and meditated
by the cerebellum under control of feedback from various
sensors in the muscles and skin. On the input side
reading is also a serial process. You cannot take in
this whole post at a single glance. You have to move
your eyes over the words and build up, one step at a
time, what the text is about by holding in memory a
transient trace of each input just as a serial adder
has to do with the carry bit. And like the brain it
needs circuitry to enable this serial process to
proceed in an orderly fashion.
Yes, but the hardware that creates these serial processes (all human
behavior) needs to be configured by training. So how do you suggest we do
that?
Until we understand and master the fundamentals of
how a parallel signal processing network can learn,
Conditioning all the way up right?
right.
One of the reasons the Wright Brothers succeeded, is
because they did take the time to study and master an
understanding of the fundamentals of heaver than air
flight by playing with wind tunnels not by wasting
their time playing with birds.
I think it was Sir George Cayley that first understood
the principles of flight?
http://www.flyingmachines.org/cayl.html
I am not sure how much the Wright brothers understood the
principles of lift and drag from their wind tunnel tests or
how much they simply observed some shapes worked better
than others as someone might do playing with different types
of networks.
My understanding is that they learned it from a book (the one you reference
above?). But that the book, which everyone trusted as the authority on the
subject, had some fundamental errors in its formulas and that those errors
were uncovered by the Wright Bothers in their wind tunnel testing, which
greatly helped them in designing their aircraft - and in designing the
props which was something they didn't learn from nature but was a key part
of their solution.
Ah yes, from Wikipedia.. (spotted it while I was looking for the date of
the first flight for comments I was making futher down)...
http://en.wikipedia.org/wiki/Wright_Brothers
"The poor lift of the gliders led the Wrights to question the accuracy of
Lilienthal's data, as well as the "Smeaton coefficient" of air pressure,
which had been in existence for over 100 years and was part of the
accepted equation for lift."...
They realized the accepted wisdom was wrong, and did experiments to find
out what was right.
In two posts in a row here, you seem to have failed to
understand that by adding addition rewards, you were NOT
improving your learning algorithm, but instead changing
the environment to make it easier to learn so a weak
algorithm could work. To me, this is an example of you
not understanding the fundamentals of the problem we are
facing in AI.
It is all about feedback. Is a brain a weak system because
of its vast feedback system?
Yes, it is all about feedback. Dan likes to remind us of that. :) Not
sure what that has to do with any of this however.
Until we understand how networks can learn, we won't
have a clue what we are looking at in the brain.
I think they both add to our understanding of both
kinds of networks.
Sure, we would have no clue at all what to do if people had not studied the
brain. Brain research is important which is why I don't totally ignore it.
But it's a path I let other people take becuase I'm not a neuroscientist
and don't expect to become one in my life. I also don't think any of them
are going to solve AI, though their contributions are key. AI, as a theory
of machine learning will be solved first, and then the neuroscience will
uncover how it's implemented by the brain.
That's what I've been working on all these years.
Basic research in to how a signal processing network
can learn.
Over time flying machines improved and so too over time
the current batch of networks will improve.
Yes, but we don't have the first one off the ground yet. And it's not
going to get off the ground if you keep working of machines that flap their
wings just because wing flapping is an important part of how evolution made
all birds fly.
Here it is, more than 100 years after the Wright's first powered flight,
and we still don't use flapping wings in any of our highly evolved airplane
designs. Just becuase evolution found some solution worked well for it,
doesn't mean that solution will EVER work well for the type of hardware we
have to work with.
Solving the engineering problem of powered flight was not solved by
studying birds, or by studying the evolution of birds, even though both
those subjects mirror exactly every argument you make about AI. It was
solved by someone looking for, identifying, and mastering, the PRINCIPLES
that governed the technologist.
What evolution did for flight, was to show us it was clearly possible for
objects heaver than air, to fly. But that was about the end of it.
Everything else was figured out through thought and experimentation, not by
dissecting birds.
The principles that are important to designing a flying machine must first
be uncovered, and understood - and that for one was the lift equation you
can find in the wikipedia article. That combined with the fundamentals of
heat engine and power so that a light enough power source could be built -
but most of that was understood at that point in time.
For AI, understanding the fundamental principles of RL is the key because
that's what we already know (and have known for over 50 years) is what the
human brain (and all animal brains that have a cortex) are - they are RL
machines that find useful survival behaviors on their own, by trial and
error.
Like in the time of the Wright Brothers, the basic problem domain is
already known. They already had simple gliders that worked, and it was only
a matter of implementation to duplicate the wonders of flight we saw in the
birds.
Today, we already understand the problem domain of reinforcement learning,
and we have simple machines doing very interesting things in that domain.
But we don't have designs that are strong enough yet.
If you understand the RL problem domain, you would understand that creating
hard coded signal pre-processing to look for edges or what not is NOT
working on the agent - it's changing the environment to make it easier for
the agent. But if you also understand how these agents have to work, you
would understand no amount of hard coding the environment, will make the
world simple enough for a weak RL algorithm to look intelligent.
To go back to the airplane parallel. What you are doing, is taking a bad
glider, and trying to make it fly further, by finding a higher cliff to
launch it off of. You are making it produce better results (fly for longer
before hitting the ground) by trying to change the environment (find a
higher cliff) than by trying to improve the airplane. No amount of looking
for higher cliffs is ever going to solve the RL problem of AI, and no
amount of adjusting the environment by given it "easier rewards", or
"better pre-processed inputs" are going to turn a weak RL algorithm, into a
strong one.
The only way to solve AI, is to work on the learning agent, not on the
environment because we don't get to pick the environment. The environment
is called the universe, and until we create a learning agent with as much
power as the human brain when it interacts with the universe, we will not
have gotten "off the ground" on this problem.
If this was already understood, and could be found in
some book on the subject, then we could then turn to
the brain to look for implementation details and we
could understand why the brain was built like it is.
The experiments you believe gave you the answer as to what
was required, "conditioning" came from observing rat brains at
work and now you deny them as a source of further study?
Well, first of, I've never said no one should study animals, or humans, or
the brain. I've simple said that was not going to be the path that will
take us to AI.
Why it is you think the only way to invent something new is by copying
evolution I have no clue. True inventors don't need to copy anything.
They invent because they have an RL based brain that learns by trial and
error, not just by mimicking. That's what gives us our creativity in the
first place. You invent by exploring new ideas, not by coping old ones.
Well, I've not written much code in some time because
I'm stuck at an impasse in trying to find the next
conceptual advancement which will give me something
worth writing and testing.
Breakthroughs can also be made by experimenters without
any need for "conceptual advancement". A lot of discoveries
were in fact serendipitous accidents.
http://en.wikipedia.org/wiki/Serendipity
JC
Yes, but you would also be a fool, to try and solve a problem, by NOT
working on it.
If you simply don't have a clue how to make a better RL algorithm, then by
all means, work on something else related and maybe it will give you the
needed insight. There's nothing wrong with that. I of course don't care
what you choose to spend you time working on.
The nature of all my debates here is to try and get people (you) to
understand what the problem of AI actually is. It's the problem of
building a machine that learns by reinforcement. That's where all our
intelligent behavior comes from. It's not built into us by evolution, it
was created by a reinforcement learning process. Evolution built into us,
a brain that can adapt its own design, though a process of reinforcement
learning, to create these complex machines (humans) that procure all this
complex behavior we call intelligence.
Designing these sort of behaviors into our machines is way beyond our
ability. Only in very limited scopes can we do that - like chess playing,
or simple question answering. The full complexity of what a human does is
based on our ability to learn. Every time I write one of these messages,
my brain gets a little bit re-programmed, so that the next message I write
is a little different. Our true intelligence is not in what we do, as much
as it's in how we change our behavior over time as we adapt to whatever
sort of environment we find ourselves in.
No amount of hard-coding beahvior into a machine will explain how it's able
to reprogram itself, and that's the key to AI - learning how to build
machines that can constantly reprogram themselves to adapt to a changing
environment.
If you don't have a clue how to do that, work on something else. But don't
expect me to buy the argument that the "something else" is an important
part of the solution to AI.
--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.
- Follow-Ups:
- Re: Robotics, AI, and Ethics
- From: casey
- Re: Robotics, AI, and Ethics
- References:
- Re: Robotics, AI, and Ethics
- From: casey
- Re: Robotics, AI, and Ethics
- From: Curt Welch
- Re: Robotics, AI, and Ethics
- From: casey
- Re: Robotics, AI, and Ethics
- From: Curt Welch
- Re: Robotics, AI, and Ethics
- From: casey
- Re: Robotics, AI, and Ethics
- From: Curt Welch
- Re: Robotics, AI, and Ethics
- From: casey
- Re: Robotics, AI, and Ethics
- Prev by Date: Re: Robotics, AI, and Ethics
- Next by Date: Re: What's right around the corner for AI?
- Previous by thread: Re: Robotics, AI, and Ethics
- Next by thread: Re: Robotics, AI, and Ethics
- Index(es):