Re: Ben G on reinforcement-learning and the wirehead problem



Tim Tyler <seemysig@xxxxxxxxxxxxxx> wrote:
Curt Welch wrote:

I think there will be plenty of approaches which are relatively
far from neural networks - but it does seem likely that they will
natually tend to have the undesirable "tangled" property.

Still, that doesn't mean that we can't try to untangle them.
We are getting experience with refactoring. With the assistance
of AI, we might be able to make a brain that is beautiful - and
not a tangled, incomprehensible mess.

Well, maybe, but I highly doubt it. I think intelligent beahvior is
inherently incompatible with concepts of easy to understand internal
structures.

However, I can certainly believe design solutions might emerge to allow us
to do things with these complex tangled systems that might at first seem
impossible. Even though we might not be able to understand how to adjust a
billion weights to fix some specific behavior into the system, maybe we can
write software to do automated testing that will be able to do the
calculations and adjustments for us. Just as an example, even in a large
neural network, you might be able to train it, then study how it's
configured, and then disable learning on a million of the 100 billion
weights, and that might effectively lock in the behavior you wanted to lock
in (it's learned goals), while still leaving a lot of learning flexibility.
And even though we can't look at the net and figure out which million
weights to freeze, some automated testing procedure might be able to figure
it out for us.

Without working hardware for people to experiment with and dream up new
approaches, we just can't know what is really possible.

Well, that's probably becuase you don't grasp how much our "beliefs"
are actually must more of the same - learned behaviors. I strongly
suspect (but have absolutely no way to prove), that the only way to
build something like human intelligence, is to mix all our behaviors
in one large (confectionist like) holographic like, memory recall
system. As such, you can't make some of the behaviors fixed (aka
non-learning), and others variable - free to be changed by learning.
So, nothing like instincts or reflexes will be posssible?

A reflex is when the leg jerks when you tap the knee. Do you really
think that has something important to do with AI? [...]

Reflexes in humans are a counter-example to the idea that "you can't
make
some of the behaviors fixed (aka non-learning), and others variable -
free
to be changed by learning".

You evidently /can/ make some behaviors fixed and others variable -
since
nature has managed it.

Right, it's not a question of whether you can fix some and not others, you
clearly can. The question is whether you can fix something as high level
and abstract as "not wanting to wirehead itself", while still allowing it
enough learning to be useful as a generally intelligent AI.

All you have to do is look at someone who has lost the ability to put new
things into their long term memory and you find out what you will end up
with you "fix in" too much of their behavior after they have been trained.
They will still act quite intelligent, until you meet them 10 minutes later
and they act like they have never seen you. Or when you try to teach them
something new, like your name, or how to make a new sandwich, and you find
out 10 minutes later they have no memory of what you just taught them.
What we call long term memory is what learning is all about. If you fix a
beahvior into the system, you prevent them from every changing that part of
their long term memory. But becuase I believe the only way to create human
level intelligence, is to build a holographic-like behavior "storage"
system (neural network), it will be highly difficult to fix in a single set
of behaviors because every beahvior is created by a combined effect of many
internal weights working together - but those same weights are also shared
by millions of other behaviors. So when you try to lock in one behavior,
you can end up at least partially locking in a million other things at the
same time. The question would be how much can you lock in, while at the
same time, leavening enough flexibility to do something useful.

The other point in this is that the solution doesn't have to be perfect.
It's a question of how long the AI will go before it ends up wireheading
itself. If you can put blocks in the way that allow it to last 1000 years
on average before it wireheads itself, then you have more than enough to
work with to create a functioning AI society. It's no different than
having some percentage of humans choosing to commit suicide. As long as we
have enough not choosing that option, then you can build a functioning
society. And of course, when an AI chooses to wirehead itself, the society
just takes it's body and reprograms it to make it start over, so the effect
to society ends up being fairly harmless anyway.

Instincts can be hard coded beahviors in non-learning machines. That's
not AI, that's just another complex machine with a hard coded function.
Like a robot that is hard coded to back up and turn around when it hits
a wall. That's in instinct in a non-learning machine, and that's just
not intelligence.

In a learning machine, there is no point in hard coding a behavior
because the learning machine will simply override the hard coded
behavior if it finds it useful to do so. If you hard code the beahvior
of backing up and turning right when you hit a wall, the learning
machine will simply override that behavior if it learns that doing so
will make things better - in effect erasing the behavior from the
machine. The only advantage of hard coding like that is that it gives
the machine some default starting behaviors that can be useful until
the machine has had enough experience to find out what's better. That
can be very important for survival, but again, has little to do with
intelligence.

All the intelligent systems we know of have plenty of instincts. It
may
be premature to conclude that they have little to do with
intelligence.

Consider the instinct to have sex, for example. That isn't learned,
it's built in - at least mostly.

Sure, but like said an a different post (our posts are overlapping so they
are old before we reply to them), that sort of instinct is created in a
learning machine by how the reward generating system is wired. All the
rewards generators define instincts. If you we have heat sensors that
cause negative rewards, then we have built in an instinct to avoid heat.
If we have reward generators that detect too much pressure and generate a
negative reward, we have instincts to avoid things that can crush us.

A collection of rewards like that can be called an instinct to "protect our
body".

So of course a reward maximizing machine _must_ have instincts. But there
are limits to what type of instincts you can easily build in. A heat
sensor is trivial to build in as an instinct. A "do not wirehead myself"
is a whole different level of complexity.

But like I said, maybe the advanced AI engineers of the future will figure
out ways to do it that aren't overly costly.

Much human behaviour revolves around this
instinct. It isn't there to protect infants - it doesn't kick in
until puberty. It has another purpose - to keep adult behaviour on
track, and to prevent them from picking up other goals from an
environment that may be trying to manipulate them.

Sure, humans only have a few prime motivations. One is all the sensors
that add up to "protect your body", (which I'm going to say includes the
"keep your energy levels up" instincts), and the other is "reproduce". All
human intelligent behavior (including stuff like us debating AI, and people
creating art) can be linked back to those two prime instincts.

That is the proposal, yes. Fix some things (the goals). Allow other
things to be learned. That is more-or-less how animals work. Some
things are built in by nature (instincts). Other things are plastic,
flexible and adaptable (learned behaviour).

The question however is what is reasonable to fix, and what are we
forced as a side effect to fix, that we might not want to fix as a
result? That's the issues I have, but we won't be able to resolve it
until we have hardware we can attempt to do that sort of thing with.

Now you are talking about implementation problems. Yes, there may be
implementation problems. These will likely depend on the AI architecture
used.

Yes, I've always been talking about implementation programs. That's the
heart of my issue with your position.

For example, one approach to AI is known as inductive-programming.

http://www.inductive-programming.org/intro/
http://en.wikipedia.org/wiki/Inductive_logic_programming

It involves making smarter and smarter compilers - that can build
programs from a specification. In such cases, the specification
of what you want to do (the goal) is kept deliberately exposed
in a high-level language.

The implementation problems may be tricky. But I don't see how
it can coherently be argued that they will prove to be insoluble.

Right, I really can't support that position. You can't argue that
something _can't_ be done unless you can fully prove you have plugged all
possible holes - which I can't begin to do here. I can't prove that
there's no way around the wirehead problem - I've even suggested about 5
different approaches to get around it.

And as you might notice, my position on the wirehead problem continues to
soften the more we debate it becuase of that. I think it's an inherent
problem in high intelligent AIs, but there might be ways to work around it
while still allowing the intelligence to grow.

The problem is that we have conflicting goals between evolution, and any AI
that evolution creates. That's where the wirehead problem is created.
Evolution has the goal of survival, and is creating these intelligent
machines becuase they are good at surviving. But intelligent machines are
trained to do what evolution wants them to do (aka survive), by building
into the machine a system that generates rewards to motivate it to survive,
such as the body damage sensors.

But when the AI itself, becomes so intelligent, it understands how
evolution has "tricked it" into doing what Evolution wants it to do, then
all bets are off. The AI becomes smart enough to out-smart evolution, and
at that point, the AI stops being a good survival machine, and Evolution
"kills it off".

And this isn't just a problem for a single AI. It's a problem for the
entire society of AIs because the society itself acts as one large
intelligence. Evolution has to not only find a way to keep individual AIs
form failing to use their intelligence for survival, but it's got to
prevent the society as a whole from loosing it's desire to survive.

I certainly see this as an interesting and important problem in the
advancement of intelligence in the universe, but I can't prove there is no
solution to the problem. There are certainly lots of options to explore
that I can think of, so maybe there are relatively straight forward ways to
keep high intelligence machines using their intelligence to help them
survive, instead of using their intelligence to wirehead themselves and in
so doing, stop caring about surviving.

How hard can fixing some of an agent's beliefs be? We see lots
of people with highly fixed beliefs. What we are trying to do
can't be *that* hard - since something similar happens everyday.

Yes, and if we took away the AIs ability to learn - aka took away it's
ability to form new long term memories, then we will have fixed it's all
it's beliefs. But it will be of limited use in society at that point since
it won't be able to learn any new skill or any new behavior. There is
certainly lots of use for such machines. I'd like my vending machines to
be as smart as humans with no ability to add new long term memories. Such
AIs would be nearly perfect factor workers. You turn their learning on,
train them until they can do the job correctly, then turn the learning off,
and it keeps doing the same job for the next 100 years without ever loosing
interest in what it's doing.

But by turning learning off, you have disabled their creativity. So the
question becomes, how much can you get away with by trying to disable some
of the learning, while leaving the rest on. And how much do you risk, the
part of the learning you left working, being used by the AI to work around
the part you locked in? Just as an odd example, we fix the beahvior of the
right arm, but allow the left arm to continue to learn. The right arm
keeps doing what we trained it to do, such as wave nicely to the humans
whenever it sees a human, but the left arm then learns to get around that,
by putting a handcuff on the right arm so it can't do what it was trained
to do. Something logically similar might happen internally in it's thought
patterns when you try to fix in the thought pattern of "my goal is to
survive and not wirehead myself". The rest of the brain might start to see
that fixed part of the brain is this odd "stranger" that lives inside it
which the rest of the brain just learns to ignore in time.

I certainly see lots of issues with the wirehead problem, but it's what I
don't see, and don't know, that could allow all the issues to be worked
around.

This seems like the idea of using a community to keep each other in
check.

It might work - but it would have some serious costs. If there's
another solution, which neatly avoids the whole problem, then we
should probably go with that - rather than wiring the agents' brains
with high explosives.

Evolution will go with whatever solution is the most cost effective,
that you can count on. I just don't know what that might be. There
are many options we have talked about here and no doubt, many we have
not yet thought of.

Right. Well, my position is that we have an indication that there may
be a relatively inexpensive solution - fix the agent's goals. Include
in the agent a model of what it thinks it is trying to do. It will
then be motivated by its own conception of its goals to try and
preserve
them - so if we can fix them a bit, the agent will do the rest of the
work of fixing them some more for us.

We may not have to do anything - apart from make sure the agent forms
the correct conception of its goals in the first place. Keep it well
clear of the idea that there can be a mismatch between its happiness
and what it sees it is doing to obtain that happiness during its early
development stages. Once it has developed enough to form an idea of
its goal in life, it will naturally act to preserve its goals - since
having your goals modified is normally really bad.

Yeah, will, what you are talking about here is like what happens when we
train a dog to do some trick. We train him by using real rewards. And he
continues to do the trick, for some time, even when you don't reward him.
But the training will wear off in time. If you don't reward him, he will
stop doing the trick in time.

But what you are talking about is training an AI for 20 years using lots of
complex social rewards to follow the party line of "our goal is to be happy
by surviving, and we don't want to wirehead ourselves".

20 years of training, might take 20 years to wear off. And if you can slow
down it's learning after it's prime secondary motivations (goals) have been
learned, it might take 50 years for it to wear off.

But the thing about the society, is that the only AIs around, are the ones
that still believe in the party line. The ones that figured out the party
line was bull***, went off and wirehead themselves, had a great time, but
then died. And the AIs that still believe in the party line do a good job
of hiding the fact that lots of AIs are going off and wireheading
themselves.

So the society is using the power of Evolution to maintain this party line,
and to condition it into all the new AIs that come along. And whenever an
AI fails to follow the party line, they just get reprogrammed. So the
society survives.

However, if hi intelligence causes the AI to quickly figure out the party
line is bull***, then the higher intelligence AIs won't stay around long,
leaving the lower intelligent AIs to run the society. So that effect could
put a natural cap on the effective intelligence of the average AI in the
society.

Evolution would have to find a way around that sort of effect if
intelligence were to keep growing.

But what I still believe, is that wire heading is an inherent problem
of intelligence, and it must always be solved one way or another, to
keep intelligence as a useful survival tool (or useful as any tool).
The more intelligent the life form becomes, and the the more knowledge
the AI society accumulates, the more wireheading will be a problem,
which means more systems will have to be put in place to keep it from
causing the downfall of the life form.

Well, if we can get the intelligent machines to do any such work
themselves, that would be nice. Then we would not have to worry
about it.

Maybe there will be some reasonable way to implement the "does not want
to wirehead" instinct into the machine and that will simply become the
default design of the machine which each machine understands and
includes in the next machine it designs.

That's the plan, yes.

Human style reproduction solves the problem becuase humans don't have
to know what they are in order to reproduce. But an AI that is
expected to be able to reproduce by design, is a very different
problem. At least some of the AIs in the society will have to fully
understand what they are, which puts them at a much higher risk to the
wirehead problem. Evolution will have to find an answer, or else
intelligence just won't be able to take over as a dominate force as
much as some of these singularity ideas suggest.

Even with the wirehead problem, intelligent agents can probably go
quite
a long way beyond humans - by dividing into a society of agents,
no one of which is capable enough to wirehead itself. Have these
creatures
produced in factories, and allow them to police each other's behaviour
- in
much the way that humans police drug-taking.

For sure. Even if the wirehead problem caused intelligence to be be capped
at something not much more than what human intelligence is now, the AIs
still have a huge advantage over humans. They can be built in all
different sizes and shapes and levels of intelligence, and levels of
training, very easily. And they can receive 20 years of training, in 10
minutes with a new download.

Even though humans tend to specialize at different tasks, and have
different innate skills, we are all still nearly identical compared to the
variation that a society of AIs will be able to create.

Human society is a society of like animals. An AI society could be so
different we won't even recognize it. Think of what a bee society looks
like - they have physically different bees built to do physically different
tasks (more so than the male/female split of human society).

But an AI society could develop millions of different types of AIs each
engineered, optimized, and trained, to do a specific job. Most the AIs
would probably not be smart enough to understand the wirehead issue and as
such, would not be at risk of wireheading themselves. If the basic
structure of the society motivated each AI to keep busy doing it's job,
then even if many were smart enough to understand the wirehead problem, and
be tempted to explore the option, they might not have the time becuase they
have been, and continue to be, so motivated by the society to do their job,
they never get to explore the ideas of wireheading themselves.

For example, think of an AI that runs on a server in some data center, but
is busy controlling the design and production of new space exploring AIs.
To start with, this AI might not even know where its server (brain) is
located. And if it has no "hands" that can reach its server brain, how is
it going to modify itself to wirehead itself? It would have to design a
space robot to go searching the planet to find where it's server was
located, and make that robot, do the wirehead work. But if there were lots
of other AIs watching over what this AI was doing, and they had the power
to reward, or punish the AI based on how good he did relative to the other
space AI designers, then the AI would be kept too busy doing it's job to be
able to go design the wirehead assistant. It doesn't have "free time" to
do whatever it wants - it's kept busy 24x7 doing it's real job. And if it
failed to do well, it would be punished, and maybe even just turned off
completely and replaced with the mind of one of the AIs that weren't
wasting time thinking about how to wirehead itself.

So maybe the solution really doesn't lie at the level of the individual AI,
but at the level of how this huge society of different AIs works by
watching over each other, and by each simply keeping busy doing it's job.

The forces of evolution would be the ultimate top level control that keeps
everything on track, becuase any part of the society that fails to be
useful to the rest of the society, will have their resources taken away
from them (energy and raw material). Their raw material will be
re-allocated into forms that are more useful (aka AIs that don't waste time
and energy wireheading themselves into infinite loops).

It might not be quite the same as if there was no wirehead problem -
but
even things like being able to plug your brain straight into the
internet
would be profoundly transformative, and would likely lead society
rapidly
beyond the human realm.

Yeah, the AIs will have a huge advantage over the humans because they, and
their society, will be able to evolve so much quicker, even if their
individual intelligence is capped by the wirehead problem. It's hard to
guess how long humans will stay around, and what form they will take
after all the options of genetic engineering and cyborg technologies starts
changing the path of human evolution.

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.


Loading