Re: Goal of AI: Perfect or Bounded Rationality



jalegris@xxxxxxxxxxxx wrote:
Curt Welch wrote:
jalegris@xxxxxxxxxxxx wrote:

Another thing I'm not clear on is explaining classical conditioning
in terms of the operation of a reinforcement learning machine. Can
you give an example?

Classical conditioning is just the association of a behavior with a new
stimulus though pairing. Food -> salivation when paired with a bell
ringing will end up causing the bell ringing alone to trigger
salivation.

However, strong reinforcement learning requires the machine to make
internal predictions about future rewards, and use those predictions as
internal reinforcers. This is what creates secondary reinforcers.
This is why money acts as a reinforcer to us even though we can't eat
it, or have sex with it, etc. Our brain has learned to recognize money
as something which leads to higher future rewards. And as such, money
acts as a reinforcer for our behavior.

In the end, a strong reinforcement learning machine must assign value
to every sensation and every behavior and use those values as
reinforcers for all behavior. Though many things have a fairly neutral
value, they all must have some value. This is done in simple computer
reinforcement learning algorithms for example when they play a board
game and assign a value to every board position. Moves in the game
that lead to stronger board positions end up being reinforced because
stronger board positions are more likely to lead to a real future
reward (which might only happen when the game is won or lost).

So, strong reinforcement learning systems are constantly learning using
internal rewards. The behavior of the machine is constantly being
adjusted based on everything that happens.

If the machine already has a behavior for salivation in response to
food, and a bell happens at the same time, that bell->salivation
response is going to be rewarded by the secondary reinforcers. And
after enough rewards, assuming some other behavior isn't being
associated with the bell, then the machine will start to produce the
salivation response for the bell.

All classical conditioning can be explained in terms of the actions of
internal rewards used by a reinforcement learning machine which is
attempting to maximize not just current rewards, but all future
rewards. In order to do that, it must be making internal predictions
about potential future rewards, and using those predictions to
condition behavior instead of using only current rewards.

The language of classical conditioning is just a simple way of
explaining the more complex effects of secondary reinforcers.


Are you talking about the same classical conditioning observed in
animals?

Yes. But, to be clear, I'm no behaviorists and I've not carefully studied
any of the experiments that demonstrate classical conditioning. My
understanding is limited to what I get by reading simplified one page
descriptions of these types of conditioning.

It appears you have redefined the term to fit your model.

Probably.

In classical conditioning, a UCR (salivation elicited by food), comes
under control of a CS (bell) that indicates a UCS (food) will follow
soon, *independent* of subsequent behaviour.

Are you saying that a UCS-UCR relationship (usually known as a reflex)
is actually some form of pre-existing operant?

I'm saying it makes no difference whether it's an unconditioned response or
a conditioned operant which is brought under the control of the new CS.
The creation of the new association with the CS can still be explained in
terms of internal reinforcement.

The question I always look at is what do I have to build into a machine to
duplicate the effects we see in animals and humans. And at the same time,
which of these effects are needed to explain what we normally think of as
intelligent human behavior.

Classical conditioning and Operant conditioning are always explained as two
separate types of effects. This seems to lead most people to the
conclusion that we would then need to program both of these effects into
the machine - that we would have to write one set of code in a computer to
create the operant conditioning effects, and different code, to create the
classical conditioning effects. However, when I look at it, I only see the
need to create one algorithm that demonstrates both types of conditioning
externally. So I don't see it as two effects, but as one.

However, my lack of education in psychology might very well be blinding me
to the fact that classical conditioning and operand conditioning are more
different than I believe them to be (or maybe I just don't understand the
language well enough to really know what they are talking about). However,
I generally believe that the reason that people talk about classical
conditioning as being different from operant conditioning is that it seems
to happen without a reinforcer. You can make the bell produce the
salivation effect without ever paring it with food.

Internally however when you look at what you have to build to create
operant conditioning, the system just naturally also produces classical
conditioning. Or so it seems to me.

Now, other thing here is the idea of an unconditioned response. Animals
and humans no doubt have various pre-wired behaviors created by evolution
for their usefulness which are not under the control of conditioning.
However, when looking at how you might build a pre-wired behavior into a
reinforcement learning machine, you have a few options. One is to use the
reinforcement learning machine to modify some base of fixed behaviors. So,
in a robot, I could for example hard-wire a behavior of running away from
light by spinning the wheels of a two wheel robot in a way to escape the
light. But then you could build a reinforcement learning system on top of
that, that had the power to override the behavior - for example, by making
the system form a simple sum of the command from the hard-wired behavior
system and the reinforcement learning system. So if the hard wired system
tries to spin a wheel counter clockwise at 10 rpm, the reinforcement system
can make it stand still by sending out a command to spin the wheel
clockwise at 10 rpm to cancel out the behavior of the hard wired system.

The hard wired system then just looks like part of the environment the
reinforcement system is learning to deal with.

But then there's the other option. The reinforcement learning system might
be me motivated with internal reinforcers for the purpose of creating the
"run away from light" behavior. So it does end up having this instinctive
run away from light behavior, but it happens indirectly though the use of
reinforcers. So it is learned, but it's learned at such a young age, and
it's so hard to override, that it doesn't seem to be something that was
learned.

So, when people talk about an unconditioned response in an animal, I have
to wonder whether it's implemented internally as a true hard-wired
behavior, or as a hard wired internal reinforcer. In the second case, it's
not really unconditioned at all in terms of how the internal hardware is
working - it was very much conditioned and continues to be conditioned
every time it's performed.

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: Goal of AI: Perfect or Bounded Rationality
    ... Classical conditioning is just the association of a behavior with a new ... strong reinforcement learning requires the machine to make ...
    (comp.ai.philosophy)
  • Re: Goal of AI: Perfect or Bounded Rationality
    ... strong reinforcement learning requires the machine to make ... which leads to higher future rewards. ... a strong reinforcement learning machine must assign value to ...
    (comp.ai.philosophy)
  • Re: Goal of AI: Perfect or Bounded Rationality
    ... strong reinforcement learning requires the machine to make ... which leads to higher future rewards. ... a strong reinforcement learning machine must assign value to ...
    (comp.ai.philosophy)
  • Re: Mechanical Dualism versus Naturalized Epistemology
    ... > the correct path (i.e. reinforcement learning), ... > a reinforcement learning algorithm. ... >> Until you can ground your ideas in actual hardware ... > to train a multilayer network with reinforcement learning. ...
    (comp.ai.philosophy)
  • Re: Goal of AI: Perfect or Bounded Rationality
    ... which will cannot be accounted for by reinforcement learning alone? ... all those and a million other things are innate behaviors produced ... (and the sensors and effectors to roughly match what humans have). ...
    (comp.ai.philosophy)