Re: Cool visual illusion





> ...

> JC:
>
>
>> Why not start with a simple machine with
>> say three binary inputs, heat, touch, food.
>> Thus the environment of this system only
>> has 16 possible states.


Curt Welch wrote:

> 2^3 is 8, not 16.

Just testing you :)


> But more important as I've posted twice now,
> the environment which is important to a machine
> trying to duplicate human behavior is NOT just
> the current input state. It's the total set
> of all past states.
>
>
> So, if these 3 binary input values can change 10
> times a second the state size of its environment
> after running for only one minute is 2^600 which
> is 414951556888099295851240786369116115101244623
> 224243689999565732969065281141290814639970704894
> 710379428819788661130078918239515107541177530788
> 6874834113963687061181803401509523685376 and not 8.

No I think it at any given time the possible input
combination is just eight. What the history of inputs
does is create the internal abstractions (internal
states) that select the output.

Learning is not creating one big long input made
up of a list of past inputs. It is in extracting
abstractions from those inputs.

The input state is not a list of past inputs.

> No one has really solved even these "simple"
> problems yet. My network at least addresses
> how it to approach the problem of dealing with
> a state space size of 2^600.

I don't think such a state space size is there.
You are trying to solve a problem that doesn't
even exist.

There is no attempt by a learning machine to
remember everything like a film strip. It reduces
that history to a description of parts and how
they interact with each other.

The current behavior is not the result of its
past inputs, it is a result of some abstraction
of those past inputs. If the past inputs were
still there they could be reproduced from the
changed input states.

You can have many different histories that
all end up producing the same current internal
states (memory). An example is convergent
evolution.

This behaviorist idea that we are simply the
product of our histories is a load of crock.
We are just as much a product of our internal
makeup.

....

> There is nothing useful to learn without an
> explicit definition of reward. Without that,
> learning about the characteristics of the input
> data is pointless because infinite knowledge
> about the inputs still tells us nothing about
> what output to produce.

It could be said that learning itself is a reward.
Don't you enjoy learning? The activity of taking
things in may be rewarding and the payoff is later
when those things can be put to good use.


> If you have an explicit definition of reward,
> then you have a goal, and you have a purpose,
> and then you have a lot to learn because you
> are interested in the correlation between the
> inputs and the rewards. The correlation
> between the inputs alone is not important.

But extracting them can be rewarding and in
the future useful.

> Only the correlation (constraints as you like
> to talk about) between the reward, and the
> state of the environment, is important.

??



>> Now we want the system to start chewing
>> in anticipation of food by associating
>> the environment state/s that precedes the
>> environmental state that includes food.
>
>
> Well, if you build that "want" of yours
> into the system, then you have done what
> I said needs to be done, you have given
> it an explicit reward.


No the "want" was ours as a requirement for
the system to produce classical conditioning.

I think "want" is a higher level concept that
most likely doesn't exist at the chewing level.

....

> ... if you limit the state size to 8, you
> have not created a toy, you have created a
> blind machine that has no temporal memory and
> that type of machine is less than just "toy"
> intelligence, it's not intelligent at all and
> has little hope of giving anyone insight into
> how you build an intelligent machine (as is
> the case with all back-prob NNs because they
> make the same mistake).


It was a toy system in that the input/output
was toy sized compared with real animals in
the real environment. An input state size of
8 binary inputs is a toy input but hopefully
an instructive one.

>> In this case one input means one
>> "stimuli" but in a complex input we have
>> the problem of extracting the "stimuli".
>
>
> Right, and the true "complex input" is a state
> size of 2^600 after running for only one minute,
> or 2^600 for a machine with a long term memory
> truncated at one minute. How do you extract
> stimuli from a set size that large? My type of
> network gives us one answer to that question.


You don't extract stimuli from such a complex
input because it doesn't exist anywhere except
in your imagination. At any given time you only
have one input state. You may also have internal
states that can be modified by each input state
and that is the "memory" of all those 2^600
combinations but it is not in the form of
2^600 combinations but some selected abstraction
of how they are temporally related.

In practice the details of those inputs cease
to exist and play no part in the current action.
The actual history is unknown and many histories
can produce the same result.

x + y = 362763

What are the values of x or y?



>> It also shows that the "environment" is
>> in fact defined by the input.
>
>> This is so small we can spell out all the
>> possible environmental transitions and
>> record their frequencies.
>
>
>> (heat:touch:food) where 1 means present.
>
>
>> 000 001 010 011 100 101 110 111
>> 000 x x x x x x x x
>> 001 x x x x x x x x
>> 010 x x x x x x x x
>> 011 x x 60% x x x x x
>> 100 x x x x x x x x
>> 101 x x x x x x x x
>> 110 x x x x x x x x
>> 111 x x x x x x x x
>
>
> And note that your table now has 64 states
> instead of 8, because it includes a memory
> of the last inputs. The human brain however
> has "memory" of inputs going back decades.

Go back as far as you like. The above example
shows that at any time you only have 64 weighted
connections between current input state and
predicted input state. It doesn't have memory
of inputs going back decades, it has some
abstraction extracted from those inputs. The
actual inputs are not remembered they are
sensibly forgotten. In the case above all that
is remembered is a set of probabilities and
there are many "histories" that might have
produced that set.




>> Thus the above tells us that touch (010)
>> precedes touch and food (011) 60% of the
>> time. This record would of course be
>> built up with "experience". The question
>> is how would you "wire up" this simple
>> machine so that it could be classically
>> conditioned?
>
>
> Yes, that is the question isn't it? :)


And in this simple case can you give me
an answer? And can that answer be extended
to more complex stimuli and responses?

> And the other side of that same question
> is why would the machine be wired up in
> such a way that it exhibits the external
> behavior we call classical conditioning?
> What's the evolutionary advantage that
> led to animals having classical conditioning?

It enabled the animal to predict the future
and act accordingly to enhance its reproductive
success.

> I believe the answer to that second question
> is that classical conditioning is a natural
> effect you get when you build a temporal
> reinforcement learning machine who's goal is
> to maximize all future rewards.

You could say that any mechanism that behaves
that way is a "natural effect" of the mechanism.

When you say things like "goal is to maximize
all future rewards" I would ask what does that
really mean in terms of machine states? What
is a goal or a reward in terms of machine states?


--
JC

.



Relevant Pages

  • Re: Is the human brain a optimal generic learning system?
    ... to fit whatever environment they are placed in. ... The learning algorithm is generic, ... to maximize a reward signal. ... One system using RL techniques will do better than ...
    (comp.ai.philosophy)
  • Re: Cool visual illusion
    ... >>> say three binary inputs, heat, touch, food. ... >> the environment which is important to a machine ... What the history of inputs ... > It could be said that learning itself is a reward. ...
    (comp.ai.philosophy)
  • Re: C++ in the kernel
    ... M68K based STB with only 1.5 MB or flash and 1.5MB of RAM and no disks. ... What we got for that one engineers effort was a development environment ... STL is surprisingly damn useful. ... STL chunking of memory needs to be looked at, ...
    (freebsd-arch)
  • Re: Reinforcement learning machines
    ... can learn by that reward system. ... things I see missing in your idea of a generic learning ... an RL algorithm sees the critic and the ... It expexts that the environment can send a reward for every "move" the RL ...
    (comp.ai.philosophy)
  • Re: Cobol data protection? Get a dog...
    ... I use win32 APIs to LOCK the memory heaps? ... This problem is universal and is not restricted to a COBOL caller. ... The submodule needs an address to do damage. ... You use the phrase 'open environment' I will let you define that so folks can ...
    (comp.lang.cobol)