Re: What did that thread indicate?



Traveler <traveler@xxxxxxxxxx> wrote:
> On 23 Sep 2005 23:52:01 GMT, curt@xxxxxxxx (Curt Welch) wrote:
>
> >Traveler <traveler@xxxxxxxxxx> wrote:
> >> On 22 Sep 2005 16:41:03 -0700, humiguel@xxxxxxx wrote:
> >
> >> >However, for creating behaviour there's not a clear-cut mechanism
> >> >that matches the power of concepts for the input part.
> >>
> >> I think that concept formation and behavior formation are related.
> >
> >I go even further than that.
> >
> >I think they are the exact same thing. That's why my network only has
> >one solution for both problems. Creating the right behavior and
> >creating the right concept is the exact same problem. The internal
> >concepts are just stepping stones to the end behavior.
>
> I, too, came to the same conclusion, but for different reasons, I'm
> sure.

Well, how we talk about it and think about it might be different, but in
the end, the real reason we see things in common is because we are both
looking at the same human and animal behavor. :)

> >Likewise, pattern recognition is the same problem.
>
> Yes, a pattern is just a concept. However, I disagree with current
> approach to pattern recognition that uses a strictly feed-forward,
> pyramid-type hierarchical network. As I've mentioned in the past, the
> biological and psychological evidence refutes this approach. Concept
> formation is a top-down process, IMO.

It's still very unclear to me how much power my feed forward network has.
It's clear feedback is needed for multiple reasons (Dan loves the idea of
lots and lots of feedback for image recognition), but it's unclear how many
different ways it might be implemented. My training system (because it's a
reinforcement learning system), is a strong feedback system. So even
though the data is feedforward, all the training happens in just the
opposit direction (outputs back to inputs) as a feedback loop. Also,
because my network is temporal, it natrually has feedback effects without
feedback paths that non-temporal nets can only get with the help of
feedback paths.

For example, activity on one path, will change the behavior of data flowing
in other paths, at a later time in my network. In a spatial non-temporal
nework (like many ANNs), these same type of effects can only happen with
the help of feedback and delay loops (memory). So when other people talk
about how important feedback is, and see my network as only a feedforward
system, I think they might be missing the bigger picture of how temporal
networks are a completely different beast and how they have natrual
temporal feedback effects, without actually having loops in the data paths.
And all that happens before you start the training system which is also
temporal feedback by defintion since training effects (reinforcements) must
be fed back in time to reinforce events that happaned before the
reinforcer.

So, where as other people look at my network and might write it off as
"just another feedforard NN", I think they have overlooked something huge
there that they simply don't even know is going on because they have never
thought about what happens when you build a feedforward network out of
temporal gates.

So, for pattern recognition, I'm not sure if any other type of feedback is
required. This network which looks like, but does not act like, a strict
feedforward network, might have all the feedback it needs. Or maybe not.
:)

But at the same time, I think other types of actual data feedback paths are
required for pattern generation and I've not yet experimented with that.
It might turn out that the same type of feedback is needed just to do
pattern recognition correctly.

And the act of "thinking to ourselves - aka private thoughts" is clearly a
large internal feedback system of some type at work.

So even though I strongly believe all this will turn out to be the same
problem in most ways (pattern recognition, concept formation, behavior),
I'm not as sure if my type of network has what it takes to solve the
problem or not - I only belive it's a big step in the right direction.

> > The only reason we
> >recognize patterns is to create the right end behavior. It's all the
> >same problem which is why I believe the right system will solve all
> >these problems at the same time (and why I believe a system as simple as
> >my learning network has a hope at doing everything).
> >
> >I think the failure to see how all these problems are really the same
> >problem is what keeps people from finding better general solutions to AI
> >- they keep looking for solutions to partial problems instead of
> >realizing they need to find one solution that solves them all.
>
> There is more to behavior than this, IMO. Right now I am wrestling
> with the problem of action timing. IOW, how does a system string
> multiple primitive behaviors (actions) together and correctly
> determine their timing?

Yeah, that's clearly a big question.

It's answered in my network because all the nodes are action timing gates.
That's what they do at their lowest level of function. They perform a
timed action. When my nework learns, it is adjusting the timing of the
gates. That's ALL it does. It's a pure temporal action learning network.
It learns timing. That's all it can and does learn. Everything is action
timing to this network.

After a pulse passes though one of my nodes, it switches it's behavior for
a pre-defined amount of time. It will route all pulses out the high side
for a fixed amount of time. When the time is up, its behavior switches
back to routing pulses out the low side. All learning in the network
adjusts the length of this behavior action timing.

A network like this with 1000 nodes, has 1000 independent action timers to
control behavior.

> For example, a grasping action must be started
> at the right time, otherwise it's ineffective and may even be harmful.
> There are other problems such as determining success and failure, of
> goal formation, prediction, conflict resolution, etc...

Right, all behavior is in fact a temporal problem. WHAT you do is only
half the problem. WHEN you do it is just as imporant as what you do. You
can't walk without getting the timing right - even on four legs. You sure
as hell can't learn to walk on two legs, throw and catch a ball, or even
carry on a converstaion with a human, without getting the timing correct.
Exactly when we start to talk in order to interrupt other speakers in a
group converstaion and when we decide to stop talking because someone else
is trying to talk at the same time is a hugely complex action time issue
that most poeple working on "talking machines" seem to totaly ignore.
Human intelligent behavior is a total temporal action timing problem. (I
know you know this Louis - I'm just repeating it for others that don't seem
to understand this).

My network learns how to get the timing right for all behavior using
reinforcement learning. Meaning, the network does everything it does,
based on timing. It's a temporal stimuls response network which from the
second it's first turned on, always has a hard-wired temporal behavior
where it's producing behavior based on the timing of events. It doesn't
know if the timing it is using is correct, but that's what the
reinforcement learning is all about. When it does something that works,
that timing which was used is reinforced (same behavior, (aka same timing),
is more likely to be repeated in the future). When it does things that
doesn't work, the timing is punished.

Now, this issue of what to actually reinforce in the machine had me going
in circles for about 20 years. For example, if you produce an output value
of 5 at time T, and it gets punished, it's easy to see that you need to
change the behavior. But, what do you actually change? Do you make action
X happen sooner, or later? How do you know which way to change the timing
when all you have to work with is a generic punishment signal and not
something telling you what the right answer is? Or do you change from
action from a value of 5 to a value of 4? Or to a value of 6? How do you
know which way to adjust the action? I spent literally 20 years, exploring
every different answers to these questions I could think about to explore.

For example, if you look at adjusting the timing of a behavior when you are
punished, one answer that seems resonable is to extend the timing when you
are punished because that will decrease the rate of repeating that behavior
(decresse the likelyhood of it being repeated in the future). And
conversely, that means you reduce the timing (make it happen sooner next
time) when you are rewarded.

But, what happens if this system is being punished because the behavior
showed up late? How does such a system learn that it must do the behavior
sooner in order to be rewarded? A system which always tends to delay the
behavior when it is punished is going to have a very hard time learning it
needs to do just the opposit. So the "delay when punished" option has
serious problems.

So the question is, how do you "learn" the proper setting of a continous
value, such as a timing event, with reinforcement learning?

> The final
> solution will be simple and easy to implement, I'm sure, but finding
> it is like searching for the proverbial needle in the haystack.

Yeah, it is. for sure.

And in my case, the focus has always been on how to build a reinforcement
learning machine because I've always seen that as key to what we needed to
build to create general machine intelligence. And when you turned me on to
the idea of focusing on temporal nature (which I had been ignoring and
putting off as a problem to be solved at the higher level of the systems)
it got me into the problem of how do you do reinforecment training of the
timing of events - which got to this hard question of how do you adjust
timing when all you know is that what you did was wrong but you have no
hint as to what is right.

The solution was found with the design of my current network. It passes
pulses, with no delay. So the loweset level behavior of the network has no
delay. It's a very simple stimuls response problem where the network is
given a stimuls (an input pulse), and must imdeiately make a decision about
how to respond (where to send the pulse). But it makes the decision, based
on temporal data - it makes the decision based on how much time has passed
since the last pulse was here.

So, each node in the network, makes a very simple binary decision which
turns out to be very easy to reinforce and punish. If it sorts a pulse out
one way, and this behavior is punished, then the system must make a change
which will reduce the probability of the next pulse going to same way. If
it is rewarded, it must increase the probabilty of the next pulse going the
same way. And it can do this very easily just by slowly adjusting it's
internal measure of pulse timing. And unlike the simple "when" question,
there is no quesiton about which way to adjust the timing to get the
correct effect here.

So, this solution gave me both a temporal action network which creates all
it's behavior based on timing, but yet didn't have the complex training
problem of knowing which direction to adjust the timing of the behavior.
Instead of trying to adjust when to produce the current behavior (which
requires some type of behavior delay), it's instead adjusting how far back
in time, the node looks for past events, to control the the current
decision. So there's never any doubt in this network when the decision has
to be made (the decision is the behavior - it must be done "now"), but
instead, it's adjusting what past information is being used to make the
current decision.

This network has all the properties I've been looking for in my needle in a
haystack search which I was never able to find with any other design. It's
a network that does everything based on timing - so hard temporal problems
like learning to walk no longer seems so hard. Hard temporal pattern
matching problems, like reacting to the speed of change of the sensory data
(how fast a ball is coming at us) is no longer seen as hard. This
network's core decision system is based only on that type of information.

Learning to create the correct sequence is natural for this type of network
because creating a sequence is no more complex than making sure the current
action is always correct based on the timing of recent past events and
actions. And that is exactly what every node in this network is doing -
producing the correct current behavior, based on what happaned in the
recent past. But the defintion of how far back in the recent past to go,
is different for every node in the network. Depending on network toplogy
and sensory data rates, the system will naturally have some nodes making
very short term decisions, and others making much longer term decisions.
So every behavior produced by the network is based on a complex temporal
footprint of recent past events which can extend minutes into the past if
need be (limited only by network size). This allows for extremly large
sets of complex temporal sequences to be learned by this type of network.

Learning to produce the right temporal sequence of behaviors at the right
time, is all I believe there is to intelligent beahvior. All our complex
longer term (higher level) behaviors we describe with names like goal
seeking or reasoning or planning or problem solving I believe all starts
here at the lowest level, as a simple temporal sequence reinforcement
learning problem. So I believe this is the problem we have to solve at the
loweset level. This is the machine we have to build to create AI. So
that's why I'm playing with networks that solve this most basic of
problems. All the high level problems are just in the end, nothing more
than the machine producing the correct sequence of behaviors, at the
correct time. If the machine can't learn to produce the right sequence on
its own, it's not intelligent. If I have to build "reasoning" sequencing
into the machine (as a computer programmer), then I'm acting as an
intelligent sequence creation machine instead of building a machine that
can generate its own intelligent sequences. In the end, no matter how many
"sequences" we hard-code into the machine, it still must learn to string
them togetner on it's own, in new and novel ways, that allow it to produce
"better" results. Until you add sequence improvement through learning,
it's not intelligent. And if you add that, then why waste any time trying
to hard-coding simple sequences when you have a machine that has the power
to find, and improve, it's own sequences of behavor for everything.

Now, Louis, I don't know if any of this will be any use to you in helping
you solving your design problems, but it's food for thought.

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: what is the Self
    ... And now you may want to tell me that because a program is a network, ... that network is always traversed in a linear sequence. ... switching between them, and logically, the threads form a sequence. ... So does timing, ...
    (comp.ai.philosophy)
  • Re: Pseudo random sequence
    ... my question is related to maximal length sequence in m sequence ... for example for 3 linear feedback shift registers ... my question is that by changing feedback network can i ...
    (comp.dsp)
  • Re: Pseudo random sequence
    ... my question is related to maximal length sequence in m sequence ... for example for 3 linear feedback shift registers the ... my question is that by changing feedback network can i increase ...
    (comp.dsp)
  • Re: Wiring ethernet in parallel?
    ... > not for a computer netowrk. ... i just wanted to use the computer network example to make ... any signal timing. ... They both support a TCP payload and are 4/16 Mbit and 10 Mbit ...
    (microsoft.public.win32.programmer.networks)
  • Re: "Boradcasting" MACd data
    ... > A network card, including the time at which interrupts occur due ... is not a good source of cryptographic entropy. ... Timing and Content of External Events ...
    (sci.crypt)

Loading