Re: Invariant Recognition, Grandmother Cell, and Memory Hierarchy



Michael Olea <oleaj@xxxxxxxxxxxxx> wrote:
> Curt Welch wrote:
>
> > Michael Olea <oleaj@xxxxxxxxxxxxx> wrote:
> >> Curt Welch wrote:
> >>
>
> >
> > There's a lot of work to be done and I tend to enjoy BSing here more
> > than doing the real work lately.
>
> I know the feeling. For me it goes like this: when I think I'm on to
> something or it feels like results are not far off I write a lot of code;
> when I get stuck, or not stuck but just feeling like there is an enormous
> amount of work to do before some payoff, I'm more inclined to chat than
> code. (Yeah, there is a behaviorist interpretation for that.)

Yeah, it's been odd for me these days. For years, I would spend a lot of
time thinking until I came up with a new approach. I would get real excited
about the new approach being "the answer" and I would have lots of energy
to work on coding and testing the idea. But after doing that, I always
found the new idea wasn't doing what I thought it would. And no amount of
adjusting the code made it do the things it needed to do. So I would just
lose all that excitement I had and put AI on the back burner again.

Then some months or years later, I'd pick it up and start thinking about it
again, and soon, I'd have a new idea that would get me excited again and
the cycle would repeat.

But, with this last net, the coding and testing never failed to do what it
needed to to. The net works and does exactly what I wanted it to do. So
to me, it feels like I finally found the approach that has been alluding me
for 30 years.

A good bit of why I always got excited when I thought I had a good idea, is
because I wanted the glory of finding an answer that no one else had found.
That's just the type of personality I have. I was doing it for the
recognition. But now I've found the answer, showed it to people, and not a
single person has seen any value in it. I've basically got the answer to
AI, and there's no recognition to be had. So I'm left with nothing to get
excited about. No one is paying me to do this work, and no one is giving
me any glory, so the work all of a sudden just doesn't have the value it
once had for me.

Logically, I can tell myself if I do the work, it will produce
demonstrations that people will understand. Then I'll get the recognition.
But somehow, it just doesn't feel that way because I can't get anyone
excited about what I've done so far. So it's left with me some real
motivation problems to push forward...

> [...]
>
> >
> >> There is some theory - the classic example is that a Perceptron cannot
> >> solve certain problems (e.g. the exclusive or problem). You can
> >> reinforce a Perceptron all you like - choose any schedule of
> >> reinforcement you care to: no perceptron will ever learn certain
> >> tasks.
> >
> > You do know that idea was shown bogus 30 years ago don't you?
>
> This is incorrect.
>
> > A single perceptron can't do xor but 2 of them can.
>
> This is correct, but could use a little elaboration.
>
> > ...Minsky just didn't see a way
> > to train a multilayer perceptron network at the time he was doing that
> > work...
>
> This is misleading - Minksy and Papert were not, as far as I know,
> looking for a way to train multilayer perceptrons; they set out to
> establish mathematical theorems describing what a perceptron, a specific
> computational device popular at the time, could and could not do. They
> succeeded. I put "Perceptrons" on the "must-read" list, even today. It is
> not just a "negative result"; it is a lucid, systematic investigation of
> computational limits.

Yeah, I've not read it. I orderd a copy from Amazon about 6 months ago and
they keep delaying the shipment until they finally said they couldn't get
it. I saw a copy at a local book store a while back but didn't buy it
since I had the copy on order. I need to go back there and see if they
still have it.

My comment about Minsky was just me repeating things I've read about the
history of the book and the effect it had on people. I've probably got
some important details confused.

I belive the problem was not with Minsky's or Papert's work as much as the
conclusions it led the bulk of AI reseachers to.

> >...so a lot of people thought that finding was far more significant than
> > it really was.
>
> Debatable. I would say that many of the lessons of "Perceptrons" apply
> with equal force to the feedforward, typicaly back-prop-trained,
> multi-layer perceptrons that became popular later. Of course Minsky has
> pointed that out himself here on CAP from time to time. So I won't
> elaborate, especialy since my "point is valid".

Yeah, I assume the work is still of significant value in understand the
power and limits of feed forward nets and the general approach of creating
a formal understanding of the power of any network is clearly a wise
approach. What I had read however is that the work is credited with
turning a lot of people off to the whole connectionist approach to AI for
decades because it simply left too many people with the impression that the
connectionist approach couldn't work.

> > However, your point is valid. Learning what a given technolgy just
> > can't do is of coruse very important to understanding what we are
> > dealing with.
> >
> > My network however has some substantial differences from perceptrons
> > even though it's a connectionist approach. That's because it's a
> > temporal network.
> >
> > Trying to determine the limits of what my type of net can do is a good
> > project that needs to be done. However, it's hard to approach with
> > typical
> > math tools because of the temporal dynamics of the net. Other than
> > analizing general statistical behavior of the network, I don't yet know
> > how to approach a better understanding of the limits of this type of
> > network. Maybe you do?
>
> Maybe. I don't remember the details of your nets well enough to do it,
> and even if I did I would have to have a good reason to invest that kind
> of effort. As I recall, the basic operation of a node in your nets is to
> compare a scalar input to a scalar threshold and send an output left or
> right. It makes a binary decision, so it extracts at most one bit of
> information from the input. It does not matter if you call that scalar
> "time", "weight", "mass", or "libido" - the basic result is binary: ABOVE
> or BELOW.

Yes, that's accurate. Each decision extracts at MOST one bit of
information from the signal.

> The reason I say it extracts at MOST one bit of information, is
> that the output from such a node, which in the abstract could without
> loss of generality be called '0' or '1' is a series of binary decisions.
> Such a series encodes 1-bit of information ONLY if on avereage '0' and
> '1' are equally likely. Suppose it always output a '1' (input always
> above threshold) - then, since the result never varies, it would contain
> no information. Suppose it outputs a '1' with probability 'p' (and so a
> '0' with probability 1-p). Then the output would, on average, contain:
>
> p*log2(p) + (1-p)*log2(1-p)
>
> bits of information per decision. That function has a maximum when p =
> 1/2, and the result is, at the maximum, 1-bit of info per decision. That
> is the most one of your nodes has to say about its world. That in itself
> does not pose any limit on what your nets can recognize - given enough
> nodes asking enough of the right questions you can get answers: Mineral?
> no. Vegetable? no. Bigger than a breadbox? yes. Faster than a speeding
> bullet? yes. Able to leap tall buildings in a single bound? yes...

> Now, so far this is strictly a feedforward scheme - it has no relevant
> "temporal dynamics" to speak of (and it is only my best recollection of
> what you do, which might be way wrong) - for interesting dynamics to
> emerge you would need feedback. You do have, as I recall, feedback, but
> strictly limited to training signals adjusting the thresholds of the
> nodes.

The are two distinct adjustments made to the thresholds. The first could
be called a non-supervised adjustment. The threshold is changed a small
amount every time it is used for the purpose of adjusting the output ratios
you talked about above. The default behavior of the node is to equlize the
probability of the two outputs - in order to maximize the information
extracted from the signal acording to your formula above.

The purpose of this first adjustment is to assign as much "meaning" to the
two output signals it creates as possible by asking the most optimal
question (as you talk about above). So if the node was sorting objects by
physical size (to go along with your breadbasket question), this adjustment
would cause the internal threshold to seek the median size of the objects
so that an equal number would be sorted (clasified) into the two output
sets - which maximizes the "size" information extracted from the signal by
this single binary clasification.

> That does not add any real temporal dynamics to the nets.

Well, my nodes don't sort by physical size as in the example I just gave.
They sort by the temporal spacing of the last two pulses. So it's
extracting the maximum amount of temporal information it can from the
signal with each action. It adjusts the internal temporal threshold
according to the temporal dynamics of the signal. It's most definitly
doing temporal pattern matching as it sorts. It's sorting on temporal
patterns.

> It just
> means that over time some node can learn that it is more efficient to ask
> "faster than a quick pedestrian" rather than "faster than a speeding
> bullet".

Right.

The other adjustment that happens to the node is done by the reinforcement
training. This is the adjustment that changes the beahvior of the node
based on the training signal. It has the same effect as you talk about
above - it simply makes it change the question it is asking (the value of
the threshold). However, it's isolated from the first info-max adjustment
by having the node maintain a sorting ratio it's attempting to produce.
The default ratio is 50/50 to allow the node to extract as much information
as possible. But the reinforcement training will adjust the target sorting
ratio the node is trying to produce. If the ratio becomes all one-sided,
it could go to 0/100, and this would make the node sort all the pulses out
one side and none out the other. The node would in this case, stop acting
as an information extractor, and turn into a wire.

So, through goal based training, the nodes adjust the size of the
clasificaion sets they are creating and can change completely from a
clasifying node, into a simple wire based on the statistical correlation
between behavior and reward. This allows the network to not only adjust
it's behavore to pick out the information that is important to it, but at
the same time, it's basically re-wiring the net to create any type of
circuit needed to get the right information, to the right place - i.e., to
produce the right behavior, in response to the correct stimuls.

> To get at the sorts of temporal dynamics Dan is talking about feedback
> would have to be not just training, but part of the recognition process
> itself.

Yes, but there's more than one way to do that. I did it inside the node
instead of outside the node.

> There are at least two kinds of feedback commonly thought to
> operate in neocortex. I'll just mention one type - delays. Delays allow
> the formation of autoassociative networks, which can learn sequences.

If you have non temporal nodes, then you can create temporal nodes by
adding external feedback with delay like this. I've put the "delay" effect
inside the node (a real temporal node), so you don't have to do externally
to make up for a lack of power of your basic node.

So, for example, if you are working with a digital AND gate (a non temporal
device), and you want it to recognize a temporal pattern, you have to delay
at least one of the inputs. If for example, you have a binary input signal
which was a stream of 1s and 0s, and you wanted to match on the temporal
pattern of two two 1s, separated by 2 unknown digits. You could do that by
feeding the stream to one input of the AND gate, and dealying the signal by
3 digits, to the other input. The output of the AND gate would be high
whenever it "saw" the 1xx1 pattern in the input stream. This is temporal
pattern recognition using non temporal gates and external delay.

You can use feedback in here as well, but you don't need feedback to do
temporal pattern matching. You only need delays.

That is the complex view of how to do temporal work when you work with a
non-temporal signal format (bits) and non temporal gates.

However, if you switch to a temporal signal format (pulses with no fixed
temporal alighment to some master clock), and you switch to temporal gates
(like mine), then the network does temporal pattern recognition by default
- you don't have to add delays or feedback to make it work. It just works.

Everything my net "recognizes" is 1xxx1 patterns where the number of x's
between the 1's are the spacing of the pulses - the amount of time between
the last two pulses. And because the pulses are not sycronized - it's a
pure async net - the amount of space between the pulses is a high precision
value - not a small interger count of the number of clock cycles between
two pits.

The delay that happens in my net comes from each of my nodes having a built
in timer (effectively) which is able to measure the delay between the last
pulse, and the current pulse. The behavior of the node is then based on
the value of that measured delay. Its internal threshold is a measure of
the median delay of all past input pulses. The value of the threshold in
effect contains a little bit of information from _every_ (to the resolution
limit of the stored value) past input - so the node in effect is carrying
forwardin time, a little bit of information from every past input pulse.
After the first 1000 past inputs, it's as if the true "function" of the
node, is a based on all 1000 past inputs, each with a small assocated
weight, and each delayed in time to allow it to show up now.

Or, if you like thinking in feedback terms - it's as if the adjusted
threshold value output from the last operation is feed back as input with a
delay, for use in the next operation. So it's feedback with delay
logically, but implemented in a way that doesn't look like feedback with a
delay.

> Pyramidal cells of layer 5 in neocortex send signals to the thalamus,
> which sends them back to layer 1 of the same cortical area whence they
> came (there are other cortico-thalamic circuits - I'm just talking about
> this one). It is, so the theory goes, a delay loop. Now it is possible
> for cortical "nodes" (not neurons, but groups of cortical colunms) to
> memorize sequnces of inputs. Now some interesting temporal dynamics are
> possible. But I digress. We were talking about analyzing the capabilities
> of your nets...

Right. My net is able to match on and respond to sequences of inputs.
It's a temporal pattern matching network at its core. It does it with no
external delays or feedback. It does it with a simple feedforward design
using temporal pattern matching nodes. What is hard to understand, and
build, with non-temporal gates is automatic and natural if you use temporal
nodes instead. Each node is already a temporal pattern matching node to
start with, so you don't have to add external delays and feedback to make
it work.

The other thing which you didn't get to, which makes this net so hard to
analyze or understand with many traditional math tools is that it's not a
value calculating network. It's acting as a switch which selects which
node will be used next. It's working like an IF statement in a program.
So you can talk about the output being a binary 1 or 0 for the type of
discussion you had above, but that 1 or 0 value is then being used to
select which node will get to make the next decision. It's in effect
delegating responsibility to one of two down-stream nodes to "do the work".

So, the behavior of the network is purely conditional and discontinuous,
which makes it hard to apply many traditional math tools to it. At best,
you can talk about Bayesian-like probabilities of data flow - but that
misses most of what really ends up happening in the network.

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: The Metheny Tone
    ... Speaker/power-amp #2 has the signal pass through a digital delay that has an lfo that modulates the delay time. ... In the old days I had a Korg DDL that had a modulation feature. ... Look for a chorus effect that has true stereo, not just phase reversal of the L/R signals. ...
    (rec.music.makers.guitar.jazz)
  • Re: The Metheny Tone
    ... Speaker/power-amp #2 has the signal pass through a digital delay that has an lfo that modulates the delay time. ... In the old days I had a Korg DDL that had a modulation feature. ... Look for a chorus effect that has true stereo, not just phase reversal of the L/R signals. ...
    (rec.music.makers.guitar.jazz)
  • Re: Newbie question about Wait for X and ModelSim
    ... Just to confirm the reason why I wanted to included a time delay. ... was modelling in VHDL a buffered sampled input bit to an output bit. ... The input sample is taken on the rising edge of a clock signal and the ... learning step was to move these signals from pins to internal signals ...
    (comp.lang.vhdl)
  • Re: GPS timing - amazing that it actually works
    ... A hardware design that adds delay to ... the GPS signals, but affects the signals from all satellites equally, ... a GPS receiver has to essentially assume ...
    (sci.geo.satellite-nav)
  • Re: Stern New Software: Opinions?
    ... if you hit start after making your adjustment it instantly makes your ... When people talk about the deep Stern games, ... does everyone remember the 5 second delay with Williams EVERY ... Just hit the enter button and the ...
    (rec.games.pinball)

Loading