Re: Fundamental Limits to Curt's Nets



Michael Olea <oleaj@xxxxxxxxxxxxx> wrote:
> Hey, Curt. I figure, since I was such a jerk the other night, I owe you
> some "quality time". For now that means pointing out some fundamental
> limits with your nets - problems they are inherently not able to solve.
> The root cause of these limits has nothing to do with RL, but with the
> architecture of the nets - they are limitations of all strictly
> feedforward nets.

Oh, I don't think you owe me, but I always like talking about my nets. :)

> You wrote:
>
> > However, the "react to the world" just means that it must produce some
> > decision on what to do next, based on every new peice of sensory input
> > data that shows up. If one pixel in the eye senses that the light is
> >getting slight brighter, how should the system as a whole react to that?
> >the point is, any AI system must be able to react to any and all bits of
> >data flowing into it. The reaction might be to just ignore the data and
> >pretend it enver happaned, but most liekly, the reaction will at a
> >mimimal be to remember the data for some period of time, and allow the
> >memory of that data to effect future reactions.
>
> This is maybe as good a place as any to start. Your nets cannot remember
> the data. They can be influenced by it, alter some parameters, but they
> cannot "remember the data" indefinitely. No feedforward net can: it takes
> a loop.

There are a few things you don't know about my intended full design.

But first a few comments about loops., A feed-forward net with learning
does include an effective loop because the learning works in a feed-back
direction. It's not the active loop you speak of below, but it's a loop
that allows the net to "remember" indefinitely which is actually very
important because is the basis of long term memory. And it's important to
understand that the secondary reinforement effects of learning are
constantly happening based on the data flowing through the network. So you
don't need the critic sending rewards to the net for it to have long term
memroy effects.

And second, the system is always expected to have a primary feedback loop
through the environment beacuse this net is expected to be running as a
real-time reactive agent. In all cases where the environment can store the
state, the feedforward net will be busy creating complex loop effects. This
is the clasic "export the cognition to the environment" issue where the
environment actually becomes a critical part of the state of the system.
It's all the interesting stuff Rodney Books has shown you can do with
reactive system.

> Imagine an xor circuit with the output fed back into one of the inputs
> (so there is only one free input to the circuit). Such a circuit will
> "remember" an input pulse on its free line indefinitely, by continuing to
> generate output pulses until a new pulse arrives on the input line. It
> acts as a counter mod 2. Such circuits can be strung together to count
> any power of 2. Lets try a little ascii art (I hope you have a monospaced
> font).

Yeah, I do. What's Usent without ascii art!

> +1 +1
> ---I_1(t)---+------>(1)-------->(1)-----+--O(t+2)-->
> | ^ ^ |
> | -1 | -1 | |
> | | | |
> +-->-----------------+ | V
> | | | +1 | |
> ^ +-------(1)----->----+ |
> | |
> +--------------<------------------------+
>
> I_1(t) is the free input, which at any time t is in state pulse/no pulse
> O(t+2) is the output. Just to keep things simple time here is discrete.
> The
> output is again pulse/no pulse
> (1) denotes a node with a fixed firing threshold (of weighted sum of
> inputs) +/-1 denote input weights.
>
> So this circuit (unless I messed up) does what I said:
>
> ----------------------------
> t | 1 2 3 4 5 6 7 8...
> -------+--------------------
> I_1(t) | 0 0 0 1 0 0 0 0 ...
> -------+--------------------
> O(t) | 0 0 0 0 0 1 1 1 ...
> ----------------------------
>
> Your nets can't do that. Is this important? I think it means: 1) they
> can't count (except upto an arbitrary fixed value that depends on the
> size of the nets)* 2) they cannot do arithmetic (except within arbitrary
> fixed bounds that depend on the size of the nets). It is not just that
> your nets cannot learn to do these things - they cannot do them period.

This is what you don't know about the design.

The feedforward part of the net is just the learning part. What it can't
do is exactly what you said above. It can't produce output patterns on
it's own. It's basically only the reactive part of the net that learns how
to react to sensory inputs.

To do AI in general the system needs an output pattern generator so it can
learn to produce fixed complex output patterns - such at the complex timing
required for walking, or talking, or writing. Now, even the feedforward
net alone might make many of these complex tricks work on it's own, by
using state, and the feedback loop through the environment, but none the
less, you still need an internal pattern generator.

That's done this way:

----> | feedforward learning net |->[og]-+---->
+-> | | |
| |
+--------------------------------------+

You take the feedforward net, and make it's output control the [og] which
is a set of output generators, or pulse generators. I've played with a few
different designs for this over the years and don't know the best way to do
the pulse generators, but here for example are a few options. The basic
idea is that unlike the rest of the nodes in my feedforward net, the pulse
generators are free running pulse generators. They don't react to data,
they generate it. So, one option with how they would work is that they run
non-stop generating pulses at a constant rate. They have two inputs (which
are pulse signals), that act like the accelorator and the break on a car.
Pulses recevied on the "speed up" input causes the internal rate of the
generator to increase, and pulses receved on the "slow down" input cause it
to slow down. It might be configured to drift back to some middle
frequency if no control pulses are received so to make it run fast, it
would have to receive a constant stream of "go faster" pulses and to make
it run slow, it would have to recevie a constant stream of "go slower"
pulses.

Another option is to make the pulse generator act more like a noise source
instead of variable frequency pulse generator. This might be advantages to
training. But otherwise, the average frequency of the output could be
controlled like above.

Another option which has potential is that the pulse generators run by
default at their fastest rates, and the input only acts as the break
control above to inhibt the production of pulses. It's an inhibit input
which slows down the pulse generator. This creates a negative feedback
effect through the feedforward net which natrually creates a stable system.

So, the output of the pulse generators is then the motor outputs going to
control whatever the network controls in the environment. But, for every
pulse generated, it's not only sent out to the environment, but it's
duplicated, and sent back to another input in my feedfroward learning
network.

So, if you want 10 sensory inputs, and 10 motor outputs total, the
feedforward net needs to have a total of 20 inputs. 10 of which are the
sensory inputs, and 10 of which are the outputs from the pulse generators.
In effect, this allows the network to "sense" it's own behavior. So it's
sensing not only the sensory inputs, but it's also sensing what the pulse
generators are doing.

The outputs of the feedforward net are then used to control the pulse
generators. And teh feedwoard net is a learning network that learn to
produce any feed-forward function to control the behavior of the pulse
generators are some combination of reactions to both the current sensory
informatino, or a function of recent pulse generator activity.

Now, if you look at this, you will see that if the learning network builds
a function to control the pulse generator which is based only on the
feedback paths, you get exactly what your circuit above was - an internal
pattern generator which can be programmed to create any number of complex
fixed patterns which will loop forever and not be a function of the sensory
inputs.

But the learning network of course can also build functions that
shape/trigger the natrual pattern generator effects. So, to learn a
complex balstic movement like reaching for an object, the learning system
uses the motor feedback loops to generate the complex control sequences
needed to operate all the different motors/muscels in parallel. But that
complex pattern is tied into sensory conditions that allow it to shape and
bend and overwride the default motor behavior as needed.

So, this configuration simplifies the learning problem to only having to
solve, and operate, in a feed-forward environment. But, with teh motor
output feedback, as well as the huge amounts of feedback though the
environment, the system has a huge number of feedback paths that it's
learning to use to it's advantage.

BTW, I think this configuration is actually how the brain works as well.
If you look at my diagram, the feedforward network can be thought of as two
smaller networks glued together, one receiving the sensory inputs, and one
receiving the motor outputs. The half receiving the sensory inputs works
just like sensory cortex in the brain, and the half receiving the motor
feedback inputs, works just like the motor cortex. In other words, I don't
believe the human motor cortex is generating motor outputs at all (which
seems to be how people like to describe it). It's just more sensory
technolgy which is sesing the outputs of the brain's pulse geneators, which
happen to be located not in the motor cortex, but instead, in the basal
ganglia. And the basal ganglia pulse generators are not controlled by just
the motor cotrex, but by outputs from both the motor and sensory cortex.

And, just to make this configuration intersting, there's no need to connect
every pulse generator to an output device. Some can just be left simply as
internal pattern generators. so if you need 10 ouputs, you might actually
build a network with 20 pulse generators, and only use 10 for the real
outputs, and that gives you 10 free generators which can be used to
generate internal behavior patterns which don't directly effect the
outputs. This might allow the system to learn to creage even more complex
behavior. Or it might just confuse things. :) That's stuff I would need to
experment with down the road.

So, the important point of this design, is that I can use one type of
learning system to solve both sensory, and motor, learning problems. Being
able to use a structure like this is a breakthough I had about 5 years ago
or so. I realized that the sensory side of the problem was one of creating
a hierarchy of abstractions to decode a sensory signal, and the motor
genertaion problem was just the inverse - you needed to build a hiearchy of
abstractions to create complex fixed patterns of behavior. But instead of
trying to build two learning systems, one to learn to decode a hiearchy of
complexity on the sensory side, and and a second to create a hiearchy of
complexity on the motor side, you could solve both problems at once, if you
used feedback like above. And that's what I've done. My feedforward
learning net only needs to learn the correct mapping of assocations of
signals as they flow though the network, and with the motor feedback
signals, it can solve both the sensory and motor learning problems at the
same time. And as I said, this also explains why the sesnory cortex and
the motor cortex are the exact same technlogy in the brain (both are all
the same neocortex structure). It's beacause it's not two different
technologies in the brain to solve the two probelms. It's one technology
used to solve both. Every description of the sensory and motor cortex I've
seen has the motor cortex wired "backwards" in terms of what they think
it's doing.

> If you think they can, then prove me wrong.

The simple feedforard part of my full design can not, you are right. But
it's not my full design. You just hadn't seen my posts where I talked
about the above information. I've posted it 3 or 4 times here in c.a.p.
now over the years.

Because my learning network is a temporal learning system, it's all about
timing. Because my nodes actually react to the pulse spacing, they are
acting logicaly as if they were a differentiators in a control feedback
loop. They respond in effect not to just the input value, but to the rate
of change of the input value. So as the learning system "wires up" some
network configuration in the feed-forward part of the network, it's
actually building a large discrete differential equation for controling a
motor feedback loop who's order is only limited by the depth of the
network. This means that the order of the control problem it's trying to
solve (the effective number of intergrators in the environmental feedback
path it's trying to control) is limited only by the depth of the
feedforward network.

This also means the natrual behavior of this type of system will be to
produce smooth speed/acceleartion/jerk/... based control actions which I
think is key in what makes humans and animals move how they move.

And, now that I write all the above again, it's reminding me of what Dan
has just been posting about the motor emulator idea. I could see someone
looking at this same type of design and thinking about as as "motor
emulator". Maybe there's some comon thoughts here.

> Don't train them to do
> arithmetic - just show a configuration, however achieved, that can add a
> set of arbitrarily many summands. It's ok if your net uses scratch paper
> to do it. After all, people cannot do arbitrary sums in their heads, but
> they can do them with pencil and paper.

The above, in a large enough configuration, should be able to exactly what
humans do with a pencel and paper as far as I can tell. (but no John, I
will never try to hand-code that solution ;)).

> *Caveat: your nets can, of course, count pulses by simply forwarding
> them. What they cannot do (assuming they had learned to read) is execute
> a command like "count from one to one thousand two hundred fifty four".

Right, the full net with both the sensory and motor halfs has the structure
required to do that. It's only a question of can I produce a learning
algorithm with enough power. But it looks to me like I've got all the
pieces in place to do that.

> The real limits run deeper. It's, again, not just your nets, but all
> feedforward nets. They cannot do recursion. Again, it takes a loop. This
> means these nets cannot achieve one of the fundamental hallmarks of human
> intelligence: the ability to compose a "complex" solution out of simple
> building blocks.
>
> This is a serious, and I would claim fatal, flaw.

Yeah, I saw that 20 years ago. I've been looking for a simple solution to
how to create both complex sensory learning and complex motor learning for
20 years. There's the answer for you above. You do it with one solution
using feeback instead of having to build two solutions.

> Your nets (and all
> feedforward nets) are denied the resources of discrete combinatorial
> generating systems. DNA (in its milieu of promoter and inhibitor
> polymerases) is a discrete combinatorial generating system. Language is a
> discrete combinatorial generating system. So is the immune system. Such
> systems achieve arbitrary complexity (sophistication) via nesting of
> primitives into a combinatorial explosion of hierarchies. Nets with loops
> can do that. Your nets cannot.

Yeah, they do it blindfolded. :) You just didn't undestand the full
design.

> Again, none of this bears at all on RL as a learning scheme - it is about
> limits inherent in the architecture, quite apart from any learning
> scheme.

right.

> Time for some more mundane observations. Your nets are regular,
> loop-free, and have a tiny fan-out (factor of 2). Biological nets are
> relatively irregular, have loops over a wide range of lengths (cycles
> from 2 to many thousands), and have fan-outs typicaly in the range of 1,
> 000 to 10,000. The biological architecture is quite probably no accident.
>
> One consequence of the regular 2-fold fan-out of your nets is that widely
> seperated inputs can have no influence on each other at all until
> correspondingly deep layers in the net are reached (it takes that many
> layers for pulses to scoot over till they converge on a common child
> node).This is another severe limit on the "logical depth", the ability to
> form non-trivial hierarchies, of your nets.Simple correlations of widely
> spaced inputs cannot be detected till late in the game. This is not only
> a damper on nested hierarchies, it also adversely affects reaction times.

This is the whole network toplogy issue which I gotten into talks with John
but probably not you. My flat network which looks like chicken wire just
won't cut it. It's got many problems. The first is the linear fan-out you
talk about. To fan out 10 nodes wide, the net has to be 10 nodes deep. TO
fan out 10,000, it hs to be be 10,000 deep. This won't work. Thje second
is that signals can't cross without merging first. That just won't cut it.

But the solution is trivial. You use a better network topology.
Understanding how to build that is a bit complex and I've not figured out
all the details. More work is needed. But, I've got various options to
play with already. But don't ask me to try and draw them with ascii art.
The answer however, which I just posted in a different post, is that you
need to wire the net, so that after one layer, a single input is split, and
sent to two nodes. After another layer, instead of spliting and ending up
at three nodes, it needs to go to 4 nodes. And after that 8, then 16. So,
to get to 10,000 nodes, you only need log2(10,000) layers of network
instead of 10,000 layers of network.

> Biological neural nets do not have that problem. They all (all the ones
> that have been checked, from C. Elegans to Homo Sapiens) have an
> organization Watts and Strogatz have dubbed "small worlds nets". In brief
> what that means is that while most neural connections are relatively
> short range, linking close neighbors, some are medium range and some are
> long range, linking regions far and wide in a single hop. It's the "six
> degrees of separation" idea - the idea that you can pick at random any
> two people on Earth and on average they are connectected by a chain of
> mutual acquaintances of at most 5 hops. That's why they call it a "small
> world" net. Most of your acquaintences are local - neighbors. But maybe
> you know one or two out of towners. Maybe someone you know has a friend
> in Milano. And someone she knows has a friend in Beijing... Small worlds.
> So the quantitative study of that sort of thing looks at probability
> distributions over a number of quantities - average node degree
> (fan-out), clustering (fraction of links to close neighbors), dispersion
> (probability of links as a funtion of distance), and path-lengths (number
> of hops in the shortest path between nodes chosen at random). It just
> takes a (relatively) few mid and long-range links to sharply reduce
> average path lengths. Small worlds. All biological neural nets do that.
> Yours don't. In fact, even the metabolic, regulatory, and
> signal-transduction nets in single cells do that. Yours don't.

Right. With the right topology, mine do. It's just a mater of researching
different ways to wire the nodes together to see how different topologies
effect learning. In the end, I think there are two possible answers.
Either there is one very good general purpose topology, that can be used to
solve all problems with only a few major paramaters like the depth and
width of the net to play with, or there's going to be a lot of different
toplogy options which means each network used will have to have it's
toplogy optimized to the type of problem it's expected to solve. So one
used for visual signal processing may end up with one type of toplogy and
and one used serial audio data might work best with a different topology.
It's not that the wrong toplogy couldn't solve the same problems, it's that
the wrong toloplgy will take more nodes, and take longer to learn than one
optimized to the class of problem.

And, BTW, learning in the network is totally independent from the way you
choose to wire the nodes together. So with the same learning and operation
code, all you have to do is wire up a network using a different schematic
and let it start learning. But, because of the conservation of a pulse
rule, you can't put loops inside the net, or pulses could get stuck
forever. So the only limit is that you have to use the pulse generators
with a feedback signal to create a loop.

And, my inital plan is for one largre motor loop as I talked about above.
However, there's nothing stopping us from exploring the multiple smaller
nets with pulse generators and smaller internal loops. Once again, it's
all just more toplogy experments to see if you can find different network
configurations that are better optimized to solving different classes of
problems.

Exploring the learning function, even for networks with a known bad
toplogy, is top on my list right now. Second, or almost even for top, is
exploring more interesting toplogies in the feed-foward part of the net.
It just gets very hard however to understand what the network is doing when
you have 100's of nodes in a complex non-flat topology (it's a bitch to
display and watch it in action). That's why I have tended to stay with the
bad but easier to understand chicken-wire toplogy even though I know it's
too limited.

> So, as for that one brightening pixel in the eye, pretty much the whole
> cortical world has felt the impact in 4 or 5 post retinal clicks.

Right, my network with a base fan out of 2 will take more hops to reach the
same effect. But the number of hops when the topolgy is correct, is only
log2(n) and not n. And when using comptuer hardware that can proccess data
something like 1,000 times faster than nurons, adding a few extra hops
seems not to be a problem to me based on the simplicity it created at the
node level. It also means I will need more total nodes. It will take a
large tree of nodes in my network to equal the same effect that a single
nuron in the brain gets with a fan-out of 10,000. But the trees are
shared. So a single layer of 10,000 neurons on the brain all cross
coneccted, takes a network of my nodes 10,000 wide, and log2 of 10,000 (aka
14) deep top get all the signals interconnected. So I need 14 times more
nodes than the brain would need neruons for that size net. But not 10,000
more.

It's the same issue as building computers which use base 2 instead of base
10 hardware. They seem at first to be a waste because they need so man
more digits in their numbers so you need 32 digit registers intestead of 10
digit registers to hold the same magnatude numbers. But the simplicity and
speed they bring to the logic hardware more than makes up for the added
size. I think these simple binary sorting nodes are likely to be shown to
have the same advantage.

Nature was stuck working with very slow neurons. To keep the total network
response time down, it had to use very wide fan-out systems intead of using
a deeper network and simpler and faster neurons. It just didn't have the
option to use nodes that could switch at nanosecond speeds. We do.

It's just going to require more experimentation to see where it leads. In
the end, I see the binary node as a perfect match to the reinforcment
learning problem which is why I'm working with them. When you punish one
behvior, there's only one other behavior to replace it with (switch the
pulse out the other direction). Using a node that's correctly optimized
for the reinforcment learning problem is the key to making this work.

> Just some stuff to think about...

Yes, it is...

Thanks for giving me the excuse to think and write.

> -- Michael

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: Can operant conditioning account for all learning?
    ... A network macro behavior of my pulse sorting network is the act of sorting ... are not replaying the raw sensory data form that event. ... The raw sensory data is made up of aspects of pixels. ...
    (comp.ai.philosophy)
  • Re: Can operant conditioning account for all learning?
    ... A network macro behavior of my pulse sorting network is the act of sorting ... are not replaying the raw sensory data form that event. ... The raw sensory data is made up of aspects of pixels. ...
    (comp.ai.philosophy)
  • Re: Intelligence - one of degree?
    ... > or motivation isn't that more to do with a prompt? ... > you're going to grade that sensory info so as to ... the networks act as pulse sorting networks. ... network, ...
    (comp.ai.philosophy)