Fundamental Limits to Curt's Nets
- From: Michael Olea <oleaj@xxxxxxxxxxxxx>
- Date: Sat, 07 Jan 2006 07:00:03 GMT
Hey, Curt. I figure, since I was such a jerk the other night, I owe you some
"quality time". For now that means pointing out some fundamental limits
with your nets - problems they are inherently not able to solve. The root
cause of these limits has nothing to do with RL, but with the architecture
of the nets - they are limitations of all strictly feedforward nets.
You wrote:
> However, the "react to the world" just means that it must produce some
> decision on what to do next, based on every new peice of sensory input
> data that shows up. If one pixel in the eye senses that the light is
>getting slight brighter, how should the system as a whole react to that?
>the point is, any AI system must be able to react to any and all bits of
>data flowing into it. The reaction might be to just ignore the data and
>pretend it enver happaned, but most liekly, the reaction will at a mimimal
>be to remeber the data for some period of time, and allow the memory of
>that data to effect future reactions.
This is maybe as good a place as any to start. Your nets cannot remember the
data. They can be influenced by it, alter some parameters, but they cannot
"remember the data" indefinitely. No feedforward net can: it takes a loop.
Imagine an xor circuit with the output fed back into one of the inputs (so
there is only one free input to the circuit). Such a circuit will
"remember" an input pulse on its free line indefinitely, by continuing to
generate output pulses until a new pulse arrives on the input line. It acts
as a counter mod 2. Such circuits can be strung together to count any power
of 2. Lets try a little ascii art (I hope you have a monospaced font).
+1 +1
---I_1(t)---+------>(1)-------->(1)-----+--O(t+2)-->
| ^ ^ |
| -1 | -1 | |
| | | |
+-->-----------------+ | V
| | | +1 | |
^ +-------(1)----->----+ |
| |
+--------------<------------------------+
I_1(t) is the free input, which at any time t is in state pulse/no pulse
O(t+2) is the output. Just to keep things simple time here is discrete. The
output is again pulse/no pulse
(1) denotes a node with a fixed firing threshold (of weighted sum of inputs)
+/-1 denote input weights.
So this circuit (unless I messed up) does what I said:
----------------------------
t | 1 2 3 4 5 6 7 8...
-------+--------------------
I_1(t) | 0 0 0 1 0 0 0 0 ...
-------+--------------------
O(t) | 0 0 0 0 0 1 1 1 ...
----------------------------
Your nets can't do that. Is this important? I think it means: 1) they can't
count (except upto an arbitrary fixed value that depends on the size of the
nets)* 2) they cannot do arithmetic (except within arbitrary fixed bounds
that depend on the size of the nets). It is not just that your nets cannot
learn to do these things - they cannot do them period. If you think they
can, then prove me wrong. Don't train them to do arithmetic - just show a
configuration, however achieved, that can add a set of arbitrarily many
summands. It's ok if your net uses scratch paper to do it. After all,
people cannot do arbitrary sums in their heads, but they can do them with
pencil and paper.
*Caveat: your nets can, of course, count pulses by simply forwarding them.
What they cannot do (assuming they had learned to read) is execute a
command like "count from one to one thousand two hundred fifty four".
The real limits run deeper. It's, again, not just your nets, but all
feedforward nets. They cannot do recursion. Again, it takes a loop. This
means these nets cannot achieve one of the fundamental hallmarks of human
intelligence: the ability to compose a "complex" solution out of simple
building blocks.
This is a serious, and I would claim fatal, flaw. Your nets (and all
feedforward nets) are denied the resources of discrete combinatorial
generating systems. DNA (in its milieu of promoter and inhibitor
polymerases) is a discrete combinatorial generating system. Language is a
discrete combinatorial generating system. So is the immune system. Such
systems achieve arbitrary complexity (sophistication) via nesting of
primitives into a combinatorial explosion of hierarchies. Nets with loops
can do that. Your nets cannot.
Again, none of this bears at all on RL as a learning scheme - it is about
limits inherent in the architecture, quite apart from any learning scheme.
Time for some more mundane observations. Your nets are regular, loop-free,
and have a tiny fan-out (factor of 2). Biological nets are relatively
irregular, have loops over a wide range of lengths (cycles from 2 to many
thousands), and have fan-outs typicaly in the range of 1,000 to 10,000. The
biological architecture is quite probably no accident.
One consequence of the regular 2-fold fan-out of your nets is that widely
seperated inputs can have no influence on each other at all until
correspondingly deep layers in the net are reached (it takes that many
layers for pulses to scoot over till they converge on a common child
node).This is another severe limit on the "logical depth", the ability to
form non-trivial hierarchies, of your nets.Simple correlations of widely
spaced inputs cannot be detected till late in the game. This is not only a
damper on nested hierarchies, it also adversely affects reaction times.
Biological neural nets do not have that problem. They all (all the ones that
have been checked, from C. Elegans to Homo Sapiens) have an organization
Watts and Strogatz have dubbed "small worlds nets". In brief what that
means is that while most neural connections are relatively short range,
linking close neighbors, some are medium range and some are long range,
linking regions far and wide in a single hop. It's the "six degrees of
separation" idea - the idea that you can pick at random any two people on
Earth and on average they are connectected by a chain of mutual
acquaintances of at most 5 hops. That's why they call it a "small world"
net. Most of your acquaintences are local - neighbors. But maybe you know
one or two out of towners. Maybe someone you know has a friend in Milano.
And someone she knows has a friend in Beijing... Small worlds. So the
quantitative study of that sort of thing looks at probability distributions
over a number of quantities - average node degree (fan-out), clustering
(fraction of links to close neighbors), dispersion (probability of links as
a funtion of distance), and path-lengths (number of hops in the shortest
path between nodes chosen at random). It just takes a (relatively) few mid
and long-range links to sharply reduce average path lengths. Small worlds.
All biological neural nets do that. Yours don't. In fact, even the
metabolic, regulatory, and signal-transduction nets in single cells do
that. Yours don't.
So, as for that one brightening pixel in the eye, pretty much the whole
cortical world has felt the impact in 4 or 5 post retinal clicks. Just some
stuff to think about...
-- Michael
.
- Follow-Ups:
- Re: Fundamental Limits to Curt's Nets
- From: Lester Zick
- Re: Fundamental Limits to Curt's Nets
- From: Curt Welch
- Re: Fundamental Limits to Curt's Nets
- References:
- Re: motor emulation, modularity, feedback, prediction, and dreams
- From: Curt Welch
- Re: motor emulation, modularity, feedback, prediction, and dreams
- From: JGCASEY
- Re: motor emulation, modularity, feedback, prediction, and dreams
- From: feedbackdroids
- Re: motor emulation, modularity, feedback, prediction, and dreams
- From: JGCASEY
- Re: motor emulation, modularity, feedback, prediction, and dreams
- From: Curt Welch
- Re: motor emulation, modularity, feedback, prediction, and dreams
- Prev by Date: Re: motor emulation, modularity, feedback, prediction, and dreams
- Next by Date: Re: motor emulation, modularity, feedback, prediction, and dreams
- Previous by thread: Re: motor emulation, modularity, feedback, prediction, and dreams
- Next by thread: Re: Fundamental Limits to Curt's Nets
- Index(es):
Relevant Pages
|