Re: Mechanical Dualism versus Naturalized Epistemology



Michael Olea <oleaj@xxxxxxxxxxxxx> wrote:
> Curt Welch wrote:

> > Yeah, there's my ideas that stop at the hardware I now have working,
> > then there's that great divide between what I have working, and full
> > human intelligent behavior that's just my best guess.
>
> Forget, for the moment, the gap between what you have working and "full
> human intelligent behavior" and, just for purposes of illustration, focus
> on the gap between what you have working and a machine that learns to
> play a decent game of tic-tac-toe. My understanding of what you have
> working now is the ability to route a pulse from input line i to output
> line j. Is there more to it than that?

Well, yes and no. The only things I've really tried to train my current
network to do is just the simple routing problem. But what I really have
is the structure of a network that addresses problems I couldn't find
answers to in any other structure. So I'm mostly excited about my
perceived potential in this structure because of what it does compared to
all the past machine strctures I've played with.

Finding the right way to approach a problem is often the hardest part of
finding the solution. To me, this seems to be the answer to how the
problem of AI can be approached and solved.

> We all "just don't understand the path" you see, so why not illuminate
> that path by describing how it could lead from an i-to-j pulse router to
> a tic-tac-toe learner? It is, of course, dead simple to write a program
> that plays perfect tic-tac-toe (a small LUT will do), but that is not the
> task here - the goal is a machine, built out of your stuff, that *learns*
> to play tic-tac-toe. I'm not asking you to build it, just to describe how
> to get from what you have to that simple goal by following that shining
> path the rest of us don't see.

Yeah, well, that's a good question. But not one that's easy to answer.
Tic tac toe is not a game that's easy for a human to learn. How many
_YEARS_ of learning does it take before you can train a human to play tic
tac toe? 5? And we have a network with 100 billion nodes. How smart are
monkeys but can you teach one to play tic tac toe? (I don't know the answer
to that).

In other words, my network is at such a low level, but is attempting to do
what I think the brain is doing, that even a game as simple as TTT is not
what this network is optimized to do.

However, I can still basically talk to the question.

When humans play a game like ttt, we follow a reasoning processes of
picking a move and and then looking at the possible counter moves and maybe
the counter moves to the counter moves etc. Now, all that tree search is
trivial to hard-code into a computer, but getting it to learn to play ttt
like a human requires that it first learn to reason in general. And that's
part of why it takes years before a human can learn to play a game like
ttt. And that's what I would hope this type of network could learn to do.
But understaning the complexity that might be connected with getting it to
learn to reason in general, is way beyond the level of problems I've been
looking at.

The approach that I'm trying to implement with this network is a general
purpose learning machine that learns context sensitive behaviors. That is,
all behaviors are learned as a simple answer to what the machine needs to
do in the current context. So to learn to walk to the kitchen, or play
TTT, it needs a large set of learned reactions to the current environment,
which means not just the current sensory inputs, but the recent past inputs
as well (stretching back to what we consider the limits of our short term
memory).

Now, for just reacting to the state of the external environment, you just
need to react to the sensory inputs. But, the generate a fixed behavior,
like reaching out the arm, which is independent of the the environment, the
system needs to sense it's outputs (as I discussed in the last long reply
to you).

The next step beyond that however is the internal "thinking" we can do.
That is, the thoughts we have that are not assocated with our normal
sensory inputs, or our muscle outputs. To create a learning machine that
can do that, the machine must have outputs which are not used to control
external devices, but which do include the feedback loop. At least that's
the best answer I currently have to the correct way to build that power
into the network. But I've got real concerns about whether it's really
going to be that simple because if it's not connected to a real output, how
will it learn to correctly use these outputs? I've got a few thoughts on
that issue but the point is, this is 3 steps beyond what I think needs to
be done first. First is to build the simple sensory reaction machine which
learns though reinforcment. My current network is the technolgy I'm doing
that with. Once I better understrand the power of that and get the best
algorithms I can for learning with the simple feed-foward problem, I'll
start to explore more with the feedback network where it can learn fixed
behaviors. That alone I believe will create behavior in robots that look
very much alive - cat/dog like for example. But, to get full human power
of reasoning, we have to add the internal feedback loops that allow it to
explore behaviors/memories without having to act on the ideas. This allows
it to explore possible actions before taking the action - to reason.

In order to play TTT like humans play TTT (which would be the goal of this
research), I've got to get the feed-forward network learning better, get
the action generation feedback learning, and get reasoning working. And if
it's as powerful as a human, I'll have to teach it for 5 years before I'll
be able to teach out how to play TTT.

So, even thouht TTT alone is simple, doing like we do, which is what I'm
attempting to build here, is not simple.

But, there should be shortcuts that would allow a much smaller network to
at least learn to play TTT if it doesn't have to first learn to control a
robot body for example and learn to draw an X and an O.

So the question becomes, could you get only the feed-forward version of my
network to learn to play TTT? Maybe it could.

But just to be clear, my network is an attempt to build a real time
reaction amchine, and games like TTT are not real time games. So a lot of
the complexity and power of the real time learning and reaction system are
just wasted (and even get in the way) of it trying to learn a simple
non-real time game. That is, my network has features that allow it learn
when to make moves, and not just which moves to make. And it's going to be
exploring that space as it tries to learn to play TTT. I might for example
think it moved too quickly, and next time, try moving a little slower. It
has to learn to ignore the timing to play a game like TTT which makes it
harder to learn than some other problems. Also, it's built to react to
past behavior. And TTT is not a game where the past moves are important.
Only the current board state is important. So that's more stuff that it's
going to have to learn to ignore as it learns to play the game.

But, it should be able to learn it if the network has the type of power I
want it to have.

I could for example give the simple feed-forward network multiple inputs to
constantly feed the board state to the network (maybe two per square where
one input will be active if there's an X in that square, and another input
will be active if there's an O in the square, and neither will be active if
it's empty). Then it will have multiple outputs which control it's choice
of which square to play next. This network creates "fuzzy" behavior. So
it's not expected to route every pulse to exact correct place. It's
expected to learn to route more pulses to the correct place, at the correct
time, in order to produce the correct behavior. So, the output system
needs to adjust for that. You can't expect the network to send one pulse
to the correct output to show where it wants to move. You have to do
something like send 1000 pulses to the network, and then see which output
had the largest number of pulses, and consider that output, the networks
choice for the next move.

Now, the hardest way to teach it to play TTT in this configuration would be
to reward it only when it won a game. And I'd like to think the network
would have the power to learn it that way. But if you want it to learn
that a tie is better than a loss, you would need to reward a tie as well
and/or punish a loss. And to help it learn not to make invalid moves
faster, you should punish invalid moves as well.

The end result, if this worked, would be a feed-forward network which
simply told you the best next move for any board position by sending more
pulses to the output which indicated the best move. You would feed the
network maybe 100 or 1000 pulses on each input according to the current
board posistion, and the network would learn to route pulses to the output
which was the best answer for that combination of inputs. You would have
to play the games in the correct sequence so it could correctly pass reward
prediction backwards in time to previous moves and board positions.

> > But for all the ideas I have that fill that great gap, I have lots of
> > supporting evidence that the ideas are not just some fantasy. But like
> > most people talking about their ideas, the evidence is mostly just
> > speculation.
>
> Speculation and evidence are different things. But even speculation, if
> it is worthy of attention, has to be based on some rational. If there is
> a rational supporting your speculations I missed it.

:)

> >> Your nodes may be simple, although not as simple
> >> as a logic gate, but they don't actually do anything
> >> much apparently because you haven't discovered
> >> the "right" topology yet.
>
> > That's just not true....
>
> I think "they don't actually do anything" here means something like: no
> network of your nodes has ever solved any problem in AI, any problem not
> easily solved by standard, deterministic algorithms, no problem currently
> "solved" by heuristics alone, or not solved at all. You've routed a pulse
> at input i to output j - so what?

Yeah, exactly. It does nothing that anyone but me seems to see as useful.

> >...The nodes as they are, do everything I've been
> > looking for for something like 20 years and could never figure out how
> > to do.
>
> And what is it that is significant about this "everything"?

It's a real time network in that all it's decisions are time based. This
allows it create timed reactions. Where this is natural and inherent with
this type of network, my old spatial value-calculating networks couldn't do
it at all and I was always planing on adding extra systems (memory with
feedback) to deal with the time problem. This network solves it by doing
all work in the time domain alone - so one solution solves both the spatial
and time based dimensions of the problem. It's because each node has a
little built in memory (the memory of when the last pulse went through).

The simple pulse routing paradigm solves the activity regulation problem I
had to build into my other networks with complex training systems that
controlled some form of mutual inhibition system to keep signals from
either dieing out or flooding the network.

The binary switching problem created by the pulse routing paradigm creates
a fundimential behavior that's a perfect fit to the reinforcement learning
system. It's simple and obvious what needs to be done in order to reward
or punish a routing decision. And at the same time, the time based
information the decision is based on is easily addjusted in slow increments
to slowly change the beahvior of the network. It's just a natrual fit to
the problem. If you try to adjust standard neural networks with weighted
inputs, it's not at all obvious how the weights should be adjusted when
there are multiple weights per node.

Standard multiple input neural networks are computational intensive. If
one node produces an output, you have to re-evaluate the effect it has on
every downstream node it fans-out to. And if any of those change their
output as a result, you have to re-evaluate each of their downstream nodes
etc. With the pulse routing paradigm, the pulse only goes to one
downstreadm node at each level so there's no exponetially exploding effect
for each node. If you double the number of nodes, the compution increase
only increases by the log2 of the nodes added (depending on the exact
topology and how they are added). At worse, you double the computation
when you double the network size.

The design I was playing with before this last change had two networks -
the input processing side which had the purpose of decoding and/or feature
extraction (an unsupervised learning technique) and the output side which
was trained to take all the normalized features and generate output
functions based on reinforcment learning. That design had a few issues.
One was that the feature extraction system was not trained by reinforcment
learning, so it was limited to working with the features the system found
by default and couldn't adjust the features based on need. Second, it
created computational intense learning problem trying to associate all the
features with all the possible outputs - and it had exponential growth
issues when you tried to increase the size of the network. This current
design merged the two processes - feature extraction and beahvior training
- into a single process that happens at the node level for every node.
This not only solved the exponential training problem, but it also allowed
the feature extraction to be trained based on usefulness.

These are the type of things I've been trying to figure out how to implment
for years which all came together with one simple solution when I found out
what happens when you switch to this temporal pulse sorting paradigm.

> > The problem is that no one besides me, sees any real value in the
> > features I was looking for.
>
> That is not a problem at all, if you really have something - it is an
> opportunity.

Yes, it's an opportunity. But it's a problem for me because a big reason I
play with this AI puzzle is beacuse I want to gloat when I find an answer.
So now I feel like I've made the largest breakthrough in all my work and I
can't find a single person to impress with the work. :) It's just a big
let down which causes me to whine alot. :)

> > My excitment over what I have is based on
> > what
> > I actually have working already, which you have see with your own eyes.
>
> You can't describe it?

I've described it probably 50 times in every way I can think to describe it
in posts here in c.a.p. and in various personal emails to friends and other
poeple that are willing to listen. Mostly, I thik people don't grasp what
I see because they haven't spent the time I've spent trying to solve this
version of the reinforcment learning problem. So before they will believe
the design is useful, they need to see it do something that isn't already
done in other software.

Where I really expect this type network to wake people up is when it gets
put into a robot and the thing creates these fluid purposful dynamic
motions and learns to do things like walk on it's own. It's a real time
learning network designed to solve problems like making a robot walk. It's
not optimized for solving problems like playing TTT.

> > It just doesn't cause you any excitement because you don't understand
> > the path I see ahead of us.
>
> You can't illuminate it?

I keep trying. :) But in the end, I've got to get it doing something
people haven't seen other systems do before anyone is going to believe
there is anything special here. And of course, I need to see that as well
to know if what I think this will do is actually what it will do.

> > I have no proof that my excitment or my vision for the correct path
> > (ie. reinforcement learning), is the correct path. However, no one
> > else has suggested a different path which answers the same questions.
> > mainly, how do you build creativity into a machine? No one has ever
> > suggested a different solution, where what I have is a suggested and
> > logical solution, and a working prototype.
>
> There is a ton of published work on reinforcement learning. How does your
> vision extend in any meaningful way what anybody can look up in Sutton
> and Barto's elementary introduction?

Everything in Sutton and Barto assumes Markov state signals. THis is is a
common assumption in current reinforcement learning algorthms. This is
where the state signal (sensory inputs) tells the learning algorithm the
true and complete state of the environment. They also assume the state of
the environment is small enough to allow the system to store statistical
information about every possible state. The current position of a
tic-tac-toe board is a Markov state signal beacuse it accurately describes
everything needed to make the next decision. The past history of moves is
not relevant. And to use traditional reinforcement learning techniques on
the game of TTT, you also have to store information about every possible
board position. Which with today's computers, is possible for TTT but not
possible for games with more complexity.

One example talked about in the book is the TD-Gammon backgammon program.
It also works with Markov state siganls (aka the current board position
which completly and accurately describes the current state of the
environment), but the state space is too large to store information about
every possible board position. They solved the second problem by using a
neural network in place of the value function which is normally just a
large array in standard reinforcment learning algorthms.

Nothing in Sutton and Barto talks about how you would solve the more
realistic non-Markov case. My network is a solution to the non-Markov case
and when the state space of the environment is far too large to store. The
only intersting AI problems are non-Markov because the sensory inputs never
tell you the full state of the environment. The portion you can sense is
an insignificant fraction of the state of the full enviroment. With
non-markov state inputs, you can never assume that the same inputs means
you should make the same behavior decision. For example, a beastie in a
maze that can sense walls in 4 directions, but which can not sense it's
location in the maze, or "see" the rest of the maze, is a non-Markov
problem. Just beacuse there are walls to the left and right doesn't mean
you should create the same behavior in this state as the last time you saw
walls to the left and right.

And, with a high dimension input (32 bits or more instead of 4), you can't
even assume that you will ever see the same sensory input state again and
you can't assume you will get to experence it enough times to learn what to
do in the state.

Solving high dimension non-Markov reinforement learning problems is way out
of the domain of what Sutton and Barto address in their intro book. But
it's exactly the problem my type of network was designed to solve.

> As to the assertion "no one has ever suggested a different solution" -
> well, Casey gave me some insight into why you say stuff like this - you
> are just thinking out loud. OK.

Yeah, I do that.

> The assertion is of course utter
> nonesense, but I can imagine thinking it and typing it and posting it as
> if it were a fact. It is not. A moment's reflection should suffice.

Tell me what someone has suggested for creating creativity in a machine
which is not reinforcment learning if you think my comment was utter
nonesense. I honestly do not know of any way to do it.

> Reinforcement learning is a standard technique in the AI bag of tricks.
> It is, however, not *fundamental* in the following sense: Bialek, Tishby,
> and Nemenman assert (e.g. in the paper "Predictability, Complexity, and
> Learning") that most learning can be reduced to unsuperivised learning of
> probability distributions. It was a passing comment - I wish they had
> fleshed it out. But it is not at all hard to see that RL is in fact just
> a special case of unsupervised learning of probability distributions.

The point of RL is to stress the issue that the all learning machines must
have a purpose. One way or another, they all have a choice as to what to
learn. All unsupervised learning systems in fact have a purpose hard-coded
into them, and if you separate the purpose, from the rest of the algorithm,
it becomes a reinforcment learning algorithm. Reinforcement learning is
just a formal way of separting the purpose from the learning problem, by
creating one general purpose abstrct purpose of maximizing all future
rewards.

Most (maybe all?) learning is a subet of the the more general reinforcement
learning framework.

And the general reinforcment learning framework becomes a simple
unsupervised learning problem as long as you understand that the only
probability distributions it is intersseted in, is probability of the
system receiving rewards.

> This is exactly where your vision - as articulated - breaks down: you
> have no evidence or speculation relating classes of probability
> distributions to effective algorithms. Ask yourself - what does it take
> to make RL work? One word: prediction. Whatever net, whatever scheme you
> choose, has to build a
> map of pairs (stimulus, action) -> expected-reward.

Yes, exactly. That's the RL problem right there. This is what is already
working in my network. The nodes have an output ratio value which is the
way it stores the expected-reward value for the stimuls action pair. It
does it in relative terms instead of absolute terms because the stimuls
which is being studied is a pulse showing up at the input of the node, and
the action is a binary choice - to send it out the high input or the low
input. So each node has two stimuls/action pairs which are inverse
behaviors of each other. There's no need to predict the absolute
expected-reward because the system doesn't need to know it. It only needs
to know the relative value of expected return produced by the two behavior
pairs. When one action pair is rewarded, the ratio is adjusted to increase
that behavior in the future. And by design of the network, this forces the
other action pair to be punished (reduced in the future). The value of the
ratio at any point in time records to relative value of the two action
pairs. If the ratio stays around 50/50, then both behaviros have equal
value. The more the ratio becomes skewed to one side, the more value the
one behavior has over the other because it's been rewarded more times than
the other. So the value of the ratio is the measure of the value of the
two inverse beahviors.

> This in fact means
> learning joint probability distributions. The nature of these
> distributions imposes fundamental limits on the efficacy of *any*
> learning scheme. Rates of convergence. The relation between degrees of
> freedom, number of trials, and (expected) generalization error. All this
> stuff is elaborated in the elegant work of Vapniak and Chervonenkis, for
> example. It is further elaborated in the seminal work of Bialek,
> Nemenman, Tishby, and colleagues. The key phrase is "predictive
> information". Google it. And maybe give that "no one else has yada yada"
> talk a rest till you know what you are talking about.

Well, I admit I am not well read on the work at all. I do this mostly for
the fun of discovery. Studying the work of others is seldom much fun. And
when I say "no one else" it always means "no one else I know of" which
should be obvious beacuse I don't know what everyone else on teh planet is
doing or has done. But my comment above was about not knowing of any other
solution to the creativeity problem. Do you know of one which is actully
not just a reinforcment learning machine with a different name?

> > Yet, even though I've solved an
> > implemention problem for reinforcement learning which I've never seen
> > any one else describe, no one else besides me seems to see any value in
> > this progress. Maybe because I'm still the only one in the world that
> > believes a reinforcement learning algorthm is going to be the solution
> > to AI?
>
> There is a ton of published work on RL. What have you solved?

And most of it I've never even seen let alone read. But what I've "solved"
is a network design that looks like a very good approach to solving
non-Markov high dimension real time problem spaces. Do you know of
published solutions to these types of reinforecment learning problems? You
do seem to be far more in touch with the community than I am.

> >> Until you can ground your ideas in actual hardware
> >
> > My ideas are grounded in actual hardware. I sent you the code. You
> > just don't see the value in these ideas or my implementation.
>
> Can you articulate wherein that value lies? Not just assert that it has
> value, but demonstrate it?

Read everything I replied to John and what I'm writing here. Maybe that
gives you a bit of insight into what I think I see in this network.

> > My idea that all intelligent human behavior can be explained in terms
> > of reinforcement learning is not some wild idea that only I have. It's
> > the basis of much of what Skinner spent his life writing about. It's
> > nothing new, even if it's not popular.
>
> Failing an actual explanation this is just an idle assertion. Skinner
> gave physiology its due. What do you think are the important differences
> between people, pigeons, and prokaryotes - supposing they are all RL
> machines?

The lower animals have less behavior under the control of RL and more of it
hard coded (instincts). Using strong general learning has the disadvantage
that it you are dumb when born and prone to die before learning to survive.
Humans have evoloved to the point of having a strong familly/society to
care for the young while they learn.

The interesting question is what is so different about us from the next
lower animals that allows us to have these strong langauge and planning
skills. I think the answer is that we simply have parts of our RL brain
hardware configured in a ways that it ends up being available for our
langauge skills instead of being dominated by other sensory/motor
functions. Basically I suspect we will find that it's mostly just a
topology problem that allows a larger portion of the general RL hardware
(neocortex I assume) to be allocated to the decoding, and production, of
language beahviors.

It's well known that if for example you are born with bad eye, the portion
of the brain which is normally allocated to processing data from that eye
is quickly taken over for processing data from the good eye. And if the
problem with the bad eye is not fixed early in life, the brain will be
allocated to other funcdtions and will never be available for use for the
bad eye.

I suspect the entire neocortex works this way and the way it's allocated to
different functions is heavilly enflunced by the data flow through it and
by it's physical layout (topology). To get enough brain reserved for
langauge processing I suspect it's just a matter of something suprising
simple like how the brain folds so that the langauge section in humans ends
up being isocated from other sensory/motor signals and available for
training for the detection and production of complex sound patterns.

> > One big reason it's not more popular, is because no one has figured out
> > how
> > to implement strong reinforcement learning. One step on that path is
> > undersanding how to train a multilayer network with reinforcement
> > learning. I sent you working code to show you how my type of node
> > solves that
> > problem. It's a huge step forward and one I've been searching for
> > since the mid 80's.
>
> Lots of people have found ways to train multi-layer nets via RL.

Cool. I've not seen it. Can you point to some or suggest how I would find
it?

> There is
> a ton of published work on RL. What is new and different and noteworthy
> in your work that is not already published?

The little I've seen is like Sutton's work. It's limited to dealing with
problems which have Markov state signals - or ones where the state signal
is very close to being Markov. I know there is work on non-Markov RL
problems, but I just don't know what it is.

> > To not buy into the belief that renforcement learning is key to AI, is
> > typical. Not many do.
>
> There is a ton of published work on RL.
>
> > To not understand how hard it is to make a
> > multilevel network train by reinforcment alone is just a problem with a
> > lack of experience trying to solve this problem.
>
> There is a ton of published work on RL.
>
> > However, to say my belief is not grounded in actual hardware is just a
> > failure to understand my ideas, not a failure on my part to ground the
> > beliefs in working hardware.
>
> Because you trained a net to map a pulse on input line i to ouput line j?
> Therefore "human level intelligent behavior" is just a walk down some
> yellow brick road? Get real, Curt.

Cool isn't it. :)

> >> and demonstrate it works I can only assume your
> >> ideas are just "gods" and have no more reality than
> >> the idea that a thermostat has beliefs such as
> >> "It is too hot", "It is too cold", "It is just fine"
> >> unless you explain exactly what that means, that is
> >> how it actually works, in actual hardware.
> >
> > yes, the belief that a network trained by reinforcement learning will
> > lead to full human intelligent behavior is very much just a matter of
> > faith - guided by a large collection of clues.
>
> Name two.

Humans are born dumb and learn all the interesting stuff we think of as
"intelligence". It's not hard wired in them. But they also don't have to
learn by example. We find new things on our own. We are creative in that
we find new behjaviors on our own. The only type of learning that explains
creativity is RL. If you know of another, I will most likely be able to
explain to you why it's just RL under a different name.

Behaviorist have been carefully studying how behavior changes in animals
and humans and they figured out 50 years ago how behavior is changed by a
general process of reinforcment.

All the work in reinforcment learning algorithms in computer science, even
when it's limited to the simple case of small, finite Markov state signals,
is consistent with what we understand about reinforcment in animals and
humans. Behaviors slowly change over time as a result of reinforcment.
Events which reward the behavior causes the behavior to be prefered over
other options in the future, and events which punish cause the beahvior to
be less likely to be used in the future.

The recursive reward prediction algorthms used in RL explain the secondary
reinforcement effects seen in animals. These effects exist in RL
algorithms because the program is designed not maximise current rewards,
but instead, to maximise all future rewards. Once you build the algorthm
to maximize all future rewards, you get the effect of secondary
reinforcement seen in animals and humans.

> >> Formal logic and fuzzy logic have been grounded
> >> in hardware and been shown to work. You must do the
> >> same to gain any credibility.
> >
> > Oh, hell, my networks already work.
>
> What does "work" mean? They route a pules from input line i to output
> line j?

Yeah, they can be trained by RL to solve some simple RL problems. The
point I was trying to stress to John is that he talks as if the net is
doing nothing when in fact it's doing a lot. He just doesn't see the
things that they are doing as being useful or interesting. I do.

The problem as has been pointed out several times is that the end result of
what I can train it to do is totaly uninteresting (routing a pulse from A
to B). But that's not what excites me. What excites me is that it seems
to have all the correct attributes to to solve these non-Markov high
dimension real time RL problems. If anyone else has a system which even
attempts to solve this class of probelms, I've not yet heard of it. I
would love to know if someone does, which is half of why I've been talking
about my network design here for 2 years. Of all the writing I've done
about it, no one has yet said, "so and so did that 10 years ago, here's the
paper...".

> > They just work in ways which you are
> > unable to see value in. I've tried in countless messages to expain the
> > value, but it doesn't really sink in very far.
>
> What I have seen is countless messages in which you *assert* they have
> value, none in which you *demonstrate* any value.

Yeah, that's because the value I see is it's potential. I can't
demonstrate potential to people that don't even understand the class of
problem I'm trying to solve. I have to demonstrate it learning something
which hasn't already been shown to exist in other solutions. But part of
that problem is that I'm working on the class of problems that is so hard
that most people working with RL systems have never even tried to solve it.

For example, there was the recent post here about reinforcment learning for
the robot. As far as I can tell, they reduced the problem to moving around
a 11x11 grid. To the entire state of the world was reduced to the robots
location in the 11x11 grid. This trival world only has 121 possible
states, and the robot only has 4 behaviors to pick from from each state for
a total of 484 state/action pairs in this envrionment.

With a high dimension non-Markov problem, the number of state/action pairs
is effective infinite. You can't use any of the algorhms he used on that
toy problem which only included 484 state/action pairs and was able to
calulate and evolve a complete value function for the entire state/action
space by multiple visits to every state.

> >> Your ideas are only "dirt simple" in the sense that
> >> saying the world is made of "nothing but" atoms is
> >> dirt simple. Doesn't explain much at all. To say
> >> that intelligence "nothing but" a general purpose
> >> reinforcement learning machine is simple to say but
> >> clearly, as you have found out, not really that simple
> >> when it comes to elaborating in hardware (grounding)
> >> what that might actually mean.
> >
> > I've been saying that for over 20 years. 20 years ago, I had many
> > different single layer networks working just fine. But without a
> > system that had the power to train hidden middle layers, the networks
> > had no chance of building the type of complex abstractions required to
> > explain the
> > powers of recognition and behavior humans have. Just finding out that
> > one next step to me has been huge for me. My excitement is based on
> > the fact that I've found, and I have working hardware, that shows how
> > to train middle layers with reinforcement learning.
>
> Back-prop also trains middle layer neurons. Back-prop can be driven by RL
> - does that mean backrop is "the key"?

Back-prop requires a trainer which knows the correct answer. There is no
creativity in a learning system that uses back-prob (alone) because it can
never find answers which were not shown to it by some external intelligent
agent. All it can do is make good guesses at answers for problems it has
not yet been shown an answer.

> > You talk about the changes in topology we have discussed. You talk as
> > if the net doesn't work without it. But that's just not true. It
> > works fine in the version I sent you.
>
> By "working fine" do you mean anyhting more than it has learned to route
> a pules on input line i to output line j?

That's what I mean.

> > It's powers are limited, but it still trains
> > multilayer networks just fine.
>
> By "just fine" do you mean anyhting more than it has learned to route a
> pules on input line i to output line j?

Well, it also can learn to route any middle-layer signal it created to the
output as well. So yeah, it's doing more than just the routing.

Another important part of what the net is doing is feature extraction which
happens before it even starts doing the RL. That as well is working and
it's nothing something I've talked much about because no one asks. It's
the answer to how you deal with an effective inifite sized state space for
which your sensory signals tell you almost nothing. The answer is that you
automatically divide it up in to as many equal probability features as your
network size will allow. Every internal signal in my network is a unique
temporal feature which has been extracted from the sensory signals. In
order to maximize the information represented in the default extracted
feature set, the network balances the probability of nodes being sorted
left or right (each node defaults to a sorting ratio of 50/50).

The features span back in time under the control of the network toplogy.
Eacn node is clasifying features based only on the spacing of the last
pulse. So the short term memory of each node is limited by how long it's
been since a pulse was last sorted. But, by creating a network toplogy
which fans-out (say 10 input signals fan out to a 1000 node wide network),
then the pulse frequency drops by the same factor (100). So if the average
pulse frequence is 10 per second as it enters the net, then the intial
nodes have an average short term memory of only 1/10 of a second. But as
the network fans out, and the average frequency per node drops by a factor
of 100, the short term memory increases by a factor of 100, and you get an
effective average short term memory of 10 seconds for each of those
features extracted from the sensory stream. I.e., it "remembers" that is
say some feature in the sensory stream 10 seconds ago. And it uses that
"memory" to make node routing decisions (i.e., to select behaviors). So
you control the short term memory profile of this type of net simply by
controlling it's topology, and controling the density of the encoding of
the signals sent to the network.

All that is "working" as well and has been for years.

> > The improved topology is only to continue
> > to improve it's powers. And I've sent you a design for improved
> > topology which I'm sure gets over the simple problem of not allowing
> > signal paths to
> > cross. So even the know issues with the chicken-wire toplogy have
> > already been solved and I've given you the hardware design.
> >
> > The progress here is not in question. My ideas are grouded in working
> > hardware. What's in question, is 1) is this a useful path to follow to
> > reach full human intelligent behavior, and 2) how far will the path
> > lead.
> >
> > I've got lots of supporting evidence that reinforcement learning is not
> > only a useful path, but the only path that has any hope.
>
> Speculation and raw assertions are not evidence. Do you have any actual
> evidence?

What type of learning produces creativity other than RL?

> RL is just a special case of unsupervised learning of probability
> distributions.

All unsupervised learning is just a special case of RL with a hard coded
purpose. If you think not, pick one and I'll show you how to relabel the
algorithm to show you how it's an RL system is sheep's clothing.

> The possibilities are not a matter of "hope". There are
> strong theorems relating the complexity of distributions to the
> convergence of learning algorithms.

I've seen lots of strong theorems for Markov problems. I've seen non for
non-Markov problems.

> > But how far the
> > path of reinforecment learning will take us will be unknown until it
> > takes us there.
>
> You might be interested in "Universal Artificial Intelligence", which
> proves some results about RL:

Yeah, that looks like it has potential. I can't tell from the abstract if
it addresses how one goes about solving non-Markov problems but it sounds
like it might apply. The problem with most of these attempts at formal
approaches is that they can't even touch the non-Markov problem so they
don't even try. The start off by defining "any environment" to be "an
Markov environment".

To solve the non-Markov probnlem, you have to find an system to map it into
easy to solve Markov problems. And that's what my network is doing. It's
using a society of simple 2 behavior nodes to map an infinite state space,
into the state space defined by the network (2 states per node time the
number of nodes in the network). I don't yet know how well this mapping
really works in terms of solving and converging on interesting problems
from this hard problem set. That's the further research that needs to be
done. But just the fact that it does creat what looks like a workable
maping, is part of my excitement.

I'll have to try and check out that book...

> http://www.idsia.ch/~marcus/ai/uaibook.htm
>
> http://www.reviews.com/review/review_review.cfm?review_id=131175
>
> "This strictly mathematical and information-theoretic book seeks to solve
> the quest for the optimal and universal algorithm of intelligent behavior
> and, to the extent that I have been able to verify and check this bold
> attempt, succeeds.
>
> The book basically derives this result by cleverly unifying two
> well-known but different realms, sequential decision theory and
> Solomonoff?s theory of universal induction, to arrive at a parameter free
> model of a certain universal agent that, as is argued, behaves optimally
> in any environment."
>
> >And how far my current network design can take us, and how many
> > more changes to the design will be needed until we reach the end of the
> > path, is unknown. But both the defintion of the path, and all my
> > ideas, are backed by a simple and working grounding in hardware.
>
> But, as far as anyone can tell, "your hardware design
> doesn't actually do anything worth talking about".

Yeah, so far, I've found no one that understands what I'm doing and why
it's interesting to me. All the stuff I explained above is stuff I've
explained in the past here and got no response or discussion about it. And
for the most part, no feedback that left me beliving anyone understood the
significance of it.

--
Curt Welch http://CurtWelch.Com/
curt@xxxxxxxx http://NewsReader.Com/
.



Relevant Pages

  • Re: A current list of what AI cannot yet do?
    ... All the reinforcement learning algorithms are exactly that. ... the network learn how to drive a car. ... figure out what behaviors are innate in us. ... Which means we want each output pulse to represent as much ...
    (comp.ai.philosophy)
  • Re: behavior as mapping
    ... by the combined behaviors of the network. ... If you are doing reinforcement learning which is limited to simple rewards ... But, with this pulse sorting approach, the problem has been reduced down to ...
    (comp.ai.philosophy)
  • Re: The problem with Macs ....
    ... download Doom and the first 'Network ... flood a network, and the network we had was busy enough as it was (we ... I guess I like that fact that I could just click on a game and it ... I've never found a real desire to play with the Mac OS or Linux etc ...
    (uk.comp.sys.mac)
  • Re: Neurons dont block so brain not a parallel computer?
    ... in any form without the hardware, ... the output signals back to the input of the network. ... To create global feedback in a pulse sorting network like this, ... major internal feedback loops. ...
    (comp.ai.philosophy)
  • Re: Is AI all about time?
    ... Using your pulse network of pulse sorting units based on the ... in two steps with two different learning systems. ... the higher resolution break down of features you end up with. ...
    (comp.ai.philosophy)

Loading