Re: Gradual Learning, not Reinforcement Learning
- From: "feedbackdroid" <feedbackdroid@xxxxxxxxx>
- Date: 17 Jul 2006 08:48:47 -0700
Curt Welch wrote:
"feedbackdroid" <feedbackdroid@xxxxxxxxx> wrote:
Can something be
reinforced that you've never seen or done before?
Of course it can. A good reinforcement learning machine is shaping classes
of behaviors as it learns. It's not learning specific reactions. It does
it by shaping the operation of a classifier as it learns. All possible
stimulus inputs then are guaranteed to fall into some class so that the
system will always have an "answer" as to how to respond. The answer will
be based on the reinforcement learning systems evaluation of what class the
current situation falls into, and on the systems past experience with other
events that might have been different, but yet fall into the same
classifications.
I think you're mixing things here. If your trained machine receives an
input it has "never" seen before, and which is adequately far removed
from the centroid of your training set, it will produce "some"
response,
but it's unlikely to produce the "correct" response. OTOH, if the
"novel"
input is adequately close to one of your training set prototypes, then
it
really isn't novel.
You can see one implementation of this in action in TD-Gammon. Each move
which gets reinforced shapes the weights of the neural network which causes
many other similar moves to be reinforced at the same time. It doesn't
have to see every move, to be able to make a good "guess" at how to respond
to a move. It has a good (for Backgammon) system for classifying moves
into response classes so that it can successfully merge it's learning from
other moves, to make a good guess at how to play a position it has never
seen before.
This makes some sense, but playing Backgammon, or learning the rules
for
other games, is not general intelligence.
This power to correct make a "good" guess for situations never seen is the
one key missing piece in general reinforcement learning systems. How it
does it is easy to understand in theory - it simply needs a system that
automatically creates a closeness function and produces an answer which is
some type of merging and selecting, from the situations it has seen.
Merging and selecting. Yes. This is one of the "other mechanisms",
besides
just the basic learning device, that I alluded to last time. Additional
structure,
of yet-unknown variety.
But
how you do this so that a generic system of measuring "closeness" (one not
hand tuned to the application like it was in TD-Gammon), to do a good job
is the hard question that has not been well answered.
Also, there are some neural nets that do "1-pass" learning. Is this
reinforcement?
Also, Edelman would probably saw something like behaviors are
selected for via internal mechanisms. IOW, any given stimulus might
elicit any #of potential behavioral responses, but only one of these
ends up being selected for execution. Certainly this happens when
you search for the proper word to stick into a sentence. Internally,
many words are filtered past before one is finally spoken. And then
of course you have to option to stop saying the word even while it's
being spoken, if it's not the right selection. Plus, there are multiple
options for how the word is spoken, emphasis, inflection, etc.
Reinforcement learning is only part of the system.
And where is your evidence to show that all those "options" are not
selected for by the same low level reinforcement learning system?
Basically, in the observation that 50 years of creation of naiive
learning
devices hasn't solved the problem. Many people, such as Grossberg,
have realized this and have tried to put various forms of additional
structure into their systems, but so far a good general solution hasn't
been found.
.
- Follow-Ups:
- Re: Gradual Learning, not Reinforcement Learning
- From: Curt Welch
- Re: Gradual Learning, not Reinforcement Learning
- References:
- Gradual Learning, not Reinforcement Learning
- From: Jim Bromer
- Re: Gradual Learning, not Reinforcement Learning
- From: Glen M. Sizemore
- Re: Gradual Learning, not Reinforcement Learning
- From: Jim Bromer
- Re: Gradual Learning, not Reinforcement Learning
- From: J.A. Legris
- Re: Gradual Learning, not Reinforcement Learning
- From: Jim Bromer
- Re: Gradual Learning, not Reinforcement Learning
- From: J.A. Legris
- Re: Gradual Learning, not Reinforcement Learning
- From: Glen M. Sizemore
- Re: Gradual Learning, not Reinforcement Learning
- From: J.A. Legris
- Re: Gradual Learning, not Reinforcement Learning
- From: Curt Welch
- Re: Gradual Learning, not Reinforcement Learning
- From: J.A. Legris
- Re: Gradual Learning, not Reinforcement Learning
- From: feedbackdroid
- Re: Gradual Learning, not Reinforcement Learning
- From: Curt Welch
- Gradual Learning, not Reinforcement Learning
- Prev by Date: Re: Gradual Learning, not Reinforcement Learning
- Next by Date: Re: Gradual Learning, not Reinforcement Learning
- Previous by thread: Re: Gradual Learning, not Reinforcement Learning
- Next by thread: Re: Gradual Learning, not Reinforcement Learning
- Index(es):
Relevant Pages
|
|