Re: Gradual Learning, not Reinforcement Learning




Curt Welch wrote:
"feedbackdroid" <feedbackdroid@xxxxxxxxx> wrote:
Can something be
reinforced that you've never seen or done before?

Of course it can. A good reinforcement learning machine is shaping classes
of behaviors as it learns. It's not learning specific reactions. It does
it by shaping the operation of a classifier as it learns. All possible
stimulus inputs then are guaranteed to fall into some class so that the
system will always have an "answer" as to how to respond. The answer will
be based on the reinforcement learning systems evaluation of what class the
current situation falls into, and on the systems past experience with other
events that might have been different, but yet fall into the same
classifications.



I think you're mixing things here. If your trained machine receives an
input it has "never" seen before, and which is adequately far removed
from the centroid of your training set, it will produce "some"
response,
but it's unlikely to produce the "correct" response. OTOH, if the
"novel"
input is adequately close to one of your training set prototypes, then
it
really isn't novel.



You can see one implementation of this in action in TD-Gammon. Each move
which gets reinforced shapes the weights of the neural network which causes
many other similar moves to be reinforced at the same time. It doesn't
have to see every move, to be able to make a good "guess" at how to respond
to a move. It has a good (for Backgammon) system for classifying moves
into response classes so that it can successfully merge it's learning from
other moves, to make a good guess at how to play a position it has never
seen before.



This makes some sense, but playing Backgammon, or learning the rules
for
other games, is not general intelligence.



This power to correct make a "good" guess for situations never seen is the
one key missing piece in general reinforcement learning systems. How it
does it is easy to understand in theory - it simply needs a system that
automatically creates a closeness function and produces an answer which is
some type of merging and selecting, from the situations it has seen.



Merging and selecting. Yes. This is one of the "other mechanisms",
besides
just the basic learning device, that I alluded to last time. Additional
structure,
of yet-unknown variety.



But
how you do this so that a generic system of measuring "closeness" (one not
hand tuned to the application like it was in TD-Gammon), to do a good job
is the hard question that has not been well answered.

Also, there are some neural nets that do "1-pass" learning. Is this
reinforcement?

Also, Edelman would probably saw something like behaviors are
selected for via internal mechanisms. IOW, any given stimulus might
elicit any #of potential behavioral responses, but only one of these
ends up being selected for execution. Certainly this happens when
you search for the proper word to stick into a sentence. Internally,
many words are filtered past before one is finally spoken. And then
of course you have to option to stop saying the word even while it's
being spoken, if it's not the right selection. Plus, there are multiple
options for how the word is spoken, emphasis, inflection, etc.

Reinforcement learning is only part of the system.

And where is your evidence to show that all those "options" are not
selected for by the same low level reinforcement learning system?



Basically, in the observation that 50 years of creation of naiive
learning
devices hasn't solved the problem. Many people, such as Grossberg,
have realized this and have tried to put various forms of additional
structure into their systems, but so far a good general solution hasn't

been found.

.



Relevant Pages

  • Re: Gradual Learning, not Reinforcement Learning
    ... A good reinforcement learning machine is shaping classes ... some type of merging and selecting, from the situations it has seen. ... Edelman would probably saw something like behaviors are ...
    (comp.ai.philosophy)
  • Re: Gradual Learning, not Reinforcement Learning
    ... The answer will be based on the reinforcement learning ... on the systems past experience with other events that might have been ... selection of behaviors in situations it has never seen before. ...
    (comp.ai.philosophy)
  • Re: Existential risks
    ... Reinforcement learning machines have no desire to survive. ... They only have the innate desire to produce behaviors which ... Why on earth would I ever suggest the RLM didn't have the power to ...
    (comp.ai.philosophy)
  • Re: Goal of AI: Perfect or Bounded Rationality
    ... humans are simply reinforcement learning machines. ... We know how to build simple reinforcement learning machines - we ... future use of behaviors that our rewarded. ...
    (comp.ai.philosophy)
  • Re: Goal of AI: Perfect or Bounded Rationality
    ... humans are simply reinforcement learning machines. ... We know how to build simple reinforcement learning machines - we ... future use of behaviors that our rewarded. ...
    (comp.ai.philosophy)