Re: Is the Curt net a kind of decision tree?
- From: Michael Olea <oleaj@xxxxxxxxxxxxx>
- Date: Thu, 13 Jul 2006 04:01:10 GMT
Curt Welch wrote:
Michael Olea <oleaj@xxxxxxxxxxxxx> wrote:
Curt Welch wrote:
But, and this is a big but, maximizing info throughput is not really the
issue. Isolating predictive info is the thing that maximizes survival.
Well, this is why I put reinforcement learning at the top of the list.
However, you seem to imply that isolating the predictive knowledge in the
signals is important by itself.
I don't know where you get that from. I was joking earlier about a skit, I
think it was on SNL, about a hard of hearing citizen (Gilda Radner?) who
does public opinion pieces on the news (I think it's terrible busting
school children...).
Anyway, I have made repeated statements to the contrary.
The ONLY predictive knowledge it needs, is the prediction of future
rewards. This is the brain's way for assigning value to knowledge and the
only prediction it needs to make.
Stop telling brains what to do. :-p
And the point of my questions and statements about the value of producing
decor related sensory data, is for the express purpose, of producing
better
predictions, about future rewards. So of course I feel the entire purpose
of these systems is about extracting predictive knowledge from the data -
but only in the context of being able to predict which reactions, (aka
behaviors performed in a given context) produce the most future rewards.
And from this perspective, there's only one prediction the system ever
needs to make - and that prediction is what behavior, will produce the
most
total rewards, in a given context. It's the only prediction the system
ever has to "understand" about the sensory data.
I believe that if you correctly focus on this problem, all the types of
predictive powers we hold (knowing that if we drive too fast we are likely
to be stopped by a cop and given a ticket - and the billion or so other
things we can predict), will be found to exist in the machine.
It only needs to be able to predict what is the best thing to do next.
In
the retina the two are roughly the same - there is a huge fan-in from
photoreceptors to retinal ganglia, so there is a big incentive to
maximize troughput
Seems to me you are putting the cart before the horse there. Where is
your evidence to support the idea that the information maximizing that
happens is a result of the need to deal with the fan-in and not the other
way
around? Maybe the neural system naturally performs an information
maximizing function on it's own, and as a result, there was no need for
more pathways to the brain?
I am not making an assumption one way or another. I am noting an emperical
fact - huge fan-in. and drawing conclusions from that.
My point is that you seem to be implying that evolution created a special
compression system simply to solve the need of putting more information
though a limited number neural fibers running to the brain. There's not
enough known about the system to say that as far as I know.
Certainly the full workings of human retinas, or the retinas of other
mammals, are far from known. However, models based on maximizing "channel
capacity" are a pretty good match. And, for natural scenes, the responses of
individual retinal ganglion cells are close to being independent
(decorrelated). Without speculating on the reasons for that near
independence, it has the consequence of maximizing channel capacity.
of all image information, however germain. But by the
time those 2 million or so pulse trains hit V1 the situation has changed.
Now there is a big fan-out. Maximizing total info is no longer the main
issue.
And again, you making huge assumptions here that I just don't buy.
I'm not making an assumption, but noting a fact - fan-out - and drawing from
that a cocnlusion. Another fact is that models based on maximizing
information throughput fail badly, here.
I believe it's likely that the fan out is needed in the brain in order to
improve the quality of the reinforcement learning - to allow the system to
create a higher (finger) resolution context by which to drive the behavior
system. It's not needed from the eye, to the brain, because the
reinforcement learning doesn't happen there.
Engineering Aspects of Enzymatic Signal Transduction: Photoreceptors in the
Retina
Peter B. Detwiler, Sharad Ramanathan, Anirvan Sengupta, and Boris I.
Shraiman
http://www.biophysj.org/cgi/content/full/79/6/2801
Abstract:
"Identifying the basic module of enzymatic amplification as an irreversible
cycle of messenger activation/deactivation by a "push-pull" pair of
opposing enzymes, we analyze it in terms of gain, bandwidth, noise, and
power consumption. The enzymatic signal transduction cascade is viewed as
an information channel, the design of which is governed by the statistical
properties of the input and the noise and dynamic range constraints of the
output. With the example of vertebrate phototransduction cascade we
demonstrate that all of the relevant engineering parameters are controlled
by enzyme concentrations and, from functional considerations, derive bounds
on the required protein numbers. Conversely, the ability of enzymatic
networks to change their response characteristics by varying only the
abundance of different enzymes illustrates how functional diversity may be
built from nearly conserved molecular components."
Nemenmen comments:
"This is definitely a paper worth reading. It provides a comprehensive
analysis of the amplification cascade and proves a bunch of inequalities
that put bounds on different performance aspects of the system. The results
include: relation between the time scale of the response and gains (high
gain means slow response); noise and energy dissipation (lower noise means
more energy); cascade of amplifiers (amplifiers should be about the same to
minimize noise and maximize speed of response). The paper also analyzes
amplifiers with feedback and puts bound on the minimal required gain and
the minimal messenger concentration so that the effects of 1/sqrt(N) noise
are smaller than the the distinguishable changes in the output. However, I
slightly disagree with the style of the discussion here, as well as in the
adaptation section. In my view, one should not count distinguishable states
in the signal to get the mutual information between the input and the
output of the amplifier. Instead, one should calculate the mutual
information between the output of the amplifier and the outside world, and
then the conditional distribution of the input given the outside world will
involve the sqrt(N) effects without specifically imposing them. The
adaptation section (where they claim imperfect adaptation) has two serious
flaws. First, imperfectness comes when the speed of the amplifier is kept
constant, which is an unreasonable assumption, since at lower light
intensity it may be reasonable for the amplifier to perform slower. The
second problem: why should we optimize the information flow through the
amplifier, rather than predictive information flow? [Bialek et al., 2001b].
Optimizing the latter one may create an optimization problem for the best
time scale for the amplifier and explain the imperfectness of adaptation!"
He must have been inspired:
I Nemenman. Predictive filtering in the phototransduction cascade, In
preparation.
Abstract: "Animals gather sensory information to guide their actions. But
acting takes time, and sense data are useful only to the extent that they
carry predictive information, that is, information about the state of the
world at the time of the actions. We suggest that efficient maximization,
extraction, and transmission of such predictive information, rather than
maximization of the overall channel capacity, may be the correct
optimization principle responsible for designs of some sensory systems. We
support these arguments by analyzing information transmission in the
enzymatic amplifier in the phototrandusction cascade, were maximization of
predictive information seems to explain various experimentally observed
properties, such as time scale and gain adaptation. Further, we emphasize
that some standard filters used in signal processing can be viewed as
(implicitly) maximizing predictive information as well."
But, what operation is happening in the brain in the fan-out? I suspect
the operation I'm talking about happens both in the brain, and in the eye.
It's a process of decorrelation that happens in the mapping of one set of
input signals, to a different set of output signals. If it's used in a
fan-in situation, you simply end up with a data compression effect. If
it's used in a fan-out situation, it produces a data decompression effect.
I believe it's quite possible for one basic transformation system to work
in both situations and solve both problems at the same time. And the
question of how that is implemented is the question of my post.
Isolating predictive info assumes prominance. Prediction is, of
course, estimating correlations. So the issue is no longer one of
decorrelating stimuli, but one of making correlations as transparent as
can be. So maybe Skinner gets the last laugh.
Which is, again, why I always describe the problem as reinforcement
learning - which is all about predicting which behaviors produce the most
future rewards. But how do you make a system like that work well? You
first have to use the sensory data, to define the context which each
possible behavior will be correlated against. To set that stage, I see
decorrelation (as well as information maximizing per output signal) as the
name of the game.
My point was that not all information is predictive, and not all predictive
information is predictive of consequences relevant to a given organism. So,
for example, the correlations in air pressure that signal with high
probability the high frequency call of an echo-locating bat are more
germain to a moth than would be decorrelating redundancy in sound waves
generated by Howard Cosell interviewing Ali. So, the game is not about
maximizing information per se, but detecting maximally informative stimulus
dimensions of ecological relevance - not signal compression, but signal
enhancement, call it "denoising" if you like. Howard, to the moth, is just
some non Gaussian noise to be filtered. I tend to agree with the moth, on
that one.
-- Michael
.
- Follow-Ups:
- Re: Is the Curt net a kind of decision tree?
- From: Curt Welch
- Re: Is the Curt net a kind of decision tree?
- References:
- Re: Is the Curt net a kind of decision tree?
- From: Curt Welch
- Re: Is the Curt net a kind of decision tree?
- From: JGCASEY
- Re: Is the Curt net a kind of decision tree?
- From: Curt Welch
- Re: Is the Curt net a kind of decision tree?
- From: JGCASEY
- Re: Is the Curt net a kind of decision tree?
- From: Curt Welch
- Re: Is the Curt net a kind of decision tree?
- From: Michael Olea
- Re: Is the Curt net a kind of decision tree?
- From: Curt Welch
- Re: Is the Curt net a kind of decision tree?
- Prev by Date: Re: Is the Curt net a kind of decision tree?
- Next by Date: Re: Is the Curt net a kind of decision tree?
- Previous by thread: Re: Is the Curt net a kind of decision tree?
- Next by thread: Re: Is the Curt net a kind of decision tree?
- Index(es):
Relevant Pages
|