Re: Bayesian Inference Engine



JGCASEY wrote:

Michael Olea wrote:

A final example, a famous problem, and in this case a
discrete model space, is the "Monty Hall game". You are
a contestant on a game show. There are 3 doors, and behind
2 of the doors there is a goat, while behind one door
there is a Cadillac. Assume that the probability of the
Cadillac being behind any of the doors is 1/3. You pick
a door. Now, just to up the suspense, Monty opens one of
the doors you did not pick, and reveals a goat. Now he
asks you, do you want to stick with your pick or switch.
What should you do? More precisely, what is the probability
of getting the Cadillac if you stick with your original
pick, and what is the probability if you switch? Everybody
I have asked gets the wrong answer, and has a hard time
accepting the correct answer even after it is explained.
Reportedly, that is the general trend. Anyway, I'll leave
it to you to calculate the answer (you could look it up,
but what fun would that be?).

JGCASEY wrote:

Without thinking about it I would say it doesn't matter
which of the two remaining doors you choose, your chance
is now 0.5 of getting it right?

Michael Olea wrote:

Just about everyone, it seems, draws that conclusion.
It's incorrect. For a hint, ask yourself what is the
probability you picked the Cadillac in the first place;
then ask yourself if being shown a goat alters that
probability. If you picked the Cadillac could he show
you a goat? If you did not pick the Cadillac could he
show you a goat. What information, if any, does being
shown a goat give you about whether or not you picked
the Cadillac? What are the odds you picked the Cadillac
before he showed you a goat? What are the odds you
picked the Cadillac after he showed you the goat?

Probably too many hints.

Never too many hints for me :)

Not just for you. When the solution to the puzzle - not just hints, but a
worked out solution - appeared in Parade magazine it generated a flood of
mail, some of it from mathematicians, arguing that the author was "one
sandwich short of a picnic", etc, some even decrying the sorry state of
mathematical education in the US. Amusing stuff.

When I first heard the puzzle I had been studying Bayesian inference (but
not for long, maybe 2 months?) so I was able to solve the problem. But my
first guess was,like everyone else, that the odds were 1/2 whether you
switch or not. In fact, when I worked out the odds and got a different
answer (the correct one) I thought I had done something wrong in setting up
the problem. It took me a while to realize that the unexpected solution was
correct, and to develop some intuition as to why. Now I have a lot more
experience solving those sorts of problems, and the answer is glaringly
obvious - "how can you not see it?". But I didn't see at at first either,
even with some training.

Wasn't I half right in that there was no improvement
in your chances of winning a Cadillac by changing the
choice of door?

Nope.

Suppose it were true that switching not only did not improve your chances,
but did not change them (the other possibility being that switching lowers
your chances). If that were the case - switching makes no difference - then
it would have to be true that whether you switch or not your probability of
winning a Cadillac is 1/2. But it is not true - your chances of getting the
Cadillac are not the same if you switch as they are if you don't switch..

Yesterday morning I thought up a new explanation aimed at "pumping
intuition" about the problem. It is not an explanation I have seen anywhere
else. Bear in mind that many correct explanations have been constructed,
none of which has been successful at getting, say, an entire class of
students in an introductory class on probability to "see it".

Suppose, just for the sake of argument, that it in fact makes no difference
whether or not you switch - either way your probability of getting the
Cadillac is 1/2. In that case the policy of never switching is optimal (as
is the policy of always switching, or switching at random). So suppose all
contestants follow that policy - never switch. And since it makes no
difference whether you switch or not they would get the Cadillac about 1/2
the time. That means the original pick was correct about 1/2 the time But
how is that possible, since the original pick had 1 out of 3 chances of
being right? It is not possible. The premise was faulty.

You could use the same argument for the 8 door case below. For switching or
not switching to make no difference in that case, your original pick would
have to have a probability of 1/2 of being correct, rather than 1/8.

Possibilities:

GGC
GCG
CGG

Right. So the a priori probability that you picked C is 1/3.

Now if I know the goat is revealed behind one of the
doors we get.

G?? -> GGC or GCG
??G -> GCG or CGG
?G? -> CGG or GGC

Right idea - tabulate the possibilities - but it does not go far enough.
There are 18 cases to consider, but there is a much more efficient approach
than tabultaing the possibilities - Bayesian inference.

So suppose, without loss of generality (as mathematicians say), that you
picked door 'A'. This could be any one of the 3 doors, it does not matter -
whatever door you picked we will call it door 'A'. Now Monty opens one of
the other two doors, revealing a goat - lets call that door 'B'. There is
one other door, call it 'C'. What we want to know is P(A|B), the
probability that the Cadillac is behind door A (the door we chose), given
that Monty revealed a goat behind door B. Alternatively, we want to know
P(C|B), the probability that the Cadillac is behind door C, the door we did
not choose, given that Monty revealed a goat behind door B. Now, the
probabilities P(A|B) and P(C|B) must sum to one since it is certain that
the Cadillac is behind either door A or door C. So if you know either one
of P(A|B) or P(C|B) you know the other.

Using Bayes' rule, we know that:

P(A|B) = P(B|A) * P(A)/ P(B)

P(A) - the prior probability that the Cadillac is behind the door we picked.

This is 1/3, the probability that we picked the Cadillac.

P(B|A) - the probability that Monty picks door B if the Cadillac is behind
door A (the one wee picked).

This is 1/2. If the Cadillac is behind the door we picked, Monty can pick
either of the two remaining doors.

P(B) - the "marginal (i.e. "total") probability" that Monty picks door B.

This is a little trickier:

P(B) = P(A)*P(B|A) + P(B)*P(B|B) + P(C)*P(B|C)

P(A) and P(B|A), we alreaady dealt with. P(B) and P(C), the prior
probabilities that the Cadillac is behind door B and that is is behind door
C, respectively, are each 1/3. P(B|B), the probability that Monty picks door
B given that the Cadillac is behind door B is 0. P(B|C), the probability
that Monty picks door B when the Cadillac is behind door C (one of the two
we did not pick) is 1. In short, if we picked the Cadillac Monty can pick
either of two doors, otherwise there is one door he must pick. So:

P(B) = (1/3)*(1/2) + (1/3)*0 + (1/3)*1 = 1/2

Now plugging in the numbers:

P(A|B) = ((1/3)*(1/2))/(1/2) = 1/3.

So if we stick with our original choice our probability of getting the
Cadillac is 1/3. This should not be surprising (but it seems it is) since
that choice had a 1/3 probability of being correct. We have received no new
information about whether that chocie was correct or not. If we picked the
Cadillac, Monty will show us a goat behind some other door. If we did not
pick the Cadillac, Monty will show us a goat behind some other door. Showing
us a goat behind some other door gives us no information at all as to
whether we picked the Cadillac or not. So that probability remains 1/3. And
therefore the probability of getting the Cadillac if you switch is 2/3:

P(C|B) = P(C)*P(B|C)/P(B) = (1/3)*1/(1/2) = 2/3

So you are twice as likely to get the Cadillac if you switch than if you
don't. One way to see this is that there ia a 1/3 probability you picked the
car. Therefore there is a 2/3 probability that you did *not* pick the car.
In other words there is a 2/3 probability it is *not* behind door A, a 2/3
probability it *is* behind either door B or door C. Some new information
comes in - it is not behind door B. Therefore there is a 2/3 probability it
is behind door C. JPl used this approach, and you did not get it (no knock
on you, most people don't).

In effect Monty is offering you a 2 for 1 deal, letting you cash in door A
for both doors B and C, it's just that door B has been "bundled" with door
C, which now "carries the weight of both doors".

Showing you a goat behind some door you did not pick gives you no
information about whether or not you picked the car, but it does give you
information - about where the car is. How much information? Just the
entropy of the distribution prior to the message "goat behind door B" minus
the entropy of the distribution after that message:

Before:
-P(A)*log2(P(A)) -P(B)*log2(P(B)) -P(C)*log2(P(C)) =
-(1/3)*log2(1/3) -(1/3)*log2(1/3) -(1/3)*log2(1/3) = -log2(1/3) = log2(3) =
1.58496250072115618146 bits.

After:
-P(A|B)*log2(P(A|B)) -P(C|B)*log2(P(C|B)) =
-(1/3)*log2(1/3) -(2/3)*log2(2/3) = 0.91829583405448951474 bits

So we gained about 0.66666666666666666672 of a bit of information - just
about 2/3 of a bit.

So knowing a goat is behind particular door reduces
the number of possible combinations left to two?

Right - the Cadillac is either behind the door you picked, or behind the
other closed door.

Which is where the notion of the remaining chances
were 0.5 comes from.

There are two possibilities - the faulty conclusion is that they are equally
likely.

You might increase the number of doors with goats.

That is one of the typical ways of trying to help people get it - extreme
cases, 100 doors, 1000 doors, a million doors. JPl used this approach. And
you still did not get it (again, no knock on you, most people still don't).

GGGCGGGG odd on a Cadillac become 1:8 regardless
I suppose of revealing a goat behind one of the
remaining doors after you make a choice.

Again aren't I right to say the there is no advantage
in changing your choice? Even if in the 8 door example
you are shown 6 doors with goats behind them?

Nope. Now the differences in the chances of getting the Cadillac if you
switch or don't switch are even greater than in the original problem. If
you switch you are 7 times more likely to get the Cadillac than if you
don't switch.

Understanding the mechanisms behind human intelligence
requires, I think, thought about why we make errors,
why we aren't perfect at statistical analysis, as you
might think that would have survival value?

Steven Pinker does cover this question in "How the
Mind Works", Good Ideas, Ecological intelligence.

I was going to expand on this theme, for I suspect the
mechanization of "thought" has not been the mechanization
human thought, but rather the mechanization of procedures
that human thought has invented. I think statistical
inference engines would fall into this category along
with all the other AI programs written so far to grind
out high level results based on mathematical, logical
or even heuristic, rules.

There are several issues here. And there has been much speculation as to why
people are bad at this problem. Certainly some of that speculation runs
along "evolutionary psychology" lines of thought. Note that the response to
the question "does it make a difference if you switch" is not random. The
response has a high probability of being incorrect - at least that is the
anecdotal evidence, there have been some systematic studies of this sort of
thing.

You could design an operant conditioning experiment that mimics the problem.
The "matching law" would predict that after the response rate stabalized
switching would occur about 2/3 of the time (and so not switching would
occur about 1/3 of the time). This is suboptimal. The optimal strategy is
to switch every time. But switching every time is optimal only if you make
the assumption that the probabilities of reinforcement when you switch and
when you do not switch are not changing. If they can change, it makes sense
to continue to monitor the value of alternative responses, though at a rate
of about the current estimate of expected return.

Note that this experiment tests choice behavior, not "what people say about
optimal strategy" behavior. I remember, vaguely, an experiment that ran
something like this: people (probably students) entered a room with a
lever, a slot that delivered nickles, and maybe a couple of buttons they
could push, and a couple of lights that flashed from time to time. The
delivery of nickles was, I believe, on a variable ratio schedule of eye
blinks. On avereage a nickle was delivered for every 10, say, eye blinks.
Sometimes it would be 5 eye blinks, sometimes 15, but on average 10 (yes, I
am making these numbers up, what I do remember is that reinforcers were
delivered based on eye blinks). All subjects increased their rate of
blinking. They also did many other things - pulling the lever once with the
left hand and then twice with the right (I am making that up too, I just
remember that there was a wide variety of that sort of thing they did).
After the experiment they were interviewed - they were asked what they
thought got them a nickle. They all had theories, but none of them, still
batting their eyes like a starlet, said "it was eye blinks".

Bayesian inference is a learned behavior. Finding an analytical solution to
a problem is a learned behavior. It can pay off, but the payoff is defered,
and the cost in learning labor is high. Future rewards are discounted
compared to current rewards - the farther in the future the payoff the more
it is discounted. Pinker argues that this has an evolutionary basis. But if
a problem is complex, or unfamiliar, then guessing rather than thinking is
the road more traveled by.

There have been many studies, most of the ones I know about coming from
psychophysics, but others too, including even the itinerary of fixations
and saccades, in which the actions of a Bayesian agent are, in these cases,
a good model of behavior, including human behavior.

And there have been many studies showing that the responses of particular
individual neurons are well modeled by a Bayesian agent. For example, the
H1 neuron of the blowfly, a wide-field horizontal motion-sensitive neuron,
is an optimal estimator of the horizontal component of the angular velocity
of the wide field. How so? You can record the spike train coming from H1,
and from that you can reconstruct the angular velocity. This reconstruction
is proveably optimal, given the resolution limits and sources of noise of
the compound eye. You cannot build a device with the same limits that
provides, by any algorithm, a better estimate of the angular velocity
signal. By the way, you can also predict from the spike train the
horizontal torque signal (a function of time) generated by the fly's wings,
though this will vary, depending, for example, on whether or not the fly
has eaten recently. The mapping from angular velocity to torque is not
fixed, but it is predictable.

As multi-channel recording becomes more practical these sorts of studies can
be done on populations of neurons, not just individual neurons.

Hawkins and George go so far as to speculate that the entire neocortex acts
as a giant Bayesian inference engine:

"Belief Propagation and Wiring Length Optimization as Organizing Principles
for Cortical Microcircuits". George and Hawkins

Abstract:

/*
In this paper we explore how functional and anatomical constraints and
resource optimization could be combined to obtain a canonical cortical
microcircuit and an explanation for its laminar organization. We start with
the assumption that cortical regions are involved in Bayesian Belief
Propagation. This imposes a set of constraints on the type of neurons and
the connection patterns between neurons in that region. In addition there
are anatomical constraints that a region has to adhere to. There are
several different configurations of neurons consistent with these
constraints. Among all such configurations, it is reasonable to expect that
Nature has chosen the configuration with the minimum wiring length. We cast
the problem of finding the optimum configuration as a combinatorial
optimization problem. A near optimal solution to this problem matched
anatomical and physiological data. As the result of this investigation, we
propose a canonical cortical microcircuit that will support Bayesian Belief
Propagation computation and whose laminar organization is near optimal in
its wiring length. We describe how details of this circuit match many of
the anatomical and physiological findings and discuss the implications of
these results to experimenters and theorists.
*/

From the intro:

/*
Perceptual systems have to deal with uncertain information in the world.
Thus Bayesian techniques have come to be widely viewed as learning and
inference mechanisms employed by the cortex. Bayesian Belief Propagation
(BBP) introduced by Pearl [6] is among the most successful inference
algorithms in computer vision and machine learning. In [5] Lee and Mumford
suggest that cortical regions could actually be doing BBP computations,
without giving details of the required mechanisms. Recent work by Rao [7]
and Deneve [2] shows that Bayesian Belief Propagation can be implemented in
spiking neurons. They did not investigate an anatomical connection and
treated single neurons as the BBP computation engine there by (sic)
restricting them to encode binary states. What are the neural and
anatomical substrates of the Bayesian computations employed by the
neocortex?
*/

http://www.stanford.edu/~dil/invariance/

This does not mean that people will act like a Bayesian agent when playing
lets make a deal, unless they play it over and over. And even then what
they *say* about what they are doing will not be what a statistition would
say, unless they have experience doing that. For one thing these Bayesian
nets are dynamic - the probabilities they implicitly encode are changing
with experience, as are "categories" (random variables) and the links
between them. However, the form of these implicit models within models is
heavily biased in favor of hierarchies - an implicit prior over model space
reflecting the structure of "correleations in the environment". In other
words, this is not a general purpose inference engine, or a general purpose
learning system, or a general purpose AI - it is a human one, or a
mammalian one, or...

So, from this point of view, it is those things we do with little or no
"thought", because we have done them over and over, that most resemble the
acts of a Bayesian agent. From "on intelligence":

/*
When you come home each day, you usually take a few seconds to go through
your front door, or whichever door you use. You reach out, turn the knob,
walk in, and shut it behind you. It's a firmly established habit, something
you do all the time and pay little attention to. Suppose while you are out,
I sneak over to your home and change something about your door. It could be
almost anything. I could move the knob over an inch, change a round knob
into a thumb latch, or turn it from brass to chrome. ... When you come home
that day and attempt to open the door, you will quickly detect that
something is wrong.
*/

Now, how does this computational view square with a "neurobiological systems
as dynamical systems" point of view? There is no conflict. Dynamical
systems can transduce, transform, transmit, and "store" information -
implicitly. They can "compute" - implicitly. Dynamical systems within
dynamical systems that can move easily between chaotic and regular regimes,
where "categories" are basins of attraction, changing shapes, bifurcating,
and coalescing, are one way to "mediate" Bayesian inference. But that is a
topic for another time.

-- Michael


.



Relevant Pages

  • Re: A Problem with Monty Hall
    ... The Monty Hall problem is a puzzle in game theory involving probability ... behind one is a car, ... The player is allowed to open one door, ... I'm not sure I would switch. ...
    (rec.puzzles)
  • Re: Monty Hall problem
    ... second contestant the odds are 1/2 for each of the remaining doors. ... Sigh...it's the probability of a coin toss... ... The second contestant will choose the door with the car 1/2 of the times ... even if he knows that the right choice is to switch, and that the prob of success is 2/3 ...
    (sci.math)
  • Re: Two children Two(?)problems
    ... Jean-Pierre said: ... > Which is the probability so that the other is a girl? ... each of whom is hidden behind a different door. ... If you now switch your guess, ...
    (rec.puzzles)
  • Re: Elegant 17th-Century Proof of Fermats Last Theorem
    ... "On the one hand, /if/ you have chosen the door with the prize, ... and then he offers them individually the chance to switch. ... It's easy to see that the one who stays with the original door wins if ... switcher's probability is 2/3. ...
    (sci.math)
  • Re: Bayesian Inference Engine
    ... There are 3 doors, and behind 2 of the doors there is a goat, while ... Cadillac being behind any of the doors is 1/3. ... You pick a door. ... just makes no difference what Monty knows or what door he opens. ...
    (comp.ai.philosophy)