[OT] Random events, uninformative priors, Bertrand's paradox, and Laplace's rule of succession.



Well, maybe not completely OT, since a lot of these ideas seem
to come up repeatedly in this group.

I'm reading E. T. Jaynes's book "Probability Theory: The Logic of
Science". Jaynes is a Bayesian and is scathing in his denunciation
of classical statistics. In particular, he objects to the practice
of trying to analyze a particular body of statistical data as if
it were the only evidence available bearing on the question. He
claims, and provides many examples, that we always have available
some prior information or expectation, and that it is unprofessional
and even criminal to fail to take that into account. But the
usual practice by statisticians is to "Let the data speak for itself."

Nevertheless, Jaynes does recognize that adopting a philosopher's
stance of total ignorance prior to seeing the data may be useful as
a thought-experiment, if nothing else. But it is a thought experiment
that is very difficult to actually pull off. A Bayesian has no
problem expressing what he knows as a prior, but it can be surprisingly
difficult for him to express what he doesn't know. Therefore, a good
part of the book deals with the technical issue of "uninformative
priors".

Some of the subtleties can be seen by examining Bertrand's paradox
http://en.wikipedia.org/wiki/Bertrand's_paradox_(probability)
which Jaynes analyzes in depth. At issue is that old favorite
question here of "What does it mean to say that something is random?"
Jaynes's answer makes use of the theory of transformation groups
and something that the mathematicians call 'Haar measures'.

Even more intriguing is the discussion of Laplace's rule of succession.
http://en.wikipedia.org/wiki/Laplace_rule_of_succession
The problem can be formulated as a ball and urn problem. We have an
urn containing some unknown number N of balls, of which an unknown
number R are red. We pick a small sample of n balls from the urn of
which r are red. What is the probability that the next ball drawn
will be red?

It is a well defined problem for a Bayesian, if he can just somehow
express his lack of information about the unknown values of N and
R as some kind of prior. Laplace solved it by postulating that the
ratio R/N is uniformly distributed between 0 and 1. And, using this
prior, the answer is that the next ball drawn has an (r+1)/(n+2)
chance of being red.

Laplace's result is a bit surprising, since one might expect that the
probability should be r/n. It is almost as if Laplace's choice of
'uninformative prior' was such that it assumed that one red ball and
one non-red ball had already been drawn before the actual random sampling
began. Our prior carries a weight of two units of real data. But
we wanted our prior to be weightless!

A little math by Jaynes proves that if Laplace had chosen a different
prior - a concave prior which makes it more likely that R/N is close
to 0 or to 1 - then his prior would indeed have been 'uninformative'
and 'weightless'. And the post-data probability would have become
r/n rather than (r+1)/(n+2) if 0<r<n. But it is far from clear (to
me, anyways) why that particular concave prior should have been chosen.

But even more interesting is what happens if r=0 or r=n - that is, if
every ball we draw is the same color. Now, the post-data probability
that the next ball drawn will be the same color as all the others rises
rapidly (exponentially) with n. So the probability that the sun will
rise tomorrow is almost a certainty, even if you don't know any physics
and have to rely on pure Bayesian analysis of the raw experimental
data.

So, why was it "right" to choose a prior which gives higher weight to
the extreme values of R/N rather than Laplace's choice of a uniform
distribution? I'm not sure, but I suspect it may have something to
do with Occam's razor. We should prefer to believe that the universe
is as simple and predictable as possible. The urn is filled with
a collection of balls that are (almost) all red or (almost) all non-red.
That minimizes the expected entropy generated by the random experiment.
A good hypothesis is one that leaves the least to chance.

Which is a surprising way to look at it, since Jaynes says that we should
choose priors based on a principle of *maximum* entropy. So, I am
very confused. Entropic, you might say.

.



Relevant Pages

  • Re: Bayesian continued and shuffling
    ... From a Bayesian point of view did I not use prior knowledge? ... An observation about hand shuffling: ... For the record I try not to adjust and keep using pure probability ...
    (rec.gambling.poker)
  • Bayesian continued and shuffling
    ... valid prior history predictable by Bayesian theorem. ... For the record I try not to adjust and keep using pure probability ... Every so often I see a reference to Bayes' ...
    (rec.gambling.poker)
  • Re: Lottery facts for Sherry
    ... Totally incorrect. ... The probability that any ball will be picked first is 1/49. ... buying one ticket in a 6/49 environment gives you one ...
    (rec.gambling.lottery)
  • Re: A simple but confusing question
    ... > Good Day, Ian. ... theorem that probability theory is the unique extension of classical logic ... Saying that you 'do not concede that a prior always exists' is ...
    (sci.stat.math)
  • Re: infinity
    ... and a leak in the dam with 1 gallon per second ... Assume that the water in the lake is ... > pong balls where the ball to be removed is picked at random. ... So what is the probability that ball 1 gets removed? ...
    (sci.math)