Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
Jonathan King wrote:
On 6/5/06, Stephen Montgomery-Smith <EMAIL:PROTECTED> wrote:
Jonathan King wrote:
> On 6/5/06, Stephen Montgomery-Smith <EMAIL:PROTECTED> wrote:
>>
>> My sense is that statistics as a field doesn't have the strong
>> foundations that, say, math or physics has. I think this field
needs a
>> "Newton" to come along and sort it all out. But I also think that the
>> time is ripe for this to happen, just like Einstein's theories were
ripe
>> for their time.
>
> I'm not sure the analogy is exact. In one sense, statistics did
> already have a Newton: Bayes Theorem is about as amazing and basic a
> thing as you're ever likely to get in almost any field. But what I
> think the Reverend Bayes couldn't do (and arguably shouldn't have
> done) is tell us what our priors should be. There is a lot of work
> being done on establishing UNinformative priors that everybody could
> agree on, but which allow statistical inferences to be made with some
> reasonable amount of efficiency. I know why this is being done, but I
> don't see it as the big problem.
I don't see Bayes Theorem as the answer. I have tried some of these
uninformative priors in my species counting problem, and they fail
dramatically (they always predict that the number of species you haven't
seen is zero).
Wow; that does seem to be a problem. Are you insisting that your
priors be proper or something? You seriously might consider talking
to Jeff Rouder about this; he and his statistics pals ran up against
some visciously nasty weirdness trying to get some of his stuff to
work.
I am allowing non-proper distributions, and I think that this is
precisely the problem.
So consider this problem. You have a machine that delivers heads or
tails. (I won't say "coin" because that gives you too much info.) The
uninformed prior distribution for the parameter p that describes the
probability that the machien will give you a head, is 1/p(1-p) which is
definitely not proper.
So do the following experiment - do precisely one trial. (I.e. toss the
coin one time.)
If you get a head, and you normalize the resulting distribution for p,
you conclude that p=1 with certainty. Similarly if you get a tail, you
conclude that p=0 with certainty. Clearly this is not the right answer.
Now with my species count problem, if you suppose that the number of
species is n, then the populations of each species is with probabilities
p1,p2,...,pn where pk>=0 and sum pk = 1. What prior distribution should
you put on these p's? The uninformed distribution is 1/(p1p2...pn),
which like the previous example, means that any species you haven't seen
will, after normalization, with certainty not exist.
So you are looking for potential prior distributions on this (n-1)
dimensional "tetrahedron" (mathematicians call it a simplex). If you do
the obvious uniform distribution, and do the calculations (which are not
so easy by the way) you get an answer that doesn't depend on how many of
each species you observed, only on how many species you observed.
I did try other distributions on this (n-1)-simplex, but none of them
really worked well, although they all did better than the uniform
distribution.
(I assumed a uniform distribution on the prior of n itself, but my sense
is that the prior on n will not play such a big role. It is the "curse
of high dimensionality" that really shows up the inadaquate nature of
uninformed prior distributions.)
Quite likely I am going to put a lot of thought into this problem next
year (now I have other projects), but one possibility I am considering
is that the Kolmogorov laws of probability don't always apply.
You frighten me. :-)
Well, I overstate it a bit. An underlying assumption is that there is a
numerical value of "believability" that you can place on any real life
event. But the correct uninformed prior distribution is 0/0, or Nan
(not a number), that is, undefined. That is, you cannot place a
numerical value on it. (Indeed the very fact that the uninformed priors
are not proper should immediately tell you that something is amiss.)
But then after doing some experiments, maybe you still cannot place a
precise numerical value on the distribution, but it is somehow midway
between Nan and a genuine numerical distribution (e.g. "probably bigger
than 5 but definitely less than 6" does not tell you what the
distribution is, yet is not a totally uninformative statement).
I have totally no idea how to make any of this work. But I think it is
worth thinking about.
> So how simple a problem do you think we should go to? I think a
> (slightly) simpler problem is to use catch/re-catch probabilities to
> estimate population sizes of a given species in a closed system like
> fish in a lake.
My guess is that this problem's complexity is about the same.
You would think so, but isn't the catch-recatch method to estimate
population size a solved problem?
Real life statisticians might be intimidated by people who question
their assumptions. But similarly I am intimidated by statisticians who
have real world experience, and who know the literature, as well as the
proper lingo.
So I thought I vaguely knew what the capture-recapture method is (don't
you tag the birds you catch and see how long it is until you catch it
again.)? But I didn't know any more than that (like it was a solved
problem).
Stephen
_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion