Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On Mon, 5 Jun 2006, Stephen Montgomery-Smith wrote:
Sort of. It really depends on the situation. I'm not always sure what
drives arguments about "Bayesian" and "frequentist" perspectives on
inference, but I think a lot of it is due to the fact that it is a
difficult topic and a statistician can get by professionally without
ever really coming to grips with the core philosophical problems.
My sense is that statistics as a field doesn't have the strong
foundations that, say, math or physics has. I think this field needs a
"Newton" to come along and sort it all out. But I also think that the
time is ripe for this to happen, just like Einstein's theories were ripe
for their time.
That might be possible, but I don't think probability/statistics can move
forward in the way that physics did with Newton. We've had about 240
years since Bayes (1763). I read his paper very closely (in 1990) and I
think we haven't made a lot of progress in our philosophy of statistics
since his time. (I'm not defining "a lot" -- but I will say that I think
you'll be surprised by how sophisticated Bayes was for his time.) It's an
interesting paper, by the way. Contrary to what some have claimed, Bayes
does use a subjectivist view of probability.
So, I just can't imagine what could be done to revolutionize statistics.
High-powered computing is changing the way things are done.
Sort of, but not quite. What Edwards points out is that the likelihood
contains all the information about the parameters that can be found in
the data. In a Bayesian analysis, one uses the likelihood along with a
"prior" which is a sort of weighting scheme based on, well, based on
whatever the hell you want it to be based on -- and that's the problem
with Bayesian analysis, but that doesn't mean it isn't a good thing.
I'll have to look at it. But a problem with his scenario is that he is
deciding between to possibilities p(1/2) and p(1/4). But really you
would be deciding between all possible p between 0 and 1. Then the
implicit assumption that all the priors are the same plays a much bigger
role than you might think.
I don't understand the problem. Edwards advocates for maximum likelihood
estimation, which is a way to make a choice of a single value from a range
of possibilities. I know what you mean about the implicit assumption, but
I'm not sure why "what I might think" would be off track.
The problem that has driven my thinking is this one. Suppose you go out
and capture 1000 birds. You observe 16 different species, but 5 of them
you only observe once. Try to give a lower estimate on how many species
you didn't observe. Note that 0 is quite unlikely, because 5 of the
species are quite likely very sparse, and if there were only 5 of these
sparse species, quite likely you wouldn't have observed them all.
This problem is obviously important to ecology and is well studied:
http://viceroy.eeb.uconn.edu/EstimateS. Chao's work on this is quite
brilliant, but is essentially ad-hoc. I tried a Baysian approach, and
its dependence upon priors is tremendous, and in any case it always seem
to estimate too high.
I think a Bayesian approach is very reasonable for a problem like this
one. If it gives answers that appear incorrect, change the prior. If you
can't make a Bayesian analysis work for you with this problem, I don't
think any analysis will solve this problem for you.
My thinking is that if you really study this model problem, then you
have some hope of getting closer to what the foundations of statistics
really should be. It is more difficult than the simple, model problems,
but much easier than most real life problems (e.g. microarrays).
In what sense is the problem of counting unobserved species easier than
analyzing microarray data?
I have a sense that this person was in denial about the problems I
brought up. He told me emphatically that there was no reasonable reason
to suppose that the changes in DNA were related to each other - but then
in another email told me that changes in DNA are found NOT to be a
Poisson process because the variance is about 2 times too big. These
two statements contradict each other.
Right. A higher variance can certainly be due to positive correlation.
It certainly implies some dependency.
Current models of statistics seem unable to deal with events with "large
tails in their probability distribution functions" - that is, an event
that is unlikely, but when it does happen, it makes a huge difference.
So, for example, with a genuine normal distribution, the chances of
being off by 10 standard deviations is incredibly small that for all
intents and purposes it just isn't going to happen. But you just need a
few of these large tailed events skewing your data, that the chances of
being off by 10 standard deviations is about the same as being off by 2
or 3 standard deviations.
I guess you are talking about outliers. This is one reason for the
development of "robust" methods in statistics -- methods that are not too
strongly affected by outliers. For example, we often focus on medians
instead of means when tail behavior is problematic.
And this is what I am thinking happened with the exit poll data.
What would you think that? What would cause it to happen?
(Incidently, I think that large tailed events are one of the problems
with microarray analysis - an example of a large tailed event is that
one of the microarray chips got a scratch. And from my brief reading of
the microarray literature, this is a real consideration.)
I believe it. In most research areas there are numerous ways to get bad
data. For example, in genetic analysis we always have genotyping errors
and this is a constant annoyance.
One last thought -- one of the guys who is writing a book on statistical
analysis of the exit poll data - Steven Freeman - is not a particularly
good statistician. I know this because I corresponded with him about a
problem in his analysis. He didn't know the literature and he didn't
understand my very clear and detailed explanations. He proceeded to
publish his incorrect answers and he is continuing to promote them today.
Believe it or not, his mistakes make his results look much stronger than
they really are! ;-) Yes, people find it harder to understand why their
wonderful results are wrong than to understand why their unappealing
results can be made better.
So, I don't know how big of a deal to make of Freeman's error. His
conclusion is still correct, it seems, but I find it hard to trust other
things that he says. If Freeman's data was the major or only thing
driving suspicion about the vote in Ohio, I would stop worrying about it
today, but there is a lot more to it.
Mike
_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion