MLUG: Re: [MLUG - DISCUSSION] statistical inference
Re: [MLUG - DISCUSSION] statistical inference
Email address obfuscation in effect -- please click here to turn it off.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
On Mon, 5 Jun 2006, Stephen Montgomery-Smith wrote:

Sort of. It really depends on the situation. I'm not always sure what drives arguments about "Bayesian" and "frequentist" perspectives on inference, but I think a lot of it is due to the fact that it is a difficult topic and a statistician can get by professionally without ever really coming to grips with the core philosophical problems.

My sense is that statistics as a field doesn't have the strong foundations that, say, math or physics has. I think this field needs a "Newton" to come along and sort it all out. But I also think that the time is ripe for this to happen, just like Einstein's theories were ripe for their time.

That might be possible, but I don't think probability/statistics can move forward in the way that physics did with Newton. We've had about 240 years since Bayes (1763). I read his paper very closely (in 1990) and I think we haven't made a lot of progress in our philosophy of statistics since his time. (I'm not defining "a lot" -- but I will say that I think you'll be surprised by how sophisticated Bayes was for his time.) It's an interesting paper, by the way. Contrary to what some have claimed, Bayes does use a subjectivist view of probability.


So, I just can't imagine what could be done to revolutionize statistics. High-powered computing is changing the way things are done.


Sort of, but not quite. What Edwards points out is that the likelihood contains all the information about the parameters that can be found in the data. In a Bayesian analysis, one uses the likelihood along with a "prior" which is a sort of weighting scheme based on, well, based on whatever the hell you want it to be based on -- and that's the problem with Bayesian analysis, but that doesn't mean it isn't a good thing.

I'll have to look at it. But a problem with his scenario is that he is deciding between to possibilities p(1/2) and p(1/4). But really you would be deciding between all possible p between 0 and 1. Then the implicit assumption that all the priors are the same plays a much bigger role than you might think.

I don't understand the problem. Edwards advocates for maximum likelihood estimation, which is a way to make a choice of a single value from a range of possibilities. I know what you mean about the implicit assumption, but I'm not sure why "what I might think" would be off track.



The problem that has driven my thinking is this one. Suppose you go out and capture 1000 birds. You observe 16 different species, but 5 of them you only observe once. Try to give a lower estimate on how many species you didn't observe. Note that 0 is quite unlikely, because 5 of the species are quite likely very sparse, and if there were only 5 of these sparse species, quite likely you wouldn't have observed them all.

This problem is obviously important to ecology and is well studied: http://viceroy.eeb.uconn.edu/EstimateS. Chao's work on this is quite brilliant, but is essentially ad-hoc. I tried a Baysian approach, and its dependence upon priors is tremendous, and in any case it always seem to estimate too high.

I think a Bayesian approach is very reasonable for a problem like this one. If it gives answers that appear incorrect, change the prior. If you can't make a Bayesian analysis work for you with this problem, I don't think any analysis will solve this problem for you.



My thinking is that if you really study this model problem, then you have some hope of getting closer to what the foundations of statistics really should be. It is more difficult than the simple, model problems, but much easier than most real life problems (e.g. microarrays).

In what sense is the problem of counting unobserved species easier than analyzing microarray data?



I have a sense that this person was in denial about the problems I brought up. He told me emphatically that there was no reasonable reason to suppose that the changes in DNA were related to each other - but then in another email told me that changes in DNA are found NOT to be a Poisson process because the variance is about 2 times too big. These two statements contradict each other.

Right. A higher variance can certainly be due to positive correlation. It certainly implies some dependency.



Current models of statistics seem unable to deal with events with "large tails in their probability distribution functions" - that is, an event that is unlikely, but when it does happen, it makes a huge difference. So, for example, with a genuine normal distribution, the chances of being off by 10 standard deviations is incredibly small that for all intents and purposes it just isn't going to happen. But you just need a few of these large tailed events skewing your data, that the chances of being off by 10 standard deviations is about the same as being off by 2 or 3 standard deviations.

I guess you are talking about outliers. This is one reason for the development of "robust" methods in statistics -- methods that are not too strongly affected by outliers. For example, we often focus on medians instead of means when tail behavior is problematic.



And this is what I am thinking happened with the exit poll data.

What would you think that? What would cause it to happen?


(Incidently, I think that large tailed events are one of the problems with microarray analysis - an example of a large tailed event is that one of the microarray chips got a scratch. And from my brief reading of the microarray literature, this is a real consideration.)

I believe it. In most research areas there are numerous ways to get bad data. For example, in genetic analysis we always have genotyping errors and this is a constant annoyance.



One last thought -- one of the guys who is writing a book on statistical analysis of the exit poll data - Steven Freeman - is not a particularly good statistician. I know this because I corresponded with him about a problem in his analysis. He didn't know the literature and he didn't understand my very clear and detailed explanations. He proceeded to publish his incorrect answers and he is continuing to promote them today. Believe it or not, his mistakes make his results look much stronger than they really are! ;-) Yes, people find it harder to understand why their wonderful results are wrong than to understand why their unappealing results can be made better.


So, I don't know how big of a deal to make of Freeman's error. His conclusion is still correct, it seems, but I find it hard to trust other things that he says. If Freeman's data was the major or only thing driving suspicion about the vote in Ohio, I would stop worrying about it today, but there is a lot more to it.

Mike

_______________________________________________
discussion mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/discussion