POV-Ray: Newsgroups: povray.off-topic: It's all statistics: It's all statistics

POV-Ray : Newsgroups : povray.off-topic : It's all statistics : It's all statistics		Server Time 13 Jul 2025 07:35:40 EDT (-0400)
From: Invisible
Date: 29 Jun 2011 07:05:21
Message: <4e0b06f1$1@news.povray.org>
Suppose you flip a coin 1000 times. What is the probability of getting 
402 heads?

The answer is of course that the head count follows a binomial 
distribution. That means that you need to compute the probability mass 
function. Unfortunately, in this instance the PMF in question is

   Pr(402 heads) = (1000 choose 402) / 2^1000

Now this is a problem. 2^1000 is roughly the square of the number of 
atoms in the observable universe, while 1000 choose 402 is defined as

   1000 choose 402 = 1000! / (402! * 598!)

In principle it looks plausible that you could compute the factorial of 
a big number. It might just be a bit slow, eh? Well... no. It turns out 
that 1000! is over 3 million bits in length, for a start.

Even if you could compute 1000 choose 402, it's again almost as huge as 
2^1000. And taking the quotient of two vast numbers is unlikely to be 
very numerically stable.



Well, perhaps we should instead look at the cumulative density function? 
Unfortunately, the CDF for a binomial distribution is even worse. It's 
defined in terms of the regularised incomplete beta function, which a 
quotient of the beta function (which involves more factorials - or 
worse, the gamma function) and the incomplete beta function (which is 
defined as a complicated integral having no closed-form solutions). 
Without an advanced degree in special functions, it's not even possible 
to derive an *algorithm* for the CDF, never mind one that's numerically 
stable.



Alternatively, we can construct a histogram, with

   Heads Tails
    402   598

and compare this with the "expected outcomes",

   Heads Tails
    500   500

Applying Pearson's chi-squared test, we have to compute

   X^2 = (402-500)^2 / 500 + (598-500)^2 / 500

This is easily determined to be 38.416. But what's the *probability* of 
that happening? The answer is that X^2 follows a chi-squared 
distribution, and consulting the PDF, we discover... that it requires 
the gamma function at non-integer arguments. (Well, half-integer anyway. 
There *is* a closed-form solution for that... but it requires a double 
factorial. Again.)

OK, so let's try the CDF... Yeah, no-go. That requires the gamma 
function *and* the lower incomplete gamma function too.



Ah, but wait. Many, many probability distributions approximate the 
so-called "normal distribution" for sufficiently large values of one 
parameter or another. And indeed, we find that the binomial distribution 
for a very large number of trails (such as our 1000) is approximately a 
normal distribution with a mean of 1000/2 and a standard deviation of 
Sqrt(1000)/2.

The PDF for a normal distribution isn't too bad. It's basically 
exp(-x^2) with a few twiddle-factors. Using this we should be able to 
compute a probability. (Need a continuity correction though. So *that* 
should be easy to get right...)

The CDF is like your worst nightmare. It involves the "error function", 
erf(). (Or, equivalently, erfc().) This, it seems, can be defined in 
terms of... *sigh* the gamma function again.

In short, if you wanted a p-value for our 402 heads, good luck with 
that. You can compute a Z-score easily enough, but you won't get a p-value.

It's kind of frustrating that the "normal" distribution, the one most 
commonly encountered, is also the hardest one to find p-values for.
Post a reply to this message