POV-Ray: Newsgroups: povray.off-topic: Noise-function statistics?: Re: Noise-function statistics?

POV-Ray : Newsgroups : povray.off-topic : Noise-function statistics? : Re: Noise-function statistics?		Server Time 5 Jul 2025 23:46:55 EDT (-0400)
From: Kevin Wampler
Date: 28 Apr 2011 20:01:25
Message: <4db9ffd5$1@news.povray.org>
On 4/28/2011 3:33 PM, gregjohn wrote:
> Here's an accurate, if morbid, example.
>
> A) Take a large auditorium filled with people.  Have each individual roll a die
> or use some other random number picker. If they are over the threshold, you give
> them an injection of cold virus. That's a random "point defects".
>
> B) Then take another auditorium. Put a handful of really sick people up in the
> rafters and have them sneeze on the crowd below. Some groups below will be under
> a sneeze cloud, others won't. The distribution of sick people will now look
> something like povray's noise3d function.  If you're sick, it's very likely the
> person next to you is sick, and well for well.
>
> Now you're a statistician who wants to describe auditorium A, then I think it's
> pretty straightforward.  Your sampling plan can be pretty simple. I might even
> say if you KNOW the population were to have completely random defects, then you
> can be lazy in how exhaustively random you sample.  But if you've chosen a lazy
> sampling plan for A), say just the first two rows, and you end up with
> auditorium B), you're making wrong predictions.
>
> So that's the pitfall.  Are there any benefits when you have B)? Are there ways
> to test between the "noise3d" function and true random points?
>

Ok, I think I see what you're talking about now.  This isn't an area I 
actually know anything about, but for what it's worth here's what I can 
think of off the top of my head.

The issue here is that your distribution of sick people in case B has a 
non-trivial covariance.  More explicitly, consider the probability that 
each person will be sick.  You can represent these probabilities in a 
vector, with one element in the vector for each person giving the 
probability with which that person will be sick.  Since there's 
randomness in who will get sick, this is a random vector, and what you 
care about is how these random vectors are distributed.

There's a few very basic statistical measures which you can use to study 
the distribution of your vector.  The mean gives a vector telling you 
the expected odds with which each person will get sick, and the 
covariance gives you a matrix telling you how correlated the odds that 
eahc of a pair of people will get sick are.  In your case A the 
covariance matrix is diagonal -- that is knowing one person is sick 
doesn't tell you anything about the odds of other people being sick.  In 
your case B the covariance matrix is not diagonal, since if you know one 
person is sick you'd expect nearby people to have a higher chance of 
being sick.

Now, depending on your case you may be able to say some very specific 
things about the expected distribution of your random vector, but a 
standard choice which happens to be computationally tractable is to 
assume that your distribution is a (modified) multivariate Gaussian 
(with a possibly unknown mean and covariance).  I say modified here 
because it's only a true Guassian if the values in the random vector are 
true real numbers, but in your case they're probabilities and thus must 
be between 0 and 1.  A standard solution is to model the distribution as 
a composition of a Gaussian and a sigmoid.

So, now that we have a model for the distribution of sickness in the 
room, on to your questions.

* Firstly, there are indeed advantages to case B.  Knowing that there is 
a non-trivial covariance lets you predict who will be sick with better 
accuracy using fewer samples.  For instance in the case of "perfect" 
covariance where either everyone is sick or nobody is, then you can tell 
who is sick with just a single sample.  For imperfect covariance you'll 
of course need more samples than one, if you sample well you'll always 
be better off than case A.  This of course assumes that you have some 
reasonable idea what the covariance is beforehand, otherwise you 
wouldn't know weather it was case A or case B in the first place.

* Secondly, you can solve for the covariance.  If you have lots of data 
(in your example this would be lots of times running the experiment) you 
can solve for it directly, at least in the case where you've assumed a 
Gaussian distribution.  The computation looks a little bit like solving 
for the mean, but gives you a matrix instead of a vector.  If you don't 
have that much data, you could express the covariance matrix in terms of 
a small number of parameters (I think these would be called 
hyperparameters) and then solve for the correct values of these 
hyperparameters in a maximum likelihood sense.  A sensible 
hyperparameter in your example would be the "width" of a falloff 
function saying how much the sickness of nearby people should covary.

Hopefully this was helpful.  It's hard to know if it's the sort of thing 
you're looking for or not without a better idea what you want to solve.
Post a reply to this message