|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I have a situation (at work, if it matters) where I am seeing a population of
points or defects which are not spread out as random pixelated points but rather
as a cloud or noise3d-like function. I am wondering if we could learn more about
the cause or improve our sampling plan by thinking about it with the right
statistical ideal.
Can anyone point me to the proper terminology for this, or primers on ways to
statistically model this, or even what one can say about the physical mechanisms
when it happens this way?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I'm not sure if I'll be of any help or not, but I'd think that some more
details would be useful. In particular do you have an image of this?
Also, what's the application? How is the image captured? What are the
images of?
On 4/28/2011 7:33 AM, gregjohn wrote:
> I have a situation (at work, if it matters) where I am seeing a population of
> points or defects which are not spread out as random pixelated points but rather
> as a cloud or noise3d-like function. I am wondering if we could learn more about
> the cause or improve our sampling plan by thinking about it with the right
> statistical ideal.
>
> Can anyone point me to the proper terminology for this, or primers on ways to
> statistically model this, or even what one can say about the physical mechanisms
> when it happens this way?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 28/04/2011 16:33, gregjohn nous fit lire :
> I have a situation (at work, if it matters) where I am seeing a population of
> points or defects which are not spread out as random pixelated points but rather
> as a cloud or noise3d-like function. I am wondering if we could learn more about
> the cause or improve our sampling plan by thinking about it with the right
> statistical ideal.
>
> Can anyone point me to the proper terminology for this, or primers on ways to
> statistically model this, or even what one can say about the physical mechanisms
> when it happens this way?
chi-square tests.
http://en.wikipedia.org/wiki/Chi-square_distribution
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thanks, Le Forgeron, I'll read up on Chi.
Kevin Wampler <wam### [at] uwashingtonedu> wrote:
> I'm not sure if I'll be of any help or not, but I'd think that some more
> details would be useful. In particular do you have an image of this?
> Also, what's the application? How is the image captured? What are the
> images of?
>
Here's an accurate, if morbid, example.
A) Take a large auditorium filled with people. Have each individual roll a die
or use some other random number picker. If they are over the threshold, you give
them an injection of cold virus. That's a random "point defects".
B) Then take another auditorium. Put a handful of really sick people up in the
rafters and have them sneeze on the crowd below. Some groups below will be under
a sneeze cloud, others won't. The distribution of sick people will now look
something like povray's noise3d function. If you're sick, it's very likely the
person next to you is sick, and well for well.
Now you're a statistician who wants to describe auditorium A, then I think it's
pretty straightforward. Your sampling plan can be pretty simple. I might even
say if you KNOW the population were to have completely random defects, then you
can be lazy in how exhaustively random you sample. But if you've chosen a lazy
sampling plan for A), say just the first two rows, and you end up with
auditorium B), you're making wrong predictions.
So that's the pitfall. Are there any benefits when you have B)? Are there ways
to test between the "noise3d" function and true random points?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 4/28/2011 3:33 PM, gregjohn wrote:
> Here's an accurate, if morbid, example.
>
> A) Take a large auditorium filled with people. Have each individual roll a die
> or use some other random number picker. If they are over the threshold, you give
> them an injection of cold virus. That's a random "point defects".
>
> B) Then take another auditorium. Put a handful of really sick people up in the
> rafters and have them sneeze on the crowd below. Some groups below will be under
> a sneeze cloud, others won't. The distribution of sick people will now look
> something like povray's noise3d function. If you're sick, it's very likely the
> person next to you is sick, and well for well.
>
> Now you're a statistician who wants to describe auditorium A, then I think it's
> pretty straightforward. Your sampling plan can be pretty simple. I might even
> say if you KNOW the population were to have completely random defects, then you
> can be lazy in how exhaustively random you sample. But if you've chosen a lazy
> sampling plan for A), say just the first two rows, and you end up with
> auditorium B), you're making wrong predictions.
>
> So that's the pitfall. Are there any benefits when you have B)? Are there ways
> to test between the "noise3d" function and true random points?
>
Ok, I think I see what you're talking about now. This isn't an area I
actually know anything about, but for what it's worth here's what I can
think of off the top of my head.
The issue here is that your distribution of sick people in case B has a
non-trivial covariance. More explicitly, consider the probability that
each person will be sick. You can represent these probabilities in a
vector, with one element in the vector for each person giving the
probability with which that person will be sick. Since there's
randomness in who will get sick, this is a random vector, and what you
care about is how these random vectors are distributed.
There's a few very basic statistical measures which you can use to study
the distribution of your vector. The mean gives a vector telling you
the expected odds with which each person will get sick, and the
covariance gives you a matrix telling you how correlated the odds that
eahc of a pair of people will get sick are. In your case A the
covariance matrix is diagonal -- that is knowing one person is sick
doesn't tell you anything about the odds of other people being sick. In
your case B the covariance matrix is not diagonal, since if you know one
person is sick you'd expect nearby people to have a higher chance of
being sick.
Now, depending on your case you may be able to say some very specific
things about the expected distribution of your random vector, but a
standard choice which happens to be computationally tractable is to
assume that your distribution is a (modified) multivariate Gaussian
(with a possibly unknown mean and covariance). I say modified here
because it's only a true Guassian if the values in the random vector are
true real numbers, but in your case they're probabilities and thus must
be between 0 and 1. A standard solution is to model the distribution as
a composition of a Gaussian and a sigmoid.
So, now that we have a model for the distribution of sickness in the
room, on to your questions.
* Firstly, there are indeed advantages to case B. Knowing that there is
a non-trivial covariance lets you predict who will be sick with better
accuracy using fewer samples. For instance in the case of "perfect"
covariance where either everyone is sick or nobody is, then you can tell
who is sick with just a single sample. For imperfect covariance you'll
of course need more samples than one, if you sample well you'll always
be better off than case A. This of course assumes that you have some
reasonable idea what the covariance is beforehand, otherwise you
wouldn't know weather it was case A or case B in the first place.
* Secondly, you can solve for the covariance. If you have lots of data
(in your example this would be lots of times running the experiment) you
can solve for it directly, at least in the case where you've assumed a
Gaussian distribution. The computation looks a little bit like solving
for the mean, but gives you a matrix instead of a vector. If you don't
have that much data, you could express the covariance matrix in terms of
a small number of parameters (I think these would be called
hyperparameters) and then solve for the correct values of these
hyperparameters in a maximum likelihood sense. A sensible
hyperparameter in your example would be the "width" of a falloff
function saying how much the sickness of nearby people should covary.
Hopefully this was helpful. It's hard to know if it's the sort of thing
you're looking for or not without a better idea what you want to solve.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|