POV-Ray: Newsgroups: povray.off-topic: Estimation

POV-Ray : Newsgroups : povray.off-topic : Estimation		Server Time 7 Nov 2025 22:55:20 EST (-0500)

From: Invisible
Subject: Estimation
Date: 15 Nov 2010 11:34:13
Message: <4ce16105@news.povray.org>

It's an old problem. You can't measure something, so you try to estimate 
it. But how do you figure out /how accurate/ your estimate is?

Computer graphics is full of situations where you want to estimate the 
integral of something. The way you usually do this is to sample it at 
lots of points and then take the weighted sum. The more points you 
sample, the better the estimate. But usually each sample costs computer 
power, so you don't want to take millions of samples except when it's 
really necessary. But how do you know if it's "really necessary"?

It's a similar situation with benchmarking. You can run a benchmark and 
time it. But what if Windows Update happened to run in the background 
just at that moment? Or one of your cores overheated and changed clock 
frequency? Hmm, better run the benchmark 3 times and take the average. 
Still, 3 flukes are three times less likely than 1, but still hardly 
what you'd call "impossible". People play the lottery with worse odds 
than that!

So many you run the benchmark 100 times. Now if all 100 results are 
almost identical, you can be pretty sure your result is very, very 
accurate. And if all 100 results are all over the place, you should 
probably do a bazillion more runs and plot a histogram. Still, how do 
you put a number on "how accurate" your results are?

Does anybody here know enough about statistics to come up with answers?

Post a reply to this message

From: Le Forgeron
Subject: Re: Estimation
Date: 15 Nov 2010 12:08:50
Message: <4ce16922$1@news.povray.org>

Le 15/11/2010 17:34, Invisible nous fit lire :

> Does anybody here know enough about statistics to come up with answers?

That's what standard deviation (σ) on my old casio FX-180P is made for.
In statistic mode.

You should do some research on "standard deviation", it might enlighten you.

Post a reply to this message

From: scott
Subject: Re: Estimation
Date: 15 Nov 2010 12:18:45
Message: <4ce16b75@news.povray.org>

> So many you run the benchmark 100 times. Now if all 100 results are almost 
> identical, you can be pretty sure your result is very, very accurate. And 
> if all 100 results are all over the place, you should probably do a 
> bazillion more runs and plot a histogram.

I suspect the accuracy of your estimations (of mean and standard deviation) 
don't depend on the actual values of the mean and standard deviation.  I 
could be wrong however.

Post a reply to this message

From: Invisible
Subject: Re: Estimation
Date: 16 Nov 2010 09:48:12
Message: <4ce299ac$1@news.povray.org>

>> Does anybody here know enough about statistics to come up with answers?
>
> That's what standard deviation (σ) on my old casio FX-180P is made for.

The standard deviation tells you how variable something is. However:

1. This, by itself, does not tell you how many measurements you need to 
take to achieve a given level of accuracy.

2. If you compute the SD from the data you gathered, then the SD itself 
may be inaccurate.

Post a reply to this message

From: andrel
Subject: Re: Estimation
Date: 16 Nov 2010 10:07:39
Message: <4CE29E3C.8050704@gmail.com>

On 16-11-2010 15:48, Invisible wrote:
>>> Does anybody here know enough about statistics to come up with answers?
>>
>> That's what standard deviation (σ) on my old casio FX-180P is made for.
>
> The standard deviation tells you how variable something is. However:
>
> 1. This, by itself, does not tell you how many measurements you need to
> take to achieve a given level of accuracy.
>
> 2. If you compute the SD from the data you gathered, then the SD itself
> may be inaccurate.

IANAS but the SD is expected to go down with the square root of the 
number of measurements *if* the data is from a normal distribution. So, 
if that is the case (or can be assumed to be the case, then you do a 
limited set of measurements and divide that by the required SD, the 
square of that should give an estimate of how much longer you have to go.

If the data is from a different distribution, you have to know that 
before you can compute anything.

Post a reply to this message

From: Le Forgeron
Subject: Re: Estimation
Date: 16 Nov 2010 10:18:09
Message: <4ce2a0b1$1@news.povray.org>

Le 16/11/2010 15:48, Invisible a écrit :
>>> Does anybody here know enough about statistics to come up with answers?
>>
>> That's what standard deviation (σ) on my old casio FX-180P is made for.
> 
> The standard deviation tells you how variable something is. However:
> 
> 1. This, by itself, does not tell you how many measurements you need to
> take to achieve a given level of accuracy.

Does your goal of accuracy ever exists ?
How do you define accuracy ?
If you want a standard deviation/mean < value X, at least you have a
criteria to know when to stop. (but you should have a minimal number of
sample).
What if your measurement will forever be split between value A and value
B, would you still insist on getting a single value C with a standard
deviation about 0.00001% of C ?

You cannot know the number of samples before sampling, to at least get
an idea of the repartition law.

You should not apply the table of confidence for normal distribution if
your sample do not at least somehow show a pattern of normal distribution! (

Have a look at the Chebyshev's inequality in the link at the end, they
are more generic... but you get to take less risk if it's following a
normal distribution law (can you prove it without a few samples first ?)

> 
> 2. If you compute the SD from the data you gathered, then the SD itself
> may be inaccurate.

That's why there is Bessel correction for some estimators.


http://en.wikipedia.org/wiki/Standard_deviation

-- 
A good Manager will take you
through the forest, no mater what.
A Leader will take time to climb on a
Tree and say 'This is the wrong forest'.

Post a reply to this message

From: Invisible
Subject: Re: Estimation
Date: 16 Nov 2010 10:47:44
Message: <4ce2a7a0$1@news.povray.org>

On 16/11/2010 03:07 PM, andrel wrote:

> If the data is from a different distribution, you have to know that
> before you can compute anything.

I guess there really are two cases to consider here.

When you want to, say, anti-alias an algorithmic image by super-sampling 
it, what you are effectively trying to do is compute the integral of a 
discontinuous function. Usually this function can in principle contain 
arbitrarily high frequencies. (That's what "discontinuous" is, loosely.)

But if the results of the function are bounded, I guess you should still 
be able to compute the minimum and maximum possible values the integral 
could have, given the samples you've collected so far. So I guess you 
just keep going until this range gets suitably narrow.

OTOH, any real interval contains an (uncountably) infinite number of 
points, so unless you sample an infinite number of points, the minimum 
and maximum integral values don't actually change. So then I guess you 
need to add some kind of probability estimate for "how evil" the 
function you're trying to integrate might perhaps be...

The other case is when you're trying to measure something. The thing you 
want to measure should theoretically have a single, fixed, value, but 
each time you measure it you get a certain amount of interference. How 
many times do you have to measure it? Can you assume that all 
interference, from any source, is normally distributed? Hmm, tricky.

Browsing Wikipedia indicates that both the mean and SD are easily biased 
by a single distant outlier, and that more sophisticated methods are 
preferable.

Then again, perhaps if you're trying to measure something, what you 
actually want is the /histogram/ rather than "the value"...

Post a reply to this message

From: Jim Henderson
Subject: Re: Estimation
Date: 16 Nov 2010 19:00:51
Message: <4ce31b33@news.povray.org>

On Tue, 16 Nov 2010 14:48:12 +0000, Invisible wrote:

> 2. If you compute the SD from the data you gathered, then the SD itself
> may be inaccurate.

Of course, because he standard deviation is the standard deviation in the 
data set.  You can't do a standard deviation for data that isn't sampled.

Jim

Post a reply to this message