POV-Ray : Newsgroups : povray.off-topic : The nebulous question of probability : Re: The nebulous question of probability Server Time
11 Oct 2024 11:12:46 EDT (-0400)
  Re: The nebulous question of probability  
From: Orchid XP v7
Date: 20 Nov 2007 13:34:54
Message: <474328ce$1@news.povray.org>
scott wrote:
>>> However, one might also conjecture that the probability of any two 
>>> arbitrary messages having the same [MD5] hash code would be 2^(-128).
>>
>> Random fact:
>>
>> 2^(-128) ~= 2.93 * 10^(-39)
>>
>> Winning the National Lottery (assuming you only bought 1 ticket) has a 
>> probability of roughly 7.15 * 10^(-8).
>>
>> If my arithmetic is right, that means that winning the National 
>> Lottery jackpot 5 times back-to-back is still more than 637 times 
>> *more* likely than something with a probability of 2^(-128) happening.
>>
>> Assuming the probability of a random file mutation leaving the MD5 
>> hash unchanged really *is* 2^(-128), it's a pretty unlikely event. You 
>> should probably worry more about the building being burned to the 
>> ground...
> 
> If you are only worried about one file per week and you don't backup off 
> site, then sure...

1. We backup off-site, but - absurdly enough, when you think about it - 
we don't *archive* anything off-site. So when a project is finished, it 
gets put into a pair of CDs, deleted from the servers, and stored in our 
own building. So actually, our "current" projects are protected better 
than the finished ones... Hmm, actually, we should probably fix that! ._.

2. Let's look at the rules of probability:

Pr(A and B) = Pr(A) * Pr(A)
Pr(A or  B) = Pr(A) + Pr(B)

[Subject to various assumptions.] In particular, we have

   Pr(A and B) < Pr(A)
   Pr(A or  B) > Pr(A)

So ANDs decrease the probability, and ORs increase it. Thus, the 
probability of winning the Lottery today AND next week AND the week 
after AND... becomes quite small. However, the probability of file A 
being corrupt OR file B being corrupt OR... increases as we add more 
files. BUT NOT AS FAST!

If the probability of a single file being corrupted but not detected by 
MD5 is 2.93 * 10^(-39) then the probability of any single file out of 
1,000 files being so-corrupted is... 2.93 * 10^(-36). Which is still 
pretty tiny. (It's about 2x more likely than 5 wins on the Lottery - 
which is astronomically unlikely in the first place.)

With 1 million files per month, we get 2.96 * 10^(-33) as the 
probability of one of them getting corrupted but us not noticing. That's 
nearer to 4 consecutive Lottery wins... I don't think I'll worry too 
much. ;-)


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.