|
|
scott wrote:
>>> However, one might also conjecture that the probability of any two
>>> arbitrary messages having the same [MD5] hash code would be 2^(-128).
>>
>> Random fact:
>>
>> 2^(-128) ~= 2.93 * 10^(-39)
>>
>> Winning the National Lottery (assuming you only bought 1 ticket) has a
>> probability of roughly 7.15 * 10^(-8).
>>
>> If my arithmetic is right, that means that winning the National
>> Lottery jackpot 5 times back-to-back is still more than 637 times
>> *more* likely than something with a probability of 2^(-128) happening.
>>
>> Assuming the probability of a random file mutation leaving the MD5
>> hash unchanged really *is* 2^(-128), it's a pretty unlikely event. You
>> should probably worry more about the building being burned to the
>> ground...
>
> If you are only worried about one file per week and you don't backup off
> site, then sure...
1. We backup off-site, but - absurdly enough, when you think about it -
we don't *archive* anything off-site. So when a project is finished, it
gets put into a pair of CDs, deleted from the servers, and stored in our
own building. So actually, our "current" projects are protected better
than the finished ones... Hmm, actually, we should probably fix that! ._.
2. Let's look at the rules of probability:
Pr(A and B) = Pr(A) * Pr(A)
Pr(A or B) = Pr(A) + Pr(B)
[Subject to various assumptions.] In particular, we have
Pr(A and B) < Pr(A)
Pr(A or B) > Pr(A)
So ANDs decrease the probability, and ORs increase it. Thus, the
probability of winning the Lottery today AND next week AND the week
after AND... becomes quite small. However, the probability of file A
being corrupt OR file B being corrupt OR... increases as we add more
files. BUT NOT AS FAST!
If the probability of a single file being corrupted but not detected by
MD5 is 2.93 * 10^(-39) then the probability of any single file out of
1,000 files being so-corrupted is... 2.93 * 10^(-36). Which is still
pretty tiny. (It's about 2x more likely than 5 wins on the Lottery -
which is astronomically unlikely in the first place.)
With 1 million files per month, we get 2.96 * 10^(-33) as the
probability of one of them getting corrupted but us not noticing. That's
nearer to 4 consecutive Lottery wins... I don't think I'll worry too
much. ;-)
Post a reply to this message
|
|