POV-Ray: Newsgroups: povray.off-topic: The nebulous question of probability

POV-Ray : Newsgroups : povray.off-topic : The nebulous question of probability		Server Time 12 Jul 2025 22:53:08 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Warp
Subject: Re: The nebulous question of probability
Date: 17 Nov 2007 07:28:59
Message: <473ede8a@news.povray.org>

Orchid XP v7 <voi### [at] devnull> wrote:
> Well, I now have the full algorithm working. :-P

> Or at least, it seems to work for messages shorter than 52 bytes anyway. 

  Then it's not a full working algorithm, is it?-)

  Testing that the algorithm works for all possible inputs would be a rather
laborious job. Just testing that it can produce any MD5 value would be a
rather impossible job, taking into account how many there are.

  I wonder how they make sure that an implementation is correct...

-- 
                                                          - Warp

Post a reply to this message

From: Orchid XP v7
Subject: Re: The nebulous question of probability
Date: 17 Nov 2007 09:12:02
Message: <473ef6b2$1@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> Well, I now have the full algorithm working. :-P
> 
>> Or at least, it seems to work for messages shorter than 52 bytes anyway. 
> 
>   Then it's not a full working algorithm, is it?-)

Well, it's damn near. :-P

>   Testing that the algorithm works for all possible inputs would be a rather
> laborious job. Just testing that it can produce any MD5 value would be a
> rather impossible job, taking into account how many there are.
> 
>   I wonder how they make sure that an implementation is correct...

Test all code paths, I assume.

For example, I am pretty damn confident that the compression function in 
my implementation is correct. I've tested it with dozens of different 
input blocks, and it's always produced the right output. There's no 
particular reason why it shouldn't for other input blocks. It processes 
them all the same way...

Now, the padding algorithm, *that* changes for every message size. For 
some sizes it seems to work, but not for all of them. I need to test it 
with a large array of sizes to check it's working. (Or rather, to figure 
out why currently it isn't.)

Post a reply to this message

From: Orchid XP v7
Subject: Re: The nebulous question of probability
Date: 17 Nov 2007 12:43:32
Message: <473f2844$1@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> Well, I now have the full algorithm working. :-P
> 
>> Or at least, it seems to work for messages shorter than 52 bytes anyway. 
> 
>   Then it's not a full working algorithm, is it?-)

Ah-HAH! One more bug, annihilated. >:-)

In the midst of a complex padding algorithm, I forgot to add on the 
running count of the number of bytes processed. The "size" figure 
appended onto the end of the bitstream was always the size of the last 
block, not the whole message. (IOW, message size mod 512.) Oops!

The program now appears to give correct results for every case I've 
tried. (I.e., not very many. However, I threw a 2 MB file at it, and it 
gave me the right answer. This indicates that it's not *completely* 
broken...)

And yes, it's just a tad slower than C. ( = exaggeration.)

Post a reply to this message

From: Darren New
Subject: Re: The nebulous question of probability
Date: 17 Nov 2007 18:47:33
Message: <473f7d95$1@news.povray.org>

Warp wrote:
>   I wonder how they make sure that an implementation is correct...

Generally, code coverage and intermediate results, along with a set of 
test vectors designed to cover all paths through and often showing the 
intermediate results.

It's somewhat easier when your specification is "here's the C code to do 
it" rather than an ambiguous description of what you're supposed to do. 
Especially since there isn't any "real world" right or wrong answer, 
other than "match what the C code outputs."

Plus, of course, it's designed to go wildly arwy with the slightest 
peturbation of the process, so even tiny bugs tend to produce completely 
different results. Which is its own debugging problem.

-- 
   Darren New / San Diego, CA, USA (PST)
     Remember the good old days, when we
     used to complain about cryptography
     being export-restricted?

Post a reply to this message

From: Orchid XP v7
Subject: Re: The nebulous question of probability
Date: 18 Nov 2007 04:38:56
Message: <47400830@news.povray.org>

Darren New wrote:

> Plus, of course, it's designed to go wildly arwy with the slightest 
> peturbation of the process, so even tiny bugs tend to produce completely 
> different results. Which is its own debugging problem.

I got 2 bits wrong in one of the magic numbers. Totally messed up the 
output... until I found and fixed the bug.

Post a reply to this message

From: Invisible
Subject: Re: The nebulous question of probability
Date: 20 Nov 2007 06:44:24
Message: <4742c898$1@news.povray.org>

Invisible wrote:

> However, one might also conjecture that the probability of any two 
> arbitrary messages having the same [MD5] hash code would be 2^(-128).

Random fact:

2^(-128) ~= 2.93 * 10^(-39)

Winning the National Lottery (assuming you only bought 1 ticket) has a 
probability of roughly 7.15 * 10^(-8).

If my arithmetic is right, that means that winning the National Lottery 
jackpot 5 times back-to-back is still more than 637 times *more* likely 
than something with a probability of 2^(-128) happening.

Assuming the probability of a random file mutation leaving the MD5 hash 
unchanged really *is* 2^(-128), it's a pretty unlikely event. You should 
probably worry more about the building being burned to the ground...

Post a reply to this message

From: scott
Subject: Re: The nebulous question of probability
Date: 20 Nov 2007 07:07:47
Message: <4742ce13$1@news.povray.org>

>> However, one might also conjecture that the probability of any two 
>> arbitrary messages having the same [MD5] hash code would be 2^(-128).
>
> Random fact:
>
> 2^(-128) ~= 2.93 * 10^(-39)
>
> Winning the National Lottery (assuming you only bought 1 ticket) has a 
> probability of roughly 7.15 * 10^(-8).
>
> If my arithmetic is right, that means that winning the National Lottery 
> jackpot 5 times back-to-back is still more than 637 times *more* likely 
> than something with a probability of 2^(-128) happening.
>
> Assuming the probability of a random file mutation leaving the MD5 hash 
> unchanged really *is* 2^(-128), it's a pretty unlikely event. You should 
> probably worry more about the building being burned to the ground...

If you are only worried about one file per week and you don't backup off 
site, then sure...

Post a reply to this message

From: Orchid XP v7
Subject: Re: The nebulous question of probability
Date: 20 Nov 2007 13:34:54
Message: <474328ce$1@news.povray.org>

scott wrote:
>>> However, one might also conjecture that the probability of any two 
>>> arbitrary messages having the same [MD5] hash code would be 2^(-128).
>>
>> Random fact:
>>
>> 2^(-128) ~= 2.93 * 10^(-39)
>>
>> Winning the National Lottery (assuming you only bought 1 ticket) has a 
>> probability of roughly 7.15 * 10^(-8).
>>
>> If my arithmetic is right, that means that winning the National 
>> Lottery jackpot 5 times back-to-back is still more than 637 times 
>> *more* likely than something with a probability of 2^(-128) happening.
>>
>> Assuming the probability of a random file mutation leaving the MD5 
>> hash unchanged really *is* 2^(-128), it's a pretty unlikely event. You 
>> should probably worry more about the building being burned to the 
>> ground...
> 
> If you are only worried about one file per week and you don't backup off 
> site, then sure...

1. We backup off-site, but - absurdly enough, when you think about it - 
we don't *archive* anything off-site. So when a project is finished, it 
gets put into a pair of CDs, deleted from the servers, and stored in our 
own building. So actually, our "current" projects are protected better 
than the finished ones... Hmm, actually, we should probably fix that! ._.

2. Let's look at the rules of probability:

Pr(A and B) = Pr(A) * Pr(A)
Pr(A or  B) = Pr(A) + Pr(B)

[Subject to various assumptions.] In particular, we have

   Pr(A and B) < Pr(A)
   Pr(A or  B) > Pr(A)

So ANDs decrease the probability, and ORs increase it. Thus, the 
probability of winning the Lottery today AND next week AND the week 
after AND... becomes quite small. However, the probability of file A 
being corrupt OR file B being corrupt OR... increases as we add more 
files. BUT NOT AS FAST!

If the probability of a single file being corrupted but not detected by 
MD5 is 2.93 * 10^(-39) then the probability of any single file out of 
1,000 files being so-corrupted is... 2.93 * 10^(-36). Which is still 
pretty tiny. (It's about 2x more likely than 5 wins on the Lottery - 
which is astronomically unlikely in the first place.)

With 1 million files per month, we get 2.96 * 10^(-33) as the 
probability of one of them getting corrupted but us not noticing. That's 
nearer to 4 consecutive Lottery wins... I don't think I'll worry too 
much. ;-)

Post a reply to this message

From: Joel Yliluoma
Subject: Re: The nebulous question of probability
Date: 22 Nov 2007 06:09:19
Message: <slrnfkaoqv.88l.bisqwit@bisqwit.iki.fi>

On 16 Nov 2007 17:08:14 -0500, Warp wrote:
> Nicolas Alvarez <nic### [at] gmailisthebestcom> wrote:
>>         error("Not implemented yet");
>
>   error("I'll implement this when I have the time.")

error("This feature is only available in the registered version.");

-- 
Joel Yliluoma - http://iki.fi/bisqwit/

Post a reply to this message

From: Invisible
Subject: Re: The nebulous question of probability
Date: 22 Nov 2007 06:51:07
Message: <47456d2b@news.povray.org>

Joel Yliluoma wrote:

>>>         error("Not implemented yet");

>>   error("I'll implement this when I have the time.")

> error("This feature is only available in the registered version.");

Joel wins.

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages