POV-Ray : Newsgroups : povray.general : benchmarks o'interest Server Time
11 Jan 2025 07:09:00 EST (-0500)
  benchmarks o'interest (Message 1 to 10 of 11)  
Goto Latest 10 Messages Next 1 Messages >>>
From: green
Subject: benchmarks o'interest
Date: 4 Aug 2014 22:35:00
Message: <web.53e0427e96f3e75fb6fe36780@news.povray.org>
something i found today
http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
-or- http://tinyurl.com/mkc3bkz


Post a reply to this message

From: jhu
Subject: Re: benchmarks o'interest
Date: 5 Aug 2014 21:45:01
Message: <web.53e187e61b6ac1fed19b0ec40@news.povray.org>
"green" <rov### [at] gmailcom> wrote:
> something i found today
> http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
> -or- http://tinyurl.com/mkc3bkz

That is interesting. Of course, the Linux version is faster because it's a
custom compile. What I'm surprised about is that hyperthreading kills
performance on Windows for some reason. Can anyone explain why?


Post a reply to this message

From: clipka
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 00:56:32
Message: <53e1b580$1@news.povray.org>
Am 06.08.2014 03:41, schrieb jhu:
> "green" <rov### [at] gmailcom> wrote:
>> something i found today
>> http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
>> -or- http://tinyurl.com/mkc3bkz
>
> That is interesting. Of course, the Linux version is faster because it's a
> custom compile. What I'm surprised about is that hyperthreading kills
> performance on Windows for some reason. Can anyone explain why?

As far as I understand it doesn't really kill performance, it just 
doesn't add much.


Post a reply to this message

From: jhu
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 02:15:00
Message: <web.53e1c6ce1b6ac1fed19b0ec40@news.povray.org>
clipka <ano### [at] anonymousorg> wrote:
> Am 06.08.2014 03:41, schrieb jhu:
> > "green" <rov### [at] gmailcom> wrote:
> >> something i found today
> >> http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
> >> -or- http://tinyurl.com/mkc3bkz
> >
> > That is interesting. Of course, the Linux version is faster because it's a
> > custom compile. What I'm surprised about is that hyperthreading kills
> > performance on Windows for some reason. Can anyone explain why?
>
> As far as I understand it doesn't really kill performance, it just
> doesn't add much.

Look at the bottom. It definitely kills performance on Windows.


Post a reply to this message

From: Le Forgeron
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 02:51:24
Message: <53e1d06c$1@news.povray.org>
Le 06/08/2014 08:10, jhu a écrit :
> clipka <ano### [at] anonymousorg> wrote:
>> Am 06.08.2014 03:41, schrieb jhu:
>>> "green" <rov### [at] gmailcom> wrote:
>>>> something i found today
>>>> http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
>>>> -or- http://tinyurl.com/mkc3bkz
>>>
>>> That is interesting. Of course, the Linux version is faster because it's a
>>> custom compile. What I'm surprised about is that hyperthreading kills
>>> performance on Windows for some reason. Can anyone explain why?
>>
>> As far as I understand it doesn't really kill performance, it just
>> doesn't add much.
> 
> Look at the bottom. It definitely kills performance on Windows.
> 
> 
The windows scheduler sucks to detect HT-core from true core.

The linux scheduler is "smarter", when you have 40 true core and 40
HT-core, if you have to run 30 threads, it will choose distinct true+HT
core pairs for each threads.

I guess, if we set HT-core as odd and true core as even, that the
windows scheduler when he has 30 threads over 40+40 cores would use the
0-29 range, actually using 15 true cores and 15 associated HT-core, thus
leaving 25 true cores idles (and their peer HT-core too). With a bit of
help, other processes in the system might shift/expand the range of used
cores. Yet, with HT active, the distribution is wasting resources and
creating bottleneck.

Now the blogging guy is just lucky to have such beasts (yep, he has 2!)

-- 
Just because nobody complains does not mean all parachutes are perfect.


Post a reply to this message

From: clipka
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 04:45:20
Message: <53e1eb20$1@news.povray.org>
Am 06.08.2014 03:41, schrieb jhu:
> "green" <rov### [at] gmailcom> wrote:
>> something i found today
>> http://www.pugetsystems.com/blog/2014/07/14/POV-ray-on-Quad-Xeon-and-Opteron-579/
>> -or- http://tinyurl.com/mkc3bkz
>
> That is interesting. Of course, the Linux version is faster because it's a
> custom compile. What I'm surprised about is that hyperthreading kills
> performance on Windows for some reason. Can anyone explain why?

Must be a high-core-count thing; on my 4-core i7, running Windows 7, I 
see this for a random scene:

HT ON:
----------------------------------------------------------------------------
Peak memory used:          64352256 bytes

Render Time:
   Photon Time:      No photons
   Radiosity Time:   0 hours  0 minutes 33 seconds (33.337 seconds)
               using 8 thread(s) with 256.666 CPU-seconds total
   Trace Time:       0 hours  0 minutes 34 seconds (34.647 seconds)
               using 8 thread(s) with 274.590 CPU-seconds total
UberPOV finished
-
CPU time used: kernel 0.86 seconds, user 532.95 seconds, total 533.80 
seconds.
Elapsed time 69.51 seconds, CPU vs elapsed time ratio 7.68.
Render averaged 8286.22 PPS (1079.05 PPS CPU time) over 576000 pixels.
----------------------------------------------------------------------------

HT OFF:
----------------------------------------------------------------------------
Peak memory used:          54779904 bytes

Render Time:
   Photon Time:      No photons
   Radiosity Time:   0 hours  0 minutes 44 seconds (44.133 seconds)
               using 4 thread(s) with 173.050 CPU-seconds total
   Trace Time:       0 hours  0 minutes 45 seconds (45.100 seconds)
               using 4 thread(s) with 177.746 CPU-seconds total
UberPOV finished
-
CPU time used: kernel 0.72 seconds, user 351.95 seconds, total 352.67 
seconds.
Elapsed time 90.32 seconds, CPU vs elapsed time ratio 3.90.
Render averaged 6377.04 PPS (1633.25 PPS CPU time) over 576000 pixels.
----------------------------------------------------------------------------

As you can see there's enough gain from HT to make it worthwhile.


Post a reply to this message

From: Le Forgeron
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 05:20:14
Message: <53e1f34e$1@news.povray.org>
Le 06/08/2014 10:45, clipka a écrit :
> Must be a high-core-count thing; on my 4-core i7, running Windows 7, I
> see this for a random scene:

From my point of view, it's a "use less than 50% of resources" thing.
The graph goes up to 40 threads, but the guy has 80 possible cores ( 40
true (4 x 10), x 2 by HT).

The curves past 40 threads are only extension of points below 40. there
is not even a data point at 50, 60 or 70, when Xeon are used. All 40+
points are with Opteron (and there is no HT in Opteron... it's
different, Intel HT and AMD module are not the same)

Opteron cores' count is 48 (4 x cpu, 2 dies of 3 piledriver module each,
each module providing 2 execution thread (core), 24 modules.


-- 
Just because nobody complains does not mean all parachutes are perfect.


Post a reply to this message

From: jhu
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 06:00:02
Message: <web.53e1fb9e1b6ac1fed19b0ec40@news.povray.org>
Le_Forgeron <lef### [at] freefr> wrote:
> Le 06/08/2014 10:45, clipka a écrit :
> > Must be a high-core-count thing; on my 4-core i7, running Windows 7, I
> > see this for a random scene:
>
> From my point of view, it's a "use less than 50% of resources" thing.
> The graph goes up to 40 threads, but the guy has 80 possible cores ( 40
> true (4 x 10), x 2 by HT).
>
> The curves past 40 threads are only extension of points below 40. there
> is not even a data point at 50, 60 or 70, when Xeon are used. All 40+
> points are with Opteron (and there is no HT in Opteron... it's
> different, Intel HT and AMD module are not the same)
>
> Opteron cores' count is 48 (4 x cpu, 2 dies of 3 piledriver module each,
> each module providing 2 execution thread (core), 24 modules.
>
>
> --
> Just because nobody complains does not mean all parachutes are perfect.

Look at the last box at the bottom. He has results for 40, 60, and 80 threads.
With HT on, the 40, 60, and 80 thread results are all about 20 seconds slower
than the non-HT results on Windows.

I suspect it's his version of Windows. Not all Windows Server 2008 R2 versions
support 4 physical CPUs. But he doesn't specify which one he has.


Post a reply to this message

From: clipka
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 06:03:39
Message: <53e1fd7b$1@news.povray.org>
Am 06.08.2014 11:20, schrieb Le_Forgeron:
> Le 06/08/2014 10:45, clipka a écrit :
>> Must be a high-core-count thing; on my 4-core i7, running Windows 7, I
>> see this for a random scene:
>
>  From my point of view, it's a "use less than 50% of resources" thing.
> The graph goes up to 40 threads, but the guy has 80 possible cores ( 40
> true (4 x 10), x 2 by HT).
>
> The curves past 40 threads are only extension of points below 40. there
> is not even a data point at 50, 60 or 70, when Xeon are used. All 40+
> points are with Opteron (and there is no HT in Opteron... it's
> different, Intel HT and AMD module are not the same)
>
> Opteron cores' count is 48 (4 x cpu, 2 dies of 3 piledriver module each,
> each module providing 2 execution thread (core), 24 modules.

jhu correctly points to the table at the bottom:

---------------------------------------------------------------------
Selected data points showing effect of Hyper-Threading

Threads | Linux HT off | Linux HT on | Windows HT off | Windows HT on
--------|--------------|-------------|----------------|--------------
  40     | 40 sec       | 40 sec      | 53 sec         | 79 sec
  60     | 39 sec       | 35 sec      | 50 sec         | 78 sec
  80     | 39 sec       | 32 sec      | 50 sec         | 78 sec
---------------------------------------------------------------------

If it was a matter of 40 threads being poorly distributed among physical 
and virtual cores, you'd expect /some/ improvement when going to 80 
threads, but apparently there is /no/ noteworthy difference whatsoever.


Post a reply to this message

From: jhu
Subject: Re: benchmarks o'interest
Date: 6 Aug 2014 13:50:01
Message: <web.53e26a5f1b6ac1fed19b0ec40@news.povray.org>
clipka <ano### [at] anonymousorg> wrote:
> Am 06.08.2014 11:20, schrieb Le_Forgeron:
> > Le 06/08/2014 10:45, clipka a écrit :
> >> Must be a high-core-count thing; on my 4-core i7, running Windows 7, I
> >> see this for a random scene:
> >
> >  From my point of view, it's a "use less than 50% of resources" thing.
> > The graph goes up to 40 threads, but the guy has 80 possible cores ( 40
> > true (4 x 10), x 2 by HT).
> >
> > The curves past 40 threads are only extension of points below 40. there
> > is not even a data point at 50, 60 or 70, when Xeon are used. All 40+
> > points are with Opteron (and there is no HT in Opteron... it's
> > different, Intel HT and AMD module are not the same)
> >
> > Opteron cores' count is 48 (4 x cpu, 2 dies of 3 piledriver module each,
> > each module providing 2 execution thread (core), 24 modules.
>
> jhu correctly points to the table at the bottom:
>
> ---------------------------------------------------------------------
> Selected data points showing effect of Hyper-Threading
>
> Threads | Linux HT off | Linux HT on | Windows HT off | Windows HT on
> --------|--------------|-------------|----------------|--------------
>   40     | 40 sec       | 40 sec      | 53 sec         | 79 sec
>   60     | 39 sec       | 35 sec      | 50 sec         | 78 sec
>   80     | 39 sec       | 32 sec      | 50 sec         | 78 sec
> ---------------------------------------------------------------------
>
> If it was a matter of 40 threads being poorly distributed among physical
> and virtual cores, you'd expect /some/ improvement when going to 80
> threads, but apparently there is /no/ noteworthy difference whatsoever.

Is the official Windows binary compiled with Microsoft's compiler or Intel's
compiler?


Post a reply to this message

Goto Latest 10 Messages Next 1 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.