|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Damn! Isn't it exciting to see this much talk about actual povray code and
improvement rather than just read Orchid's blog posts all day? No offense,
Andrew! :D
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp <war### [at] tagpovrayorg> wrote:
> Since POV-Ray performs a lot of matrix multiplications (as well as
> vector x matrix multiplications) it could theoretically benefit from SSE
> optimizations. Of course it's quite difficult to say in (portable) C++
> "calculate this matrix multiplication in the most optimal way using SSE".
>
> OTOH, I wonder how much that would really speed it up, because AFAIK
> POV-Ray spends most of its time calculating ray-boundingbox and ray-surface
> intersections rather than multiplying vectors and matrices.
On my AMD64 Linux machine, POV-Ray 3.6 runs the benchmark in ~1770 seconds,
being a generic i686 binary. MegaPOV 1.2.1 AMD64 binary does the same stunt in
~1423 seconds. I expect this to be mainly due to SSE2.
No special SSE2 "optimization hints" have been coded into MegaPOV. It's just
plain POV 3.6 code with some functionality added. And compiled with different
options. (Both were compiled using g++)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"nemesis" <nam### [at] gmailcom> wrote:
> Damn! Isn't it exciting to see this much talk about actual povray code and
> improvement rather than just read Orchid's blog posts all day? No offense,
> Andrew! :D
Who is Orchid? Have I missed something...?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka <nomail@nomail> wrote:
> On my AMD64 Linux machine, POV-Ray 3.6 runs the benchmark in ~1770 seconds,
> being a generic i686 binary. MegaPOV 1.2.1 AMD64 binary does the same stunt in
> ~1423 seconds. I expect this to be mainly due to SSE2.
> No special SSE2 "optimization hints" have been coded into MegaPOV. It's just
> plain POV 3.6 code with some functionality added. And compiled with different
> options. (Both were compiled using g++)
That isn't very telling if they were compiled with different options.
They should be compiled with all the same options, except for SSE2
optimizations in order for the measurement to be reliable.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka <nomail@nomail> wrote:
> Who is Orchid? Have I missed something...?
Not much, really.
(Sorry Andrew. I *had* to. ;) )
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"clipka" <nomail@nomail> wrote:
> "nemesis" <nam### [at] gmailcom> wrote:
> > Damn! Isn't it exciting to see this much talk about actual povray code and
> > improvement rather than just read Orchid's blog posts all day? No offense,
> > Andrew! :D
>
> Who is Orchid? Have I missed something...?
Heh, our resident mascot. He seems to only roam around in off-topic and
bork.bork.bork... I suggest you keep on with your fine work and don't go there
yet, otherwise you might find yourself posting more than coding... ;)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> You are mistaken. All modern x86 compilers (gcc,icc,vc) can use it instead
>> of the x87 FPU. I think since versions 3.2, 8, and 7.1 respectively.
>
> "Can use" doesn't really tell how effectively they can use it.
This is not about semantics. I suggest you check yourself, Intel has plenty
of information available.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 31 Dec 2008 12:15:44
Message: <495ba8c0@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> If you are only going to use SSE as a direct substitute for the FPU,
> I assume that would be possible, but you probably won't get any significant
> speed benefit (perhaps even the contrary). In order to truely get benefit
> from SSE, you need to vectorize the calculations so that you can calculate
> many things in parallel. This is extremely hard, if not impossible for a
> compiler to do with random C/C++ code.
Warp, could you stop theorizing and actually *use* the information already
out there, supplied by Intel and plenty of other sources? SSE != SIMD
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 31 Dec 2008 12:17:02
Message: <495ba90e@news.povray.org>
|
|
|
| |
| |
|
|
clipka wrote:
> It doesn't seem to be very difficult with software like POV-ray: The Intel C++
> compiler keeps spitting out lots of "code was VECTORIZED" at me every time I
> compile the POV source code. Which is the compiler's way to say that it
> inserted an SSE2 instruction.
Please check out the current Intel information what SSE2 is actually
supposed to be used for. I am not talking about vectorized code or
auto-vectorization.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> clipka wrote:
>>> Don't expect all these to be "naive hardware implementation" in the same sense
>>> as, say, an integer addition, shift, bit-wise AND/OR/XOR or whatever.
>> Exactly that is why you ought to be looking at the SSE2/3 floating-point
>> registers and associated hardware support. The x87 FPU is only there for
>> legacy support and rather inefficient.
>
> Hah! Say that again...
>
> SSE3, SSSE3 to SSE4 is rather primitive compared to what the x87 FPU can do -
> except when it comes to bulk add, subtract, multiply or divide. Which is what
> they're designed for: Vectors and matrices. That's why they're called Streaming
> SIMD (= Single Instruction Multiple Data) Extensions.
>
> Search for trigonometric or logarithmic functions - you'll not find any in the
> SSE2 or SSE3 sections. You'll probably find that these still rely on good old
> x87 FPU instructions.
You are looking at the wrong manual. This manual does not tell you how to do
something but what is available. I admit the Intel documentation is not
clear, but x87 usage is deprecated. This is documented for x86-64 mode in OS
vendor information because the x86-64 ABIs even use the SSE registers for
argument passing (no more x87 FPU stack or memory mapped argument passing).
But googling for that information is difficult. One of the top-most useful
links I found was
<http://msdn.microsoft.com/en-us/library/bb147385.aspx#ID0EBEAA> - you will
have to look up the remaining information yourself. I guess AMD might have
more info, as they came up with x86-64.....
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |