POV-Ray: Newsgroups: povray.unofficial.patches: [announce] JITC: Really fast POVRay FPU: Re: [announce] JITC: Really fast POVRay FPU

POV-Ray : Newsgroups : povray.unofficial.patches : [announce] JITC: Really fast POVRay FPU : Re: [announce] JITC: Really fast POVRay FPU		Server Time 18 Jul 2025 18:53:41 EDT (-0400)

From: Wolfgang Wieser
Date: 4 Aug 2004 14:27:53
Message: <41112aa8@news.povray.org>

Christoph Hormann wrote:
> I think the work probably would have been better invested in
> implementing an internal JIT compiler using the existing hooks as
> Thorsten explained.  This would work on all x86 systems (and for Mac an
> implementation already exists).
> 
Well, I had a look at the PPC JIT Compiler in the Mac version. If I 
understand it correctly, then it is actually assembling the binary code for 
the PPC in memory and then executing it by jumping into that created 
code. 

After thinking about it, I found 2 reasons which can keep me from 
doing something like that for i386 architecture: 

(1) When I saw the POV VM instruction set it immediately reminded me 
  somehow on the PPC instruction set or a similar RISC instruction set 
  with a number of general-purpose registers etc. (I do not know the PPC 
  instructions in detail but this opinion was based on the feeling I had 
  of it from reading the computer magazines and from my experiences with 
  other RISCs.) 
  So, compiling this code into PPC code turns out to be pretty
  straight-forward. In contrast, the i387 does not seem to have  
  these general purpose registers but instead it uses a register stack with 
  IMO 8 registers and there is a top-of-stack pointer and so on. 
  Furthermore, I am by far no expert in i386/7 assembly and I do not want 
  to hack tons of error-prone code to perform correct translation of 
  POV-VM-ASM into i387-ASM. [r0 should be top of stack...]

(2) GCC does a decent work in optimizing. The POV VM compiler produces 
  assembly which IMO has plenty of (seemingly?) pointless register moves. 
  (Don't understand me wrong, Thorsten: Good register allocation is a really 
  tough job.) Take for example this part of the paramflower on my homepage. 
------------------------
        r0 = sqrt(r0);
        r5 = r5 * r0;
        r0 = r5;
        r5 = r6;  <-- completely useless
        r5 = r0;  <-- useless as well
        r0 = r2;
        r0 = sqrt(r0);
        r5 = r5 + r0;
        r6 = r5;
        r0 = POVFPU_Consts[k];
        r5 = r0;  <-- (skip)
        r7 = r5;  <-- why not r7=r0
        r0 = r2;  <-- (skip)
        r5 = r0;  <-- why not r5=r2
        r0 = r5;  <-- hmm?!
        r5 = r5 * r0;
        r5 = r5 * r0;
        r0 = r5;
        r5 = r7;
------------------------
  Compiling this assembly directly into i387 code would probably not give 
  as good runtime performance as asking GCC would. 
  And I do not want to implement an optimizer especially since there is 
  really little chance to get better than GCC if we're only 1 or 2 people...

Maybe it would be easier to translate the POV-VM-ASM into SSE2 instructions. 
But 2 reasons suggest against that: 
(1) Not yet widely available. 
(2) My box only has SSE1 and hence I could not test it. 

Wolfgang

Post a reply to this message