POV-Ray: Newsgroups: povray.advanced-users: function optimization question

POV-Ray : Newsgroups : povray.advanced-users : function optimization question		Server Time 13 Jul 2025 01:36:57 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 5 Messages >>>

From: Thorsten Froehlich
Subject: Re: function optimization question
Date: 31 Mar 2002 10:45:45
Message: <3ca72f29@news.povray.org>

In article <3ca6f75f$1@news.povray.org> , "Jan Walzer" <jan### [at] lzernet> wrote:

> No ... this is not what I meant ... I mean, that the prog, can modify itself
> in memory, to speedup execution ...
> ... or to make unreadable code ...

Which would make it close to impossible to just-in-time compile anything.

Anyway, here is a list from the source code.  It may not be fully up-to-date
and probably won't we very readable in a newsreader, but that is what text
editiors and printers are for :-)

    Thorsten

- - - - - - - - -

# Rs = R0 - R7, one of eight floating-point registers
# Rd = R0 - R7, one of eight floating-point registers
# k = constant, can be DBL or int depending on function
# SP = data stack point
# PSP = program stack pointer
# CC = condition code register
# PC = program counter
# global = global variable data space
# local = local variable data space
# const = const floating-point value data space
# eq = equal
# ne = not equal
# lt = lower
# le = lower or equal
# gt = greater
# ge = greater or equal


R-Type Instructions (9 * 64 = 576)

Opcode|Source| Dest | k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  00  |  Rs  |  Rd  |  0000  | add   Rs, Rd | Rd + Rs -> Rd
  01  |  Rs  |  Rd  |  0000  | sub   Rs, Rd | Rd - Rs -> Rd
  02  |  Rs  |  Rd  |  0000  | mul   Rs, Rd | Rd * Rs -> Rd
  03  |  Rs  |  Rd  |  0000  | div   Rs, Rd | Rd / Rs -> Rd
  04  |  Rs  |  Rd  |  0000  | mod   Rs, Rd | Rd % Rs -> Rd
  05  |  Rs  |  Rd  |  0000  | move  Rs, Rd | Rs -> Rd
  06  |  Rs  |  Rd  |  0000  | cmp   Rs, Rd | Rd - Rs -> CC
  07  |  Rs  |  Rd  |  0000  | neg   Rs, Rd | -Rs -> Rd
  08  |  Rs  |  Rd  |  0000  | abs   Rs, Rd | |Rs| -> Rd


I-Type Instructions (7 * 8 = 56)

Opcode|Ext.Op| Dest | k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  09  |  00  |  Rd  |const(k)|addi  c, Rd   | Rd + const(k) -> Rd
  09  |  01  |  Rd  |const(k)|subi  c, Rd   | Rd - const(k) -> Rd
  09  |  02  |  Rd  |const(k)|muli  c, Rd   | Rd * const(k) -> Rd
  09  |  03  |  Rd  |const(k)|divi  c, Rd   | Rd / const(k) -> Rd
  09  |  04  |  Rd  |const(k)|modi  c, Rd   | Rd % const(k) -> Rd
  09  |  05  |  Rd  |const(k)|loadi c, Rd   | const(k) -> Rd
  09  |  06  |  Rd  |const(k)|cmpi  c, Rd   | Rd - const(k) -> CC


JS-Type Instructions (6 * 8 = 48)

Opcode|Ext.Op| Dest | k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  10  |  00  |  Rd  |  0000  |seq   Rd      | (CC = eq) -> Rd
  10  |  01  |  Rd  |  0000  |sne   Rd      | (CC = ne) -> Rd
  10  |  02  |  Rd  |  0000  |slt   Rd      | (CC = lt) -> Rd
  10  |  03  |  Rd  |  0000  |sle   Rd      | (CC = le) -> Rd
  10  |  04  |  Rd  |  0000  |sgt   Rd      | (CC = gt) -> Rd
  10  |  05  |  Rd  |  0000  |sge   Rd      | (CC = ge) -> Rd
  10  |  06  |  Rd  |  0000  |teq   Rd      | (Rd == 0) -> Rd
  10  |  07  |  Rd  |  0000  |tne   Rd      | (Rd != 0) -> Rd


ML-Type Instructions (2 * 8 = 16)

Opcode|Ext.Op| Dest | k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  11  |  00  |  Rd  |  0000  |load 0(k), Rd | global(k) -> Rd
  11  |  01  |  Rd  |  0000  |load SP(k), Rd| local(k) -> Rd


MS-Type Instructions (2 * 8 = 16)

Opcode|Ext.Op|Source| k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  12  |  00  |  Rs  |  0000  |store Rs, 0(k)| Rs -> global(k)
  12  |  01  |  Rs  |  0000  |store Rs, SP(k)| Rs -> local(k)


JB-Type Instructions (6)

Opcode|Ext.Op|Unused| k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  13  |  00  |  00  | offset |beq   k       | if (CC = eq) then k -> PC
  13  |  01  |  00  | offset |bne   k       | if (CC = ne) then k -> PC
  13  |  02  |  00  | offset |blt   k       | if (CC = lt) then k -> PC
  13  |  03  |  00  | offset |ble   k       | if (CC = le) then k -> PC
  13  |  04  |  00  | offset |bgt   k       | if (CC = gt) then k -> PC
  13  |  05  |  00  | offset |bge   k       | if (CC = ge) then k -> PC


XS-Type Instructions (6 * 8 = 48)

Opcode|Ext.Op|Source| k-data |Instruction   | Operation
4 bits|3 bits|3 bits|16 bits |              |
------+------+------+--------+--------------+-----------------------------
  14  |  00  |  Rs  |  0000  |xeq   Rs      | if (Rs == 0) then exception
  14  |  01  |  Rs  |  0000  |xne   Rs      | if (Rs != 0) then exception
  14  |  02  |  Rs  |  0000  |xlt   Rs      | if (Rs < 0) then exception
  14  |  03  |  Rs  |  0000  |xle   Rs      | if (Rs <= 0) then exception
  14  |  04  |  Rs  |  0000  |xgt   Rs      | if (Rs > 0) then exception
  14  |  05  |  Rs  |  0000  |xge   Rs      | if (Rs >= 0) then exception
  14  |  06  |  Rs  |  0000  |xdz   R0, Rs  | if (R0 == 0) && (Rs == 0) then
exception


X-Type Instructions (11)

Opcode|Ext.Op| k-data |Instruction   | Operation
4 bits|6 bits|16 bits |              |
------+------+--------+--------------+--------------------------------
  15  |  00  |  0000  |jsr   k       | PSP + 1 -> PSP, PC -> (PSP), k -> PC
  15  |  01  |  0000  |jmp   k       | k -> PC
  15  |  02  |  0000  |rts           | (PSP) -> PC, PSP - 1 -> PSP
  15  |  03  |  0000  |call  k       | calls user-defined function k, result
will be in R0
  15  |  04  |  0000  |sys1  k       | calls internal function k with one
argument, result will be in R0
  15  |  05  |  0000  |sys2  k       | calls internal function k with two
arguments, result will be in R0
  15  |  06  |  0000  |trap  k       | calls internal function k with
arguments on stack, result will be in R0
  15  |  07  |  0000  |traps k       | calls internal function k with
arguments on stack, result will be on the stack
  15  |  08  |  0000  |grow  k       | grow the stack to hold k elements
(floating-point numbers)
  15  |  09  |  0000  |push  k       | move the stack pointer k elements
forward
  15  |  10  |  0000  |pop   k       | move the stack pointer k elements
backward
  15  |  63  |  0000  |nop           | no operation

Total number of defined instructions: 780
Density of instruction set: 780 / 1024 = 0.7617


OF COURSE THIS IS ALL SUBJECT TO COMPLETE CHANGE AND REDESIGN IN CURRENT AND
FUTURE VERSIONS OF POV-RAY!

____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde

Visit POV-Ray on the web: http://mac.povray.org

Post a reply to this message

From: Warp
Subject: Re: function optimization question
Date: 31 Mar 2002 14:07:09
Message: <3ca75e5d@news.povray.org>

Ahh... This list awakens warm memories from the time I coded in asm for
Spectrum and later for DOS.
  It really is starting to look like something I would like to code in
the future.
  Making a pattern with asm-code. Cool.

-- 
#macro N(D)#if(D>99)cylinder{M()#local D=div(D,104);M().5,2pigment{rgb M()}}
N(D)#end#end#macro M()<mod(D,13)-6mod(div(D,13)8)-3,10>#end blob{
N(11117333955)N(4254934330)N(3900569407)N(7382340)N(3358)N(970)}//  - Warp -

Post a reply to this message

From: Christopher James Huff
Subject: Re: function optimization question
Date: 31 Mar 2002 15:21:37
Message: <chrishuff-5C9513.15223731032002@netplex.aussie.org>

In article <3ca6f75f$1@news.povray.org>, "Jan Walzer" <jan### [at] lzernet> 
wrote:

> No ... this is not what I meant ... I mean, that the prog, can modify itself
> in memory, to speedup execution ...

Well, what's stopping you? You would have to interface through the 
ordinary POV-Script and write it to the disk in order to load it again, 
but the result would be the same: dynamically generated code being 
loaded into memory and executed. Just a bit roundabout, and a large 
chunk of the required code would have to be written in POV-Script 
instead of being done by the function engine. It would also be very 
slow...
Hmm, maybe write a VM in POV-Script. ;-)

> ... or to make unreadable code ...

It would certainly do this well.

-- 
Christopher James Huff <chr### [at] maccom>
POV-Ray TAG e-mail: chr### [at] tagpovrayorg
TAG web site: http://tag.povray.org/

Post a reply to this message

From: Jan Walzer
Subject: Re: function optimization question
Date: 31 Mar 2002 16:33:12
Message: <3ca78098@news.povray.org>

"Christopher James Huff" <chr### [at] maccom> wrote:
> > No ... this is not what I meant ... I mean, that the prog, can modify itself
> > in memory, to speedup execution ...
>
> Well, what's stopping you? You would have to interface through the
> ordinary POV-Script and write it to the disk in order to load it again,
> but the result would be the same: dynamically generated code being
> loaded into memory and executed. Just a bit roundabout, and a large
> chunk of the required code would have to be written in POV-Script
> instead of being done by the function engine. It would also be very
> slow...

nope... that's not what I meant...
Consider this pseudo-code

-----------------------------------
A=SomeExpression(Input)

repeat

DoSomething()

if (A=SomeValue)
    FarCall SubRoutine1
  else
    NearCall SubRoutine2
endif

DoSomethingElse()

until (End)
------------------------------------

as you can see: the if-instruction is executed very often...
In Self-modifying-ASM-code one could write:

------------------------------------
A=SomeExpression(Input)

if (A=SomeValue)
    [ChangeHere]=OpCodeOf(FarCall SubRoutine1)
  else
    [ChangeHere]=OpCodeOf(NearCall SubRoutine2)
endif

repeat

DoSomething()

:ChangeHere
noop
noop
noop
noop

DoSomethingElse()

until (End)
------------------------------------

I once had a chance, to speed up the Bresenham by a factor of 3 by a similar method.
The thing is that you can decide at runtime, which opcodes get executed, depending
on some runtime values
As there are(were) different opcodes used for a near-jump (target-address, beein in
the same codesegment) and a far-jump (jump to anywhere in memory) there was no easy
chance, to work only with pointers, and only change these...

Of course, my last lines applied to my active assembler time, when  I did some
x86-Realmode ASM ....

I hope you got the idea ...

Post a reply to this message

From: Jan Walzer
Subject: Re: function optimization question
Date: 31 Mar 2002 16:45:50
Message: <3ca7838e@news.povray.org>

Ahhh ... nice ...

Thats what I wanted to read ...

Seems very complete and useful ...

Will this get included into the official doc ? ...
I would much apreciate this ...

Post a reply to this message

From: Jan Walzer
Subject: Re: function optimization question
Date: 31 Mar 2002 16:57:02
Message: <3ca7862e@news.povray.org>

>   Ahh... This list awakens warm memories from the time I coded in asm for
>   Spectrum and later for DOS.

hehe ... yeah, me too ...

>   It really is starting to look like something I would like to code in
>   the future.
>   Making a pattern with asm-code. Cool.

... just realized ...

this would also mean: creating objects with asm-code, as we can
easily turn these patterns into isos ... *g

... I'm waiting for the first signatures in ASM-style ... ;)

Post a reply to this message

From: Rune
Subject: Re: function optimization question
Date: 31 Mar 2002 17:21:36
Message: <3ca78bf0@news.povray.org>


> I wonder if this is optimized if functions evaluator:
>
> select(
>   x*Ey+y*Ex-R1*Ex,
>   select(
>     z+3*x.
>     x*Ey+y*Ex-R1*Ex,
>     123
>   ),
>  17
> )
>
> Note expression "x*Ey+y*Ex-R1*Ex" appear twice.
> How many times is it calculated ?

Thorsten says twice. However, there is a way to avoid that.

See the (untested) code below.

#declare Func1 =
function(x,y,z,Ex,Ey,Value){
  select(
    Value,
    select( z+3*x, Value, 123 ),
    17
  )
}

#declare Func2 =
function(x,y,z,Ex,Ey){Func1(x,y,z,Ex,Ey,x*Ey+y*Ex-R1*Ex)}

I don't know in this case if the additional function call slows down more
than the calculation of the expression did, but it's worth trying out. In
cases where the same expression appears many times it can give a big
speed-up anyway.

Maybe you already knew this, but then, maybe you didn't, so...

Rune
--
3D images and anims, include files, tutorials and more:
Rune's World:  http://rsj.mobilixnet.dk (updated Feb 16)
POV-Ray Users: http://rsj.mobilixnet.dk/povrayusers/
POV-Ray Ring:  http://webring.povray.co.uk

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: function optimization question
Date: 31 Mar 2002 17:35:40
Message: <3ca78f3c@news.povray.org>

In article <3ca78098@news.povray.org> , "Jan Walzer" <jan### [at] lzernet> wrote:

> I once had a chance, to speed up the Bresenham by a factor of 3 by a similar
> method. The thing is that you can decide at runtime, which opcodes get
> executed, depending on some runtime values As there are(were) different
> opcodes used for a near-jump (target-address, beein in the same codesegment)
> and a far-jump (jump to anywhere in memory) there was no easy chance, to work
> only with pointers, and only change these...

Not only is this a terrible idea to code anything, it also kills all modern
processor performance.  Well, all but x86 processors of course for which
designers actually have to interlock (or use common) the data and instruction
caches to support such nonsense.  Not that it won't cripple performance for
them as well these days, but on a good old Pentium it will still be fast...

On any reasonable architecture not supporting 25 years of eight bit legacy
backward compatibility and other junk self-modifying code will fortunately
break! :-)

    Thorsten

____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde

Visit POV-Ray on the web: http://mac.povray.org

Post a reply to this message

From: Jan Walzer
Subject: Re: function optimization question
Date: 31 Mar 2002 17:50:57
Message: <3ca792d1@news.povray.org>

"Thorsten Froehlich" <tho### [at] trfde> wrote:
> > I once had a chance, to speed up the Bresenham by a factor of 3 by a similar
> > method. The thing is that you can decide at runtime, which opcodes get
> > executed, depending on some runtime values As there are(were) different
> > opcodes used for a near-jump (target-address, beein in the same codesegment)
> > and a far-jump (jump to anywhere in memory) there was no easy chance, to work
> > only with pointers, and only change these...
>
> Not only is this a terrible idea to code anything, it also kills all modern
[...]

> On any reasonable architecture not supporting 25 years of eight bit legacy
> backward compatibility and other junk self-modifying code will fortunately
> break! :-)

Hey ... for my studies, I had the chance to "port" this x86-code to
MIPS-ASM ... It was perfectly possible, meaning, it worked...
... I couldn't messure any speedup, as it run on a software-MIPS-simulater,
called "SPIM" ... if it was useful, is another chapter, but at least it was
fun (for me, at least) ...

... but hey: We all know, you prefer Mac ;) ...

Post a reply to this message

From: Warp
Subject: Re: function optimization question
Date: 31 Mar 2002 21:52:55
Message: <3ca7cb87@news.povray.org>

I think that the problem with self-modifying code nowadays is that, if
supported by the processor (not all processors support that; in them the
code to be executed is read-only), it effectively kills most optimizations
performed by the processor.
  If the code being executed is modified, it invalidates the pipelines and
perhaps even the code in the cache, plus all the branch prediction stuff
and so on. This causes the same effect as loading the code from RAM for
the first time. If this is done in each loop, it's extremely slow.
  If a very tight loop modifies itself, it could result in a code which
is several tens of times slower than an equivalent code with no
self-modification.

  Of course self-modification is ok if you first modify the code to be
executed, and then this code is executed for a long period of time.
  In fact, the OS does exactly this when it loads an exe file to memory:
It modifies its long jump addresses to match the place in memory where it
loaded it (with the help of the relocation table which can be found in
the header of the exe file).

-- 
#macro M(A,N,D,L)plane{-z,-9pigment{mandel L*9translate N color_map{[0rgb x]
[1rgb 9]}scale<D,D*3D>*1e3}rotate y*A*8}#end M(-3<1.206434.28623>70,7)M(
-1<.7438.1795>1,20)M(1<.77595.13699>30,20)M(3<.75923.07145>80,99)// - Warp -

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 5 Messages >>>