![](/i/fill.gif) |
![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
From: Thorsten Froehlich
Subject: Re: function optimization question
Date: 31 Mar 2002 10:45:45
Message: <3ca72f29@news.povray.org>
|
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
In article <3ca6f75f$1@news.povray.org> , "Jan Walzer" <jan### [at] lzer net> wrote:
> No ... this is not what I meant ... I mean, that the prog, can modify itself
> in memory, to speedup execution ...
> ... or to make unreadable code ...
Which would make it close to impossible to just-in-time compile anything.
Anyway, here is a list from the source code. It may not be fully up-to-date
and probably won't we very readable in a newsreader, but that is what text
editiors and printers are for :-)
Thorsten
- - - - - - - - -
# Rs = R0 - R7, one of eight floating-point registers
# Rd = R0 - R7, one of eight floating-point registers
# k = constant, can be DBL or int depending on function
# SP = data stack point
# PSP = program stack pointer
# CC = condition code register
# PC = program counter
# global = global variable data space
# local = local variable data space
# const = const floating-point value data space
# eq = equal
# ne = not equal
# lt = lower
# le = lower or equal
# gt = greater
# ge = greater or equal
R-Type Instructions (9 * 64 = 576)
Opcode|Source| Dest | k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
00 | Rs | Rd | 0000 | add Rs, Rd | Rd + Rs -> Rd
01 | Rs | Rd | 0000 | sub Rs, Rd | Rd - Rs -> Rd
02 | Rs | Rd | 0000 | mul Rs, Rd | Rd * Rs -> Rd
03 | Rs | Rd | 0000 | div Rs, Rd | Rd / Rs -> Rd
04 | Rs | Rd | 0000 | mod Rs, Rd | Rd % Rs -> Rd
05 | Rs | Rd | 0000 | move Rs, Rd | Rs -> Rd
06 | Rs | Rd | 0000 | cmp Rs, Rd | Rd - Rs -> CC
07 | Rs | Rd | 0000 | neg Rs, Rd | -Rs -> Rd
08 | Rs | Rd | 0000 | abs Rs, Rd | |Rs| -> Rd
I-Type Instructions (7 * 8 = 56)
Opcode|Ext.Op| Dest | k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
09 | 00 | Rd |const(k)|addi c, Rd | Rd + const(k) -> Rd
09 | 01 | Rd |const(k)|subi c, Rd | Rd - const(k) -> Rd
09 | 02 | Rd |const(k)|muli c, Rd | Rd * const(k) -> Rd
09 | 03 | Rd |const(k)|divi c, Rd | Rd / const(k) -> Rd
09 | 04 | Rd |const(k)|modi c, Rd | Rd % const(k) -> Rd
09 | 05 | Rd |const(k)|loadi c, Rd | const(k) -> Rd
09 | 06 | Rd |const(k)|cmpi c, Rd | Rd - const(k) -> CC
JS-Type Instructions (6 * 8 = 48)
Opcode|Ext.Op| Dest | k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
10 | 00 | Rd | 0000 |seq Rd | (CC = eq) -> Rd
10 | 01 | Rd | 0000 |sne Rd | (CC = ne) -> Rd
10 | 02 | Rd | 0000 |slt Rd | (CC = lt) -> Rd
10 | 03 | Rd | 0000 |sle Rd | (CC = le) -> Rd
10 | 04 | Rd | 0000 |sgt Rd | (CC = gt) -> Rd
10 | 05 | Rd | 0000 |sge Rd | (CC = ge) -> Rd
10 | 06 | Rd | 0000 |teq Rd | (Rd == 0) -> Rd
10 | 07 | Rd | 0000 |tne Rd | (Rd != 0) -> Rd
ML-Type Instructions (2 * 8 = 16)
Opcode|Ext.Op| Dest | k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
11 | 00 | Rd | 0000 |load 0(k), Rd | global(k) -> Rd
11 | 01 | Rd | 0000 |load SP(k), Rd| local(k) -> Rd
MS-Type Instructions (2 * 8 = 16)
Opcode|Ext.Op|Source| k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
12 | 00 | Rs | 0000 |store Rs, 0(k)| Rs -> global(k)
12 | 01 | Rs | 0000 |store Rs, SP(k)| Rs -> local(k)
JB-Type Instructions (6)
Opcode|Ext.Op|Unused| k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
13 | 00 | 00 | offset |beq k | if (CC = eq) then k -> PC
13 | 01 | 00 | offset |bne k | if (CC = ne) then k -> PC
13 | 02 | 00 | offset |blt k | if (CC = lt) then k -> PC
13 | 03 | 00 | offset |ble k | if (CC = le) then k -> PC
13 | 04 | 00 | offset |bgt k | if (CC = gt) then k -> PC
13 | 05 | 00 | offset |bge k | if (CC = ge) then k -> PC
XS-Type Instructions (6 * 8 = 48)
Opcode|Ext.Op|Source| k-data |Instruction | Operation
4 bits|3 bits|3 bits|16 bits | |
------+------+------+--------+--------------+-----------------------------
14 | 00 | Rs | 0000 |xeq Rs | if (Rs == 0) then exception
14 | 01 | Rs | 0000 |xne Rs | if (Rs != 0) then exception
14 | 02 | Rs | 0000 |xlt Rs | if (Rs < 0) then exception
14 | 03 | Rs | 0000 |xle Rs | if (Rs <= 0) then exception
14 | 04 | Rs | 0000 |xgt Rs | if (Rs > 0) then exception
14 | 05 | Rs | 0000 |xge Rs | if (Rs >= 0) then exception
14 | 06 | Rs | 0000 |xdz R0, Rs | if (R0 == 0) && (Rs == 0) then
exception
X-Type Instructions (11)
Opcode|Ext.Op| k-data |Instruction | Operation
4 bits|6 bits|16 bits | |
------+------+--------+--------------+--------------------------------
15 | 00 | 0000 |jsr k | PSP + 1 -> PSP, PC -> (PSP), k -> PC
15 | 01 | 0000 |jmp k | k -> PC
15 | 02 | 0000 |rts | (PSP) -> PC, PSP - 1 -> PSP
15 | 03 | 0000 |call k | calls user-defined function k, result
will be in R0
15 | 04 | 0000 |sys1 k | calls internal function k with one
argument, result will be in R0
15 | 05 | 0000 |sys2 k | calls internal function k with two
arguments, result will be in R0
15 | 06 | 0000 |trap k | calls internal function k with
arguments on stack, result will be in R0
15 | 07 | 0000 |traps k | calls internal function k with
arguments on stack, result will be on the stack
15 | 08 | 0000 |grow k | grow the stack to hold k elements
(floating-point numbers)
15 | 09 | 0000 |push k | move the stack pointer k elements
forward
15 | 10 | 0000 |pop k | move the stack pointer k elements
backward
15 | 63 | 0000 |nop | no operation
Total number of defined instructions: 780
Density of instruction set: 780 / 1024 = 0.7617
OF COURSE THIS IS ALL SUBJECT TO COMPLETE CHANGE AND REDESIGN IN CURRENT AND
FUTURE VERSIONS OF POV-RAY!
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trf de
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Ahh... This list awakens warm memories from the time I coded in asm for
Spectrum and later for DOS.
It really is starting to look like something I would like to code in
the future.
Making a pattern with asm-code. Cool.
--
#macro N(D)#if(D>99)cylinder{M()#local D=div(D,104);M().5,2pigment{rgb M()}}
N(D)#end#end#macro M()<mod(D,13)-6mod(div(D,13)8)-3,10>#end blob{
N(11117333955)N(4254934330)N(3900569407)N(7382340)N(3358)N(970)}// - Warp -
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
In article <3ca6f75f$1@news.povray.org>, "Jan Walzer" <jan### [at] lzer net>
wrote:
> No ... this is not what I meant ... I mean, that the prog, can modify itself
> in memory, to speedup execution ...
Well, what's stopping you? You would have to interface through the
ordinary POV-Script and write it to the disk in order to load it again,
but the result would be the same: dynamically generated code being
loaded into memory and executed. Just a bit roundabout, and a large
chunk of the required code would have to be written in POV-Script
instead of being done by the function engine. It would also be very
slow...
Hmm, maybe write a VM in POV-Script. ;-)
> ... or to make unreadable code ...
It would certainly do this well.
--
Christopher James Huff <chr### [at] mac com>
POV-Ray TAG e-mail: chr### [at] tag povray org
TAG web site: http://tag.povray.org/
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
"Christopher James Huff" <chr### [at] mac com> wrote:
> > No ... this is not what I meant ... I mean, that the prog, can modify itself
> > in memory, to speedup execution ...
>
> Well, what's stopping you? You would have to interface through the
> ordinary POV-Script and write it to the disk in order to load it again,
> but the result would be the same: dynamically generated code being
> loaded into memory and executed. Just a bit roundabout, and a large
> chunk of the required code would have to be written in POV-Script
> instead of being done by the function engine. It would also be very
> slow...
nope... that's not what I meant...
Consider this pseudo-code
-----------------------------------
A=SomeExpression(Input)
repeat
DoSomething()
if (A=SomeValue)
FarCall SubRoutine1
else
NearCall SubRoutine2
endif
DoSomethingElse()
until (End)
------------------------------------
as you can see: the if-instruction is executed very often...
In Self-modifying-ASM-code one could write:
------------------------------------
A=SomeExpression(Input)
if (A=SomeValue)
[ChangeHere]=OpCodeOf(FarCall SubRoutine1)
else
[ChangeHere]=OpCodeOf(NearCall SubRoutine2)
endif
repeat
DoSomething()
:ChangeHere
noop
noop
noop
noop
DoSomethingElse()
until (End)
------------------------------------
I once had a chance, to speed up the Bresenham by a factor of 3 by a similar method.
The thing is that you can decide at runtime, which opcodes get executed, depending
on some runtime values
As there are(were) different opcodes used for a near-jump (target-address, beein in
the same codesegment) and a far-jump (jump to anywhere in memory) there was no easy
chance, to work only with pointers, and only change these...
Of course, my last lines applied to my active assembler time, when I did some
x86-Realmode ASM ....
I hope you got the idea ...
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Ahhh ... nice ...
Thats what I wanted to read ...
Seems very complete and useful ...
Will this get included into the official doc ? ...
I would much apreciate this ...
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
> Ahh... This list awakens warm memories from the time I coded in asm for
> Spectrum and later for DOS.
hehe ... yeah, me too ...
> It really is starting to look like something I would like to code in
> the future.
> Making a pattern with asm-code. Cool.
... just realized ...
this would also mean: creating objects with asm-code, as we can
easily turn these patterns into isos ... *g
... I'm waiting for the first signatures in ASM-style ... ;)
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
> I wonder if this is optimized if functions evaluator:
>
> select(
> x*Ey+y*Ex-R1*Ex,
> select(
> z+3*x.
> x*Ey+y*Ex-R1*Ex,
> 123
> ),
> 17
> )
>
> Note expression "x*Ey+y*Ex-R1*Ex" appear twice.
> How many times is it calculated ?
Thorsten says twice. However, there is a way to avoid that.
See the (untested) code below.
#declare Func1 =
function(x,y,z,Ex,Ey,Value){
select(
Value,
select( z+3*x, Value, 123 ),
17
)
}
#declare Func2 =
function(x,y,z,Ex,Ey){Func1(x,y,z,Ex,Ey,x*Ey+y*Ex-R1*Ex)}
I don't know in this case if the additional function call slows down more
than the calculation of the expression did, but it's worth trying out. In
cases where the same expression appears many times it can give a big
speed-up anyway.
Maybe you already knew this, but then, maybe you didn't, so...
Rune
--
3D images and anims, include files, tutorials and more:
Rune's World: http://rsj.mobilixnet.dk (updated Feb 16)
POV-Ray Users: http://rsj.mobilixnet.dk/povrayusers/
POV-Ray Ring: http://webring.povray.co.uk
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
From: Thorsten Froehlich
Subject: Re: function optimization question
Date: 31 Mar 2002 17:35:40
Message: <3ca78f3c@news.povray.org>
|
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
In article <3ca78098@news.povray.org> , "Jan Walzer" <jan### [at] lzer net> wrote:
> I once had a chance, to speed up the Bresenham by a factor of 3 by a similar
> method. The thing is that you can decide at runtime, which opcodes get
> executed, depending on some runtime values As there are(were) different
> opcodes used for a near-jump (target-address, beein in the same codesegment)
> and a far-jump (jump to anywhere in memory) there was no easy chance, to work
> only with pointers, and only change these...
Not only is this a terrible idea to code anything, it also kills all modern
processor performance. Well, all but x86 processors of course for which
designers actually have to interlock (or use common) the data and instruction
caches to support such nonsense. Not that it won't cripple performance for
them as well these days, but on a good old Pentium it will still be fast...
On any reasonable architecture not supporting 25 years of eight bit legacy
backward compatibility and other junk self-modifying code will fortunately
break! :-)
Thorsten
____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trf de
Visit POV-Ray on the web: http://mac.povray.org
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
"Thorsten Froehlich" <tho### [at] trf de> wrote:
> > I once had a chance, to speed up the Bresenham by a factor of 3 by a similar
> > method. The thing is that you can decide at runtime, which opcodes get
> > executed, depending on some runtime values As there are(were) different
> > opcodes used for a near-jump (target-address, beein in the same codesegment)
> > and a far-jump (jump to anywhere in memory) there was no easy chance, to work
> > only with pointers, and only change these...
>
> Not only is this a terrible idea to code anything, it also kills all modern
[...]
> On any reasonable architecture not supporting 25 years of eight bit legacy
> backward compatibility and other junk self-modifying code will fortunately
> break! :-)
Hey ... for my studies, I had the chance to "port" this x86-code to
MIPS-ASM ... It was perfectly possible, meaning, it worked...
... I couldn't messure any speedup, as it run on a software-MIPS-simulater,
called "SPIM" ... if it was useful, is another chapter, but at least it was
fun (for me, at least) ...
... but hey: We all know, you prefer Mac ;) ...
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
I think that the problem with self-modifying code nowadays is that, if
supported by the processor (not all processors support that; in them the
code to be executed is read-only), it effectively kills most optimizations
performed by the processor.
If the code being executed is modified, it invalidates the pipelines and
perhaps even the code in the cache, plus all the branch prediction stuff
and so on. This causes the same effect as loading the code from RAM for
the first time. If this is done in each loop, it's extremely slow.
If a very tight loop modifies itself, it could result in a code which
is several tens of times slower than an equivalent code with no
self-modification.
Of course self-modification is ok if you first modify the code to be
executed, and then this code is executed for a long period of time.
In fact, the OS does exactly this when it loads an exe file to memory:
It modifies its long jump addresses to match the place in memory where it
loaded it (with the help of the relocation table which can be found in
the header of the exe file).
--
#macro M(A,N,D,L)plane{-z,-9pigment{mandel L*9translate N color_map{[0rgb x]
[1rgb 9]}scale<D,D*3D>*1e3}rotate y*A*8}#end M(-3<1.206434.28623>70,7)M(
-1<.7438.1795>1,20)M(1<.77595.13699>30,20)M(3<.75923.07145>80,99)// - Warp -
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |