POV-Ray: Newsgroups: povray.general: Making faster #switch statements

POV-Ray : Newsgroups : povray.general : Making faster #switch statements		Server Time 23 Jul 2026 09:10:51 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Thorsten Froehlich
Subject: Off Topic: Re: Making faster #switch statements
Date: 5 Mar 1999 12:54:11
Message: <36e01a43.0@news.povray.org>

In article <36e00044.0@news.povray.org> , par### [at] my-dejanewscom (Ron 
Parker) wrote:

>>That depends a lot on the density of the case conditions you have. Assume
>>you write a switch statement for the tokenizer in POV-Ray: You have most of
>>the ASCII characters and a high density of the numbers you compare. So the
>>compiler generates a jump table. It will most likely be somewhere around 128
>>entries. You need two comparisons plus branches (range checking) and one
>>jump.
>
> This depends on your compiler.  I've seen modern compilers do one of two
> things when given a switch statement like

I assumed that this point would be obvious because it is a well known (I
thought) fact that different compilers generate different code. :-)
But maybe I should still have added it...

>Both look no less efficient than what you'd expect from an if-else
>chain.  The jump table might take up space, but it's pretty quick
>to execute.  Note the fancy footwork in the first case: it does the
>range checking for you in only three instructions, and only one
>subtraction.  You can always do this when the table is a power-of-two
>size.  If it makes sense to use a jump table in the first place, you
>can always pad it to the next power of two.

With CodeWarrior 4.1 I get these results (it does not offer assembler
output, so it is a bit harder to read the disassembled code - all compiled
on a Mac, of course :-)

The code on a 68K:

00000000: 4E56 0000          link      a6,#0
00000004: 202E 0008          move.l    8(a6),d0
00000008: 5380               subq.l    #1,d0
0000000A: 670A               beq.s     *+12           ; 0x00000016
0000000C: 5380               subq.l    #1,d0
0000000E: 670E               beq.s     *+16           ; 0x0000001e
00000010: 5580               subq.l    #2,d0
00000012: 6710               beq.s     *+18           ; 0x00000024
00000014: 6016               bra.s     *+24           ; 0x0000002c
00000016: 4EB9 0000 0000     jsr       foo
0000001C: 6014               bra.s     *+22           ; 0x00000032
0000001E: 4EB9 0000 0000     jsr       bar
00000024: 4EB9 0000 0000     jsr       baz
0000002A: 6006               bra.s     *+8            ; 0x00000032
0000002C: 4EB9 0000 0000     jsr       quux
00000032: 4E5E               unlk      a6
00000034: 4E75               rts
00000036: 8474 6573 7400     dc.b      0x84,'test',0x00
0000003C: 0000

And in PowerPC code:

00000000: 7C0802A6  mflr     r0
00000004: 90010008  stw      r0,8(SP)
00000008: 9421FFC0  stwu     SP,-64(SP)
0000000C: 2C030003  cmpwi    r3,3
00000010: 41820044  beq      *+68                    ; $00000054
00000014: 40800014  bge      *+20                    ; $00000028
00000018: 2C030001  cmpwi    r3,1
0000001C: 41820018  beq      *+24                    ; $00000034
00000020: 40800020  bge      *+32                    ; $00000040
00000024: 48000030  b        *+48                    ; $00000054
00000028: 2C030005  cmpwi    r3,5
0000002C: 40800028  bge      *+40                    ; $00000054
00000030: 48000018  b        *+24                    ; $00000048
00000034: 48000001  bl       .foo
00000038: 60000000  nop
0000003C: 48000020  b        *+32                    ; $0000005C
00000040: 48000001  bl       .bar
00000044: 60000000  nop
00000048: 48000001  bl       .baz
0000004C: 60000000  nop
00000050: 4800000C  b        *+12                    ; $0000005C
00000054: 48000001  bl       .quux
00000058: 60000000  nop
0000005C: 80010048  lwz      r0,72(SP)
00000060: 38210040  addi     SP,SP,64
00000064: 7C0803A6  mtlr     r0
00000068: 4E800020  blr

And in x86 code:

00000000: 8B 4C 24 04         mov       ecx,dword ptr [esp+4]
00000004: 89 C8               mov       eax,ecx
00000006: 2D 01 00 00 00      sub       eax,1
0000000B: 3D 03 00 00 00      cmp       eax,3
00000010: 77 2E               ja        $+48 ; --> 0x0040
00000012: FF 24 85 1C 00 00   jmp       dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00            lea       eax,dword ptr [eax+0]
0000001C: 2C 00               sub       al,0
0000001E: 00 00               add       byte ptr [eax],al
00000020: 33 00               xor       eax,dword ptr [eax]
00000022: 00 00               add       byte ptr [eax],al
00000024: 40                  inc       eax
00000025: 00 00               add       byte ptr [eax],al
00000027: 00 38               add       byte ptr [eax],bh
00000029: 00 00               add       byte ptr [eax],al
0000002B: 00 E8               add       al,ch
0000002D: 00 00               add       byte ptr [eax],al
0000002F: 00 00               add       byte ptr [eax],al
00000031: EB 12               jmp       $+20 ; --> 0x0045
00000033: E8 00 00 00 00      call      _bar
00000038: E8 00 00 00 00      call      _baz
0000003D: EB 06               jmp       $+8 ; --> 0x0045
0000003F: 90                  nop
00000040: E8 00 00 00 00      call      _quux
00000045: C3                  ret       near


As you can see, it does not optimize this case, too, but a larger one and
then the optimizer starts doing its work:

 switch(i)
 {
  case 1: foo(); break;
  case 2: bar(); /* fall through */
  case 4: baz(); break;
  case 5: foo1(); break;
  case 6: bar1(); break;
  case 7: baz1(); break;
  case 8: quux1(); break;
  default: quux(); break;
 }


In 68K code:

00000000: 4E56 0000          link      a6,#0
00000004: 202E 0008          move.l    8(a6),d0
00000008: 0C80 0000 0008     cmpi.l    #8,d0
0000000E: 6252               bhi.s     *+84           ; 0x00000062
00000010: D040               add.w     d0,d0
00000012: 303B 0006          move.w    (6,pc,d0.w),d0
00000016: 4EFB 0002          jmp       (2,pc,d0.w)
0000001A: 0048 0012          ori.w     #0x12,a0
0000001E: 001A 0048          ori.b     #0x48,(a2)+
00000022: 0020 0028          ori.b     #0x28,-(a0)
00000026: 0030 0038 0040     ori.b     #0x38,(64,a0,d0.w)
0000002C: 4EB9 0000 0000     jsr       foo
00000032: 6034               bra.s     *+54           ; 0x00000068
00000034: 4EB9 0000 0000     jsr       bar
0000003A: 4EB9 0000 0000     jsr       baz
00000040: 6026               bra.s     *+40           ; 0x00000068
00000042: 4EB9 0000 0000     jsr       foo1
00000048: 601E               bra.s     *+32           ; 0x00000068
0000004A: 4EB9 0000 0000     jsr       bar1
00000050: 6016               bra.s     *+24           ; 0x00000068
00000052: 4EB9 0000 0000     jsr       baz1
00000058: 600E               bra.s     *+16           ; 0x00000068
0000005A: 4EB9 0000 0000     jsr       quux1
00000060: 6006               bra.s     *+8            ; 0x00000068
00000062: 4EB9 0000 0000     jsr       quux
00000068: 4E5E               unlk      a6
0000006A: 4E75               rts
0000006C: 8474 6573 7400     dc.b      0x84,'test',0x00
00000072: 0000

In PowerPC code the optimizer takes over and you get:

00000000: 7C0802A6  mflr     r0
00000004: 90010008  stw      r0,8(SP)
00000008: 9421FFC0  stwu     SP,-64(SP)
0000000C: 28030008  cmplwi   r3,$0008
00000010: 41810068  bgt      *+104                   ; $00000078
00000014: 80820000  lwz      r4,@12(RTOC)
00000018: 5460103A  rlwinm   r0,r3,2,0,29
0000001C: 7C84002E  lwzx     r4,r4,r0
00000020: 7C8903A6  mtctr    r4
00000024: 4E800420  bctr
00000028: 48000001  bl       .foo
0000002C: 60000000  nop
00000030: 48000050  b        *+80                    ; $00000080
00000034: 48000001  bl       .bar
00000038: 60000000  nop
0000003C: 48000001  bl       .baz
00000040: 60000000  nop
00000044: 4800003C  b        *+60                    ; $00000080
00000048: 48000001  bl       .foo1
0000004C: 60000000  nop
00000050: 48000030  b        *+48                    ; $00000080
00000054: 48000001  bl       .bar1
00000058: 60000000  nop
0000005C: 48000024  b        *+36                    ; $00000080
00000060: 48000001  bl       .baz1
00000064: 60000000  nop
00000068: 48000018  b        *+24                    ; $00000080
0000006C: 48000001  bl       .quux1
00000070: 60000000  nop
00000074: 4800000C  b        *+12                    ; $00000080
00000078: 48000001  bl       .quux
0000007C: 60000000  nop
00000080: 80010048  lwz      r0,72(SP)
00000084: 38210040  addi     SP,SP,64
00000088: 7C0803A6  mtlr     r0
0000008C: 4E800020  blr

And in x86 code:

00000000: 8B 4C 24 04         mov       ecx,dword ptr [esp+4]
00000004: 89 C8               mov       eax,ecx
00000006: 2D 01 00 00 00      sub       eax,1
0000000B: 3D 07 00 00 00      cmp       eax,7
00000010: 77 5E               ja        $+96 ; --> 0x0070
00000012: FF 24 85 1C 00 00   jmp       dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00            lea       eax,dword ptr [eax+0]
0000001C: 3C 00               cmp       al,0
0000001E: 00 00               add       byte ptr [eax],al
00000020: 43                  inc       ebx
00000021: 00 00               add       byte ptr [eax],al
00000023: 00 70 00            add       byte ptr [eax+0],dh
00000026: 00 00               add       byte ptr [eax],al
00000028: 48                  dec       eax
00000029: 00 00               add       byte ptr [eax],al
0000002B: 00 50 00            add       byte ptr [eax+0],dl
0000002E: 00 00               add       byte ptr [eax],al
00000030: 57                  push      edi
00000031: 00 00               add       byte ptr [eax],al
00000033: 00 60 00            add       byte ptr [eax+0],ah
00000036: 00 00               add       byte ptr [eax],al
00000038: 67 00 00            add       byte ptr [bx][si],al
0000003B: 00 E8               add       al,ch
0000003D: 00 00               add       byte ptr [eax],al
0000003F: 00 00               add       byte ptr [eax],al
00000041: EB 32               jmp       $+52 ; --> 0x0075
00000043: E8 00 00 00 00      call      _bar
00000048: E8 00 00 00 00      call      _baz
0000004D: EB 26               jmp       $+40 ; --> 0x0075
0000004F: 90                  nop
00000050: E8 00 00 00 00      call      _foo1
00000055: EB 1E               jmp       $+32 ; --> 0x0075
00000057: E8 00 00 00 00      call      _bar1
0000005C: EB 17               jmp       $+25 ; --> 0x0075
0000005E: 89 C0               mov       eax,eax
00000060: E8 00 00 00 00      call      _baz1
00000065: EB 0E               jmp       $+16 ; --> 0x0075
00000067: E8 00 00 00 00      call      _quux1
0000006C: EB 07               jmp       $+9 ; --> 0x0075
0000006E: 89 C0               mov       eax,eax
00000070: E8 00 00 00 00      call      _quux
00000075: C3                  ret       near


I have to admit that I don't understand x86 assembly language very well. So
I am still wondering what all these add operations do. Any ideas?

>All of this is, of course, way off topic. :)

Well, might be true...


    Thorsten

Post a reply to this message

From: Rudy Velthuis
Subject: Re: Making faster #switch statements
Date: 5 Mar 1999 13:37:53
Message: <36e02481.0@news.povray.org>

Thorsten Froehlich schrieb in Nachricht <36dff57f.0@news.povray.org>...
>In article <MPG.1149eb79f78933399896a1@news.povray.org> ,
eag### [at] telekabelnl
>(Phoenix) wrote:
>
>> In general, the time it takes to read in a jump table is far greater than
>> just using the jump system that gets generated by an if-clause.
>
>That depends a lot on the density of the case conditions you have.

Exactly. IMHO most compilers will do a faster switch() than numerous
if-cascades. See also Ron Parker's output of two compilers. They're both
faster than a simple cascade of if's would be. Now this could be different
for the interpreted POV-script (I still didn't have a look at the code yet)
and interpreted BASIC (I think VB and other "compiled" Basics can handle
switch statements just as well as other languages).

--
Rudy Velthuis

Post a reply to this message

From: Rudy Velthuis
Subject: Re: Off Topic: Re: Making faster #switch statements
Date: 5 Mar 1999 13:37:54
Message: <36e02482.0@news.povray.org>

Thorsten Froehlich schrieb in Nachricht <36e01a43.0@news.povray.org>...

>And in x86 code:
>
>00000000: 8B 4C 24 04         mov       ecx,dword ptr [esp+4]
>00000004: 89 C8               mov       eax,ecx
>00000006: 2D 01 00 00 00      sub       eax,1
>0000000B: 3D 07 00 00 00      cmp       eax,7
>00000010: 77 5E               ja        $+96 ; --> 0x0070
>00000012: FF 24 85 1C 00 00   jmp       dword ptr [eax*4+.text++28] near
>00000018: 00
>00000019: 8D 40 00            lea       eax,dword ptr [eax+0]
>0000001C: 3C 00               cmp       al,0
>0000001E: 00 00               add       byte ptr [eax],al
>00000020: 43                  inc       ebx
>00000021: 00 00               add       byte ptr [eax],al
>00000023: 00 70 00            add       byte ptr [eax+0],dh
>00000026: 00 00               add       byte ptr [eax],al
>00000028: 48                  dec       eax
>00000029: 00 00               add       byte ptr [eax],al
>0000002B: 00 50 00            add       byte ptr [eax+0],dl
>0000002E: 00 00               add       byte ptr [eax],al
>00000030: 57                  push      edi
>00000031: 00 00               add       byte ptr [eax],al
>00000033: 00 60 00            add       byte ptr [eax+0],ah
>00000036: 00 00               add       byte ptr [eax],al
>00000038: 67 00 00            add       byte ptr [bx][si],al
>0000003B: 00 E8               add       al,ch
>0000003D: 00 00               add       byte ptr [eax],al
>0000003F: 00 00               add       byte ptr [eax],al
>00000041: EB 32               jmp       $+52 ; --> 0x0075
>00000043: E8 00 00 00 00      call      _bar
>00000048: E8 00 00 00 00      call      _baz
>0000004D: EB 26               jmp       $+40 ; --> 0x0075
>0000004F: 90                  nop
>00000050: E8 00 00 00 00      call      _foo1
>00000055: EB 1E               jmp       $+32 ; --> 0x0075
>00000057: E8 00 00 00 00      call      _bar1
>0000005C: EB 17               jmp       $+25 ; --> 0x0075
>0000005E: 89 C0               mov       eax,eax
>00000060: E8 00 00 00 00      call      _baz1
>00000065: EB 0E               jmp       $+16 ; --> 0x0075
>00000067: E8 00 00 00 00      call      _quux1
>0000006C: EB 07               jmp       $+9 ; --> 0x0075
>0000006E: 89 C0               mov       eax,eax
>00000070: E8 00 00 00 00      call      _quux
>00000075: C3                  ret       near
>
>
>I have to admit that I don't understand x86 assembly language very well. So
>I am still wondering what all these add operations do. Any ideas?


I was doing some arithmetic first, because I didn't get it either, but it's
quite simple. From 0000018 on, you don't see code, you see a kind of jump
table (dword entries). Of course the disassembler doesn't know this, so it
is shown as code.

--
Rudy Velthuis

Post a reply to this message

From: Ron Parker
Subject: Re: Making faster #switch statements
Date: 5 Mar 1999 14:42:11
Message: <36e03393.0@news.povray.org>

On Fri, 5 Mar 1999 19:28:28 +0100, Rudy Velthuis <rve### [at] gmxnet> wrote:
>
>Thorsten Froehlich schrieb in Nachricht <36dff57f.0@news.povray.org>...
>>In article <MPG.1149eb79f78933399896a1@news.povray.org> ,
>eag### [at] telekabelnl
>>(Phoenix) wrote:
>>
>>> In general, the time it takes to read in a jump table is far greater than
>>> just using the jump system that gets generated by an if-clause.
>>
>>That depends a lot on the density of the case conditions you have.
>
>Exactly. IMHO most compilers will do a faster switch() than numerous
>if-cascades. See also Ron Parker's output of two compilers. They're both
>faster than a simple cascade of if's would be.

Don't take my stuff as gospel, though.  I reverse-engineer stuff for a 
living, so what you see is from memory, probably not 100% accurate, 
could have even been hand-coded by some masochist, and is almost 
certainly filtered through my "understand-o-meter"

I don't know what about switch in POV would make it slow, other than the 
requirement that it parse even the parts it doesn't use, to find the parts 
it needs.  In this case, an array would be the best bet.  In other cases,
it's mostly empirical.  The idea should be to minimize the amount of
text in each case, but that won't always be the best solution either.

Post a reply to this message

From: Rudy Velthuis
Subject: Re: Making faster #switch statements
Date: 5 Mar 1999 17:43:15
Message: <36e05e03.0@news.povray.org>

Ron Parker schrieb in Nachricht <36e03393.0@news.povray.org>...
>>Exactly. IMHO most compilers will do a faster switch() than numerous
>>if-cascades. See also Ron Parker's output of two compilers. They're both
>>faster than a simple cascade of if's would be.
>
>Don't take my stuff as gospel, though.  I reverse-engineer stuff for a
>living, so what you see is from memory, probably not 100% accurate,
>could have even been hand-coded by some masochist, and is almost
>certainly filtered through my "understand-o-meter"


I don't do it for a living, but I've seen enough code (Pascal and C
disassembly) like the samples you showed to believe this is the way most
switch statements (should) work. Even the example Thorsten showed seems to
do this (although it was hard to recognise at a first glance).

>I don't know what about switch in POV would make it slow, other than the
>requirement that it parse even the parts it doesn't use, to find the parts
>it needs.  In this case, an array would be the best bet.  In other cases,
>it's mostly empirical.  The idea should be to minimize the amount of
>text in each case, but that won't always be the best solution either.

I don't know how the parser handles a #switch, but I'd expect it to look for
the end of the #switch block when it encounters #break and to look for the
next #case or #range, when a #case/#range check fails. Don't know how this
could be improved easily.

--
Rudy Velthuis

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages