POV-Ray: Newsgroups: povray.general: Making faster #switch statements: Off Topic: Re: Making faster #switch statements

POV-Ray : Newsgroups : povray.general : Making faster #switch statements : Off Topic: Re: Making faster #switch statements		Server Time 12 Aug 2024 17:15:56 EDT (-0400)
From: Thorsten Froehlich
Date: 5 Mar 1999 12:54:11
Message: <36e01a43.0@news.povray.org>
In article <36e00044.0@news.povray.org> , par### [at] my-dejanewscom (Ron 
Parker) wrote:

>>That depends a lot on the density of the case conditions you have. Assume
>>you write a switch statement for the tokenizer in POV-Ray: You have most of
>>the ASCII characters and a high density of the numbers you compare. So the
>>compiler generates a jump table. It will most likely be somewhere around 128
>>entries. You need two comparisons plus branches (range checking) and one
>>jump.
>
> This depends on your compiler.  I've seen modern compilers do one of two
> things when given a switch statement like

I assumed that this point would be obvious because it is a well known (I
thought) fact that different compilers generate different code. :-)
But maybe I should still have added it...

>Both look no less efficient than what you'd expect from an if-else
>chain.  The jump table might take up space, but it's pretty quick
>to execute.  Note the fancy footwork in the first case: it does the
>range checking for you in only three instructions, and only one
>subtraction.  You can always do this when the table is a power-of-two
>size.  If it makes sense to use a jump table in the first place, you
>can always pad it to the next power of two.

With CodeWarrior 4.1 I get these results (it does not offer assembler
output, so it is a bit harder to read the disassembled code - all compiled
on a Mac, of course :-)

The code on a 68K:

00000000: 4E56 0000          link      a6,#0
00000004: 202E 0008          move.l    8(a6),d0
00000008: 5380               subq.l    #1,d0
0000000A: 670A               beq.s     *+12           ; 0x00000016
0000000C: 5380               subq.l    #1,d0
0000000E: 670E               beq.s     *+16           ; 0x0000001e
00000010: 5580               subq.l    #2,d0
00000012: 6710               beq.s     *+18           ; 0x00000024
00000014: 6016               bra.s     *+24           ; 0x0000002c
00000016: 4EB9 0000 0000     jsr       foo
0000001C: 6014               bra.s     *+22           ; 0x00000032
0000001E: 4EB9 0000 0000     jsr       bar
00000024: 4EB9 0000 0000     jsr       baz
0000002A: 6006               bra.s     *+8            ; 0x00000032
0000002C: 4EB9 0000 0000     jsr       quux
00000032: 4E5E               unlk      a6
00000034: 4E75               rts
00000036: 8474 6573 7400     dc.b      0x84,'test',0x00
0000003C: 0000

And in PowerPC code:

00000000: 7C0802A6  mflr     r0
00000004: 90010008  stw      r0,8(SP)
00000008: 9421FFC0  stwu     SP,-64(SP)
0000000C: 2C030003  cmpwi    r3,3
00000010: 41820044  beq      *+68                    ; $00000054
00000014: 40800014  bge      *+20                    ; $00000028
00000018: 2C030001  cmpwi    r3,1
0000001C: 41820018  beq      *+24                    ; $00000034
00000020: 40800020  bge      *+32                    ; $00000040
00000024: 48000030  b        *+48                    ; $00000054
00000028: 2C030005  cmpwi    r3,5
0000002C: 40800028  bge      *+40                    ; $00000054
00000030: 48000018  b        *+24                    ; $00000048
00000034: 48000001  bl       .foo
00000038: 60000000  nop
0000003C: 48000020  b        *+32                    ; $0000005C
00000040: 48000001  bl       .bar
00000044: 60000000  nop
00000048: 48000001  bl       .baz
0000004C: 60000000  nop
00000050: 4800000C  b        *+12                    ; $0000005C
00000054: 48000001  bl       .quux
00000058: 60000000  nop
0000005C: 80010048  lwz      r0,72(SP)
00000060: 38210040  addi     SP,SP,64
00000064: 7C0803A6  mtlr     r0
00000068: 4E800020  blr

And in x86 code:

00000000: 8B 4C 24 04         mov       ecx,dword ptr [esp+4]
00000004: 89 C8               mov       eax,ecx
00000006: 2D 01 00 00 00      sub       eax,1
0000000B: 3D 03 00 00 00      cmp       eax,3
00000010: 77 2E               ja        $+48 ; --> 0x0040
00000012: FF 24 85 1C 00 00   jmp       dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00            lea       eax,dword ptr [eax+0]
0000001C: 2C 00               sub       al,0
0000001E: 00 00               add       byte ptr [eax],al
00000020: 33 00               xor       eax,dword ptr [eax]
00000022: 00 00               add       byte ptr [eax],al
00000024: 40                  inc       eax
00000025: 00 00               add       byte ptr [eax],al
00000027: 00 38               add       byte ptr [eax],bh
00000029: 00 00               add       byte ptr [eax],al
0000002B: 00 E8               add       al,ch
0000002D: 00 00               add       byte ptr [eax],al
0000002F: 00 00               add       byte ptr [eax],al
00000031: EB 12               jmp       $+20 ; --> 0x0045
00000033: E8 00 00 00 00      call      _bar
00000038: E8 00 00 00 00      call      _baz
0000003D: EB 06               jmp       $+8 ; --> 0x0045
0000003F: 90                  nop
00000040: E8 00 00 00 00      call      _quux
00000045: C3                  ret       near


As you can see, it does not optimize this case, too, but a larger one and
then the optimizer starts doing its work:

 switch(i)
 {
  case 1: foo(); break;
  case 2: bar(); /* fall through */
  case 4: baz(); break;
  case 5: foo1(); break;
  case 6: bar1(); break;
  case 7: baz1(); break;
  case 8: quux1(); break;
  default: quux(); break;
 }


In 68K code:

00000000: 4E56 0000          link      a6,#0
00000004: 202E 0008          move.l    8(a6),d0
00000008: 0C80 0000 0008     cmpi.l    #8,d0
0000000E: 6252               bhi.s     *+84           ; 0x00000062
00000010: D040               add.w     d0,d0
00000012: 303B 0006          move.w    (6,pc,d0.w),d0
00000016: 4EFB 0002          jmp       (2,pc,d0.w)
0000001A: 0048 0012          ori.w     #0x12,a0
0000001E: 001A 0048          ori.b     #0x48,(a2)+
00000022: 0020 0028          ori.b     #0x28,-(a0)
00000026: 0030 0038 0040     ori.b     #0x38,(64,a0,d0.w)
0000002C: 4EB9 0000 0000     jsr       foo
00000032: 6034               bra.s     *+54           ; 0x00000068
00000034: 4EB9 0000 0000     jsr       bar
0000003A: 4EB9 0000 0000     jsr       baz
00000040: 6026               bra.s     *+40           ; 0x00000068
00000042: 4EB9 0000 0000     jsr       foo1
00000048: 601E               bra.s     *+32           ; 0x00000068
0000004A: 4EB9 0000 0000     jsr       bar1
00000050: 6016               bra.s     *+24           ; 0x00000068
00000052: 4EB9 0000 0000     jsr       baz1
00000058: 600E               bra.s     *+16           ; 0x00000068
0000005A: 4EB9 0000 0000     jsr       quux1
00000060: 6006               bra.s     *+8            ; 0x00000068
00000062: 4EB9 0000 0000     jsr       quux
00000068: 4E5E               unlk      a6
0000006A: 4E75               rts
0000006C: 8474 6573 7400     dc.b      0x84,'test',0x00
00000072: 0000

In PowerPC code the optimizer takes over and you get:

00000000: 7C0802A6  mflr     r0
00000004: 90010008  stw      r0,8(SP)
00000008: 9421FFC0  stwu     SP,-64(SP)
0000000C: 28030008  cmplwi   r3,$0008
00000010: 41810068  bgt      *+104                   ; $00000078
00000014: 80820000  lwz      r4,@12(RTOC)
00000018: 5460103A  rlwinm   r0,r3,2,0,29
0000001C: 7C84002E  lwzx     r4,r4,r0
00000020: 7C8903A6  mtctr    r4
00000024: 4E800420  bctr
00000028: 48000001  bl       .foo
0000002C: 60000000  nop
00000030: 48000050  b        *+80                    ; $00000080
00000034: 48000001  bl       .bar
00000038: 60000000  nop
0000003C: 48000001  bl       .baz
00000040: 60000000  nop
00000044: 4800003C  b        *+60                    ; $00000080
00000048: 48000001  bl       .foo1
0000004C: 60000000  nop
00000050: 48000030  b        *+48                    ; $00000080
00000054: 48000001  bl       .bar1
00000058: 60000000  nop
0000005C: 48000024  b        *+36                    ; $00000080
00000060: 48000001  bl       .baz1
00000064: 60000000  nop
00000068: 48000018  b        *+24                    ; $00000080
0000006C: 48000001  bl       .quux1
00000070: 60000000  nop
00000074: 4800000C  b        *+12                    ; $00000080
00000078: 48000001  bl       .quux
0000007C: 60000000  nop
00000080: 80010048  lwz      r0,72(SP)
00000084: 38210040  addi     SP,SP,64
00000088: 7C0803A6  mtlr     r0
0000008C: 4E800020  blr

And in x86 code:

00000000: 8B 4C 24 04         mov       ecx,dword ptr [esp+4]
00000004: 89 C8               mov       eax,ecx
00000006: 2D 01 00 00 00      sub       eax,1
0000000B: 3D 07 00 00 00      cmp       eax,7
00000010: 77 5E               ja        $+96 ; --> 0x0070
00000012: FF 24 85 1C 00 00   jmp       dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00            lea       eax,dword ptr [eax+0]
0000001C: 3C 00               cmp       al,0
0000001E: 00 00               add       byte ptr [eax],al
00000020: 43                  inc       ebx
00000021: 00 00               add       byte ptr [eax],al
00000023: 00 70 00            add       byte ptr [eax+0],dh
00000026: 00 00               add       byte ptr [eax],al
00000028: 48                  dec       eax
00000029: 00 00               add       byte ptr [eax],al
0000002B: 00 50 00            add       byte ptr [eax+0],dl
0000002E: 00 00               add       byte ptr [eax],al
00000030: 57                  push      edi
00000031: 00 00               add       byte ptr [eax],al
00000033: 00 60 00            add       byte ptr [eax+0],ah
00000036: 00 00               add       byte ptr [eax],al
00000038: 67 00 00            add       byte ptr [bx][si],al
0000003B: 00 E8               add       al,ch
0000003D: 00 00               add       byte ptr [eax],al
0000003F: 00 00               add       byte ptr [eax],al
00000041: EB 32               jmp       $+52 ; --> 0x0075
00000043: E8 00 00 00 00      call      _bar
00000048: E8 00 00 00 00      call      _baz
0000004D: EB 26               jmp       $+40 ; --> 0x0075
0000004F: 90                  nop
00000050: E8 00 00 00 00      call      _foo1
00000055: EB 1E               jmp       $+32 ; --> 0x0075
00000057: E8 00 00 00 00      call      _bar1
0000005C: EB 17               jmp       $+25 ; --> 0x0075
0000005E: 89 C0               mov       eax,eax
00000060: E8 00 00 00 00      call      _baz1
00000065: EB 0E               jmp       $+16 ; --> 0x0075
00000067: E8 00 00 00 00      call      _quux1
0000006C: EB 07               jmp       $+9 ; --> 0x0075
0000006E: 89 C0               mov       eax,eax
00000070: E8 00 00 00 00      call      _quux
00000075: C3                  ret       near


I have to admit that I don't understand x86 assembly language very well. So
I am still wondering what all these add operations do. Any ideas?

>All of this is, of course, way off topic. :)

Well, might be true...


    Thorsten
Post a reply to this message