|
![](/i/fill.gif) |
In article <36e00044.0@news.povray.org> , par### [at] my-dejanews com (Ron
Parker) wrote:
>>That depends a lot on the density of the case conditions you have. Assume
>>you write a switch statement for the tokenizer in POV-Ray: You have most of
>>the ASCII characters and a high density of the numbers you compare. So the
>>compiler generates a jump table. It will most likely be somewhere around 128
>>entries. You need two comparisons plus branches (range checking) and one
>>jump.
>
> This depends on your compiler. I've seen modern compilers do one of two
> things when given a switch statement like
I assumed that this point would be obvious because it is a well known (I
thought) fact that different compilers generate different code. :-)
But maybe I should still have added it...
>Both look no less efficient than what you'd expect from an if-else
>chain. The jump table might take up space, but it's pretty quick
>to execute. Note the fancy footwork in the first case: it does the
>range checking for you in only three instructions, and only one
>subtraction. You can always do this when the table is a power-of-two
>size. If it makes sense to use a jump table in the first place, you
>can always pad it to the next power of two.
With CodeWarrior 4.1 I get these results (it does not offer assembler
output, so it is a bit harder to read the disassembled code - all compiled
on a Mac, of course :-)
The code on a 68K:
00000000: 4E56 0000 link a6,#0
00000004: 202E 0008 move.l 8(a6),d0
00000008: 5380 subq.l #1,d0
0000000A: 670A beq.s *+12 ; 0x00000016
0000000C: 5380 subq.l #1,d0
0000000E: 670E beq.s *+16 ; 0x0000001e
00000010: 5580 subq.l #2,d0
00000012: 6710 beq.s *+18 ; 0x00000024
00000014: 6016 bra.s *+24 ; 0x0000002c
00000016: 4EB9 0000 0000 jsr foo
0000001C: 6014 bra.s *+22 ; 0x00000032
0000001E: 4EB9 0000 0000 jsr bar
00000024: 4EB9 0000 0000 jsr baz
0000002A: 6006 bra.s *+8 ; 0x00000032
0000002C: 4EB9 0000 0000 jsr quux
00000032: 4E5E unlk a6
00000034: 4E75 rts
00000036: 8474 6573 7400 dc.b 0x84,'test',0x00
0000003C: 0000
And in PowerPC code:
00000000: 7C0802A6 mflr r0
00000004: 90010008 stw r0,8(SP)
00000008: 9421FFC0 stwu SP,-64(SP)
0000000C: 2C030003 cmpwi r3,3
00000010: 41820044 beq *+68 ; $00000054
00000014: 40800014 bge *+20 ; $00000028
00000018: 2C030001 cmpwi r3,1
0000001C: 41820018 beq *+24 ; $00000034
00000020: 40800020 bge *+32 ; $00000040
00000024: 48000030 b *+48 ; $00000054
00000028: 2C030005 cmpwi r3,5
0000002C: 40800028 bge *+40 ; $00000054
00000030: 48000018 b *+24 ; $00000048
00000034: 48000001 bl .foo
00000038: 60000000 nop
0000003C: 48000020 b *+32 ; $0000005C
00000040: 48000001 bl .bar
00000044: 60000000 nop
00000048: 48000001 bl .baz
0000004C: 60000000 nop
00000050: 4800000C b *+12 ; $0000005C
00000054: 48000001 bl .quux
00000058: 60000000 nop
0000005C: 80010048 lwz r0,72(SP)
00000060: 38210040 addi SP,SP,64
00000064: 7C0803A6 mtlr r0
00000068: 4E800020 blr
And in x86 code:
00000000: 8B 4C 24 04 mov ecx,dword ptr [esp+4]
00000004: 89 C8 mov eax,ecx
00000006: 2D 01 00 00 00 sub eax,1
0000000B: 3D 03 00 00 00 cmp eax,3
00000010: 77 2E ja $+48 ; --> 0x0040
00000012: FF 24 85 1C 00 00 jmp dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00 lea eax,dword ptr [eax+0]
0000001C: 2C 00 sub al,0
0000001E: 00 00 add byte ptr [eax],al
00000020: 33 00 xor eax,dword ptr [eax]
00000022: 00 00 add byte ptr [eax],al
00000024: 40 inc eax
00000025: 00 00 add byte ptr [eax],al
00000027: 00 38 add byte ptr [eax],bh
00000029: 00 00 add byte ptr [eax],al
0000002B: 00 E8 add al,ch
0000002D: 00 00 add byte ptr [eax],al
0000002F: 00 00 add byte ptr [eax],al
00000031: EB 12 jmp $+20 ; --> 0x0045
00000033: E8 00 00 00 00 call _bar
00000038: E8 00 00 00 00 call _baz
0000003D: EB 06 jmp $+8 ; --> 0x0045
0000003F: 90 nop
00000040: E8 00 00 00 00 call _quux
00000045: C3 ret near
As you can see, it does not optimize this case, too, but a larger one and
then the optimizer starts doing its work:
switch(i)
{
case 1: foo(); break;
case 2: bar(); /* fall through */
case 4: baz(); break;
case 5: foo1(); break;
case 6: bar1(); break;
case 7: baz1(); break;
case 8: quux1(); break;
default: quux(); break;
}
In 68K code:
00000000: 4E56 0000 link a6,#0
00000004: 202E 0008 move.l 8(a6),d0
00000008: 0C80 0000 0008 cmpi.l #8,d0
0000000E: 6252 bhi.s *+84 ; 0x00000062
00000010: D040 add.w d0,d0
00000012: 303B 0006 move.w (6,pc,d0.w),d0
00000016: 4EFB 0002 jmp (2,pc,d0.w)
0000001A: 0048 0012 ori.w #0x12,a0
0000001E: 001A 0048 ori.b #0x48,(a2)+
00000022: 0020 0028 ori.b #0x28,-(a0)
00000026: 0030 0038 0040 ori.b #0x38,(64,a0,d0.w)
0000002C: 4EB9 0000 0000 jsr foo
00000032: 6034 bra.s *+54 ; 0x00000068
00000034: 4EB9 0000 0000 jsr bar
0000003A: 4EB9 0000 0000 jsr baz
00000040: 6026 bra.s *+40 ; 0x00000068
00000042: 4EB9 0000 0000 jsr foo1
00000048: 601E bra.s *+32 ; 0x00000068
0000004A: 4EB9 0000 0000 jsr bar1
00000050: 6016 bra.s *+24 ; 0x00000068
00000052: 4EB9 0000 0000 jsr baz1
00000058: 600E bra.s *+16 ; 0x00000068
0000005A: 4EB9 0000 0000 jsr quux1
00000060: 6006 bra.s *+8 ; 0x00000068
00000062: 4EB9 0000 0000 jsr quux
00000068: 4E5E unlk a6
0000006A: 4E75 rts
0000006C: 8474 6573 7400 dc.b 0x84,'test',0x00
00000072: 0000
In PowerPC code the optimizer takes over and you get:
00000000: 7C0802A6 mflr r0
00000004: 90010008 stw r0,8(SP)
00000008: 9421FFC0 stwu SP,-64(SP)
0000000C: 28030008 cmplwi r3,$0008
00000010: 41810068 bgt *+104 ; $00000078
00000014: 80820000 lwz r4,@12(RTOC)
00000018: 5460103A rlwinm r0,r3,2,0,29
0000001C: 7C84002E lwzx r4,r4,r0
00000020: 7C8903A6 mtctr r4
00000024: 4E800420 bctr
00000028: 48000001 bl .foo
0000002C: 60000000 nop
00000030: 48000050 b *+80 ; $00000080
00000034: 48000001 bl .bar
00000038: 60000000 nop
0000003C: 48000001 bl .baz
00000040: 60000000 nop
00000044: 4800003C b *+60 ; $00000080
00000048: 48000001 bl .foo1
0000004C: 60000000 nop
00000050: 48000030 b *+48 ; $00000080
00000054: 48000001 bl .bar1
00000058: 60000000 nop
0000005C: 48000024 b *+36 ; $00000080
00000060: 48000001 bl .baz1
00000064: 60000000 nop
00000068: 48000018 b *+24 ; $00000080
0000006C: 48000001 bl .quux1
00000070: 60000000 nop
00000074: 4800000C b *+12 ; $00000080
00000078: 48000001 bl .quux
0000007C: 60000000 nop
00000080: 80010048 lwz r0,72(SP)
00000084: 38210040 addi SP,SP,64
00000088: 7C0803A6 mtlr r0
0000008C: 4E800020 blr
And in x86 code:
00000000: 8B 4C 24 04 mov ecx,dword ptr [esp+4]
00000004: 89 C8 mov eax,ecx
00000006: 2D 01 00 00 00 sub eax,1
0000000B: 3D 07 00 00 00 cmp eax,7
00000010: 77 5E ja $+96 ; --> 0x0070
00000012: FF 24 85 1C 00 00 jmp dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00 lea eax,dword ptr [eax+0]
0000001C: 3C 00 cmp al,0
0000001E: 00 00 add byte ptr [eax],al
00000020: 43 inc ebx
00000021: 00 00 add byte ptr [eax],al
00000023: 00 70 00 add byte ptr [eax+0],dh
00000026: 00 00 add byte ptr [eax],al
00000028: 48 dec eax
00000029: 00 00 add byte ptr [eax],al
0000002B: 00 50 00 add byte ptr [eax+0],dl
0000002E: 00 00 add byte ptr [eax],al
00000030: 57 push edi
00000031: 00 00 add byte ptr [eax],al
00000033: 00 60 00 add byte ptr [eax+0],ah
00000036: 00 00 add byte ptr [eax],al
00000038: 67 00 00 add byte ptr [bx][si],al
0000003B: 00 E8 add al,ch
0000003D: 00 00 add byte ptr [eax],al
0000003F: 00 00 add byte ptr [eax],al
00000041: EB 32 jmp $+52 ; --> 0x0075
00000043: E8 00 00 00 00 call _bar
00000048: E8 00 00 00 00 call _baz
0000004D: EB 26 jmp $+40 ; --> 0x0075
0000004F: 90 nop
00000050: E8 00 00 00 00 call _foo1
00000055: EB 1E jmp $+32 ; --> 0x0075
00000057: E8 00 00 00 00 call _bar1
0000005C: EB 17 jmp $+25 ; --> 0x0075
0000005E: 89 C0 mov eax,eax
00000060: E8 00 00 00 00 call _baz1
00000065: EB 0E jmp $+16 ; --> 0x0075
00000067: E8 00 00 00 00 call _quux1
0000006C: EB 07 jmp $+9 ; --> 0x0075
0000006E: 89 C0 mov eax,eax
00000070: E8 00 00 00 00 call _quux
00000075: C3 ret near
I have to admit that I don't understand x86 assembly language very well. So
I am still wondering what all these add operations do. Any ideas?
>All of this is, of course, way off topic. :)
Well, might be true...
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |