|
|
|
|
|
|
| |
| |
|
|
From: Thorsten Froehlich
Subject: Off Topic: Re: Making faster #switch statements
Date: 5 Mar 1999 12:54:11
Message: <36e01a43.0@news.povray.org>
|
|
|
| |
| |
|
|
In article <36e00044.0@news.povray.org> , par### [at] my-dejanewscom (Ron
Parker) wrote:
>>That depends a lot on the density of the case conditions you have. Assume
>>you write a switch statement for the tokenizer in POV-Ray: You have most of
>>the ASCII characters and a high density of the numbers you compare. So the
>>compiler generates a jump table. It will most likely be somewhere around 128
>>entries. You need two comparisons plus branches (range checking) and one
>>jump.
>
> This depends on your compiler. I've seen modern compilers do one of two
> things when given a switch statement like
I assumed that this point would be obvious because it is a well known (I
thought) fact that different compilers generate different code. :-)
But maybe I should still have added it...
>Both look no less efficient than what you'd expect from an if-else
>chain. The jump table might take up space, but it's pretty quick
>to execute. Note the fancy footwork in the first case: it does the
>range checking for you in only three instructions, and only one
>subtraction. You can always do this when the table is a power-of-two
>size. If it makes sense to use a jump table in the first place, you
>can always pad it to the next power of two.
With CodeWarrior 4.1 I get these results (it does not offer assembler
output, so it is a bit harder to read the disassembled code - all compiled
on a Mac, of course :-)
The code on a 68K:
00000000: 4E56 0000 link a6,#0
00000004: 202E 0008 move.l 8(a6),d0
00000008: 5380 subq.l #1,d0
0000000A: 670A beq.s *+12 ; 0x00000016
0000000C: 5380 subq.l #1,d0
0000000E: 670E beq.s *+16 ; 0x0000001e
00000010: 5580 subq.l #2,d0
00000012: 6710 beq.s *+18 ; 0x00000024
00000014: 6016 bra.s *+24 ; 0x0000002c
00000016: 4EB9 0000 0000 jsr foo
0000001C: 6014 bra.s *+22 ; 0x00000032
0000001E: 4EB9 0000 0000 jsr bar
00000024: 4EB9 0000 0000 jsr baz
0000002A: 6006 bra.s *+8 ; 0x00000032
0000002C: 4EB9 0000 0000 jsr quux
00000032: 4E5E unlk a6
00000034: 4E75 rts
00000036: 8474 6573 7400 dc.b 0x84,'test',0x00
0000003C: 0000
And in PowerPC code:
00000000: 7C0802A6 mflr r0
00000004: 90010008 stw r0,8(SP)
00000008: 9421FFC0 stwu SP,-64(SP)
0000000C: 2C030003 cmpwi r3,3
00000010: 41820044 beq *+68 ; $00000054
00000014: 40800014 bge *+20 ; $00000028
00000018: 2C030001 cmpwi r3,1
0000001C: 41820018 beq *+24 ; $00000034
00000020: 40800020 bge *+32 ; $00000040
00000024: 48000030 b *+48 ; $00000054
00000028: 2C030005 cmpwi r3,5
0000002C: 40800028 bge *+40 ; $00000054
00000030: 48000018 b *+24 ; $00000048
00000034: 48000001 bl .foo
00000038: 60000000 nop
0000003C: 48000020 b *+32 ; $0000005C
00000040: 48000001 bl .bar
00000044: 60000000 nop
00000048: 48000001 bl .baz
0000004C: 60000000 nop
00000050: 4800000C b *+12 ; $0000005C
00000054: 48000001 bl .quux
00000058: 60000000 nop
0000005C: 80010048 lwz r0,72(SP)
00000060: 38210040 addi SP,SP,64
00000064: 7C0803A6 mtlr r0
00000068: 4E800020 blr
And in x86 code:
00000000: 8B 4C 24 04 mov ecx,dword ptr [esp+4]
00000004: 89 C8 mov eax,ecx
00000006: 2D 01 00 00 00 sub eax,1
0000000B: 3D 03 00 00 00 cmp eax,3
00000010: 77 2E ja $+48 ; --> 0x0040
00000012: FF 24 85 1C 00 00 jmp dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00 lea eax,dword ptr [eax+0]
0000001C: 2C 00 sub al,0
0000001E: 00 00 add byte ptr [eax],al
00000020: 33 00 xor eax,dword ptr [eax]
00000022: 00 00 add byte ptr [eax],al
00000024: 40 inc eax
00000025: 00 00 add byte ptr [eax],al
00000027: 00 38 add byte ptr [eax],bh
00000029: 00 00 add byte ptr [eax],al
0000002B: 00 E8 add al,ch
0000002D: 00 00 add byte ptr [eax],al
0000002F: 00 00 add byte ptr [eax],al
00000031: EB 12 jmp $+20 ; --> 0x0045
00000033: E8 00 00 00 00 call _bar
00000038: E8 00 00 00 00 call _baz
0000003D: EB 06 jmp $+8 ; --> 0x0045
0000003F: 90 nop
00000040: E8 00 00 00 00 call _quux
00000045: C3 ret near
As you can see, it does not optimize this case, too, but a larger one and
then the optimizer starts doing its work:
switch(i)
{
case 1: foo(); break;
case 2: bar(); /* fall through */
case 4: baz(); break;
case 5: foo1(); break;
case 6: bar1(); break;
case 7: baz1(); break;
case 8: quux1(); break;
default: quux(); break;
}
In 68K code:
00000000: 4E56 0000 link a6,#0
00000004: 202E 0008 move.l 8(a6),d0
00000008: 0C80 0000 0008 cmpi.l #8,d0
0000000E: 6252 bhi.s *+84 ; 0x00000062
00000010: D040 add.w d0,d0
00000012: 303B 0006 move.w (6,pc,d0.w),d0
00000016: 4EFB 0002 jmp (2,pc,d0.w)
0000001A: 0048 0012 ori.w #0x12,a0
0000001E: 001A 0048 ori.b #0x48,(a2)+
00000022: 0020 0028 ori.b #0x28,-(a0)
00000026: 0030 0038 0040 ori.b #0x38,(64,a0,d0.w)
0000002C: 4EB9 0000 0000 jsr foo
00000032: 6034 bra.s *+54 ; 0x00000068
00000034: 4EB9 0000 0000 jsr bar
0000003A: 4EB9 0000 0000 jsr baz
00000040: 6026 bra.s *+40 ; 0x00000068
00000042: 4EB9 0000 0000 jsr foo1
00000048: 601E bra.s *+32 ; 0x00000068
0000004A: 4EB9 0000 0000 jsr bar1
00000050: 6016 bra.s *+24 ; 0x00000068
00000052: 4EB9 0000 0000 jsr baz1
00000058: 600E bra.s *+16 ; 0x00000068
0000005A: 4EB9 0000 0000 jsr quux1
00000060: 6006 bra.s *+8 ; 0x00000068
00000062: 4EB9 0000 0000 jsr quux
00000068: 4E5E unlk a6
0000006A: 4E75 rts
0000006C: 8474 6573 7400 dc.b 0x84,'test',0x00
00000072: 0000
In PowerPC code the optimizer takes over and you get:
00000000: 7C0802A6 mflr r0
00000004: 90010008 stw r0,8(SP)
00000008: 9421FFC0 stwu SP,-64(SP)
0000000C: 28030008 cmplwi r3,$0008
00000010: 41810068 bgt *+104 ; $00000078
00000014: 80820000 lwz r4,@12(RTOC)
00000018: 5460103A rlwinm r0,r3,2,0,29
0000001C: 7C84002E lwzx r4,r4,r0
00000020: 7C8903A6 mtctr r4
00000024: 4E800420 bctr
00000028: 48000001 bl .foo
0000002C: 60000000 nop
00000030: 48000050 b *+80 ; $00000080
00000034: 48000001 bl .bar
00000038: 60000000 nop
0000003C: 48000001 bl .baz
00000040: 60000000 nop
00000044: 4800003C b *+60 ; $00000080
00000048: 48000001 bl .foo1
0000004C: 60000000 nop
00000050: 48000030 b *+48 ; $00000080
00000054: 48000001 bl .bar1
00000058: 60000000 nop
0000005C: 48000024 b *+36 ; $00000080
00000060: 48000001 bl .baz1
00000064: 60000000 nop
00000068: 48000018 b *+24 ; $00000080
0000006C: 48000001 bl .quux1
00000070: 60000000 nop
00000074: 4800000C b *+12 ; $00000080
00000078: 48000001 bl .quux
0000007C: 60000000 nop
00000080: 80010048 lwz r0,72(SP)
00000084: 38210040 addi SP,SP,64
00000088: 7C0803A6 mtlr r0
0000008C: 4E800020 blr
And in x86 code:
00000000: 8B 4C 24 04 mov ecx,dword ptr [esp+4]
00000004: 89 C8 mov eax,ecx
00000006: 2D 01 00 00 00 sub eax,1
0000000B: 3D 07 00 00 00 cmp eax,7
00000010: 77 5E ja $+96 ; --> 0x0070
00000012: FF 24 85 1C 00 00 jmp dword ptr [eax*4+.text++28] near
00000018: 00
00000019: 8D 40 00 lea eax,dword ptr [eax+0]
0000001C: 3C 00 cmp al,0
0000001E: 00 00 add byte ptr [eax],al
00000020: 43 inc ebx
00000021: 00 00 add byte ptr [eax],al
00000023: 00 70 00 add byte ptr [eax+0],dh
00000026: 00 00 add byte ptr [eax],al
00000028: 48 dec eax
00000029: 00 00 add byte ptr [eax],al
0000002B: 00 50 00 add byte ptr [eax+0],dl
0000002E: 00 00 add byte ptr [eax],al
00000030: 57 push edi
00000031: 00 00 add byte ptr [eax],al
00000033: 00 60 00 add byte ptr [eax+0],ah
00000036: 00 00 add byte ptr [eax],al
00000038: 67 00 00 add byte ptr [bx][si],al
0000003B: 00 E8 add al,ch
0000003D: 00 00 add byte ptr [eax],al
0000003F: 00 00 add byte ptr [eax],al
00000041: EB 32 jmp $+52 ; --> 0x0075
00000043: E8 00 00 00 00 call _bar
00000048: E8 00 00 00 00 call _baz
0000004D: EB 26 jmp $+40 ; --> 0x0075
0000004F: 90 nop
00000050: E8 00 00 00 00 call _foo1
00000055: EB 1E jmp $+32 ; --> 0x0075
00000057: E8 00 00 00 00 call _bar1
0000005C: EB 17 jmp $+25 ; --> 0x0075
0000005E: 89 C0 mov eax,eax
00000060: E8 00 00 00 00 call _baz1
00000065: EB 0E jmp $+16 ; --> 0x0075
00000067: E8 00 00 00 00 call _quux1
0000006C: EB 07 jmp $+9 ; --> 0x0075
0000006E: 89 C0 mov eax,eax
00000070: E8 00 00 00 00 call _quux
00000075: C3 ret near
I have to admit that I don't understand x86 assembly language very well. So
I am still wondering what all these add operations do. Any ideas?
>All of this is, of course, way off topic. :)
Well, might be true...
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thorsten Froehlich schrieb in Nachricht <36dff57f.0@news.povray.org>...
>In article <MPG.1149eb79f78933399896a1@news.povray.org> ,
eag### [at] telekabelnl
>(Phoenix) wrote:
>
>> In general, the time it takes to read in a jump table is far greater than
>> just using the jump system that gets generated by an if-clause.
>
>That depends a lot on the density of the case conditions you have.
Exactly. IMHO most compilers will do a faster switch() than numerous
if-cascades. See also Ron Parker's output of two compilers. They're both
faster than a simple cascade of if's would be. Now this could be different
for the interpreted POV-script (I still didn't have a look at the code yet)
and interpreted BASIC (I think VB and other "compiled" Basics can handle
switch statements just as well as other languages).
--
Rudy Velthuis
Post a reply to this message
|
|
| |
| |
|
|
From: Rudy Velthuis
Subject: Re: Off Topic: Re: Making faster #switch statements
Date: 5 Mar 1999 13:37:54
Message: <36e02482.0@news.povray.org>
|
|
|
| |
| |
|
|
Thorsten Froehlich schrieb in Nachricht <36e01a43.0@news.povray.org>...
>And in x86 code:
>
>00000000: 8B 4C 24 04 mov ecx,dword ptr [esp+4]
>00000004: 89 C8 mov eax,ecx
>00000006: 2D 01 00 00 00 sub eax,1
>0000000B: 3D 07 00 00 00 cmp eax,7
>00000010: 77 5E ja $+96 ; --> 0x0070
>00000012: FF 24 85 1C 00 00 jmp dword ptr [eax*4+.text++28] near
>00000018: 00
>00000019: 8D 40 00 lea eax,dword ptr [eax+0]
>0000001C: 3C 00 cmp al,0
>0000001E: 00 00 add byte ptr [eax],al
>00000020: 43 inc ebx
>00000021: 00 00 add byte ptr [eax],al
>00000023: 00 70 00 add byte ptr [eax+0],dh
>00000026: 00 00 add byte ptr [eax],al
>00000028: 48 dec eax
>00000029: 00 00 add byte ptr [eax],al
>0000002B: 00 50 00 add byte ptr [eax+0],dl
>0000002E: 00 00 add byte ptr [eax],al
>00000030: 57 push edi
>00000031: 00 00 add byte ptr [eax],al
>00000033: 00 60 00 add byte ptr [eax+0],ah
>00000036: 00 00 add byte ptr [eax],al
>00000038: 67 00 00 add byte ptr [bx][si],al
>0000003B: 00 E8 add al,ch
>0000003D: 00 00 add byte ptr [eax],al
>0000003F: 00 00 add byte ptr [eax],al
>00000041: EB 32 jmp $+52 ; --> 0x0075
>00000043: E8 00 00 00 00 call _bar
>00000048: E8 00 00 00 00 call _baz
>0000004D: EB 26 jmp $+40 ; --> 0x0075
>0000004F: 90 nop
>00000050: E8 00 00 00 00 call _foo1
>00000055: EB 1E jmp $+32 ; --> 0x0075
>00000057: E8 00 00 00 00 call _bar1
>0000005C: EB 17 jmp $+25 ; --> 0x0075
>0000005E: 89 C0 mov eax,eax
>00000060: E8 00 00 00 00 call _baz1
>00000065: EB 0E jmp $+16 ; --> 0x0075
>00000067: E8 00 00 00 00 call _quux1
>0000006C: EB 07 jmp $+9 ; --> 0x0075
>0000006E: 89 C0 mov eax,eax
>00000070: E8 00 00 00 00 call _quux
>00000075: C3 ret near
>
>
>I have to admit that I don't understand x86 assembly language very well. So
>I am still wondering what all these add operations do. Any ideas?
I was doing some arithmetic first, because I didn't get it either, but it's
quite simple. From 0000018 on, you don't see code, you see a kind of jump
table (dword entries). Of course the disassembler doesn't know this, so it
is shown as code.
--
Rudy Velthuis
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On Fri, 5 Mar 1999 19:28:28 +0100, Rudy Velthuis <rve### [at] gmxnet> wrote:
>
>Thorsten Froehlich schrieb in Nachricht <36dff57f.0@news.povray.org>...
>>In article <MPG.1149eb79f78933399896a1@news.povray.org> ,
>eag### [at] telekabelnl
>>(Phoenix) wrote:
>>
>>> In general, the time it takes to read in a jump table is far greater than
>>> just using the jump system that gets generated by an if-clause.
>>
>>That depends a lot on the density of the case conditions you have.
>
>Exactly. IMHO most compilers will do a faster switch() than numerous
>if-cascades. See also Ron Parker's output of two compilers. They're both
>faster than a simple cascade of if's would be.
Don't take my stuff as gospel, though. I reverse-engineer stuff for a
living, so what you see is from memory, probably not 100% accurate,
could have even been hand-coded by some masochist, and is almost
certainly filtered through my "understand-o-meter"
I don't know what about switch in POV would make it slow, other than the
requirement that it parse even the parts it doesn't use, to find the parts
it needs. In this case, an array would be the best bet. In other cases,
it's mostly empirical. The idea should be to minimize the amount of
text in each case, but that won't always be the best solution either.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Ron Parker schrieb in Nachricht <36e03393.0@news.povray.org>...
>>Exactly. IMHO most compilers will do a faster switch() than numerous
>>if-cascades. See also Ron Parker's output of two compilers. They're both
>>faster than a simple cascade of if's would be.
>
>Don't take my stuff as gospel, though. I reverse-engineer stuff for a
>living, so what you see is from memory, probably not 100% accurate,
>could have even been hand-coded by some masochist, and is almost
>certainly filtered through my "understand-o-meter"
I don't do it for a living, but I've seen enough code (Pascal and C
disassembly) like the samples you showed to believe this is the way most
switch statements (should) work. Even the example Thorsten showed seems to
do this (although it was hard to recognise at a first glance).
>I don't know what about switch in POV would make it slow, other than the
>requirement that it parse even the parts it doesn't use, to find the parts
>it needs. In this case, an array would be the best bet. In other cases,
>it's mostly empirical. The idea should be to minimize the amount of
>text in each case, but that won't always be the best solution either.
I don't know how the parser handles a #switch, but I'd expect it to look for
the end of the #switch block when it encounters #break and to look for the
next #case or #range, when a #case/#range check fails. Don't know how this
could be improved easily.
--
Rudy Velthuis
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|