|  |  | Warp wrote:
> For example accessing the nth element of an array can usually be done
> with a simple CPU opcode. However, if the system restricts this because
> it cannot prove what that n might contain, it means that the compiler
> cannot generate the single opcode for accessing that array, but must
> perform something much more complicated to keep the system happy.
Here's an example of where the 4% comes from. The compiler might 
generate a single op-code, and that single op-code might take hundreds 
of cycles to run, because it hits a page whose virtual address map isn't 
in the cache.  Or, even worse, it hits a page that isn't in memory at 
all. But sure, I suppose if your program consists primarily of random 
access to an array of stuff that you could do in one cycle, and your 
cache coherency sucks, you might take a slight extra hit for bounds 
checking. I guess things like photoshop plug-ins for distorting an image 
might take something of a hit. Something like a SQL server would 
probably run faster than on hardware-protected processes. The 4% was 
from their compiler/verifier/code generator, IIRC.
There's another cool thing they do. Each thread starts with only a 4K 
stack (i.e., one page). The installer (that compiles from MSIL to native 
code, called "bartok" for some reason) will build a call map, figure out 
which function calls *might* pass a page boundary, and insert in-line 
code to allocate another page of memory. Then it copies the appropriate 
number of arguments to the new stack frame, after including a return 
address which will deallocate that new page of memory. So instead of 
allocating a meg of memory for stack space for each thread, or instead 
of trapping out when you run off the end and trying to rearrange things, 
instead you have a bunch of randomly-allocated pages holding your stack, 
linked together with compiler-generated code to allocate and deallocate 
pages as needed. The compiler also makes sure there's enough space at 
the top of any given page to hold the stack of any interrupt routine 
that might run, so you don't even have to deal with switching pages 
around for that.  And when the code *does* call into the kernel, it just 
allocates a new stack page for that and makes the call, and marks that 
stack page as belonging to the kernel, so the GC doesn't start reaping 
things it shouldn't and so the process can get cleaned up if it exits 
during a call-back from the kernel. But if the compiler can look at the 
call graph and figure out that either you *won't* overflow the stack 
frame, or you *will* overflow the stack frame, there's no need to even 
put in the check - you can just put in the code (or not) do do the right 
thing.
And a lot of this gets inlined in the code, because they know what 
kernel you're "linked" against, and they know you can't execute the 
arbitrary code, so you're often not even "trapping" into the kernel to 
allocate memory or send messages between processes or schedule threads 
or whatever.
-- 
Darren New / San Diego, CA, USA (PST)
  Ever notice how people in a zombie movie never already know how to
  kill zombies? Ask 100 random people in America how to kill someone
  who has reanimated from the dead in a secret viral weapons lab,
  and how many do you think already know you need a head-shot?
 Post a reply to this message
 |  |