|
 |
Warp wrote:
> For example accessing the nth element of an array can usually be done
> with a simple CPU opcode. However, if the system restricts this because
> it cannot prove what that n might contain, it means that the compiler
> cannot generate the single opcode for accessing that array, but must
> perform something much more complicated to keep the system happy.
Here's an example of where the 4% comes from. The compiler might
generate a single op-code, and that single op-code might take hundreds
of cycles to run, because it hits a page whose virtual address map isn't
in the cache. Or, even worse, it hits a page that isn't in memory at
all. But sure, I suppose if your program consists primarily of random
access to an array of stuff that you could do in one cycle, and your
cache coherency sucks, you might take a slight extra hit for bounds
checking. I guess things like photoshop plug-ins for distorting an image
might take something of a hit. Something like a SQL server would
probably run faster than on hardware-protected processes. The 4% was
from their compiler/verifier/code generator, IIRC.
There's another cool thing they do. Each thread starts with only a 4K
stack (i.e., one page). The installer (that compiles from MSIL to native
code, called "bartok" for some reason) will build a call map, figure out
which function calls *might* pass a page boundary, and insert in-line
code to allocate another page of memory. Then it copies the appropriate
number of arguments to the new stack frame, after including a return
address which will deallocate that new page of memory. So instead of
allocating a meg of memory for stack space for each thread, or instead
of trapping out when you run off the end and trying to rearrange things,
instead you have a bunch of randomly-allocated pages holding your stack,
linked together with compiler-generated code to allocate and deallocate
pages as needed. The compiler also makes sure there's enough space at
the top of any given page to hold the stack of any interrupt routine
that might run, so you don't even have to deal with switching pages
around for that. And when the code *does* call into the kernel, it just
allocates a new stack page for that and makes the call, and marks that
stack page as belonging to the kernel, so the GC doesn't start reaping
things it shouldn't and so the process can get cleaned up if it exits
during a call-back from the kernel. But if the compiler can look at the
call graph and figure out that either you *won't* overflow the stack
frame, or you *will* overflow the stack frame, there's no need to even
put in the check - you can just put in the code (or not) do do the right
thing.
And a lot of this gets inlined in the code, because they know what
kernel you're "linked" against, and they know you can't execute the
arbitrary code, so you're often not even "trapping" into the kernel to
allocate memory or send messages between processes or schedule threads
or whatever.
--
Darren New / San Diego, CA, USA (PST)
Ever notice how people in a zombie movie never already know how to
kill zombies? Ask 100 random people in America how to kill someone
who has reanimated from the dead in a secret viral weapons lab,
and how many do you think already know you need a head-shot?
Post a reply to this message
|
 |