|
![](/i/fill.gif) |
On 03/06/2011 02:48 PM, Invisible wrote:
> Woooo boy, that's one big can of complexity, right there! ;-)
I've said it before, and I'll say it again: The IA32 platform is one
huge stack of backwards compatibility kludges.
The story begins (arguably) with the Intel 8008, released in 1972 or so.
(!!) It consisted of 3,500 transistors, and was manufactured on a 10 μm
process PMOS. It ran at 500 kHz. (That's 0.5 MHz or 0.00005 GHz.) It had
a grand total of 18 pins (despite the 14-bit address space). It featured
two main registers, A and B, both 8 bits wide.
Then came the 8080 (around 1974), with a 2 MHz maximum clock speed. This
had 7 registers, named A, B, C, D, H, and L, all 8 bits wide. Certain
pairs (BC, DE, HL) could be used together for certain instructions to
perform 16 bit operations.
(The world-famous Z80 processor is an enhanced version of the 8080.)
Now, finally, in 1978 (!) we arrive at the 8086. The A, B, C and D
registers are renamed AL, BL, CL and DL, and new registers called AH,
BH, CH and DH were added. These are all 8-bit, and can be combined
together creating AX, BX, CX and DX registers 16 bits wide.
Also new was the infamous memory segmentation model. Under this bizarre
scheme, there are four "segment pointer" registers which select which
"segment" you access data from. But because the segment offsets are
16-bit, the segments actually overlap, so there are multiple ways to
refer to the same physical address.
Basically, this is a huge kludge. Rather than implementing real 32-bit
addressing, they kludged in 20-bit addressing. While not /completely/
without merit (e.g., this whole segment melarchy makes relocatable code
quite a bit easier), it's really a bad solution.
Not content with that, Intel developed the 8087, the FPU to go with the
8086 CPU. Unlike any sane design, this FPU has 8 registers, but you
cannot access them directly. Instead, they function as a "stack". Math
operations "pop" their operands from the top and "push" the result back
on. If you want to access something lower down, you have to FXCH
instructions to swap the top register's contents with one of the
registers lower down.
In later generations of chip, the registers are mapped in hardware with
pointers, and two parallel instruction pipelines allow you to optimise
FXCH down to zero clock cycles (effectively). But still, WTF?
In 1982 (this is the first year in the list so far when I was actually
*live*!) the 80286 (or "286") appeared. This was the first CPU with
memory protection.
In 1985, the 80386 ("386") came along. This was the first 32-bit
processor. (Which is why IA32 is sometimes referred to as "i386", and
why Linux generally refuses to work with anything older.) This was the
first processor where the relationship between segment numbers and
physical memory addresses is programmable rather than hard-wired. In
other words, this is where memory pages got invented.
The 386 inherits all of the registers from the 286 (i.e., AL, AH, AX,
BL, BH, BX, etc.) But AX is a 16-bit register. So the 386 adds a new
register, EAX, which is 32 bits. AX is the bottom 16 bits of EAX.
Similarly for B, C and D.
(By contrast, a *real* 32-bit chip like the Motorola 68000 has registers
A0 through A7 and D0 through D7, and when you do an operation, you
specify how many bits to use, e.g., mov.l d3 d7. None of this stupidity
with multiple names for the same register.)
When AMD64 eventually came along, these became 64-bit registers RAX,
RBX, etc., of which EAX, EBX, etc. are the lower 32-bits. (AMD64 also
adds completely new 64-bit registers R8 through R15, just for good
measure. You would expect this to result in an utterly massive speed
increase, but apparently people have measured it at less than 2%.)
If I say 80486, you probably think of some ancient old thing. But it was
the first chip in the family to include an L1 cache, and the first one
to support superscalar execution (i.e., executing more than one
instruction per clock cycle). In the form of the 486DX, it was also the
first one with an on-chip FPU.
In 1996, the Pentium MMX arrived. MMX stands for "multimedia
extensions". (Remember, in the 1990s, "multimedia" was the wave of the
future that was going to take over the world...) What this actually
*does* is it adds SIMD (single-instruction, multiple-data) instructions.
Basically, there's 8 new registers, MMX0 through MMX7, each 64 bits
wide. Using the new MMX instructions, you can basically treat a given
MMX register as an array of values (e.g., 4 items of 16 bits each) and
do element-wise operations over them.
Nothing wrong with that.
Oh yeah, and the MMX registers are the FPU registers.
KLUDGE! >_<
Yes, rather than add 8 *new* registers, MMX just adds new names for the
existing FPU registers. (But now you have proper random access to them.)
The reason for this is simple: it means that the OS doesn't have to
support MMX for context switches to work properly. (I.e., a context
switch under an OS that doesn't know about MMX won't clobber the MMX
registers, because the MMX registers *are* the FPU registers, which any
FPU-aware OS will already be preserving.)
This horrifying kludge still haunts us to this day. Yes, it meant that
developers could start using MMX because Microsoft released an updated
version of Windows. On the other hand, it means you can't use MMX (which
is integer-only) and normal FPU operations at the same time, because one
will clobber the other. FAIL!
In 1998, AMD released the 3DNow! technology that almost nobody now
remembers. This basically adds new MMX operations, using the same "MMX
registers that are really the FPU registers" kludge, for the same reason.
Apparently 3DNow! was never that popular, and is being phased out now.
Instead, Intel came up with SSE, which adds new registers named XMM.
(Get it?) Yes, that's right, *finally* they actually added new registers
rather than kludging old ones. These new XMM registers are 128 bits
wide, and can be operated as 4 x single-precision floats.
Then SSE2 came along, and added versions of all the MMX instructions
that work on the XMM registers instead. So now you never need to use the
old MMX instructions ever again! And now the XMM registers can be
treated not just as 2 x single-precision floats, but also as (say) 2 x
double-precision floats, 8 x 16-bit integers, and so on.
Of course, since the OS has to know to include the new XMM registers in
context switches, SSE is disabled by default. The OS has to explicitly
enable it before it will work. A bit like the way the processor starts
up in "real mode" (i.e., 8086 emulation mode), and the OS has to
manually switch it into "protected mode" (i.e., the normal operating
mode that all modern software actually freaking uses) during the boot
sequence.
Then we have AMD64, which runs in 32-bit mode by default until the OS
switches it to 64-bit mode. (The PC I am using right now is *still*
running in 32-bit mode, despite possessing a 64-bit processor.) In that
mode, an extra 8 XMM registers appear (XMM8 to XMM15).
Did you follow all that?
Don't even get me started on all the different memory paging and
segmentation schemes...
In short, they kept kludging more and more stuff in. Having a
stack-based FPU register file is a stupid, stupid idea. But now all our
software depends on this arrangement, so we're stuck with it forever.
Aliasing the MMX registers to the FPU registers was stupid, but
fortunately we don't have to live with that one. Memory segmentation was
stupid, but now we're basically stuck with it. The list goes on...
Post a reply to this message
|
![](/i/fill.gif) |