POV-Ray: Newsgroups: povray.off-topic: Complicated: Re: Complicated

POV-Ray : Newsgroups : povray.off-topic : Complicated : Re: Complicated		Server Time 12 Jul 2025 08:57:21 EDT (-0400)
From: Invisible
Date: 3 Jun 2011 10:45:54
Message: <4de8f3a2$1@news.povray.org>
On 03/06/2011 02:48 PM, Invisible wrote:

> Woooo boy, that's one big can of complexity, right there! ;-)

I've said it before, and I'll say it again: The IA32 platform is one 
huge stack of backwards compatibility kludges.

The story begins (arguably) with the Intel 8008, released in 1972 or so. 
(!!) It consisted of 3,500 transistors, and was manufactured on a 10 μm 
process PMOS. It ran at 500 kHz. (That's 0.5 MHz or 0.00005 GHz.) It had 
a grand total of 18 pins (despite the 14-bit address space). It featured 
two main registers, A and B, both 8 bits wide.

Then came the 8080 (around 1974), with a 2 MHz maximum clock speed. This 
had 7 registers, named A, B, C, D, H, and L, all 8 bits wide. Certain 
pairs (BC, DE, HL) could be used together for certain instructions to 
perform 16 bit operations.

(The world-famous Z80 processor is an enhanced version of the 8080.)

Now, finally, in 1978 (!) we arrive at the 8086. The A, B, C and D 
registers are renamed AL, BL, CL and DL, and new registers called AH, 
BH, CH and DH were added. These are all 8-bit, and can be combined 
together creating AX, BX, CX and DX registers 16 bits wide.

Also new was the infamous memory segmentation model. Under this bizarre 
scheme, there are four "segment pointer" registers which select which 
"segment" you access data from. But because the segment offsets are 
16-bit, the segments actually overlap, so there are multiple ways to 
refer to the same physical address.

Basically, this is a huge kludge. Rather than implementing real 32-bit 
addressing, they kludged in 20-bit addressing. While not /completely/ 
without merit (e.g., this whole segment melarchy makes relocatable code 
quite a bit easier), it's really a bad solution.

Not content with that, Intel developed the 8087, the FPU to go with the 
8086 CPU. Unlike any sane design, this FPU has 8 registers, but you 
cannot access them directly. Instead, they function as a "stack". Math 
operations "pop" their operands from the top and "push" the result back 
on. If you want to access something lower down, you have to FXCH 
instructions to swap the top register's contents with one of the 
registers lower down.

In later generations of chip, the registers are mapped in hardware with 
pointers, and two parallel instruction pipelines allow you to optimise 
FXCH down to zero clock cycles (effectively). But still, WTF?

In 1982 (this is the first year in the list so far when I was actually 
*live*!) the 80286 (or "286") appeared. This was the first CPU with 
memory protection.

In 1985, the 80386 ("386") came along. This was the first 32-bit 
processor. (Which is why IA32 is sometimes referred to as "i386", and 
why Linux generally refuses to work with anything older.) This was the 
first processor where the relationship between segment numbers and 
physical memory addresses is programmable rather than hard-wired. In 
other words, this is where memory pages got invented.

The 386 inherits all of the registers from the 286 (i.e., AL, AH, AX, 
BL, BH, BX, etc.) But AX is a 16-bit register. So the 386 adds a new 
register, EAX, which is 32 bits. AX is the bottom 16 bits of EAX. 
Similarly for B, C and D.

(By contrast, a *real* 32-bit chip like the Motorola 68000 has registers 
A0 through A7 and D0 through D7, and when you do an operation, you 
specify how many bits to use, e.g., mov.l d3 d7. None of this stupidity 
with multiple names for the same register.)

When AMD64 eventually came along, these became 64-bit registers RAX, 
RBX, etc., of which EAX, EBX, etc. are the lower 32-bits. (AMD64 also 
adds completely new 64-bit registers R8 through R15, just for good 
measure. You would expect this to result in an utterly massive speed 
increase, but apparently people have measured it at less than 2%.)

If I say 80486, you probably think of some ancient old thing. But it was 
the first chip in the family to include an L1 cache, and the first one 
to support superscalar execution (i.e., executing more than one 
instruction per clock cycle). In the form of the 486DX, it was also the 
first one with an on-chip FPU.

In 1996, the Pentium MMX arrived. MMX stands for "multimedia 
extensions". (Remember, in the 1990s, "multimedia" was the wave of the 
future that was going to take over the world...) What this actually 
*does* is it adds SIMD (single-instruction, multiple-data) instructions. 
Basically, there's 8 new registers, MMX0 through MMX7, each 64 bits 
wide. Using the new MMX instructions, you can basically treat a given 
MMX register as an array of values (e.g., 4 items of 16 bits each) and 
do element-wise operations over them.

Nothing wrong with that.

Oh yeah, and the MMX registers are the FPU registers.

KLUDGE! >_<

Yes, rather than add 8 *new* registers, MMX just adds new names for the 
existing FPU registers. (But now you have proper random access to them.) 
The reason for this is simple: it means that the OS doesn't have to 
support MMX for context switches to work properly. (I.e., a context 
switch under an OS that doesn't know about MMX won't clobber the MMX 
registers, because the MMX registers *are* the FPU registers, which any 
FPU-aware OS will already be preserving.)

This horrifying kludge still haunts us to this day. Yes, it meant that 
developers could start using MMX because Microsoft released an updated 
version of Windows. On the other hand, it means you can't use MMX (which 
is integer-only) and normal FPU operations at the same time, because one 
will clobber the other. FAIL!

In 1998, AMD released the 3DNow! technology that almost nobody now 
remembers. This basically adds new MMX operations, using the same "MMX 
registers that are really the FPU registers" kludge, for the same reason.

Apparently 3DNow! was never that popular, and is being phased out now. 
Instead, Intel came up with SSE, which adds new registers named XMM. 
(Get it?) Yes, that's right, *finally* they actually added new registers 
rather than kludging old ones. These new XMM registers are 128 bits 
wide, and can be operated as 4 x single-precision floats.

Then SSE2 came along, and added versions of all the MMX instructions 
that work on the XMM registers instead. So now you never need to use the 
old MMX instructions ever again! And now the XMM registers can be 
treated not just as 2 x single-precision floats, but also as (say) 2 x 
double-precision floats, 8 x 16-bit integers, and so on.

Of course, since the OS has to know to include the new XMM registers in 
context switches, SSE is disabled by default. The OS has to explicitly 
enable it before it will work. A bit like the way the processor starts 
up in "real mode" (i.e., 8086 emulation mode), and the OS has to 
manually switch it into "protected mode" (i.e., the normal operating 
mode that all modern software actually freaking uses) during the boot 
sequence.

Then we have AMD64, which runs in 32-bit mode by default until the OS 
switches it to 64-bit mode. (The PC I am using right now is *still* 
running in 32-bit mode, despite possessing a 64-bit processor.) In that 
mode, an extra 8 XMM registers appear (XMM8 to XMM15).

Did you follow all that?

Don't even get me started on all the different memory paging and 
segmentation schemes...

In short, they kept kludging more and more stuff in. Having a 
stack-based FPU register file is a stupid, stupid idea. But now all our 
software depends on this arrangement, so we're stuck with it forever. 
Aliasing the MMX registers to the FPU registers was stupid, but 
fortunately we don't have to live with that one. Memory segmentation was 
stupid, but now we're basically stuck with it. The list goes on...
Post a reply to this message