POV-Ray : Newsgroups : povray.off-topic : Interesting performance paper Server Time
4 Sep 2024 09:16:22 EDT (-0400)
  Interesting performance paper (Message 10 to 19 of 39)  
<<< Previous 9 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:58:31
Message: <4c1651a7$1@news.povray.org>
clipka wrote:
> (2) Capacity of memory basically has nothing to do with access speed. 

Of course it does. You have speed-of-light delay and address line cascading 
to decode. The slowness isn't due primarily to accessing the memory cells. 
It's due to the cascade of gates you have to go thru to decode the address 
to figure out which cell. That's why video memory can be so much faster than 
CPU memory - there's an access pattern (the video beam scan) that can be 
used to speed up address decoding.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:59:18
Message: <4c1651d6@news.povray.org>
Invisible wrote:
> contains a couple of CPU cores plus (say) 2GB of RAM. 

What makes you think nobody has done this?

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 12:27:57
Message: <4c16588d$1@news.povray.org>
Invisible wrote:
> More like in the early days, if it didn't fit in RAM, you just couldn't 
> do it at all. 

Not even close to true. People took it into account. You don't think 
WordStar could handle a document >64K?  Indeed, the whole point of an 
external sort was to handle what didn't fit in RAM.

Nowadays, people throw processing at it and let the OS handle paging.

And of course, you still get people dealing with the heirarchy explictly, 
like google, who even deals with the heirarchy of "on a disk on the same 
machine, on a disk on a different machine in the same rack, on a disk on a 
machine in a different rack in the same building, on a disk on a machine in 
a different building in the same state, etc."

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

From: clipka
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 12:37:08
Message: <4c165ab4$1@news.povray.org>
Am 14.06.2010 17:12, schrieb Invisible:
> clipka wrote:
>
>> (1) Home computers were toys.
>
> I saw at least one employee using a company-purchased C64 to run
> accounting software to do actual, productive work.
>
> They might be considered toys *now*...
>
>> (2) Capacity of memory basically has nothing to do with access speed.
>
> Really? So having more bits to decode requiring more address decode
> logic doesn't affect speed in any way?

I don't think so. Essentially, all you need to decode an address is a 
huge AND gate with some inputs inverted (and no, multi-input gates are 
/not/ normally designed as a tree of 2-input gates - they're simply 
extend on the 2-input gate design by adding just a few inputs and 
transistors more). Of course you need more of those gates, and therefore 
stronger drivers for the address lines to deal with the parasitic 
capacitance of the gates' inputs.

 > And neither does a larger die
> area necessasitating longer traces?

That, yes. But that's negligible compared with the distance between CPU 
and memory slot.

AFAIK, the most time-consuming thing about DRAM is the time it takes to 
charge, discharge or read out the capacitors comprising the memory cells.

> I gather that part of the problem is that having traces on the
> motherboard operating into the GHz frequency range isn't very feasible.
> (For reasons such as trace length and electronic interference.) But that
> would still mean that theoretically you could make a single chip that
> contains a couple of CPU cores plus (say) 2GB of RAM. The fact that
> nobody has ever done this indicates that it isn't possible for whatever
> reason.

I guess it's impractical, because...

(1) RAM requirements are pretty different, so you'd need a lot more CPU 
models (which would skyrocket the development, maintenance and 
production set-up costs per CPU) - and you couldn't just buy a few extra GB.

(2) You wouldn't win much speed because DRAM technology is inherently 
slow, and SRAM technology (as used in caches) is prohibitively large

(3) I might be wrong, but I guess producing DRAM memory in a 40nm 
process would be ineffective: You do want certain minimum dimensions for 
the capacitors and insulation anyway, in order to reduce self-discharge 
effects.

(4) Integrating 2GB into the CPU would require a lot of additional die 
space even with DRAM, increasing the per-die production cost - not only 
because you get less CPUs out of a wafer, but also because increasing 
the die space gives you an increased per-die probability of a failure, 
reducing production yield.


Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 12:58:50
Message: <4c165fca$1@news.povray.org>
clipka wrote:
> I don't think so. Essentially, all you need to decode an address is a 
> huge AND gate with some inputs inverted 

In theory true. In practice, there's a certain amount of fan-in you can 
handle in any one gate.  The limit is about 8 inputs, afaik. My knowledge on 
this particular bit is probably several fab generations out of date.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

From: clipka
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 13:38:51
Message: <4c16692b$1@news.povray.org>
Am 14.06.2010 18:58, schrieb Darren New:
> clipka wrote:
>> I don't think so. Essentially, all you need to decode an address is a
>> huge AND gate with some inputs inverted
>
> In theory true. In practice, there's a certain amount of fan-in you can
> handle in any one gate. The limit is about 8 inputs, afaik. My knowledge
> on this particular bit is probably several fab generations out of date.

Even then, you'd have some log8(N) dependency of gate levels needed, 
where N isn't even the amount of memory, but the number of address lines.

1 level:   8 address lines - up to 256 words (*)
2 levels: 64 address lines - up to 16 Exawords

(* where the word size solely depends on your data bus width)

If main memory sizes should increase with the same speed as today, that 
should give us another 20 to 30 years until a computer for private or 
office use will need 3 levels of address decoding gates in its RAM modules.


Post a reply to this message

From: clipka
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 13:50:39
Message: <4c166bef$1@news.povray.org>
Am 14.06.2010 17:58, schrieb Darren New:
> clipka wrote:
>> (2) Capacity of memory basically has nothing to do with access speed.
>
> Of course it does. You have speed-of-light delay and address line
> cascading to decode. The slowness isn't due primarily to accessing the
> memory cells. It's due to the cascade of gates you have to go thru to
> decode the address to figure out which cell. That's why video memory can
> be so much faster than CPU memory - there's an access pattern (the video
> beam scan) that can be used to speed up address decoding.

If gates are so slow, how come SRAM is so much faster? After all, even 
the memory cells themselves consist of gates there.

I guess the advantage of the access pattern in video memory is a 
different one: It allows easy /prediction/ of the next address to fetch, 
so readout of DRAM cells can be initiated well before the data is 
actually needed, so it's available just in time - farewell latency.


Post a reply to this message

From: clipka
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 14:20:37
Message: <4c1672f5$1@news.povray.org>
Am 14.06.2010 17:45, schrieb Invisible:

>> Sure, but just as an example, what's the fastest way to sort a 10 KB
>> array when you have 16 KB or RAM? The mathematically efficient way
>> won't necessarily be the quickest if you need to read/write to
>> tape/disc rather than just using RAM.
>
> Then you use a specialised external-sort. More to the point, you *know*
> that you're going to need an external-sort. Your program doesn't just
> magically run 500x slower than you'd expected.

Which is just the point to learn from this: You should (ideally) know 
that the very same is true for today's computers, so that you know where 
such "magical" slowdowns come from, and what can be done against it.

> More like in the early days, if it didn't fit in RAM, you just couldn't
> do it at all.

Nonsense. Datacenters have been using slow memory, such as magnetic 
tapes, for ages. "If it all fits in RAM, then congratulations!" would 
have been closer to the mark back then I guess.

 > Today if it doesn't fit in RAM, the OS makes the computer
> pretend it has more RAM than it does (with the *minor issue* of a vast
> slowdown and halving the life of your harddrive).

I'd rather put it this way: Today, if it doesn't fit in RAM, the OS 
helps you "juggling" the memory chunks to work on. (And if it doesn't 
fit in L1 or L2 cache, the CPU provides essentially the same service.) 
Which makes life much easier in a world where you don't know how much 
RAM you'll have when programming the software. (Back then, you /did/ 
know the specs of the one machine you wrote your programs for.) Plus, it 
allows much easier and faster (and therefore more economical) design of 
software where runtime performance doesn't matter that much.

(As for harddrive life, I guess that depends on your harddrive.)


Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 14:26:08
Message: <4c167440$1@news.povray.org>
clipka wrote:
> Even then, you'd have some log8(N) dependency of gate levels needed, 
> where N isn't even the amount of memory, but the number of address lines.

I'm not really following...

> 1 level:   8 address lines - up to 256 words (*)
> 2 levels: 64 address lines - up to 16 Exawords

Except ... you need that fan-in for each word you address. Now you have a 
problem of fan-out, as well, if I'm visualizing what you're saying correctly.

In theory, any logic function (including an entire CPU) can be built from 
two levels of gates. In practice, the fan-in and fan-out kills you, because 
you can't have 256 fan-outs going into each of 256 additional gates.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 14:27:25
Message: <4c16748d$1@news.povray.org>
clipka wrote:
> If gates are so slow, how come SRAM is so much faster? After all, even 
> the memory cells themselves consist of gates there.

Well, DRAM adds its own slowness, yes. But even with SRAM, having 16 gig of 
SRAM is going to be slower than having 1 Meg of SRAM.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.


Post a reply to this message

<<< Previous 9 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.