 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Fast enough, in most cases. I think memory and I/O speed will be the
> bottleneck at some point, after which the extra cores will be worth a
> paperweight.
>
> (It's surprising how much effect memory bus speed has eg. on video
> capturing.)
And this is the other Fun Thing. Given enough CPU cores, you will
eventually reach a point where the memory subsystem can't actually keep
up. The result is that the more cores you have, the more time they can
waste sitting idle.
(As somebody once said, "a supercomputer is a device for turning
compute-bound problems into I/O-bound problems". It seems apt here.)
Long ago when RAM was actually faster than the CPU, having several of
them seemed like a good idea. Currently the CPU goes way, way faster
than RAM anyway. Having "more CPU" just makes the problem worse...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 <voi### [at] dev null> wrote:
> And this is the other Fun Thing. Given enough CPU cores, you will
> eventually reach a point where the memory subsystem can't actually keep
> up. The result is that the more cores you have, the more time they can
> waste sitting idle.
In theory you could still get an advantage if each core has its own
L1 cache and they perform heavy calculations on bunches of data which
fit those caches (and likewise the routine itself must obviously also
fit in the L1 cache).
In other words, if the algorithm can be constructed to be of the type
"crunch 4 kB of data for several seconds, write the results to RAM and
read new 4 kB of data, repeat", then additional cores will give an
advantage.
Of course many algorithms deal with a lot more data at a time than will
nicely fit in L1 cache, especially if the cache is shared among the cores,
so efficiency problems will start happening.
(Ironically, with shared L1 cache systems you might end up in situations
where a single-threaded version of the algorithm actually runs faster than
a multithreaded version, or where the multithreaded one doesn't run any
faster than the single-threaded one.)
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp schrieb:
> The main reason is that the GPU sets the limit, not the CPU. If the CPU
> gets fast enough, it will just sit idle while the GPU renders a frame.
> Adding more cores is not going to help that.
That's only part of the story.
The other part is "AI", physics sim and stuff like that. If you've maxed
out the GPU and have no way to add visual incentives to buy, you can
start focusing on making the gameplay more complex.
If the gaming industry won't find any other way to keep additional CPU
cores busy, they'll go on smartening the AIs, adding more physics
effects, getting rid of level transitions, and what-have-you-not.
And maybe ultimately they might even start inventing innovative game
concepts ;-)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
> The main reason is that the GPU sets the limit, not the CPU. If the CPU
> gets fast enough, it will just sit idle while the GPU renders a frame.
> Adding more cores is not going to help that.
More realistic physics and AI!
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> But, for reasons unknown, desktop motherboards never support multiple
>> CPUs...
>
> I dunno. I go down to Frye's Electronics and they have a bunch. They're
> all hundreds of dollars more than the single-CPU boards...
Yeah, I imagine it's pretty expensive to wire up several hundred extra
tracks on the board. SLI boards are all way more expensive too. But at
least shops *sell* those...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New wrote:
> My boss bought a quad-core Mac with SSDs. A few weeks later I asked how
> it worked out. He said "I never wait for anything." :-)
And that's the difference. With Windoze, just closing the CD drive is
enough to lock the entire Explorer shell for ten minutes while it
attempts to determine whether there's a disk in there. (Um,
multitasking? Anyone?)
> With several SATA drives, it's nice to be able to copy at 80MBps between
> two different pairs of drives at once. Way nicer than lame-ass IDE.
My PC at home is all SATA too. It still takes forever for TF2 to start
up. :-P
> And my net here is nicely peppy. At 12Mbps, I'm almost always maxing out
> someone else's connection before my own. I can suck down an entire CD
> worth of data in about 15 minutes, faster than driving into work to pick
> it up.
Where in the name of God can you get 12 Mbit/sec? I thought 8 was the
maximum that ADSL supports...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> And this is the other Fun Thing. Given enough CPU cores, you will
>> eventually reach a point where the memory subsystem can't actually keep
>> up. The result is that the more cores you have, the more time they can
>> waste sitting idle.
>
> In theory you could still get an advantage if each core has its own
> L1 cache and they perform heavy calculations on bunches of data which
> fit those caches (and likewise the routine itself must obviously also
> fit in the L1 cache).
>
> In other words, if the algorithm can be constructed to be of the type
> "crunch 4 kB of data for several seconds, write the results to RAM and
> read new 4 kB of data, repeat", then additional cores will give an
> advantage.
>
> Of course many algorithms deal with a lot more data at a time than will
> nicely fit in L1 cache, especially if the cache is shared among the cores,
> so efficiency problems will start happening.
Typically the L1 cache is per-core, and the L2 cache is shared. But the
point still stands: If two cores try to write to the same region of
memory, they tend to constantly trip other each other with cache
coherancy issues. Oh, and if your algorithm needs random access to a
large block of RAM, forget it.
> (Ironically, with shared L1 cache systems you might end up in situations
> where a single-threaded version of the algorithm actually runs faster than
> a multithreaded version, or where the multithreaded one doesn't run any
> faster than the single-threaded one.)
This is quite common. I've been reading documentation about Haskell's GC
engine. They found that when running in parallel, sometimes it's faster
to turn *off* load-balancing, because that way each GC thread is
processing data which already happens to be in the cache, so that's
faster. If you migrate the work to another core, the data has to be
pumped out of one cache, into memory, and into the other cache before it
can be processed.
If CPUs didn't need caches in the first place (i.e., RAM was faster than
the CPU) then this would be a total non-issue. But here in the real
world, it's sometimes faster to leave cores idling rather than risk
upsetting the Almighty Cache. How sad...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
> If CPUs didn't need caches in the first place (i.e., RAM was faster than
> the CPU) then this would be a total non-issue. But here in the real world,
> it's sometimes faster to leave cores idling rather than risk upsetting the
> Almighty Cache. How sad...
Yes, if we upset the Almight Cache, *shock* we might drop back to the
performance levels of the fastest RAM available. The cache is there to
*speed up* stuff, I have no idea why you'd want a machine with a CPU running
at the same speed as the fastest RAM available, you're then going to get the
same levels of performance as if you upset the cache continuously!
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
scott wrote:
>> If CPUs didn't need caches in the first place (i.e., RAM was faster
>> than the CPU) then this would be a total non-issue. But here in the
>> real world, it's sometimes faster to leave cores idling rather than
>> risk upsetting the Almighty Cache. How sad...
>
> Yes, if we upset the Almight Cache, *shock* we might drop back to the
> performance levels of the fastest RAM available.
I meant that it's sad that we don't have RAM that can perform as fast as
the CPU itself.
[Or rather... I guess we do, since that must be what they make the L1
cache out of. But the L1 cache is tiny, so...]
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
> I meant that it's sad that we don't have RAM that can perform as fast as
> the CPU itself.
>
> [Or rather... I guess we do, since that must be what they make the L1
> cache out of. But the L1 cache is tiny, so...]
And, most importantly, it is very close to the CPU.
It's a cost/benefit thing, for $X how do you make the fastest computer? The
answer is to have a big slab of slow RAM, and progressively smaller bits of
faster RAM. Trying to do it another way will not make the fastest machine
for a given amount of money.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |