|
 |
On 18-8-2009 15:57, Invisible wrote:
>>> Indeed. And on the GPU, you can say "for all these thirty pixels
>>> you're processing, multiply each one by the corresponding texture
>>> pixel". For example. One instruction, executed on 30 different pairs
>>> of data values. SIMD.
>>
>> If you've ever done a array processing with eg MatLab you will be
>> familiar with how to restructure algorithms to work in this sort of
>> environment.
>
> Indeed, this is part of what I hated about Matlab; If it isn't an array,
> you can't do anything with it. (That and the absurd syntax...)
Sometimes I wished you did stop flaunting your ignorance, this might be
one of those.
>> Things like doing "OUTPUT = A * B + (1-A) * C" is a single instruction
>> that can operate on every value of the array, but essentially lets you
>> choose output B or C based on the value of A. This is often very
>> useful and fast for converting typical one-value-at-a-time algorithms.
>
> Me being me, I would have expected a conditional statement to be faster
> than a redundant computation.
>
> I guess back in the days before FPUs, when the RAM was faster than the
> CPU, that might even have been true. But today it seems it doesn't
> matter how inefficient an algorithm is, just so long as it has good
> cache behaviour and doesn't stall the pipeline. *sigh*
Actually it is a different way of thinking. Remember battlechess? The PC
did not have a language or paradigm that would make something like that
seem feasible. The Amiga had, with its combined CPU and blitter
architecture. Having seen the Amiga example it is easy to see how you
can simulate that in software. So battlechess on the PC could not have
been developed, not because it was technically impossible, but because
it would have been too slow if you had designed it using the
conventional un-parallelized paradigms.
I have probably mentioned it here before, but I once wrote an Ising
model simulation on an Amiga 256 by 256 (IIRC) at 7 frames per second
(in 1988 +-1), using almost exclusively the blitter. For that I had to
implement my own bitwise addition and subtraction, easy, just a number
of AND and XOR blit operations (if you don't know how to do it, I know
just the course for you ;) ). I also had to have a chance of flipping a
spin that is for each spin a fraction that depends on the temperature. I
even solved that with almost only the blitter. (left as an exercise to
the reader/mascot) I think it was several orders of magnitude faster
than what I could have accomplished if I did it the traditional way with
loops, IFs and floating point comparisons to random numbers.
>> Reminds me of a built-in MatLab function to convert an RGB image to
>> HSV. The function was actually looping through every pixel and calling
>> the convert function (which had several if's in it). I rewrote the
>> conversion function to work on whole arrays at a time and it was
>> orders of magnitude faster. That's what you need to do for GPU
>> programming too.
>
> Whenever you have a system like MatLab or SQL which is inherantly
> designed to do parallel processing, letting the intensively-tuned
> parallel engine do its stuff rather than explicitly looping yourself is
> always, always goig to be faster. ;-)
If only because the interpreter is rather slow because of the complexity
of the language.
Post a reply to this message
|
 |