 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> Indeed, this is my plan. I'm just trying to get the naive list-based
>> version to work. (Currently it gives the wrong answers, which
>> obviously isn't a lot of use...)
>
> I could have written a working C version by now :-D
I have a working Haskell version by now. (Apparently I mixed up the rows
and columns, hence the bogus answers.) It's really not hard. I've
actually spent more time building a test harness than coding the matrix
multiplication so far.
In case you care, the code is:
type Matrix x = [[x]]
multiply :: (Num x) => Matrix x -> Matrix x -> Matrix x
multiply m1 m2 = [ [ sum $ zipWith (*) row col | col <- transpose m2 ] |
row <- m1 ]
That's the whole program. (Minus the part where it creates some
matricies, multiplies then, and prints out the result.) Complex, eh?
Now, in terms of speed... Well, it takes about 62 seconds to multiply
two 512x512 matricies. But hey, this is only the simplest, dumbest
version I could come up with. Let's see what happens (if anything) when
I start applying a few optimisations...
> Also I wonder how his results compare to something like MatLab that
> supposedly has very fast matrix handling functions built it...
Well, matrix multiplication is a built-in MatLab primitive. I would
imagine it's comparable to BLAS (or possibly parallel BLAS)...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible a écrit :
> It seems we have reached an age where the complexity class of an
> algorithm is almost irrelevant, and the only important property is
> the specific low-level details of exactly how the algorithm has been
> implemented on a specific hardware platform.
>
> This is the antithesis of high-level programming.
Actually the complexity class of the algorithm remains exactly as
relevant because it just says how the computation time asymptotically
evolves as the data gets bigger. The optimization described here do not
change that I believe. Just the multiplicative constant in front, which
in practice is of course important as well ;-)
--
Vincent
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 07/23/09 04:19, Invisible wrote:
> Just for giggles, I could implement this in Haskell and see how many
> million times slower it is. But then, since it isn't running on the same
> hardware, it would be kind of a moot comparison.
Well, they _do_ provide the source...
--
No, Taco Bell is NOT the Mexican Phone Company!
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 07/23/09 05:42, Invisible wrote:
> This is the antithesis of high-level programming.
As is all numerical computation.
--
No, Taco Bell is NOT the Mexican Phone Company!
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Vincent Le Chevalier wrote:
> Actually the complexity class of the algorithm remains exactly as
> relevant because it just says how the computation time asymptotically
> evolves as the data gets bigger. The optimization described here do not
> change that I believe. Just the multiplicative constant in front, which
> in practice is of course important as well ;-)
I think that BLAS probably uses a slightly more asymptotically efficient
multiplication algorithm for matrices this size, but it's interesting
that at best it only accounts for a small fraction of the speedup.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> Oh, sure, I understand *why* this happens. But it is very, very
> disturbing to realise that just doing the same instructions in a
> different order, or storing the same data in a different order, can have
> such a vast effect on performance. It seems we have reached an age where
> the complexity class of an algorithm is almost irrelevant, and the only
> important property is the specific low-level details of exactly how the
> algorithm has been implemented on a specific hardware platform.
>
> This is the antithesis of high-level programming.
>
Was an example if this I saw recently with multi-core programming. If
you changed the memory "directly" which was used between multiple
threads in the program, then all cores would have to wait for the
"current" core to finish handling the data, thus unlocking the memory,
so that the other cores would operate. The result would be that every
single core would be wobbling between wait and active, and your speed
would drop *under* that of a single core working on the same process.
The fix was simple. Make your function so you pass the value (not a
pointer), handle it internally as a temporary variable, then return it,
where the calling function would only set the value. The memory would be
free while the function was being called and executing, and only "lock"
during the brief moment the final value was set. Result - the expected
increase in speed, as you increase the number of cores.
Yeah, one they got to multi-core, the idea of, "The language doesn't
need to know anything about the hardware.", was lost, to some extent.
But, then, you also get stupid things like GCC optimizing out NULL
tests, which actually caused two Linux kernal versions to suffer a
potential, Windows style, "root level access no matter what account you
are on", problem. The compiler needs to be smart enough to "know" you
are compiling for multi-cores, and warn you about such things (or avoid
optimizing in a way that causes them). But, high level languages where
all developed "before" this was an issue, so its not surprising its
shown up now.
--
void main () {
if version = "Vista" {
call slow_by_half();
call DRM_everything();
}
call functional_code();
}
else
call crash_windows();
}
<A HREF='http://www.daz3d.com/index.php?refid=16130551'>Get 3D Models,
3D Content, and 3D Software at DAZ3D!</A>
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Patrick Elliott wrote:
> The compiler needs to be smart enough to "know" you
> are compiling for multi-cores, and warn you about such things (or avoid
> optimizing in a way that causes them).
More precisely, the compiler needs to "know" what your processes are,
instead of leaving it as an opaque library outside the semantics of the
language completely. See the ongoing thread elsewhere. ;-)
--
Darren New, San Diego CA, USA (PST)
"We'd like you to back-port all the changes in 2.0
back to version 1.0."
"We've done that already. We call it 2.0."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> Sure. Because working out what order an arbitrary program wants to
> access its data in is not, in any way, equivilent to the halting
> problem. No sir!
Remember, the Halting Problem is unsolvable in the general case, but may
be solvable in specific cases.
It's the same thing with compiled code. While the compiler can't
guarantee that *every* code file will be compiled to its optimal machine
code, it can guarantee that *sets* of code files will. The job of the
compiler developer is to identify these potential sets, and teach the
compiler to recognize them.
--
Chambers
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> The compiler needs to be smart enough to "know" you are compiling for
>> multi-cores, and warn you about such things (or avoid optimizing in a
>> way that causes them).
>
> More precisely, the compiler needs to "know" what your processes are,
> instead of leaving it as an opaque library outside the semantics of the
> language completely. See the ongoing thread elsewhere. ;-)
And I would suggest that geniunely "high-level" languages (i.e.,
languages were you don't manually frob low-level machine resources) have
a much greater chance of adapting to multicore programming...
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |