|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
So I got my simulation in C# to run 1000 particles at a decent speed
(~20 fps), and I thought this would be a good time to try comparing it
to C++.
As much as possible, I ported the data structures and algorithm straight
across. And the result... surprised me.
The C++ release build, targetting SSE/SSE2, could only run ~500
particles before choking.
Thinking this might be an effect of the frequent allocation /
deallocation of new arrays, I switched the C++ version to preallocate
the arrays.
Nada. No difference at all in speed.
Now, I'm thinking I'm seriously doing something wrong here with the C++
version, but I can't figure it out. And without a profiler, I can't
spot bottlenecks in my executable.
Anyway, other than that, it runs fine. If anyone wants to take a look
at it, they can at http://www.pacificwebguy.com/downloads/physics.zip
(~450k, C++ source included).
--
...Ben Chambers
www.pacificwebguy.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> Now, I'm thinking I'm seriously doing something wrong here with the C++
> version, but I can't figure it out. And without a profiler, I can't spot
> bottlenecks in my executable.
Looks nice, but I'd say it definitely looks GPU bound for some strange
reason rather than CPU bound on my PC:
Particles FPS CPU %
0 60 00
200 60 00
300 60 01
320 30 05
Drawing 320 particles should NOT run at only 30 fps! See attached little
demo* I made, it draws 30000 cubes in the view, and on my machine runs at
around 100fps (you can move around with the arrow keys).
What I would add to your program to debug further:
a) Add an option to disable rendering each frame (ie just do the physics)
b) Add an option to disable the physics each frame (ie just draw the
spheres)
c) Add an option to disable the console output (it might be affecting frame
rate)
d) Add an option to disable the vsync that is limiting the frame time to
multiples of 1/60th second (your code might be blocking while waiting for
vsync?) and anyway it's useful for seeing how fast your code really is
*
BTW you might need d3dx9_35.dll if you don't have exactly the right DirectX
version, I haven't included it because it's quite big (3.5 MB), but you can
download it from here:
http://www.dll-files.com/dllindex/dll-files.shtml?d3dx9_35
Officially MS want you to release the DirectX installer with your DirectX
application, but when your program is only 42KB it seems a bit pointless to
include an installer and then the DirectX installer as well.
Post a reply to this message
Attachments:
Download 'boxes.zip' (43 KB)
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> Looks nice, but I'd say it definitely looks GPU bound for some strange
> reason rather than CPU bound on my PC:
>
> Particles FPS CPU %
> 0 60 00
> 200 60 00
> 300 60 01
> 320 30 05
>
> Drawing 320 particles should NOT run at only 30 fps! See attached
> little demo* I made, it draws 30000 cubes in the view, and on my machine
> runs at around 100fps (you can move around with the arrow keys).
PCI bandwidth limitation?
[Hmm, even that shouldn't be a problem with just 320 particles.]
> What I would add to your program to debug further:
>
> a) Add an option to disable rendering each frame (ie just do the physics)
> b) Add an option to disable the physics each frame (ie just draw the
> spheres)
> c) Add an option to disable the console output (it might be affecting
> frame rate)
> d) Add an option to disable the vsync that is limiting the frame time to
> multiples of 1/60th second (your code might be blocking while waiting
> for vsync?) and anyway it's useful for seeing how fast your code really is
Sounds sensible to me...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Drawing 320 particles should NOT run at only 30 fps! See attached little
>> demo* I made, it draws 30000 cubes in the view, and on my machine runs at
>> around 100fps (you can move around with the arrow keys).
>
> PCI bandwidth limitation?
I forgot to mention, my 30k cubes have their positions updated from the CPU
each frame...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Chambers <ben### [at] pacificwebguycom> wrote:
> Thinking this might be an effect of the frequent allocation /
> deallocation of new arrays, I switched the C++ version to preallocate
> the arrays.
Which version did your zip file contain?
It's certainly not a good idea to create, resize and destroy 6 vectors
at each iteration.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>> Now, I'm thinking I'm seriously doing something wrong here with the
>> C++ version, but I can't figure it out. And without a profiler, I
>> can't spot bottlenecks in my executable.
>
> Looks nice, but I'd say it definitely looks GPU bound for some strange
> reason rather than CPU bound on my PC:
>
> Particles FPS CPU %
> 0 60 00
> 200 60 00
> 300 60 01
> 320 30 05
That's REALLY strange, on my PC the CPU % goes up to 50 (ie, one entire
core)!
> What I would add to your program to debug further:
>
> a) Add an option to disable rendering each frame (ie just do the physics)
Huh, I even tried that on my machine, and it didn't change anything, so
I didn't think it was the graphics... but, I'll try using OpenGL
directly, rather than the ClanLib sprite API.
--
...Ben Chambers
www.pacificwebguy.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Particles FPS CPU %
>> 0 60 00
>> 200 60 00
>> 300 60 01
>> 320 30 05
>
> That's REALLY strange, on my PC the CPU % goes up to 50 (ie, one entire
> core)!
But what is the CPU % when your frame rate first drops to 30 from 60? If
it's under 50% then that means the program is GPU bound. It will of course
depend on the relative performance of the CPU and GPU in your machine. It
might be that I have a lame GPU compared to my CPU, so you may never see it
like this on your machine.
Mine goes up to 50% too soon after anyway:
380 30 25
480 10 40
515 3 48
520 1 50
But then using your algorithm of collision detection I would expect the code
to become CPU bound around this point anyway.
>> What I would add to your program to debug further:
>>
>> a) Add an option to disable rendering each frame (ie just do the physics)
>
> Huh, I even tried that on my machine, and it didn't change anything, so I
> didn't think it was the graphics... but, I'll try using OpenGL directly,
> rather than the ClanLib sprite API.
I think the problem is that initially the program is becoming GPU bound
(which it totally shouldn't for such a simple display), but then because the
graphics are O(n) and your collision algorithm is O(n^2), at some point the
program becomes CPU bound.
Improving your collision detection algorithm would help a lot, but then
still you need to do something with the graphics. Anyway, best to work on
one at a time, and disable the other while you do so you can see the true
performance of each part.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> It's certainly not a good idea to create, resize and destroy 6 vectors
> at each iteration.
And in a well-tuned GC system, that's less of a problem, see? :-)
That's part of the reason these languages *do* allocate everything on
the heap - it's not as much of a problem. Honestly, I'm not 100% sure
why, but when folks actually measure, it often works out that way.
C# has value types too. I wonder how much you'd speed it up by not
heap-allocating vectors to start with in C#.
Maybe the C# over-allocates the vectors, so the resize is cheap, and the
C++ vectors have to actually get moved when resized?
Just random thoughts on the topic.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
|
| |
| |
|
|
From: Nicolas Alvarez
Subject: Re: Speed comparison between C# and C++
Date: 17 Apr 2008 12:50:26
Message: <48077fd2@news.povray.org>
|
|
|
| |
| |
|
|
> Warp wrote:
>> It's certainly not a good idea to create, resize and destroy 6 vectors
>> at each iteration.
>
> And in a well-tuned GC system, that's less of a problem, see? :-)
Somebody uses a C++ feature inefficiently, and that proves GC is better??
> That's part of the reason these languages *do* allocate everything on
> the heap - it's not as much of a problem. Honestly, I'm not 100% sure
> why, but when folks actually measure, it often works out that way.
A C++ vector is not kept in the stack. Well, the vector itself is
(bookkeeping info) but the data inside is heap-allocated by the vector
AFAIK.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Alvarez wrote:
>> Warp wrote:
>>> It's certainly not a good idea to create, resize and destroy 6 vectors
>>> at each iteration.
>>
>> And in a well-tuned GC system, that's less of a problem, see? :-)
>
> Somebody uses a C++ feature inefficiently, and that proves GC is better??
You're taking a hopefully-educational comment and turning it into a C++
bash yourself there. "Allocating everything on the heap is often more
efficient in a well-tuned GC than allocating everything on the heap when
you're doing manual memory management" is not a bash on manual memory
management. *I* didn't even *mention* C++ in that sentence.
>> That's part of the reason these languages *do* allocate everything on
>> the heap - it's not as much of a problem. Honestly, I'm not 100% sure
>> why, but when folks actually measure, it often works out that way.
>
> A C++ vector is not kept in the stack. Well, the vector itself is
> (bookkeeping info) but the data inside is heap-allocated by the vector
> AFAIK.
I wonder if maybe it's a locality-of-reference thing then. :-)
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|