[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1823: Undefined array key 5435
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1823: Undefined array key 5435
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1823: Undefined array key 5435
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/viewtopic.php on line 1833: Trying to access array offset on value of type null
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4149: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3027)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4149: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3027)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4149: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3027)
[phpBB Debug] PHP Warning: in file [ROOT]/includes/functions.php on line 4149: Cannot modify header information - headers already sent by (output started at [ROOT]/includes/functions.php:3027)
Scourge of War Boards • First (truly shocking!) optimization benchmarks
Page 1 of 1

First (truly shocking!) optimization benchmarks

Posted: Tue Jun 09, 2015 6:25 pm
by Guest
I've just played the tutorial.
The camera's movement was showing some serious lag, but not so annoying.
That scenario just had a few units on the ground, though.
I enjoyed it overally.

From the previous days talks on this forum, I was thinking this game was CPU (AI, LOS, anims updating, etc.) bound, rather than GPU.
Hence profiling it with MS DirectX PIX sounded like a formality. I did it anyway...
So I picked "WL01 - The Emperor's Plan", looking at the scene's left from the French starting position, as benchmark.

They were by far the most shocking results I ever run across with such a tool:

- 5000 DPUPs per frame, meaning one of the most deprecated DX9's draw call ("data specified by a user memory pointer") per sprite instance (plus all terrain stuff)!
This can be very easily converted to 1 x DIP per sprite type (not instance!!!) with Hardware Instancing (SetStreamSourceFreq).
I suppose 5000 DPUPs to 50 DIPs thereafter?
- 5500 SetRenderStates per frame, same as above (one per sprite instance, plus terrains).
This can be very easily converted to 5 x SetRenderState for the whole sprite renderer.
5000 SetRenderState vs 5 SetRenderState?
- 5000 SetVertexShader + 5000 SetPixelShader per frame, again the same problem.
This can be very easily converted to 1 x SetVertexShader + 1 x SetPixelShader for the whole sprite renderer.
5000+5000 vs 1+1?
- 22000 SetTexture calls, which should come from the fact that one sprite instance needs more than one texture switch (average of 4?) because of composition (trousers, horses, hats, etc.).
Some terrain stuff is also included in the estimate.
Yet, incredible. You should batch draw calls sharing material.
20000 to 250 SetTextures?
- 6600 x SetVertexShaderConstant per frame;
- 6600 x SetPixelShaderConstant per frame;

- The total DX9 API calls are steadily > 200.000 per frame.

DirectX 9 APIs are quite inefficient on their own, abusing in such a way is performance-killer at the nth power.
This is inescapable, so please fix it as soon as possible.
It may really do all the difference of the world with a limited effort.

Back to fun now.

Re: First (truly shocking!) optimization benchmarks

Posted: Tue Jun 09, 2015 6:30 pm
by con20or
Moving this to the technical section.

Re: First (truly shocking!) optimization benchmarks

Posted: Tue Jun 09, 2015 8:30 pm
by norb
It would be awesome if we had a larger coding staff. I know, our engine is older and does not take advantage of many upgrades today. It's one of those areas that we can't do as well as we would want if we had more resources. I would love the time to upgrade the entire graphics engine... wishing.

Re: First (truly shocking!) optimization benchmarks

Posted: Tue Jun 09, 2015 10:18 pm
by Gunfreak
I've just played the tutorial.
The camera's movement was showing some serious lag, but not so annoying.
That scenario just had a few units on the ground, though.
I enjoyed it overally.

From the previous days talks on this forum, I was thinking this game was CPU (AI, LOS, anims updating, etc.) bound, rather than GPU.
Hence profiling it with MS DirectX PIX sounded like a formality. I did it anyway...
So I picked "WL01 - The Emperor's Plan", looking at the scene's left from the French starting position, as benchmark.

They were by far the most shocking results I ever run across with such a tool:

- 5000 DPUPs per frame, meaning one of the most deprecated DX9's draw call ("data specified by a user memory pointer") per sprite instance (plus all terrain stuff)!
This can be very easily converted to 1 x DIP per sprite type (not instance!!!) with Hardware Instancing (SetStreamSourceFreq).
I suppose 5000 DPUPs to 50 DIPs thereafter?
- 5500 SetRenderStates per frame, same as above (one per sprite instance, plus terrains).
This can be very easily converted to 5 x SetRenderState for the whole sprite renderer.
5000 SetRenderState vs 5 SetRenderState?
- 5000 SetVertexShader + 5000 SetPixelShader per frame, again the same problem.
This can be very easily converted to 1 x SetVertexShader + 1 x SetPixelShader for the whole sprite renderer.
5000+5000 vs 1+1?
- 22000 SetTexture calls, which should come from the fact that one sprite instance needs more than one texture switch (average of 4?) because of composition (trousers, horses, hats, etc.).
Some terrain stuff is also included in the estimate.
Yet, incredible. You should batch draw calls sharing material.
20000 to 250 SetTextures?

DirectX 9 APIs are quite inefficient on their own, abusing in such a way is performance-killer at the nth power.
This is inescapable, so please fix it as soon as possible.
It may really do all the difference of the world with a limited effort.

Back to fun now.
Is this things that can be fixed by you with mods or the devs now that you have told them? Or is this inate in the engine?

Re: First (truly shocking!) optimization benchmarks

Posted: Wed Jun 10, 2015 12:12 am
by Guest
From a rather quick look at the DLL, I don't think the PowerRender engine is that bad after all, Norb. I might be wrong however.
What we'd really need at least are some structures aiming for optimal rendering on top of its resources.
Just to name some: display list (back-to-front sorting for entities with transparency) with parallel submission, minimization of changes in render and shader states, stacks of render states and draw calls (DIPInstancing is essential for your sprites), cbuffer emulation to update variables of shaders and so on.
In the recent past I coded for my own (unfinished so far) projects these kind of utilities on top of good old Ogre 1.9 (DX9 render system only) with success and remarkable benefits.
I'd even offer my humble services, although my CV is fairly ridicolous. But let's talk about this tomorrow after some sleep. 3.25 hours of bike racing at 31.5 km/h avg speed may had some impact on my clarity of mind. :)
Ciao.

Re: First (truly shocking!) optimization benchmarks

Posted: Wed Jun 10, 2015 1:27 am
by Jim
Norb is 100% of the entire coding staff for the main game as well as for the graphics engine. Mitra does the DLL AI coding which is a major job in itself. Everyone else multitasks in other areas as shown in the game credits. All of which is done (mostly) on a nights and weekend schedule. So when we say we have limited resources, we are not just blowing (black powder) smoke.

-Jim

Re: First (truly shocking!) optimization benchmarks

Posted: Wed Jun 10, 2015 12:07 pm
by Guest
Nobody is going to bring that into question, Jim. Never. :)

Nevertheless, considering my statistics as proof (and I had been conservative for sure, probably forgetting additional thousands of DX9 API calls on per-sprite-instance basis), it should be clear enough that some very limited/careful investment of the available resources/earnings would be well worthwhile the benefits.

Can you really figure out, after reducing those calls by the order of magnitude I pointed above, how many new features you could potentially add to the game in the coming years without spending most of your time for worried measurements and considerations (or even facing impossibility)?

Just take a look at the final appendix here and multiply accordingly.

Moreover, achieving higher frame rates, you could enable vsync (some users are already asking for it), which would result in much fewer complains of customers about burned GPUs.

If Norb wants, he knowns how to contact me. ;)

Nicolò

Re: First (truly shocking!) optimization benchmarks

Posted: Wed Jun 10, 2015 12:55 pm
by Guest
A few more stats on the same scene to complete the picture:

- 6600 x SetVertexShaderConstant per frame;
- 6600 x SetPixelShaderConstant per frame;

- The total DX9 API calls are steadily > 200.000 per frame.

It's running at 10 FPS on my 4 years old PC. But this is a subjective reference...