Rumors?

General discussion regarding the OpenMW project.
For technical support, please use the Support subforum.
SquireNed
Posts: 403
Joined: 21 Dec 2013, 22:18

Re: Rumors?

Post by SquireNed »

Given CPU being given, I'm assuming we're doing software rendering?
swick
Posts: 96
Joined: 06 Aug 2011, 13:00

Re: Rumors?

Post by swick »

SquireNed wrote:Given CPU being given, I'm assuming we're doing software rendering?
Very probably not. The test executes the same code on the GPU in both cases, the difference lies in how the CPU tells the GPU what to render.
raevol wrote:So, it's great that OSG is getting 3x the fps of Ogre here, but I am curious as to why it's only able to render at 24fps when it's just rendering one seemingly simple model? Should't we be able to get hundreds of fps out of our renderer for such a simple scene?

EDIT: I'm assuming I'm just missing part of what went into the test- was this only using software rendering? Or on an intel card? Or some other restriction?
This might explain it ;)
scrawl wrote: Rendering 100 copies of this tree, which has 52 batches (so over 5k batches total).
I'm sure that there still is room for improvement, though.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: Rumors?

Post by Chris »

The alpha blending is likely knocking it down too, since that enforces strict draw ordering (whereas opaque/alpha tested models could be improved with instancing and lazy draw ordering) and increases overdraw.
User avatar
kojack
Posts: 14
Joined: 17 Feb 2015, 23:15

Re: Rumors?

Post by kojack »

scrawl wrote:As expected, the Ogre version was constantly hitting 100% CPU on a single core. The OSG version was using ~22% total CPU distributed evenly across all 4 cores.

Ogre, StaticGeometry: 8 FPS
Ogre, Entities: 5 FPS
OSG: 24 FPS
I just did a very rough test.
The niftools were annoying (such as giving me 8 sets of corrupt uv coordinates per vertex) and 52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?), so I used a 1566 tri tree mesh (2 sub entities) from blendswap.
40000 trees in the scene (so 80000 batches):
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
ogre 2.1: 37fps with 20% cpu usage evenly spread over all cores.

I really don't know osg (only a few hours use, most of that was trying to find a way to stop it from rendering to both of my monitors at the same time and in the wrong order (left half on right monitor etc). Why is it using monitor index order to choose viewport arrangement instead of the actual display offset windows provides?) so I'm probably doing something less than optimal. Then again I only have a few minutes experience with Ogre 2.1's new way of doing things. :)

(Note: I'm not trying to change anybody's mind, there's a lot of issues involved in the decision to change engines that have been considered. I'm just curious)
User avatar
Ace (SWE)
Posts: 887
Joined: 15 Aug 2011, 14:56

Re: Rumors?

Post by Ace (SWE) »

kojack wrote:...so I used a 1566 tri tree mesh (2 sub entities) from blendswap.
40000 trees in the scene (so 80000 batches):
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
ogre 2.1: 37fps with 20% cpu usage evenly spread over all cores.
...
(Note: I'm not trying to change anybody's mind, there's a lot of issues involved in the decision to change engines that have been considered. I'm just curious)
Indeed there's a lot of issues involved, like the fact that Ogre 2.1 has certain requirements for its materials that would impact the modability of OpenMW.

I wonder if there'd be a possibility to run an optimizer pass over the models though, since I too consider 52 batches to be insane for the tree, and if we could fix that during load or run-time then it would most likely lead to substantial FPS gains.
User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51

Re: Rumors?

Post by scrawl »

52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?)
That's because the leaves are alpha blended and must be depth sorted among each other (and each leaf is also divided in 4 TriShapes for more accurate sorting). Of course, you can ask why exactly they are alpha blended instead of alpha tested. That, I do not know. Still, I do know that batches are the bottleneck in MW; an overview of Balmora typically uses 1000+ batches. That's why I picked a mesh with lots of batches for testing.
I wonder if there'd be a possibility to run an optimizer pass over the models though, since I too consider 52 batches to be insane for the tree, and if we could fix that during load or run-time then it would most likely lead to substantial FPS gains.
Yes there is, and we're actually doing that already with the current Ogre backend, we have an override file that sets some meshes to alpha testing instead of alpha blending, so they can be added to a StaticGeometry object which then merges the batches. But with alpha testing it does look not look faithful to MW, so I'm getting rid of that override list for the port. Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
I'm assuming the osg::Viewer wasn't set up to run multithreaded?
when it's just rendering one seemingly simple model? Should't we be able to get hundreds of fps out of our renderer for such a simple scene?
- As mentioned it is rendering 100 copies of the same model, not one model.
- The model is not simple, it's glaringly inefficient in its number of batches.
- The test had the tree close to the camera, so rendering it 100 times produces some overdraw, similar to doing full-screen postprocessing passes. When I zoomed out the FPS increased.
- This was on a 2011 Intel graphics system, for the record.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: Rumors?

Post by Chris »

scrawl wrote:But with alpha testing it does look not look faithful to MW, so I'm getting rid of that override list for the port. Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.
Another reason was for shadows, since you can't properly render alpha blended textures to the shadow map.
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
I'm assuming the osg::Viewer wasn't set up to run multithreaded?
Looking at OSG's code, it actually seems a single viewer runs GL in a single thread at a given time. At least, there's no GraphicsWindow/GraphicsContext methods I can see to create extra GL contexts separate from the window.
User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51

Re: Rumors?

Post by scrawl »

osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
I'm assuming the osg::Viewer wasn't set up to run multithreaded?
Looking at OSG's code, it actually seems a single viewer runs GL in a single thread at a given time. At least, there's no GraphicsWindow/GraphicsContext methods I can see to create extra GL contexts separate from the window.
I meant that the culling / update traversal was likely running single threaded. Having GL rendering from multiple threads does not make much sense, since the driver calls are asynchronous anyway.
User avatar
kojack
Posts: 14
Joined: 17 Feb 2015, 23:15

Re: Rumors?

Post by kojack »

scrawl wrote:
52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?)
That's because the leaves are alpha blended and must be depth sorted among each other (and each leaf is also divided in 4 TriShapes for more accurate sorting). Of course, you can ask why exactly they are alpha blended instead of alpha tested. That, I do not know. Still, I do know that batches are the bottleneck in MW; an overview of Balmora typically uses 1000+ batches. That's why I picked a mesh with lots of batches for testing.
Good point.
scrawl wrote: Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.
That's the default, but if you enable Extreme Points on the mesh then Ogre sorts individual sub meshes by distance instead of using the parent entity's position.
(Call generateExtremes() on the mesh at runtime or use an exporter that saves Extreme Points in the mesh file)



It would be interesting to try something like optimising trees by sorting triangle render order (in the index buffer) by distance from the centre. Kind of like what AMD Tootle does for reducing overdraw (but that's the opposite direction, overdraw is fixed by outer triangles rendering first, transparency by inner triangles first).
User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51

Re: Rumors?

Post by scrawl »

Skinning in osg works:
Image
That's the default, but if you enable Extreme Points on the mesh then Ogre sorts individual sub meshes by distance instead of using the parent entity's position.
(Call generateExtremes() on the mesh at runtime or use an exporter that saves Extreme Points in the mesh file)
Learned something new, thanks.
Post Reply