Rumors?
Re: Rumors?
Given CPU being given, I'm assuming we're doing software rendering?
Re: Rumors?
Very probably not. The test executes the same code on the GPU in both cases, the difference lies in how the CPU tells the GPU what to render.SquireNed wrote:Given CPU being given, I'm assuming we're doing software rendering?
This might explain itraevol wrote:So, it's great that OSG is getting 3x the fps of Ogre here, but I am curious as to why it's only able to render at 24fps when it's just rendering one seemingly simple model? Should't we be able to get hundreds of fps out of our renderer for such a simple scene?
EDIT: I'm assuming I'm just missing part of what went into the test- was this only using software rendering? Or on an intel card? Or some other restriction?
I'm sure that there still is room for improvement, though.scrawl wrote: Rendering 100 copies of this tree, which has 52 batches (so over 5k batches total).
Re: Rumors?
The alpha blending is likely knocking it down too, since that enforces strict draw ordering (whereas opaque/alpha tested models could be improved with instancing and lazy draw ordering) and increases overdraw.
Re: Rumors?
I just did a very rough test.scrawl wrote:As expected, the Ogre version was constantly hitting 100% CPU on a single core. The OSG version was using ~22% total CPU distributed evenly across all 4 cores.
Ogre, StaticGeometry: 8 FPS
Ogre, Entities: 5 FPS
OSG: 24 FPS
The niftools were annoying (such as giving me 8 sets of corrupt uv coordinates per vertex) and 52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?), so I used a 1566 tri tree mesh (2 sub entities) from blendswap.
40000 trees in the scene (so 80000 batches):
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
ogre 2.1: 37fps with 20% cpu usage evenly spread over all cores.
I really don't know osg (only a few hours use, most of that was trying to find a way to stop it from rendering to both of my monitors at the same time and in the wrong order (left half on right monitor etc). Why is it using monitor index order to choose viewport arrangement instead of the actual display offset windows provides?) so I'm probably doing something less than optimal. Then again I only have a few minutes experience with Ogre 2.1's new way of doing things.
(Note: I'm not trying to change anybody's mind, there's a lot of issues involved in the decision to change engines that have been considered. I'm just curious)
Re: Rumors?
Indeed there's a lot of issues involved, like the fact that Ogre 2.1 has certain requirements for its materials that would impact the modability of OpenMW.kojack wrote:...so I used a 1566 tri tree mesh (2 sub entities) from blendswap.
40000 trees in the scene (so 80000 batches):
osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
ogre 2.1: 37fps with 20% cpu usage evenly spread over all cores.
...
(Note: I'm not trying to change anybody's mind, there's a lot of issues involved in the decision to change engines that have been considered. I'm just curious)
I wonder if there'd be a possibility to run an optimizer pass over the models though, since I too consider 52 batches to be insane for the tree, and if we could fix that during load or run-time then it would most likely lead to substantial FPS gains.
Re: Rumors?
That's because the leaves are alpha blended and must be depth sorted among each other (and each leaf is also divided in 4 TriShapes for more accurate sorting). Of course, you can ask why exactly they are alpha blended instead of alpha tested. That, I do not know. Still, I do know that batches are the bottleneck in MW; an overview of Balmora typically uses 1000+ batches. That's why I picked a mesh with lots of batches for testing.52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?)
Yes there is, and we're actually doing that already with the current Ogre backend, we have an override file that sets some meshes to alpha testing instead of alpha blending, so they can be added to a StaticGeometry object which then merges the batches. But with alpha testing it does look not look faithful to MW, so I'm getting rid of that override list for the port. Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.I wonder if there'd be a possibility to run an optimizer pass over the models though, since I too consider 52 batches to be insane for the tree, and if we could fix that during load or run-time then it would most likely lead to substantial FPS gains.
I'm assuming the osg::Viewer wasn't set up to run multithreaded?osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
- As mentioned it is rendering 100 copies of the same model, not one model.when it's just rendering one seemingly simple model? Should't we be able to get hundreds of fps out of our renderer for such a simple scene?
- The model is not simple, it's glaringly inefficient in its number of batches.
- The test had the tree close to the camera, so rendering it 100 times produces some overdraw, similar to doing full-screen postprocessing passes. When I zoomed out the FPS increased.
- This was on a 2011 Intel graphics system, for the record.
Re: Rumors?
Another reason was for shadows, since you can't properly render alpha blended textures to the shadow map.scrawl wrote:But with alpha testing it does look not look faithful to MW, so I'm getting rid of that override list for the port. Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.
Looking at OSG's code, it actually seems a single viewer runs GL in a single thread at a given time. At least, there's no GraphicsWindow/GraphicsContext methods I can see to create extra GL contexts separate from the window.I'm assuming the osg::Viewer wasn't set up to run multithreaded?osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
Re: Rumors?
I meant that the culling / update traversal was likely running single threaded. Having GL rendering from multiple threads does not make much sense, since the driver calls are asynchronous anyway.Looking at OSG's code, it actually seems a single viewer runs GL in a single thread at a given time. At least, there's no GraphicsWindow/GraphicsContext methods I can see to create extra GL contexts separate from the window.I'm assuming the osg::Viewer wasn't set up to run multithreaded?osg: 20fps with around 16% cpu usage, one core entirely maxed out (I'm on hex core with ht)
Re: Rumors?
Good point.scrawl wrote:That's because the leaves are alpha blended and must be depth sorted among each other (and each leaf is also divided in 4 TriShapes for more accurate sorting). Of course, you can ask why exactly they are alpha blended instead of alpha tested. That, I do not know. Still, I do know that batches are the bottleneck in MW; an overview of Balmora typically uses 1000+ batches. That's why I picked a mesh with lots of batches for testing.52 batches for a single tree with only 2 textures is just insane (WTF Morrowind?)
That's the default, but if you enable Extreme Points on the mesh then Ogre sorts individual sub meshes by distance instead of using the parent entity's position.scrawl wrote: Another reason we used the override list was ogre's transparency sorting not being accurate enough, because it sorts using the scene node position instead of the bounding boxes.
(Call generateExtremes() on the mesh at runtime or use an exporter that saves Extreme Points in the mesh file)
It would be interesting to try something like optimising trees by sorting triangle render order (in the index buffer) by distance from the centre. Kind of like what AMD Tootle does for reducing overdraw (but that's the opposite direction, overdraw is fixed by outer triangles rendering first, transparency by inner triangles first).
Re: Rumors?
Skinning in osg works:
Learned something new, thanks.That's the default, but if you enable Extreme Points on the mesh then Ogre sorts individual sub meshes by distance instead of using the parent entity's position.
(Call generateExtremes() on the mesh at runtime or use an exporter that saves Extreme Points in the mesh file)