What is the status on performance?

General discussion regarding the OpenMW project.
For technical support, please use the Support subforum.
Chris
Posts: 1625
Joined: 04 Sep 2011, 08:33

Re: What is the status on performance?

Post by Chris »

AnyOldName3 wrote: 28 Nov 2018, 22:38 Regarding occlusion planes, it looks like OSG doesn't support them at all via occlusion queries with osg::OcclusionQueryNode (as those just check against the previous frame once every few frames, not against an existing depth buffer at cull time).
You probably wouldn't want to use occlusion queries to determine whether a node should be rendered anyway. It's an obvious chicken-and-egg problem: to determine if something can be seen to render, you need to render it to tell if it can be seen. You also don't want to block and wait on an occlusion query to complete, as that will kill performance (the entire purpose of those queries is that they happen asynchronously on the hardware; anything you use those queries for needs to be able to take a few frames before utilizing the results). The only thing that can help is to reduce the effectiveness of the occlusion plane, e.g. start rendering some object when it's near the edge of the plane, essentially reducing the size of the plane... but even that's not fool-proof for fast-moving objects.

Occlusion planes, done "properly", would function on the CPU side similar to frustum culling. It should be quick and approximate, so it won't do perfect culling against each individual trimesh (especially if the occlusion plane is relatively small in screen space). Occlusion culling relies on big continuous visual obstructions that would be easy to test against and take out large portions of the node tree. With a world like Morrowind, I don't see much in the way of easy effective testing, especially as you move around cities. Sure there may be particular points that you can see a benefit with occlusion planes, but players are more likely to be wandering around all over, where the occlusion tests become a lot more difficult to do effectively. And if it takes more time to figure out what can be culled than to just draw everything, there's no point in culling.
JDGBOLT
Posts: 21
Joined: 05 Apr 2018, 19:52

Re: What is the status on performance?

Post by JDGBOLT »

I recall someone mentioned this: https://github.com/gigc/Janua earlier at some point, for generating and using occlusion maps. When I was looking into it it looks like it creates a voxel map of the geometry when creating the occlusion map. Now with Recast we kind of already do that, had me thinking whether that could be useful or not. Just something I noticed, no idea whether feasible or not.
User avatar
AnyOldName3
Posts: 2668
Joined: 26 Nov 2015, 03:25

Re: What is the status on performance?

Post by AnyOldName3 »

It's not a chicken-and-egg problem - you do occlusion queries by rendering a bounding box of a big chunk of the scene, not all the individual objects as full meshes. UE4 does all its occlusion culling with occlusion queries exactly as I described and none with the CPU. As long as there's something else that you can do the cull traversal of in the meantime, the latency of the queries isn't a huge issue, and even if you can't, drawing an AABB and querying it takes way less time than drawing the tens or hundreds of objects contained within the AABB, especially if some of them need skinning, too.
Chris
Posts: 1625
Joined: 04 Sep 2011, 08:33

Re: What is the status on performance?

Post by Chris »

AnyOldName3 wrote: 29 Nov 2018, 14:57 It's not a chicken-and-egg problem - you do occlusion queries by rendering a bounding box of a big chunk of the scene, not all the individual objects as full meshes.
Whether you draw the bounding box or mesh, you still need to draw to tell if you should draw, it may be cheaper, but the draws are still done asynchronously from making the draw commands. And as you see, occlusion queries can take a frame or two to complete, so there's always the possibility of pop-in if you split up those two tasks.
As long as there's something else that you can do the cull traversal of in the meantime
It's not figuring out what to cull that's the issue. The issue is realizing what not to cull, given a moving scene. If you're relying on delayed occlusion queries, there's no way to guarantee what the query says was occluded is still occluded. You need some other more up-to-date test to override the delayed query data, defeating the purpose of the delayed query.

I suppose one possibility would be to use the delayed query results, and for everything it says is occluded, retest those AABBs on the CPU prior to drawing. That might cut down on the amount of work while ensuring visible things are drawn, though I'd question if it's a performance win, especially given an open world where your view is purposely unobstructed (is the added work for doing the occlusion tests going to be less than the amount of work saved from skipping what's ultimately occluded?).
drawing an AABB and querying it takes way less time than drawing the tens or hundreds of objects contained within the AABB, especially if some of them need skinning, too.
Since it needs to draw those tens or hundreds of objects as part of the visual scene, the AABB draws are still going to see delays. The command queue and hardware still has to process what it has and will get to the AABB draws on its own time.

Doing occlusion culling wholly on the GPU would rely on tighter integration with the GPU pipeline, where node traversal can be done on the GPU and it can serialize the occlusion queries and issuance of draw commands. We don't have that with OpenGL 2 or 3 since we don't have indirect draws or anything, we'd instead rely on synchronizing data between the CPU and GPU which is very costly. If we were using OpenGL 4 or Vulkan (and OSG was capable of it), it might be possible to do the occlusion queries and set up the draws on the GPU, but without it there's unavoidable latency with the query.
User avatar
AnyOldName3
Posts: 2668
Joined: 26 Nov 2015, 03:25

Re: What is the status on performance?

Post by AnyOldName3 »

Upon further research, I've realised that (depending on stuff like drivers and how many FBOs are in use) there are probably far fewer hard syncs than I thought in our rendering system (I was under the impression that, for example, you couldn't swap buffers without triggering a sync and you couldn't issue a drawing command that used an FBO as an input texture without stalling on everything being rendered to that FBO), so doing same-frame occlusion queries can still introduce massive stalls even when perfectly organised (although it's unlikely given the number of things OpenMW needs to do each frame that there's be more than a frame's worth of delay, so it may or may not work out faster). UE4 manages same-frame occlusion queries, but doesn't run well on low-end hardware.

That makes doing basically the same thing in software a more viable alternative. As CPUs have had vector operations for years now, this can apparently still be way faster than culling and drawing redundant geometry, especially if you have a few big queries. Unity apparently does this, so it's got proof-of-utility.
User avatar
psi29a
Posts: 5356
Joined: 29 Sep 2011, 10:13
Location: Belgium
Gitlab profile: https://gitlab.com/psi29a/
Contact:

Re: What is the status on performance?

Post by psi29a »

How useful is this to throw in a cpu thread who's responsibility is to this kind of thing? Or is the overhead going to also nullify any gains (delay/stall) in the rendering thead? (or do I have this right?)
Chris
Posts: 1625
Joined: 04 Sep 2011, 08:33

Re: What is the status on performance?

Post by Chris »

psi29a wrote: 29 Nov 2018, 21:54 How useful is this to throw in a cpu thread who's responsibility is to this kind of thing? Or is the overhead going to also nullify any gains (delay/stall) in the rendering thead? (or do I have this right?)
OSG does its culling prior to drawing on its own render thread. There's not much sense putting the occlusion culling on another separate thread since drawing depends on the culling results.
Post Reply