CPU and Single Core Implications

General discussion regarding the OpenMW project.
For technical support, please use the Support subforum.
CMAugust
Posts: 110
Joined: 10 Jan 2016, 00:13

Re: CPU and Single Core Implications

Post by CMAugust » 27 Dec 2017, 12:42

Decided to test in a different region. This time around, it's a small win for the performance patch. I picked the large tree near the center of the image for the scene graph. Maybe this will help diagnose the Bitter Coast example.

With patch (scene graph)
Without patch (scene graph)

User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51
Contact:

Re: CPU and Single Core Implications

Post by scrawl » 27 Dec 2017, 22:23

The scene graph looks fine. I'm not going to take too close of a look though because the version of OSG used to generate file has that bug causing it to not indent lines which are really a pain to read. For another metric one could look at the number of drawables in the last F3 panel.

Are you using FPS limiting on the optimized screenshot? It looks like there is some idle time in between draw phases. The real speedup, going by ms looks to be 21% on the draw thread and 34% on the GPU thread. That's not bad, but, I guess expected more :( Well, the tree is still kinda inefficient having 5 different textures.

Regarding the claim on the mod's page that FPS in solstheim doubles; that may well be the case in the original engine, but there shouldn't be a difference in OpenMW, because we already optimize solstheim trees last I checked (they have, for some odd reason, both alpha testing and blending enabled).

CMAugust
Posts: 110
Joined: 10 Jan 2016, 00:13

Re: CPU and Single Core Implications

Post by CMAugust » 28 Dec 2017, 00:26

scrawl wrote:
27 Dec 2017, 22:23
The scene graph looks fine. I'm not going to take too close of a look though because the version of OSG used to generate file has that bug causing it to not indent lines which are really a pain to read.
Can a fix be submitted and merged upstream it if makes the graphs easier to review?
For another metric one could look at the number of drawables in the last F3 panel.
Done.

Ascadian Isles (with patch / without patch)
Bitter Coast (with patch / without patch)

(edit: had the region names switched by mistake)
Are you using FPS limiting on the optimized screenshot? It looks like there is some idle time in between draw phases.
Sorry, you're right, I had the framerate limiter set to 255. I thought it wouldn't affect the results since it didn't come near that in my tests. Is that not the case?
The real speedup, going by ms looks to be 21% on the draw thread and 34% on the GPU thread. That's not bad, but, I guess expected more :( Well, the tree is still kinda inefficient having 5 different textures.
Yes - certainly for Morrowind's trees. But even a more modern tree such as Skyrim's ubiquitous treepineforest01.nif still makes use of 4 textures (branch, bark and their _n variants). Is there any more feedback I can give to the mod author that may produce a more optimal tree?
Regarding the claim on the mod's page that FPS in solstheim doubles; that may well be the case in the original engine, but there shouldn't be a difference in OpenMW, because we already optimize solstheim trees last I checked (they have, for some odd reason, both alpha testing and blending enabled).
Seems you're right. Made a save from on high to compare them, with a scene graph of "flora_tree_bm_snow_02".

Solstheim with patch (scene graph)
Solstheim without patch (scene graph)
Last edited by CMAugust on 28 Dec 2017, 03:19, edited 1 time in total.

User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51
Contact:

Re: CPU and Single Core Implications

Post by scrawl » 28 Dec 2017, 01:40

Can a fix be submitted and merged upstream it if makes the graphs easier to review?
It's already fixed, just not in all versions. If someone wants to pick that to our osg fork (is it not fixed there?), please do.
Bitter Coast (with patch / without patch)
Ascadian Isles (with patch / without patch)
Now, that scene gets a boost that's more like it:
Draw 6 vs 12.1
GPU 4.14 vs 9.94
So more than twice as fast?
But even a more modern tree such as Skyrim's ubiquitous treepineforest01.nif still makes use of 4 textures (branch, bark and their _n variants).
Ok, but you can't compare just the number of textures:

Code: Select all

bind bark.dds
bind bark_n.dds
draw mesh
bind branch.dds
bind branch_n.dds
draw mesh
vs.

Code: Select all

bind 1.dds
draw mesh
bind 2.dds
draw mesh
bind 3.dds
draw mesh
bind 4.dds
draw mesh
Guess which one is slower.

But I think the mod author did as well as they could without redoing the whole model. Good job !
Sorry, you're right, I had the framerate limiter set to 255. I thought it wouldn't affect the results since it didn't come near that in my tests. Is that not the case?
The framelimiting can do weird things because they put the CPU to sleep and CPUs don't like to wake up at a precise time, they like to switch to another thread and get back when they're finished. In my experience this makes the actual FPS you get a little bit lower than the limit you set.

Anyway, if your limit was 255 and the fps was 129, the framelimiter could hardly be active. But still, there seems to be some strange idle time in between the draw phases on that one picture.

CMAugust
Posts: 110
Joined: 10 Jan 2016, 00:13

Re: CPU and Single Core Implications

Post by CMAugust » 28 Dec 2017, 04:48

scrawl wrote:
28 Dec 2017, 01:40
(is it not fixed there?).
I'm using Ace's nightly build 7245b251e8 which is fairly recent, so probably not.
The framelimiting can do weird things because they put the CPU to sleep and CPUs don't like to wake up at a precise time, they like to switch to another thread and get back when they're finished. In my experience this makes the actual FPS you get a little bit lower than the limit you set.

Anyway, if your limit was 255 and the fps was 129, the framelimiter could hardly be active. But still, there seems to be some strange idle time in between the draw phases on that one picture.
I've gone back to that area several times (just north of Hla Oad) without any frame limiting and still get the unexplained idle time. Very odd. https://imgur.com/Y1KdEXe

I've since done tests in another area of the Bitter Coast (see below) where the idle doesn't appear, and the patched version is slightly ahead again.
Ok, but you can't compare just the number of textures:
(code)
vs.
(code)
Guess which one is slower.
You're right, I didn't fully understand what merits a draw. Now that you put it that way, it does make sense. Speaking of which:
Now, that scene gets a boost that's more like it:
Draw 6 vs 12.1
GPU 4.14 vs 9.94
So more than twice as fast?
Looks like it. Did some more levitation tests and the difference in Ascadian Isles is dramatic. This may be down to trees like Flora_tree_01 using only two textures instead of five.

Ascadian Isles (with patch / without patch)
Bitter Coast (with patch / without patch)

CMAugust
Posts: 110
Joined: 10 Jan 2016, 00:13

Re: CPU and Single Core Implications

Post by CMAugust » 29 Dec 2017, 14:16

scrawl wrote:
28 Dec 2017, 01:40
But still, there seems to be some strange idle time in between the draw phases on that one picture.
It's even more apparent when looking up. To be clear, vsync and frame limiter is off, as you can see when I enter a nearby interior. From what I can tell so far, the oddity only manifests outdoors. Guess I never noticed because OpenMW rarely goes above 60 in the test areas with the full profiler active. Let me know if I should submit a bug report to the tracker.

Meanwhile, the optimization patch has updated since my last test and the Bitter Coast has seen big gains. (the visual difference in the trees between patch 1.0 and final is due to edited normals, so they are now more correctly lit by the westering sun)

Area #1: vanilla / patch 1.0 / patch final
Area #2: vanilla / patch 1.0 / patch final

All in all a great result, and demonstrates what kind of difference optimized assets can make to the game.

ajira2
Posts: 25
Joined: 30 Oct 2017, 14:27

Re: CPU and Single Core Implications

Post by ajira2 » 29 Dec 2017, 15:47

I use now nvidia gtx 690..

We'll see how it goes with cpu fx 9590 and new liquid cooling (incoming)

Ill try vurts grass and that performance mod.

Im planning on buying 2 1080 gtx ti in sli and i plan to beat the crap out of openmw xD i hope i will get 60fps everywhere with 5 cells or so loaded or more and a heap of mods xD lets start slowly... (Tweaking a few things at a time)

Ill post the results nevertheless. We'll see where it bottlenecks.

Take a seat though. Ive saved just 300€ for the new gpus... :mrgreen:

User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51
Contact:

Re: CPU and Single Core Implications

Post by scrawl » 29 Dec 2017, 17:52

strange idle time in between the draw phases on that one picture.
I only meant on this one, the pause between each yellow bar.

On the other screens where you have all F3 panels enabled, it's understandable the FPS takes a hit because all objects in the scene (even non-visible ones) are being counted. This appears to happen between Update and Cull and is not registered on the profiling graph.

@ajira2, instead of SLI you should put the money for the 2nd gpu in a new motherboard & a Intel CPU instead. OpenMW is cpu-limited in most setups. Also SLI wouldn't work at all if you don't have an SLI profile for the game, which only Nvidia knows how to make.

CMAugust
Posts: 110
Joined: 10 Jan 2016, 00:13

Re: CPU and Single Core Implications

Post by CMAugust » 31 Dec 2017, 10:53

scrawl wrote:
29 Dec 2017, 17:52
On the other screens where you have all F3 panels enabled, it's understandable the FPS takes a hit because all objects in the scene (even non-visible ones) are being counted. This appears to happen between Update and Cull and is not registered on the profiling graph.
Understood, that makes sense.

I had hoped the optimized assets would allow for a stutter-free experience with distant land, but unfortunately that's still not the case. Actually, simply turning the player character around on the spot will produce a pronounced judder with distant terrain enabled, even though the framerate is 100 or better when the hiccup is over. My only guess as to what's happening is something related to this:
Of course, what ends up being rendered is still culled by the view frustum of the currently active camera.
I'm happy to be corrected though.

ajira2
Posts: 25
Joined: 30 Oct 2017, 14:27

Re: CPU and Single Core Implications

Post by ajira2 » 08 Jan 2018, 14:05

I can't find the performance patch with optimized assets or something like that.

Used to be in this thread, right?

Please can you repost it?

Post Reply

Who is online

Users browsing this forum: No registered users and 6 guests