Vertex cache optimization
Posted: 23 Dec 2017, 19:17
I am the emissary of the great and mighty Greatness7 who has discovered that reordering the faces in a .nif file leads to significant performance increases in both vanilla and openmw (or any game, this is apparently a well-known concept), and this is a feature that could be added to the mesh loader. In his test I got 440FPS with regular .nifs and 720FPS with optimized ones (in a cell full of Suzannas).
Apparently the GPU keeps a cache of recently seen vertices and it loads from the cache faster than it does from VRAM, and this makes it so it is able to load from the cache more often. Imagine a mesh of a pizza with one vertex at the center that connects all the slices, if the faces are ordered in a way that all the slices that use the same center vertex are adjacent to each other then it can just keep the center vertex in the cache instead of loading the vertices for the crust. Then when it loads the crust it will load crust from slices that are next to each other since they share vertices.
More specifics can be found here:
https://tomforsyth1000.github.io/papers ... e_opt.html
http://gameangst.com/?p=9
Here's the test file with instructions: https://a.safe.moe/hwS9Y.7z
Apparently the GPU keeps a cache of recently seen vertices and it loads from the cache faster than it does from VRAM, and this makes it so it is able to load from the cache more often. Imagine a mesh of a pizza with one vertex at the center that connects all the slices, if the faces are ordered in a way that all the slices that use the same center vertex are adjacent to each other then it can just keep the center vertex in the cache instead of loading the vertices for the crust. Then when it loads the crust it will load crust from slices that are next to each other since they share vertices.
More specifics can be found here:
https://tomforsyth1000.github.io/papers ... e_opt.html
http://gameangst.com/?p=9
Here's the test file with instructions: https://a.safe.moe/hwS9Y.7z