Page 2 of 3

Re: OpenMW 2023.04.10: crash on startup

Posted: 23 Apr 2023, 05:06
by franley
Thanks, that does make sense.

Fortunately, it does happen on every run, so I should be able to play around with it in gdb.

Here is the stack frame on catching the segv:

Code: Select all

(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff67c7faf in osg::ClipControl::apply(osg::State&) const ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgd.so.161
#2  0x0000000001895425 in osg::State::applyAttribute (this=0x3450eb0, 
    attribute=0x74f5830, as=...)
    at /gnu/store/p14inh5wi6dyq7h81kjcs2pd1a5z0lfv-openscenegraph-3.6-1.a827840/include/osg/State:1190
#3  0x00007ffff694b65f in osg::State::applyAttributeList(std::map<std::pair<osg::StateAttribute::Type, unsigned int>, osg::State::AttributeStack, std::less<std::pair<osg::StateAttribute::Type, unsigned int> >, std::allocator<std::pair<std::pair<osg::StateAttribute::Type, unsigned int> const, osg::State::AttributeStack> > >&, std::map<std::pair<osg::StateAttribute::Type, unsigned int>, std::pair<osg::ref_ptr<osg::StateAttribute>, unsigned int>, std::less<std::pair<osg::StateAttribute::Type, unsigned int> >, std::allocator<std::pair<std::pair<osg::StateAttribute::Type, unsigned int> const, std::pair<osg::ref_ptr<osg::StateAttribute>, unsigned int> > > > const&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgd.so.161
#4  0x00007ffff6943457 in osg::State::apply(osg::StateSet const*) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgd.so.161
#5  0x00007ffff70ebd4a in osgUtil::RenderLeaf::render(osg::RenderInfo&, osgUtil::RenderLeaf*) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#6  0x00007ffff70de917 in osgUtil::RenderBin::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#7  0x00007ffff70f3425 in osgUtil::RenderStage::drawImplementation(osg::RenderInfo&, osgUtil::RenderLeaf*&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a82784--Type <RET> for more, q to quit, c to continue without paging--
0/lib/libosgUtild.so.161
#8  0x00007ffff70de601 in osgUtil::RenderBin::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#9  0x00007ffff70f15cc in osgUtil::RenderStage::drawInner(osg::RenderInfo&, osgUtil::RenderLeaf*&, bool&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#10 0x00007ffff70f2b1f in osgUtil::RenderStage::draw(osg::RenderInfo&, osgUtil::RenderLeaf*&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#11 0x00007ffff70edbd5 in osgUtil::RenderStage::drawPreRenderStages(osg::RenderInfo&, osgUtil::RenderLeaf*&) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#12 0x00007ffff710612a in osgUtil::SceneView::draw() ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgUtild.so.161
#13 0x00007ffff7d79411 in osgViewer::Renderer::draw() ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgViewerd.so.161
#14 0x00007ffff7d7a89b in osgViewer::Renderer::operator()(osg::GraphicsContext*) ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgViewerd.so.161
#15 0x00007ffff6858a2c in osg::GraphicsContext::runOperations() ()
   from /gnu/store/c1kn2raiqyiywacjk0vpbzxpysvmd6kp-openscenegraph-3.6-1.a827840/lib/libosgd.so.161
#16 0x00007ffff686300d in osg::RunOperations::operator()(osg::GraphicsContext*) ()
Nothing surprising, same as in the crash dump.

Here is the disassembly of osg::ClipControl::apply at time of crash:

Code: Select all

(gdb) disass 0x00007ffff67c7faf
Dump of assembler code for function _ZNK3osg11ClipControl5applyERNS_5StateE:
   0x00007ffff67c7f60 <+0>:	push   %rbp
   0x00007ffff67c7f61 <+1>:	mov    %rsp,%rbp
   0x00007ffff67c7f64 <+4>:	sub    $0x20,%rsp
   0x00007ffff67c7f68 <+8>:	mov    %rdi,-0x8(%rbp)
   0x00007ffff67c7f6c <+12>:	mov    %rsi,-0x10(%rbp)
   0x00007ffff67c7f70 <+16>:	mov    -0x8(%rbp),%rax
   0x00007ffff67c7f74 <+20>:	mov    %rax,-0x20(%rbp)
   0x00007ffff67c7f78 <+24>:	mov    -0x10(%rbp),%rdi
   0x00007ffff67c7f7c <+28>:	call   0x7ffff6748ea0 <_ZN3osg5State3getINS_12GLExtensionsEEEPT_v@plt>
   0x00007ffff67c7f81 <+33>:	mov    %rax,-0x18(%rbp)
   0x00007ffff67c7f85 <+37>:	mov    -0x18(%rbp),%rax
   0x00007ffff67c7f89 <+41>:	testb  $0x1,0x2e(%rax)
   0x00007ffff67c7f8d <+45>:	jne    0x7ffff67c7f98 <_ZNK3osg11ClipControl5applyERNS_5StateE+56>
   0x00007ffff67c7f93 <+51>:	jmp    0x7ffff67c7faf <_ZNK3osg11ClipControl5applyERNS_5StateE+79>
   0x00007ffff67c7f98 <+56>:	mov    -0x20(%rbp),%rcx
   0x00007ffff67c7f9c <+60>:	mov    -0x18(%rbp),%rax
   0x00007ffff67c7fa0 <+64>:	mov    0x358(%rax),%rax
   0x00007ffff67c7fa7 <+71>:	mov    0x78(%rcx),%edi
   0x00007ffff67c7faa <+74>:	mov    0x7c(%rcx),%esi
   0x00007ffff67c7fad <+77>:	call   *%rax
   0x00007ffff67c7faf <+79>:	add    $0x20,%rsp
   0x00007ffff67c7fb3 <+83>:	pop    %rbp
   0x00007ffff67c7fb4 <+84>:	ret    
My assembly lang / gdb fu is not particularly advanced at the moment, so any guidance you have on what would be good to examine here is appreciated.

My initial thought is to try to find out what address(es) is/are being corrupted, presumably it's somewhere in apply itself or in value it's moving into a register? Then maybe I could set a watch on that address, assuming it's stable, when the program is starting, and see if the corruption happens at runtime prior to the crash or maybe that it's already present when the library is loading?

Re: OpenMW 2023.04.10: crash on startup

Posted: 28 Apr 2023, 19:36
by krogg
psi29a wrote: 18 Apr 2023, 12:45 Normally the git revision is listed in the launcher. But you can see that by running `git rev-parse HEAD`
Not sure what is meant by launcher?
I didn't know about that command. That's new. Thank you. Reading more, I discovered that it won't work this time because the first thing I do after fetching from git is to remove the porky .git directory. :o
Interesting... have you tried one of our flatpak RCs or generic linux builds? Just to first validate if they crash too?
Flatpaks have never worked (at least for me) in LFS. I expect there are non-obvious hidden dependencies. I may have better luck with a "generic linux build," assuming it's some sort of already compiled binary?

Re: OpenMW 2023.04.10: crash on startup

Posted: 29 Apr 2023, 14:30
by AnyOldName3
The launcher should be a binary you build at the same time as OpenMW called openmw-launcher.

Re: OpenMW 2023.04.10: crash on startup

Posted: 02 May 2023, 03:05
by krogg
AnyOldName3 wrote: 29 Apr 2023, 14:30 The launcher should be a binary you build at the same time as OpenMW called openmw-launcher.
Woops, I thought you meant some kind of git launcher thingie. I stand corrected.

EDIT: rebooted to the LFS system and snapped a pic of the launcher, in the hopes that the information is there? Going to try to attach it now...

Re: OpenMW 2023.04.10: crash on startup

Posted: 02 May 2023, 03:21
by krogg
psi29a wrote: 18 Apr 2023, 12:45 Interesting... have you tried one of our flatpak RCs or generic linux builds? Just to first validate if they crash too?
I finally got around to trying a "generic linux build." I started from the Downloads page on the OpenMW site. https://openmw.org/downloads/ I found links for "Ubuntu" and "arch" binaries, but no generic. I'm sure they are right there but i just couldn't see them?

Re: OpenMW 2023.04.10: crash on startup

Posted: 02 May 2023, 11:09
by psi29a
Here is 0.48 RC9: viewtopic.php?p=73657#p73657

Just to verify that this works.

For your crashy set of files, can't you just clone the repo again and not nuke the .git directory? Then see what the last commit is?

Re: OpenMW 2023.04.10: crash on startup

Posted: 04 May 2023, 21:26
by krogg
psi29a wrote: 02 May 2023, 11:09 Here is 0.48 RC9: viewtopic.php?p=73657#p73657

Just to verify that this works.
That link worked. Thank you. Tried it on my LFS system and there is good and bad.

The good is that the program started and it played sound correctly.

The bad is that it crashed in the same way at the same point in the process.

If all goes well, a screen dump and the OpenMW crash log should be attached.

Re: OpenMW 2023.04.10: crash on startup

Posted: 05 May 2023, 08:21
by akortunov
franley wrote: 23 Apr 2023, 05:06 Here is the stack frame on catching the segv:
The ClipControl::apply is a quite simple function:

Code: Select all

void ClipControl::apply(State& state) const
{
    const GLExtensions* extensions = state.get<GLExtensions>();
    
    if (!extensions->isClipControlSupported) return;

    extensions->glClipControl((GLenum)_origin, (GLenum)_depthMode);
}
If I understood correctly, OpenSceneGraph fails to handle OpenGL extensions for some reason (either extensions or extensions->glClipControl is NULL or not initialized), what leads to crash.

Re: OpenMW 2023.04.10: crash on startup

Posted: 05 May 2023, 08:56
by akortunov
franley wrote: 22 Apr 2023, 22:28 Summary: OSG 3.6.1 fine, OSG 3.6.5 crashes
In newer 3.6.x versions OSG devs changed a way to resolve pointers to OpenGL extensions on Linux (here is a related topic). Initally they used a glXGetProcAddressARB, later they moved to dlsym. Is there a way to build OSG with this commit reverted and see if there will be a difference?

Re: OpenMW 2023.04.10: crash on startup

Posted: 05 May 2023, 16:19
by AnyOldName3
So firstly, I'll point out that this isn't the same crash as in the original post as that didn't have anything to do with clip control. We've see the clip control crash before, though. One potential trigger (which I think I incorrectly dismissed before) would be if the driver reported having clip control (either by saying it supported an OpenGL version with mandatory clip control or it supported the extension to add it to older versions), but then didn't actually have the function, so we ended up with a null pointer.