OpenMW crashes my system

Support for running, installing or compiling OpenMW
Locked
User avatar
lgromanowski
Site Admin
Posts: 1193
Joined: 05 Aug 2011, 22:21
Location: Wroclaw, Poland
Contact:

OpenMW crashes my system

Post by lgromanowski »

GWater wrote: The topic says it all. I've tried to debug the situation but there are some difficulties:

The program runs for 5-20 seconds then freezes. Then my workspace freezes. The my screen goes black/blank. Than sound goes into a short loop. Then sound stops.

These "symptoms" have an unfortunate side-effect: I cannot read any logs afterwards because they don't even reach the harddisk.

However the extreme results point to a problem at the very bottom of the graphics-stack; somewhere in kernel-space. Strangely though other OpenGL applications work fine.

Here's my setup:
  • most recent development version of openmw (6cdb0f1)
    OGRE-1.7.1 (self-compiled)
    Fedora 13 (x86_64)
    KDE 4.5 with kwin
    Xorg 1.8.2
    xorg-x11-drv-ati-6.13.0 (with KMS)
    kernel-2.6.33.6-147.2.4.fc13.x86_64

    hardware:
    3 GiB RAM
    3-core processor
    radeon hd 3650 graphics card
    [/list:u]
    Please note that I'm using the opensource graphics stack right now, not fglrx. I just noticed it's possible for me to switch to catalyst 10.7 but I really don't wanne risk my otherwise stable workspace. I assume most of you use the proprietary drivers but i think sooner or later openmw should also work with the vanilla opensource drivers.

    Does anyone have a simliar setup to test this issue? Has anyone had similar problems?
nicolay wrote: We are currently experiencing some sound-related crashes with the last version (or at least we think they're sound related), see http://openmw.org/forum/viewtopic.php?f=13&t=26. This seems be some deeper memory corruption bug, so it might very well be related to this.

If you don't mind, could you try changing sound backend to Audiere (instead of the default mpg123/libsndfile) and see if that makes any difference? You do this at the top of the CMakeLists.txt file (you might have to delete CMakeCache.txt for changes to take effect.)

If this helps then we know we are dealing with the same bug. If not then, well, it's either a new bug or both are caused by some third problem somewhere else.
Zini wrote: Actually, the right thing to do here is to edit CMakeCache.txt. No point in changing CMakeLists.txt.
GWater wrote: Hey, thanks for the fast answer. there are a few problems though:

I can't get Audiere to build. And with FFmpeg OpenMW itself fails to build.

Anyway I think this bug is different from your problem.

It's a stark step from a segfault to crashing the whole system. Also I think the problem was already there before you added any part of the sound-engine. I'll try an older commit (without sound) to support that claim.

I really think this issue is in the graphics department, not audio.
nicolay wrote: Ok. If you've been able to run OpenMW successfully before, then there must be SOME commit somewhere that doesn't crash. If your crashes are consistent (ie. they happen every time) then this shouldn't be too hard to find.

Git even has a binary search mechanism to find buggy commits: git bisect. Never tried it but it looks fun (depending on how you define "fun".) :)

EDIT: Now I've also added a --nosound switch which disables the entire sound system.
GWater wrote: OK, it's definitely not sound-related. I'll try to bisect the issue but it won't be fun. (I have to hard-reset my machine after every crash. Not good for the drives and filesystem consistency...)
GWater wrote: Here's the git bisect result:

Code: Select all

7e4f6559399b74263cfccc345eaa753a026870cf is the first bad commit
commit 7e4f6559399b74263cfccc345eaa753a026870cf
Author: Marc Zinnschlag <[email protected]>
Date:   Wed Jun 16 20:15:48 2010 +0200

    pull in all bsa files instead of only the bsa file matching the master

:040000 040000 b232d6b34948783be4d78052d46ce35b8377ca9d 850b07586a9ffce479bbe96ee3e9375a59881e75 M      game
However it is quite possible that I made a mistake bisecting: Since it take some time until the program crashes and I could wait an infinite amount of time for each revision, the mistake may well have been introduced before that commit.

Given the nature of the change it also looks as if basically the dataload was increased. Maybe the problem is somewhere else and only became apparent through the additional strain of 3 BSAs.
Zini wrote: Well, there is nothing in this commit, that could crash a whole system.

I read your first post again and suddenly noticed it to be vaguely familiar. I had pretty much the same symptoms at some point (but not while working with OpenMW). Turned out the additional load I was putting on my graphics hardware was overheating it and at some point it was simply giving up. Maybe you should start to monitor your system's temperature. Could be a problem with the ventilation or maybe a hardware component is flaky and gives in at lower temperature than usual.
Zini wrote: Also, do you have the VSync option enabled? If not, Ogre will force your GPU to run at maximum capacity. If VSync is not enabled, you could try to enable it and see if the instability is reduced (the file you need to change is ogre.cfg).
GWater wrote: Hey, I think you're onto something here:

Firstly: I just ran another even older revision, and after about 10 minutes the same problem occurred. git-bisect is really cool but not the solution in this case.

Secondly: The opensource drivers have only recently gotten thermal sensor support. Not yet in my kernel. The good old, "finger-on-card-thermal-test" revealed quite some heat.

Thirdly: I had VSync disabled the whole time. I'll check that out right away.

Anyway: I guess using fglrx will fix it, since that driver should know howto control temperature or at least shut down more gracefully.
GWater wrote: Unfortunately VSync is not the solution. I'll try the fglrx drivers next. but that may take some time.

Thanks anyway for the diagnosis. Good to know at least that it's not lingering somewhere in the OpenMW code.
Zini wrote: Please note, that VSync might not help immediately. At least on my system the GPU sometimes needed hours to cool down enough to loose the instability. If its already pretty hot the additional load might push it over the threshold even with VSync.
GWater wrote: OK, my system kept crashing so now I've switched to fglrx/catalyst again and it works fine.

I hope kernel 2.6.35 will fix the overheating problem: http://www.phoronix.com/scan.php?page=n ... &px=ODMxMQ
Locked