File Format

Everything about development and the OpenMW source code.
Post Reply
User avatar
Zini
Posts: 5538
Joined: 06 Aug 2011, 15:16

File Format

Post by Zini »

The current discussion about the open/new file dialogue made me think about the file format again. As I mentioned before I have put together some plans for post-1.0 development. After thinking about it a bit more, I have come to the conclusion, that we should not wait until 1.0. Simple reason: The ESX files created by the editor won't be compatible with MW anyway (because we are using a different VM for scripting). Effectively we are already creating a new file format, so giving it a new filename extensions now should result in a less confusing file type situation. And since we are already at it, we can as well make some minimal basic structural changes.


Here is my idea:

We create two new file types (with new icons, extensions and everything):

1. The base file. Proposed extension: .omwbase. Any stack of content files will contain exactly one of these files. A base file must not depend on anything else.

2. The extension file. Proposed extension: .omwext. Any stack of content files will contain zero or more of these files. An extension file must depend on an extension file or a base file.

OpenMW and the editor would still be able to read legacy file formats, but the editor can only write the new file format. When loading the old file formats these would mapped to the new ones by the following scheme:

- If the file is an ESP, it is seen as an omwext file (all plugins).
- If the file is an ESM, that does depend on another file, it is seen as an omwext file (Tribunal.esm and Bloodmoon.esm)
- If the file is an ESM, that does not depend on another file, it is seen as an omwbase file (Morrowind.esm).

The actual content of the omwbase/ext files would be identical to the ESX files but for one difference:

Instead of the old header record (TES3), we have a new type of header record.

For reference here is the old record:

Code: Select all

        TES3 = 1 count
	Main Header Record, 308 Bytes

	HEDR (300 bytes)
		4 bytes, float Version (1.2)
		4 bytes, long Unknown (1)
		32 Bytes, Company Name string
		256 Bytes, ESM file description?
		4 bytes, long NumRecords (48227)
	MAST = string, variable length
		Only found in ESP plugins and specifies a master file that the plugin
		requires.  Can occur multiple times.  Usually found just after the TES3
		record.
	DATA = 8 Bytes	long64 MasterSize
		Size of the previous master file in bytes (used for version tracking of plugin).
		The MAST and DATA records are always found together, the DATA following the MAST record
		that it refers to.
Some of the comments about MAST and DATA are not entirely correct, I think, but that is not important.

Here is the proposed replacement header:

Code: Select all

OMWH
    COMP
        int, 4 bytes: Engine compatibility version; value: 1 (replaces first entry in TES3.HEDR)
    AUTH (optional)
        string, variable length: Author name (replaces 3. entry in TES3.HEDR)
    DESC (optional)
        string, variable length: Content description (replaces 4. entry in TES3.HEDR)
    DEPE (optional, multiple instances, not allowed in omwbase files)
        FILE
            string, variable length: name of the content file this file is based on (replaces TES3.MAST)
        SIZE (optional)
            int, 4 bytes: size of the file specified in FILE (replaces TES3.DATA)
I dropped entries #3 and #5 from TES3.HEDR, because we don't know what #3 is anyway and #5 does not fulfil any purpose.

Regarding COMP: The switch from float to int should be obvious. float is a horrible choice for specifying a version number. Note that we do not use OpenMW version numbers here. Instead we are using a flat integer file format version number. This has the advantage that we can vary these independently. Obviously we will have to increment this number each time we release a new version of OpenMW, that provides new features with new data structures. But there may also be releases, that do not make any changes to the world model. In this case we can keep the COMP value.
I guess I don't need to mention that newer releases of OpenMW/CS should always be able to read file formats with lover COMP values.

Regarding DEPE:

This is a bit different from the original TES3 record. I reorganised it, because I am wary of the DATA/SIZE sub-record. It does not look to me like a reasonable method to track dependency version changes. I would like to keep the option to add other methods in the future. The new format should allow for easy additions of this kind.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: File Format

Post by Chris »

Should we even stick with an ESM/P-like format? Particularly one as unsafe as Morrowind's?

The file format is heavily linked with the way the data is treated and handled internally. IMO, that should be stabilized before we start worrying about how we'll store it externally.

Though if you want my thoughts, I actually think we should go for something like xml with some kind of compression applied. XML works good for defining hierarchical structures and stuff, and being text it compresses really well. Plus, there's libs that can help parse xml into more manageable data structures.
User avatar
Zini
Posts: 5538
Joined: 06 Aug 2011, 15:16

Re: File Format

Post by Zini »

Should we even stick with an ESM/P-like format?
Yes? I see absolutely no reason at all to switch to a different format. The ESX format is very extensible and serves our purposes well enough. A switch to a different format would require a huge amount of work for very little gain.
Particularly one as unsafe as Morrowind's?
I have no idea what you mean with that.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: File Format

Post by Chris »

Zini wrote:Yes? I see absolutely no reason at all to switch to a different format. The ESX format is very extensible and serves our purposes well enough. A switch to a different format would require a huge amount of work for very little gain.
The format strikes me as pretty archaic. There's little-to-no defined hierarchy for the data structure (e.g. is this 'ARBA' record part of the previous 'FOOB', or is it defining its own object?). You're given flat data, and are expected to build structure out of it based on the kind of object you're defining, which you don't know what kind of object you're defining until you've gotten the records. The existence and ordering of the records makes it even more questionable.

EDIT
For example, look at the NPC_ and CREA records, and the way AI packages are handled. You just get a continuous array of AI sub-records until you get one you don't expect. That's pretty bad design, IMO.

Or worse, the CELL record. The sub-records to expect changes based on the value of previous sub-records. There's no way to simply load it and use it, as you have to interpret it as you load it. It also has that 'loop until you find something odd' problem, which makes any kind of error recovery difficult (what if a sub-record is missing? what if you get an unexpected sub-record embedded in the object reference list of sub-records? what if the sub-records are out of order?).

Not to mention how records and sub-records are all based on a 4-character identifier, which causes it be pretty restrictive, and to use some pretty nonsensical IDs.
/EDIT

With something like XML, you know when a piece of data is part of a bigger structure. You can clearly define lists of things, and tell where one thing ends and another begins. You can have all the data and property values loaded in a single shot, and simply access the parts you need based on run-time interpretation, without having to worry about what you're building while you're loading it from disk. I imagine the various XML helper libs would also give us an easy way to update the data store, then easily write out the changes (for save files).

It actually appears we're already using tinyxml, which is embedded into OICS...
I have no idea what you mean with that.
I suppose it's not a problem with the ESM format itself (it was fixed in later games), but it feels really bad with the way Morrowind had object records defined by a global string. Makes it too easy to inadvertently cause incompatibilities, or to make questionable changes, and puts unnecessary requirements on content developers to find some unused prefix for their records. It also kinda makes the list of Masters somewhat useless since you can just overwrite a record anyway whether the original file it's from is listed or not. It also makes it slower to lookup, since you have to do a case-insensitive string compare (even a 64-bit integer compare would be better than a case-insensitive string compare).
User avatar
Greendogo
Posts: 1467
Joined: 26 Aug 2011, 02:04

Re: File Format

Post by Greendogo »

I'd just like to say that I'm unsure how useful it would be to have the "omwbase" type file. Why is this new two-file system a benefit over the old two-file system we have now of ESM and ESP?

I don't know why there is a distinction between a plugin and a master (or base and extension) to begin with. Sure, there is going to be the distinction of whether a file has dependencies on other files or not (the detail you use to distinguish between omwbase and omwext), but I think we should go the opposite route and consolidate the number of file types from 3 (ESP/ESM/ESS) down to 1 (OMW).

You could just make the file extension ".omw" and allow the modder and the user the discretion to precede the extension by any distinguishing characteristics, like this: Tamriel_Rebuilt_Map3.m.omw (for a master, or base), NOM.p.omw (for a plugin), or they could choose not to as well. The "Save Game" feature could automatically add the word ".save." before the extension to signify to the user that it is a save file (ex. "Auto Save.save.omw", which would appear as "Auto Save" in the in-game menu). This would give the player the option to treat a save file in the editor as though it were just any other plugin, because all of the data would be accessible in the same format.

I think the only reason there is a distinction is because Bethesda didn't trust the user to correctly order their master plugin files and because they don't want the modders to alter and break their ESM files [edited for clarity]. The whole point of ESMs is dubious, and therefore so would be the omwbase file you propose. I doubt it matters now that the modding community is more sophisticated. One file type should be sufficient and give the modders a little more flexibility and provide a certain amount of simplification to the whole process.
Last edited by Greendogo on 26 Jan 2013, 10:54, edited 2 times in total.
User avatar
sirherrbatka
Posts: 2159
Joined: 07 Aug 2011, 17:21

Re: File Format

Post by sirherrbatka »

I think the only reason there is a distinction is because Bethesda treats the users like idiots who can't correctly order their plugin files and because they don't want the modders to screw up their ESM files.
i think that this discussion shoould remain technical. I don't feel like participating when i have no idea what i'm talking about.
User avatar
Greendogo
Posts: 1467
Joined: 26 Aug 2011, 02:04

Re: File Format

Post by Greendogo »

I'm sorry Herrbatka, I was attempting to give a little background as to why I believed the Vanilla engine's format was divided into multiple file types and why that is a decision we don't necessarily need to repeat for OpenMW if it would benefit the User. Zini's ideas for two new file types is what he began this discussion about, and it is what I was responding to.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: File Format

Post by Chris »

Greendogo wrote:I think the only reason there is a distinction is because Bethesda didn't trust the user to correctly order their master plugin files and because they don't want the modders to alter and break their ESM files [edited for clarity].
I'm not really sure it's for that reason. I think it's done that way based on the development philosophy.

When they build the game, they have one master file. Morrowind.esm, in this case. As they developed the game, they make patches, esp files, with the CS. When those patches are ready, they're merged into the master. That would best explain why the CS only creates patch files (the ESM is handled by merging in patches), why the ESMs load first (patches come after the masters they patch), and why load order was based on timestamp (newest work gets priority).
User avatar
Greendogo
Posts: 1467
Joined: 26 Aug 2011, 02:04

Re: File Format

Post by Greendogo »

Sure, that makes sense. But then, if that was their design philosophy, why did they restrict access to the ability to create ESMs or to modify them as the active file in the CS? Someone make a note that this ability should be more accessible in the OMW Editor. In vanilla MW it should never have required a third-party program.

But I will cede the point because I just tested the assertion that ESPs cannot be made dependent on other ESPs, which turns out to be quite true. To me this means there is a larger distinction between ESPs and ESMs than I had originally thought, and so the distinction between a master file type and its plugins makes a lot more sense to me.
Chris
Posts: 1626
Joined: 04 Sep 2011, 08:33

Re: File Format

Post by Chris »

Greendogo wrote:But then, if that was their design philosophy, why did they restrict access to the ability to create ESMs or to modify them as the active file in the CS?
Because they wouldn't create or modify ESMs with the CS. They can't restrict an ability that was never there. They'd create patches (ESPs) with the CS, having it depend on whatever master(s) they were currently developing for, then use an external tool to merge ("commit") them to the ESM.
Post Reply