Improved File Format for omwaddon files

bmw · Post by **bmw** » 04 Apr 2020, 00:53

I don't know what the current plans for the future of the omwaddon file format are, but I had an idea for a new file format to make mod creation/interaction simpler.

Now, I'm not as familiar with how esps/omwaddons work as I'd like to be, but current esp/omwaddon files cause a number of problems when interacting with each other, which, to my understanding, are largely due to the fact that records overwrite existing records of the same name, so to modify a record, even if only modifying a single subrecord, you need to include a copy of the entire record, plus your modifications.

Why not create a new format that instead can modify individual subrecords? With such a format tools such as the Bashed Patcher, Mator Smash, TES3Merge, etc. wouldn't be necessary, as each mod would only make the changes they need to make, and mods would cleanly apply on top of each other (mods that modify the same subrecord would still conflict, but then it just comes down to load order).

I.e rather than just defining records, you have an AddRecord operation, a ModifyRecord operation, and a RemoveRecord operation. AddRecord would need to have all fields, but ModifyRecord could include as many or as few fields as the author wants to modify, and all other fields will remain unchanged vs any other plugins that defined/modified that record earlier in the load order.

It's worth noting that, while it would be great to have support for this in-engine, it could also be done outside the engine, with a tool that will take your plugins and merge them together into a vanilla-style plugin. Further, we could also have a tool to automatically convert old-style esps/omwaddons to this new format, in essence re-creating TES3Merge in two steps.

A big plus for this would be that we could use a purely text-based format, meaning that mods would be much easier to read, modify and store in version control (This isn't something inherent to the proposed idea, but we may as well do it if introducing a new plugin format). It would also be possible to include comments inside the plugin.

The first example I tried was the MRM Puzzle Canal fix, as its small, but it seems like CELL object references are already handled nicely, so it's not really worth reproducing here, given that it doesn't do anything that the current esp format doesn't already do.

The second example I tried better demonstrates the idea, and is from At Home Alchemy, which has been graciously provided under a free license by Syclonix.

Below is a mock-up of what such a file could look like, omitting most records for brevity's sake. I've used yaml as the markup language, as I find it handles this sort of thing in a clean and readable way, however it would be relatively easy to support different input formats such as xml or json.

Fortunately the output of "tes3cmd dump" is actually in a relatively yaml-like format already.

Code: Select all

Version: 1.3
Is_Master: false
Author: "Syclonix"
Description: "Version 1.0

Allows you to finally use alchemy apparatuses without having to
place them in your inventory. Just \"fire up\" the apparatuses you
want to use to bring up the alchemy menu (mortar and pestle must be
activated last).

Requires Tribunal"
Masters:
  - Morrowind.esm

Records:
  - Record:
      type: GLOB
      NAME: syc_AHA_Calcinator
      FNAM: short
      FLTV: 0.00

    # At Home Alchemy only adds SCRI subrecord, but
    # also reproduces the entirety of the original record
    # All we really need here is the id and the new SCRI
    # subrecord. all other fields could be determined using the
    # original record.
  - ModifyRecord:
      NAME: apparatus_a_calcinator_01
      SCRI: Script:syc_AHA_a_calcinator_01

    # Original: For reference.
    # Record: APPA "apparatus_a_calcinator_01" Flags:0x0000 ()
    #  NAME: ID:apparatus_a_calcinator_01
    #  MODL: Model:m\Apparatus_A_Calcinator_01.NIF
    #  FNAM: Name:Apprentice's Calcinator
    #  AADT: Type:(Calcinator)  Quality:0.50  Weight:25.00  Value:10
    #  ITEX: Icon:m\Tx_calcinator_01.tga

bmw · Post by **bmw** » 01 May 2020, 15:40

I've thrown together an implementation of this, which can be found here: https://gitlab.com/bmwinger/espmarkup.
An example of the output can be found here: https://gitlab.com/bmwinger/ncgd

I considered working from within the OpenMW tree to start with, but decided against it for now. A brief look through the OpenMW code indicated that this would require significant changes, and an external tool for translating files is going to be useful anyway. Rust also proved to be a much faster choice for implementation, particularly since as far as I know there isn't a c++ serializing framework that's quite as easy to work with as the serde framework (there is cpp-serde-gen, but it's not quite the same).

I've managed to get it as far as the point where it can create merged plugins using the results, though you should note that one of the ways I saved time in implementing this is ignoring records other than TES3 (header), SCPT, SSCR, SPEL, SKIL, and GMST (i.e. the types used by the ncgd example). Supporting other record types won't be too difficult, but I don't want to start until I've finished playing around with how everything works (there's still at least one important abstraction missing). The merged plugin creation is brilliantly fast, taking about 1.5 seconds to merge 67 plugins (total size 207M due to two big ones), perhaps in part due to ignoring most of the records, but note that, being an extension of the idea of creating plugins that make minimal changes that I mentioned above, merging is a linear operation, so it is unlikely that the full implementation will be significantly slower.

xirsoi · Post by **xirsoi** » 19 May 2020, 22:30

This is a really cool project!
I think the benefits of a text based format (and thus all the benefits of version control) would be quite the boon to larger mod projects such as Tamriel Rebuilt.

Greendogo · Post by **Greendogo** » 19 May 2020, 23:32

I definitely support this. We tossed around doing a text-based format long ago, but it was never picked up. The biggest bonus here is being able to do version control adequately. I like the main focus of your proposal of course, but for me the text based aspect is the major win.

AnyOldName3 · Post by **AnyOldName3** » 19 May 2020, 23:37

A text-based representation would be good for diffing purposes, but it might be good to get people like ElminsterAU involved in that discussion as I'm sure it would be useful for the later games, too.

An ESP is already kind of a diff, though. It's reasonably feasible to pass them around like patch files during content development in a team and then merge them into each other or an ESM as you would a pull request. It's a lot more manual than something like Git, unfortunately.

EvilEye · Post by **EvilEye** » 20 May 2020, 15:53

Almost every time I do a bit of modding I think it'd be nice to be able to use git to create the ESP. And in the context of TR and PT, I'd like claims to be mergeable using PRs/MRs. Now I only did a bit of thinking and never implemented anything, but these were the steps/features I had in mind.

Create a utility to turn an ESP into a directory structure containing yaml files
Group record types into directories
Group dialogue topics into a directory per topic
Create a lock file (in the spirit of npm and yarn) to allow reproduction of the original ESP (byte for byte)
Create a utility to turn a directory structure into an ESP (with or without a lock file)

Chris · Post by **Chris** » 21 May 2020, 15:40

A text-based format sounds like a really bad idea. The main benefit of being text-based, human-readability, quickly goes out the window with non-small files. Similarly if you try to enforce rules for data structuring, it's easy to create a monster that is not pleasant to create, edit/diff, or use. On top of that, the resulting file size will be significantly larger, taking more time to load, and be all-around more problematic to load (many more opportunities for invalid input that needs to be checked). Additionally, you have to be more careful of formatting; DOS or Unix line-endings? Errant formatting or escape characters? Code page encoding (ascii-7, extended latin, russian, utf-8, utf-16 BE/LE, ...)? Especially on Windows, it's easy for things to unknowingly get changed on you.

Any potential benefit is lost by the time a project gets to medium size, while also creating more baggage and additional problems.

heilkitty · Post by **heilkitty** » 21 May 2020, 18:38

I think, the idea is to have a text format additionally to the binary one, for mod development purposes.

bmw · Post by **bmw** » 29 May 2020, 03:13

EvilEye wrote: ↑20 May 2020, 15:53 Almost every time I do a bit of modding I think it'd be nice to be able to use git to create the ESP. And in the context of TR and PT, I'd like claims to be mergeable using PRs/MRs. Now I only did a bit of thinking and never implemented anything, but these were the steps/features I had in mind.

Create a utility to turn an ESP into a directory structure containing yaml files
Group record types into directories
Group dialogue topics into a directory per topic
Create a lock file (in the spirit of npm and yarn) to allow reproduction of the original ESP (byte for byte)
Create a utility to turn a directory structure into an ESP (with or without a lock file)

That's also more or less what I had in mind, unless I'm misunderstanding anything.
The output of the tool should be able to be deterministic by default, except for content from master files when converting to ESPs (though, as mentioned before, it would be ideal to move away from requiring copying master information into plugins). There's also a sort of lock in esps already in the form of the masters list, but it could certainly be improved (size is a poor hash function).
I'd also rather not focus too much on binary equivalent reproductions of esps, as there are things in the format such as filler data in fixed-length strings which would be really annoying to have to reproduce (I don't know how the original engine handles these, or even if there is much variation, but if the engine ignores everything after the first null byte then there could be arbitrary data stored in the rest of the subrecord which we would need to track), and not really worth it given that my thought would be to use the yaml files as the authoritative version of a mod, and the only reason for such exact reproductions would be for validating the translation tool.

I've already implemented splitting out scripts into their own files (the implementation could use improvement, but it's early days yet), and my thoughts were to have support for include directives so that you could have a main file which includes a bunch of other files, or directories full of files. I don't think grouping record types into directories should be a required thing, but it could certainly be an optional way to structure a project.

Chris wrote: ↑21 May 2020, 15:40 A text-based format sounds like a really bad idea. The main benefit of being text-based, human-readability, quickly goes out the window with non-small files. Similarly if you try to enforce rules for data structuring, it's easy to create a monster that is not pleasant to create, edit/diff, or use. On top of that, the resulting file size will be significantly larger, taking more time to load, and be all-around more problematic to load (many more opportunities for invalid input that needs to be checked). Additionally, you have to be more careful of formatting; DOS or Unix line-endings? Errant formatting or escape characters? Code page encoding (ascii-7, extended latin, russian, utf-8, utf-16 BE/LE, ...)? Especially on Windows, it's easy for things to unknowingly get changed on you.

Any potential benefit is lost by the time a project gets to medium size, while also creating more baggage and additional problems.

It would certainly be possible to also have a binary format that is equivalent to the text format which it could be transcoded into at release time. Cap'n Proto was something I'd briefly looked at, being a binary format with no parsing cost, but I thought I'd stick with focusing on text formats in my prototype tool (though any format supported by serde could actually be used with an extremely small code change).

As for problems such as line endings, formatting, encoding and large file sizes, those are the same problems that software development has always had to deal with, and the solutions are no different than any other situation. Large file sizes can be mitigated by breaking them up into smaller, meaningfully structured files, and using include directives of some sort to link the files together. Inline comments also help significantly in improving the readability of text files. Encoding should be standardized (the yaml spec calls for utf-8, utf-16 or utf-32, for example). Most markup parsers will ignore errant formatting, and DOS line endings are just Unix line endings with extra trailing whitespace on each line.

Admittedly there may be issues when whitespace denotes scope, which makes yaml a potentially problematic language to use as it would be easy, particularly for the inexperienced, to accidentally break files with incorrect formatting. That being said, it does also support braces to denote scope, the use of which could be encouraged to avoid such problems. I don't think yaml is the perfect language for the job, but I've yet to find a suitable replacement.

AnyOldName3 wrote: ↑19 May 2020, 23:37 An ESP is already kind of a diff, though.

The trouble is that the control you have with such diffs for the current format is very coarse, as it only allows changing entire records as a whole, which leads to a bunch of issues. The sort of format I'm proposing would allow diffs at the smallest level possible (since for most fields, it's usually not meaningful to change just part of a field at a time. Scripts and books on the other hand...). Also text diffs of those diffs, but I think the only advantage there would be the ability to use existing diff tools to handle them.

AnyOldName3 wrote: ↑19 May 2020, 23:37 it might be good to get people like ElminsterAU involved in that discussion as I'm sure it would be useful for the later games, too.

Yes, feedback from people developing tools for the later games would be useful, though I don't really know what the best way to go about doing that would be.

Greendogo wrote: ↑19 May 2020, 23:32 I definitely support this. We tossed around doing a text-based format long ago, but it was never picked up. The biggest bonus here is being able to do version control adequately. I like the main focus of your proposal of course, but for me the text based aspect is the major win.

The text-based aspect is a hugely significant to me too, particularly since getting mods into version control and hosted on places like GitLab and GitHub, or wherever, could also help produce a much more open modding ecosystem than we currently have. I was pushing the improved way of modifying masters in my original post mostly because I thought that was the more significant part of the suggestion (and you saying that you'd tossed around the idea of text-based plugins before confirms my suspicion that this isn't an entirely novel idea).

I don't suppose anyone knows of any other fundamental flaws with the current system that this might not address? If we're going to create a new system, it would be best to spend the time to do it right, rather than figuring out later that it's also flawed.

One further thought that has occurred to me is that including ways of linking field contents together might be useful. That is, you could have a value for an object introduced in a plugin file be dependent on a value of a different object which was introduced in its master file. That way the field could be meaningfully updated when the files are processed, even if the master value is different than it was when the plugin was written.
Instead of constants, fields could be expressions, and the "delta" records could make use of the expression in the field they are replacing as a variable in their expression. This might increase complexity significantly (and loading time), but even an extremely minimal expression language could be powerful (e.g. basic arithmetic operations, string concatenation and substitution). It might be more trouble than it's worth, but it's at least worth consideration.

One example I could think of where this might be useful is relative coordinates. Instead of all objects having an absolute position (not that I'm entirely sure how this works at the moment; I haven't gotten there yet in my record implementations), you could have some objects be placed relative to another object. It certainly wouldn't be perfect, as there are still many ways that moving an object could result in invalid positions for the related objects, but objects on top of furniture comes to mind as an instance where it would almost always work properly (then again this would also require some sort of reference frame to handle rotation).
Another is typos. A string substitution operation could easily fix typos in the base Morrowind files no matter what other changes are made to the text in the record prior to the substitution being applied (and fix any replicated typos in other modifications made to the text).

Ferk · Post by **Ferk** » 29 May 2020, 12:34

Chris wrote: ↑21 May 2020, 15:40The main benefit of being text-based, human-readability

Imho, that would be the least of the benefits.
Being able to track the history of the changes in the file through version control tools that rely on text-based content is already a big plus.
But what's also interesting is that you can use all kind of tools or scripts to process text files very easily without having to add features in the OMW CS editor for every possible form of "search-and-replace" / macros to cover every possible scenario for making changes in bulk or automate some of the editing.

openmw.org

Improved File Format for omwaddon files

Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files

Re: Improved File Format for omwaddon files