A localization problem (warning! huge tl;dr and bad english)

Everything about development and the OpenMW source code.
Post Reply
lazydev
Posts: 68
Joined: 16 Dec 2012, 14:03

A localization problem (warning! huge tl;dr and bad english)

Post by lazydev »

Hello everyone, and please excuse me for my poor english.


There is a problem with Russian localization - it shows door's destionation and, sometimes, current area is English (see screenshot1, screenshot2).

That happens because in Russian version some cell names (location cells as far as i can see) are not translated in the ESM files.

Instead, there are special *.cel files (one for each ESM - morrowind.cel, tribunal.cel and bloodmoon.cel), which contain the translations for all the untranslated cell names.

I dont know why they did so, but fact is fact, so OpenMW need to show that translations. I mean, it can keep the original titles for internal purpoises, but a user should see the translated onse.

A *.cel file has very primitive format: many rows, each row has an original title, the tab symbol and the translation. Here is the example:
Holamayan Холамаян
Sadrith Mora Садрит Мора
Wolverine Hall Волверин Холл
Nchurdamz Нчурдамц
Tel Aruhn Тель Арун
Tel Fyr Тель Фир
The files are encoded in windows1251 (for the Russian version).


It is very easy to implement that feachure, and i did it (for the door tooltips) on my local sources, but, because OpenMW sources are new for me, i need an advice how to harmoniously add my code.


I did the following:
1. Added some data and methods to MWWorld::ESMStore (because there is one cell translation file per esm file, i guess it is right place):

Code: Select all

namespace MWWorld
{
<...>
    class ESMStore
    {
        <...>
        std::map<std::string, std::string> mCellNamesTranslations;
        
        void loadCellNamesTranslation(const std::string& path, ToUTF8::FromType encoding);
        std::string translateCellName(const std::string& cellName) const;
        <...>
    }
<...>
}
Realization:

Code: Select all

void ESMStore::loadCellNamesTranslation(const std::string& path, ToUTF8::FromType encoding)
{
    std::ifstream stream(path);
    if (stream.is_open())
    {
        std::string line;
        while (!stream.eof())
        {
            std::getline( stream, line );
            if (!line.empty())
            {
                char* buffer = ToUTF8::getBuffer(line.size() + 1);
                //buffer has at least line.size() + 1 bytes, so it must be safe
                strcpy(buffer, line.c_str());
                line = ToUTF8::getUtf8(encoding);

                size_t tab_pos = line.find('\t');
                if (tab_pos != std::string::npos && tab_pos > 0 && tab_pos < line.size() - 1)
                {
                    std::string original = line.substr(0, tab_pos);
                    std::string translation = line.substr(tab_pos + 1);

                    if (!original.empty() && !translation.empty())
                      mCellNamesTranslations.insert(std::make_pair(original, translation));
                }
            }
        }

        stream.close();
    }
}

std::string ESMStore::translateCellName(const std::string& cellName) const
{
    std::map<std::string, std::string>::const_iterator entry = mCellNamesTranslations.find(cellName);
    if (entry == mCellNamesTranslations.end())
        return cellName;

    return entry->second;
}
Problem 1:
The *.cell file use \r\n line endings, but on my linux computer std::getline not treating \r as a part of line ending sequence.
Is there any good way to read lines correctly except to remove \r manually?

Problem2:
There are more surprizes from Bethesda dev/translation team: not only *.cel files, but also, for example, *.top files (see https://bugs.openmw.org/issues/253) and *.mrk files. All those files have similar structure, so
maybe it is good to write a parser class or function for them in components/ folder?


Here is the way I load the cell translation file:

Code: Select all

World::World (OEngine::Render::OgreRenderer& renderer,
    const Files::Collections& fileCollections,
    const std::string& master, const boost::filesystem::path& resDir, const boost::filesystem::path& cacheDir, bool newGame,
    const std::string& encoding, std::map<std::string,std::string> fallbackMap)
: mPlayer (0), mLocalScripts (mStore), mGlobalVariables (0),
  mSky (true), mCells (mStore, mEsm),
  mNumFacing(0)
{
    <...>

    std::cout << "Loading ESM " << masterPath.string() << "\n";

    // This parses the ESM file and loads a sample cell
    mEsm.setEncoding(encoding);
    mEsm.open (masterPath.string());
    mStore.load (mEsm);

    //Loading sell name translations from a file wich have the same name with the master file
    //(but with .cel extension and in lowercase).
    //I guess this section should be modified when multi-esm support will be done
    {
        std::string cellTranslationFileName = StringUtils::lowerCase(master);
        //changing the extension
        size_t dotPos = cellTranslationFileName.rfind('.');
        if (dotPos != std::string::npos)
          cellTranslationFileName.replace(dotPos, cellTranslationFileName.length() - dotPos, ".cel");

        std::string cellTranslationFilePath = fileCollections.getCollection(".cel").getPath(cellTranslationFileName).string();

        ToUTF8::FromType ft;
        if (encoding == "win1250")
            ft= ToUTF8::WINDOWS_1250;
        else if (encoding == "win1251")
            ft = ToUTF8::WINDOWS_1251;
        else
            ft = ToUTF8::WINDOWS_1252;

        mStore.loadCellNamesTranslation(cellTranslationFilePath, ft);
    }

    <...>
}
Problem 3:
To read the file correctly the loadCellNamesTranslation method need to know the file encoding (It is always win1251 for Russian files but, ofc, can differ in other versions). World::World() takes encoding as a string, but it need to be converted to ToUTF8::FromType. I have seen at least two places in the code where such conversions made - it is unnecessary to copy this code.
Maybe the conversion should be made once at the program start, and then the program should use ToUTF8::FromType everywhere?


Problem 4:
From my point of view, it is bad design to pass encoding into the loadCellNamesTranslation method. Decoding is not his task. So if we will solve problem 2 by creating a special parser, we can pass encoding to it, and pass the parser
to the the loadCellNamesTranslation.


Problem 5:
There are many places where (untranslated) cell names are shown to the user. Door tooltips, map point tooltips, location title above minimap.... At that moment I found only the place where door tooltips are shown.
Could you point me to other places in the code?
User avatar
psi29a
Posts: 5362
Joined: 29 Sep 2011, 10:13
Location: Belgium
Gitlab profile: https://gitlab.com/psi29a/
Contact:

Re: A localization problem (warning! huge tl;dr and bad engl

Post by psi29a »

Thank you for contributing this and reporting the issue!

If you would like to contribute code, then you can also submit the patches through github by forking openmw yourself.

Follow the contributor checklist.

Welcome aboard!
User avatar
Zini
Posts: 5538
Joined: 06 Aug 2011, 15:16

Re: A localization problem (warning! huge tl;dr and bad engl

Post by Zini »

Good that we finally have someone who is tackling this stuff. I need to think about it a bit more though (wasn't expecting it to be taken on at this point of the development). The places you have chosen are definitely not where the code needs to go and I do not have answers ready for most of the other points you brought up. I'll get back to you as soon as possible.
lazydev
Posts: 68
Joined: 16 Dec 2012, 14:03

Re: A localization problem (warning! huge tl;dr and bad engl

Post by lazydev »

BrotherBrick wrote:Thank you for contributing this and reporting the issue!

If you would like to contribute code, then you can also submit the patches through github by forking openmw yourself.

Follow the contributor checklist.

Welcome aboard!
Thanks.

But i have troubles with commiting to github :( Could you help me?
I made a fork (https://github.com/lazydev2/openmw.git), cloned it to my local pc

Code: Select all

git clone git://github.com/zinnschlag/openmw.git
 cd openmw
 git submodule update --init
Then I made code editions and commited

Code: Select all

git add -A
git commit
But when I am trying to push my commit to github

Code: Select all

git push origin master
I get an error:

Code: Select all

fatal: remote error: 
  You can't push to git://github.com/lazydev2/openmw.git
  Use [email protected]:lazydev2/openmw.git
What I did wrong?

P.S.
Btw, i solved problem 5 (found tooltips for map, minimap and map window title), so now all the titles are translated at my local copy :)
User avatar
Ace (SWE)
Posts: 887
Joined: 15 Aug 2011, 14:56

Re: A localization problem (warning! huge tl;dr and bad engl

Post by Ace (SWE) »

lazydev wrote: But when I am trying to push my commit to github

Code: Select all

git push origin master
I get an error:

Code: Select all

fatal: remote error: 
  You can't push to git://github.com/lazydev2/openmw.git
  Use [email protected]:lazydev2/openmw.git
What I did wrong?
You chose the wrong url when cloning it, easily fixed though. All you need to do is run;

Code: Select all

git set-url origin [email protected]:lazydev2/openmw.git
If you've added your public ssh key to your github account then that should work just fine.
User avatar
Zini
Posts: 5538
Joined: 06 Aug 2011, 15:16

Re: A localization problem (warning! huge tl;dr and bad engl

Post by Zini »

Okay, here we go:

I definitely don't want the code for this feature spread out all over the code base. Also, there is the possibility that we need the feature in the editor. That means the code for reading and managing those word replacements should go into a new component.

This component needs to be configured with an encoding. It also needs a function for adding files, that is called once for each esm. If the matching *.cel (or whatever) files exist, they should be read in and merged into the list of word replacements.

Code should definitely not go into ESM storage classes.

Please pay attention to the data directories. There can be more then one and those files can be found in any of them.


For performing the actual word replacement there are too options:

1. Query the replacement table from each spot where relevant user visible text is created (that what you are doing right now, apparently).

2. Hocking into MyGUI and do the replacement there. We are currently doing something similar for strings from game settings. That is an option we should at least consider. Scrawl wrote the game setting implementation. Maybe he has some comments on the viability of this option.


A special case that needs to be considered separately are dialogues, because here we may not deal with just replacing output text.
Maybe the conversion should be made once at the program start, and then the program should use ToUTF8::FromType everywhere?
Fine with me.
lazydev
Posts: 68
Joined: 16 Dec 2012, 14:03

Re: A localization problem (warning! huge tl;dr and bad engl

Post by lazydev »

Ace (SWE) wrote:You chose the wrong url when cloning it, easily fixed though. All you need to do is run;

Code: Select all

git set-url origin [email protected]:lazydev2/openmw.git
If you've added your public ssh key to your github account then that should work just fine.
Thanks, I did it (correct command is

Code: Select all

git remote set-url
).


Now the changes are in my fork at https://github.com/lazydev2/openmw.
I will make a pull request later, when the discussions about realization will be complete.
lazydev
Posts: 68
Joined: 16 Dec 2012, 14:03

Re: A localization problem (warning! huge tl;dr and bad engl

Post by lazydev »

Zini wrote:I definitely don't want the code for this feature spread out all over the code base. Also, there is the possibility that we need the feature in the editor. That means the code for reading and managing those word replacements should go into a new component.
Ok, I will make a special component for that. But where the translator object will live?
Should it be a member object of MWWorld::World?
Zini wrote:This component needs to be configured with an encoding. It also needs a function for adding files, that is called once for each esm. If the matching *.cel (or whatever) files exist, they should be read in and merged into the list of word replacements.
So you think that the component should get a ESM file name and seek the respective *.cel file, right?
Zini wrote: Please pay attention to the data directories. There can be more then one and those files can be found in any of them.
At the moment i do it like this:

Code: Select all

std::string cellTranslationFilePath = fileCollections.getCollection(".cel").getPath(cellTranslationFileName).string()
Is it enougth for that problem?
Zini wrote: For performing the actual word replacement there are too options:

1. Query the replacement table from each spot where relevant user visible text is created (that what you are doing right now, apparently).
You are right, i am doing that right now
Zini wrote: 2. Hocking into MyGUI and do the replacement there. We are currently doing something similar for strings from game settings. That is an option we should at least consider. Scrawl wrote the game setting implementation. Maybe he has some comments on the viability of this option.
But we dont need to translate the cell names everywhere in the GUI. For example, if we want to get a real cell name in console (for debugging purposes), the GUI must not replace it with translated name...
Zini wrote: A special case that needs to be considered separately are dialogues, because here we may not deal with just replacing output text.
Ammm... what the pronlem with dialogues? are they have to show cell names? I thought they have to show only the topic texts, which are fully handwritten and not using cell names... Or there is some script variables %currentCell or something?..
User avatar
Zini
Posts: 5538
Joined: 06 Aug 2011, 15:16

Re: A localization problem (warning! huge tl;dr and bad engl

Post by Zini »

I see. I wrote my posting above under the assumption that you would take care of all the localisation files, not just the .cel files. Can I convince you to do that? We have a shortage of developers with enough time and the right language background. And a unified implementation would be preferable anyway.
Ok, I will make a special component for that. But where the translator object will live?
Should it be a member object of MWWorld::World?
I kinda tend to MWGui::WindowManager, since this is more a matter of presenting text to the user than world state.
Is it enougth for that problem?
Looks correct.
But we dont need to translate the cell names everywhere in the GUI. For example, if we want to get a real cell name in console (for debugging purposes), the GUI must not replace it with translated name...
If done correctly, that should not happen. Let's wait for input from scrawl, before we make a decision. The development of the new component is independent of this part anyway, so you could start with that.
User avatar
scrawl
Posts: 2152
Joined: 18 Feb 2012, 11:51

Re: A localization problem (warning! huge tl;dr and bad engl

Post by scrawl »

Zini wrote: 2. Hocking into MyGUI and do the replacement there. We are currently doing something similar for strings from game settings. That is an option we should at least consider. Scrawl wrote the game setting implementation. Maybe he has some comments on the viability of this option.
Sure, that should work. We just need to distinguish between GMST strings and cell names then. At the moment, ${text} would retrieve GMST string "text". We could change it to ${GMST_text} and then add ${Cell_cellname} to retrieve a cell's name.
Post Reply