Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GameTextCompiler RTL #1

Open
wants to merge 10 commits into
base: xezon-gametextcompiler
Choose a base branch
from

Conversation

DevGeniusCode
Copy link

@DevGeniusCode DevGeniusCode commented Dec 15, 2024

Purpose of the PR

This PR introduces support for RTL (Right-to-Left) languages in the handling of generals.str files. The new functionality eliminates the need to manually reverse text when working with these files.

Key Changes

  • Added a new flag: RTL_REVERSE.
    When enabled, this flag automatically reverses the text during file processing to accommodate RTL languages.

This enhancement simplifies workflows for projects involving languages like Arabic, ensuring proper text direction without additional manual steps.

You may want to review the commit c8ce01a and see if the changes in the commit really need to be made. I removed the const so I could change the text value.

Fix TheSuperHackers#1

src/game/client/gametextfile.cpp Outdated Show resolved Hide resolved
src/game/client/gametextfile.cpp Outdated Show resolved Hide resolved
src/game/common/system/unicodestring.h Outdated Show resolved Hide resolved
src/game/common/system/unicodestring.cpp Outdated Show resolved Hide resolved
@DevGeniusCode
Copy link
Author

DevGeniusCode commented Jan 27, 2025

  • Refactor Utf16String functions to use const references
  • Updated Utf16String::Reverse with improved logic for handling ignored and special characters.
  • Updated unicodestring.h to include necessary headers to improve functionality.

Reverse

  • Added detailed handling for untranslated strings, numeric sequences, and format specifiers. See doc comments for more information.
  • Other edge cases still need to be handled:
    • Mixed RTL and LTR word order.
    • Spaces and numbers.
    • Handling of - at the beginning of a word or sentence. ("- GLA has SCUD Storm")

@DevGeniusCode
Copy link
Author

DevGeniusCode commented Jan 28, 2025

I added support for mixed strings and refactored the function into smaller, more manageable functions for easier maintenance. However, handling all edge cases for untranslated strings and language-specific directionality manually leads to a lot of complexity.

Instead of manually addressing every single case, I propose using specific Unicode characters to mark RTL strings, such as LRM (Left-to-Right Mark), RLM (Right-to-Left Mark), and others. With these characters, the engine can automatically identify RTL segments and handle them accordingly, even in mixed or untranslated strings. And maybe we won't need to use the RTL_REVERSE flag either.

Example usage:

// context: Button to go to the game options screen
GUI:GameOptions
US: "GAME OPTIONS"
AR: "[U+200F]خيارات اللعبة" // No need for [U+200E] if RTL text is at the end.

// context: The title that will have a player's name and location added to it on the stats screen
GUI:PlayerStatistics
US: "%hs, from %ls"
AR: "%hs[U+200F], من [U+200E]%ls"

CONTROLBAR:UpgradeChinaOverlordBattleBunker
US: "Battle Bunker: &B"
AR: "[U+200F]مخبأ المعركة: [U+200E]&B"

// context: Refresh the information about NAT (Network Address Translation)
GUI:FirewallRefresh
US: "Refresh NAT"
AR: "[U+200F]تحديث [U+200E]NAT"

// context: Beacons can be placed and removed from the map
GUI:DeleteBeacon
US: "Delete\nBeacon"
AR: "[U+200F]حذف\nالمنارة[U+200E]" 

How it works:

  • The [U+200F] (RLM) character forces the text after it to be displayed from right to left, while the [U+200E] (LRM) character enforces left-to-right rendering.
  • This solution allows the engine to render RTL text correctly without needing manual intervention for each case.
  • Note that It's not enough to type the string "[U+200E]" as it is. need to copy and paste the actual character (U+200E) directly.

This makes the process much more scalable, especially for projects involving multiple languages, including mixed LTR and RTL segments. By using these characters, the reversal function becomes very simple, only needing to avoid reversing number sequences (unless explicitly marked).

while (start <= end_line && (buffer[start] != 0x200F)) {
    start++;
}
while (end <= end_line && ( buffer[start] != 0x200E)) { 
    end++;
}
Reverse(buffer, start, end);

For more information on Unicode characters for text directionality, you can refer to the following resource:


void Utf16String::Reverse()
{
// This function reverses the string for RTL languages while ensuring certain characters
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reverse is so specific to RTL languages then this function should not be part of this class.

I suggest create a new class for the RTL processing.

return to_reverse;
}

Utf16String reverse_line(unichar_t *buffer, size_t *start, size_t end)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is very complicated. I wonder if it can be simplified. Also, it is not clear to me if it is correct.

So for example this string:

AR: ": دراﻮﻟا لﺎﺳرﻹا\nقﺮﺸﻟا ﻦﻣ ﻢﮫﻤﺟﺎﮫﻨﺳ ﻦﺤﻧ\nﺔﯿﺑﻮﻨﺠﻟا ﻲﻜﻳﺮﻣﻷا ﻢﻋﺪﻟا ةﺪﻋﺎﻗ لزأ"

should simply reverse all sentences between \n

And

AR: "%d:%d : ﻲﻟﺎﺘﻟا لاﺰﻧﻺﻟ ﻲﻘﺒﺘﻤﻟا ﺖﻗﻮﻟا"

should reverse everything, except the %d:%d part in itself.

I think to program that, you can first locate the begin and end of the % format fields, then reverse the whole string, then reverse the local begin and end of the % format fields, and then revert the reversal on those % format fields.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid reversing strings such as

AR: "MISSION OBJECTIVE:\nTeam 2: Destroy the Power Plants"

or

AR: "*You've brought Colonel Burton to our little war, General. Even he will fall when faced with the horde"

I suggest to add a feature where we say to apply the reversal only if a certain set of characters is contained in the string.

https://en.wikipedia.org/wiki/Arabic_script_in_Unicode

So for example we could pass codes 0600–06FF and only if at least 1 character is part of these codes, the reversal will be applied.

Alternatively could also give it a range for characters to not reverse inplace, such as ascii with 0000-007F

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants