Implement the homograph (heteronyms) #1

MaruTama · 2019-07-30T15:47:19Z

"行" is (Xíng), however when it is "银行" the pinyin is (YínHáng).
Since ligatures are not registered in this font, "银行" is displayed as (YínXíng).

NightFurySL2001 · 2020-09-01T14:30:10Z

Is it possible to generate the different pinyin glyphs with different glyphs name first? E.g. making both zhang and chang in different file for 长（長）. This can accelerate the building process as the glyphs will be available for swapping with minimal changes further. There is also contextual swapping i.e. swapping 1 glyph with another depending on the context, but it'll be as hard to implement as using ligatures.

Also, ligatures require the building of double-word glyphs for each pair of words i.e. 行啊，行了，银行，行长 (also have 2 pronunciation: hang zhang, head of bank; hang chang, line lenght) etc which can dramatically increase file size.

MaruTama · 2020-09-22T07:33:06Z

The implementation we are considering is as follows:
Predefine a standard pinyin for each character.
Store the polysyllabic pattern in lookup table.
If the pattern matches, replace it with another glyph.

We think "calt" is appropriate as a feature tag. ccmp, slat, and aalt also We believe it can be done, but it is not suitable.

The reason is follows:

It can use "Chaining contextual substitution (GSUB LookupType 6)".
The purpose is not a ligature, but a context-dependent character substitution.
Refer to Syntax for OpenType features in CSS

This feature, in specified situations, replaces default glyphs with alternate forms which provide better joining behavior. Like ligatures (though not strictly a ligature feature), contextual alternates are commonly used to harmonize the shapes of glyphs with the surrounding context.

Many environments are expected to support it.
(slat is not supported; aalt requires the user to select the replacement character.
Refer to calt

UI suggestion: This feature should be active by default.

Does not affect other GSUB feature

Feature interaction: This feature may be used in combination with other substitution (GSUB) features, whose results it may override.

(5. Chinese is't ideographic scripts. I don't have to worry about following)

Script/language sensitivity: Not applicable to ideographic scripts.

Implementation example

Statement of Expectations
行啊 => xíng a
★银行 => yín háng
★道行 => dào, héng
长城 => cháng chéng
★行长 => xíng zhǎng
(If you want the user to choose between "xíng zhǎng" or "háng cháng", I assume you can use aalt.)
☆了得 => liǎo de

Standard Pinyin
行 => xíng
长 => cháng
了 => le
得 => dé


# ★ Describes a substitution pattern for the different pinyin of "行".
lookup CNTXT_884C {
    substitute 银 行’ by 行.ha2ng;
    substitute 道 行’ by 行.he2ng;
} CNTXT_884C;
lookup CNTXT_957F {
    substitute 行 长' by 长.zha3ng;
} CNTXT_957F;

# ☆ Describes a substitution pattern for idiom.
lookup CNTXT_4E86_5F97 {
    substitute 了' 得 by 了.lia3o;
    substitute 了.lia3o 得’ by 得.de;
} CNTXT_4E86_5F97;

# Describe the context
feature calt {
    substitute 银' lookup CNTXT_884C 行’;
    substitute 道' lookup CNTXT_884C 行’;
    substitute 行' lookup CNTXT_957F 长’;
    substitute 了' lookup CNTXT_4E86_5F97 得’;
} calt;

Appendix

注音符號數位化顯示計畫

I found an example on the web that uses OpenType Ruby tags.
I think it's an interesting example, like the IVS @NightFurySL2001 mentioned it.

Recently there has been a repo that makes bopomofo with newer OpenType technology, BPMF IVS. It utilize the IVS (ideographic variant selector) in Unicode to change between different pinyin (eg providing 4 glyphs: zháo zhāo zhe zhuó for 着) and also put the variant glyphs in stylistic set (SS01-04). You can visit that repo and check how it works. BPMF IVS has a pinyin standard in bopomofo, so that all fonts generated with that program can be used interchangably without losing the tonal marks (if using IVD). Is it possible to recreate it in this program? (and maybe use the same pinyin standard, which will make conversion between bopomofo and hanyu pinyin a ton easier by just changing fonts)

NightFurySL2001 · 2020-09-22T11:24:52Z

(5. Chinese is't ideographic scripts. I don't have to worry about following)

Chinese is an ideographic script. Some programs may not support calt substitution of CJK characters on purpose.

Sadly, making a pinyin font to swap out heteronyms using OpenType would be a bit far-fetched as there exist cases where even the same double word pair produce different pinyin:

为 (wèi) 人作嫁 Marry for someone
为 (wéi) 人处事 Interaction with others as a human

or

为 (wèi) 何这样？ Why do you do that?
手机型号为 (wéi) 何？What is the phone model?

or

节目的 (de) 尾声 End of show
目的 (dì) 地 Destination

This may require listing exhaustively all the possible pairs of words in two, three, or even four word pairs which may require a longer time for software processing.

This should actually be done in an external software and then copy paste into required place. Some basic processing still could be done using OpenType but you will have to limit how far the font can handle before requiring external intervention. Example range could be all the heteronyms in HSK, while heteronyms outside HSK will not be replaced and manual substitution is required.

注音符號數位化顯示計畫

This is actually not related to this project as it promotes the use of ruby annotation instead of bopomofo in font file.

Side notes:

The best bet for the heteronyms in OpenType is to actually reference to the BPMF IVS as it uses bopomofo in font file itself and provides the ability to "remember" what pinyin is chosen by using newer technology of Ideographic Variation Selector （日本語：異体字セレクタ）. It also provides the usage of Stylistic Set but the selection will be lost when copy and pasting to other softwares.

This requires that the input text contains the correct IVS to display the pinyin correctly which is impossible for texts online.

MaruTama · 2020-09-22T12:25:04Z

Oh... really.
Chinese can't seem to use calt...

Chinese is an ideographic script. Some programs may not support calt substitution of CJK characters on purpose.

I see... I should think about this.

Sadly, making a pinyin font to swap out heteronyms using OpenType would be a bit far-fetched as there exist cases where even the same double word pair produce different pinyin:

Thank you so much for your help.
I should implement using IVS.

NightFurySL2001 · 2020-09-22T13:01:34Z

It do looks like rclt could be used in place of calt as it may be used on all scripts, but I think we could try calt anyway to determine if CJK ideographs are really incompatible in software. Alternatively, ccmp can be used but it will require a little bit of modification to the OpenType specification. I will send an email to Dr. Ken Lunde, previous head of engineer of Source Han series to check about using calt with CJK ideographs.

The first priority is to make the glyphs. There is also a limiting factor of 65535 glyphs in a OpenType font which may be an issue. A subset of SHS may be required to empty/release more glyph spaces for pinyin characters.

MaruTama · 2020-10-25T11:01:45Z

It's an example of homograph support using calt tag and IVS.
I will continue to implement it.
However, there is a problem that calt becomes invalid when there are many feature tags.
I plan to replace calt tag to rclt.

NightFurySL2001 · 2020-12-19T10:34:36Z

👍

NightFurySL2001 · 2020-12-19T10:58:33Z

Is it possible to explain the text in English? I dont really understand how you did it XD
https://github.com/MaruTama/Mengshen-pinyin-font/blob/master/NOTE.md

What is the source of dictionary of homographs that the lookup used? It doesn't seem to support homographs for Traditional Chinese (e.g. 乾（gān）淨/乾（qián）坤）. Also there's this:

U+7D8F: suī,suí,shuāi,ruí,tuǒ #綏
U+7EE5: suí #绥

They are simplified/traditional, but shouldn't have that much difference... right? The sources I can access give suī,suí only.

MaruTama · 2020-12-19T11:28:18Z

Is it possible to explain the text in English? I dont really understand how you did it XD
https://github.com/MaruTama/Mengshen-pinyin-font/blob/master/NOTE.md

It's okay. I will organize and translate.

They are simplified/traditional, but shouldn't have that much difference... right?

I referred to the following dictionary. Traditional Chinese is not yet supported.
There was no Traditional Chinese in the dictionary here.

Sorry.... I'm not very familiar so I have a question.
Is Traditional Chinese the same homographs as Simplified Chinese?

The sources I can access give suī,suí only.

I referred to here.

NightFurySL2001 · 2020-12-19T11:35:21Z

Is(Do) Traditional Chinese (have) the same homographs as Simplified Chinese?

Not exactly, some homographs in Traditional Chinese was seperated in Simplified Chinese (eg. 乾 gān/qián -> 干gān净、乾qián坤) and some Traditional Chinese was combined into one homographs in Simplified Chinese (eg. 乾gān淨、幹gàn部、支干gàn -> 干gān净、干gàn部、支干gàn). 干 is a very suitable example of how Simplified Chinese messed with the pronunciation....

I referred to here.

Well guess that'll work...

MaruTama · 2020-12-19T11:49:57Z

干 is a very suitable example of how Simplified Chinese messed with the pronunciation....

I see...
干 is messing because different characters have been merged....

* Create .github/workflows/python-app.yml (#1) * Update python-app.yml (#2) * Update python-app.yml * Update python-app.yml * Update python-app.yml * Update python-app.yml * Upload build font from Github Actions

MaruTama mentioned this issue Aug 2, 2020

Request on English documentation and different pinyin in font #2

Closed

MaruTama changed the title ~~Implement the ligature~~ Implement the homograph Sep 22, 2020

MaruTama changed the title ~~Implement the homograph~~ Implement the homograph (heteronyms) Sep 22, 2020

MaruTama closed this as completed Dec 19, 2020

MaruTama reopened this Dec 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the homograph (heteronyms) #1

Implement the homograph (heteronyms) #1

MaruTama commented Jul 30, 2019

NightFurySL2001 commented Sep 1, 2020

MaruTama commented Sep 22, 2020 •

edited

Loading

NightFurySL2001 commented Sep 22, 2020

MaruTama commented Sep 22, 2020 •

edited

Loading

NightFurySL2001 commented Sep 22, 2020

MaruTama commented Oct 25, 2020

NightFurySL2001 commented Dec 19, 2020

NightFurySL2001 commented Dec 19, 2020

MaruTama commented Dec 19, 2020

NightFurySL2001 commented Dec 19, 2020

MaruTama commented Dec 19, 2020

Implement the homograph (heteronyms) #1

Implement the homograph (heteronyms) #1

Comments

MaruTama commented Jul 30, 2019

NightFurySL2001 commented Sep 1, 2020

MaruTama commented Sep 22, 2020 • edited Loading

NightFurySL2001 commented Sep 22, 2020

MaruTama commented Sep 22, 2020 • edited Loading

NightFurySL2001 commented Sep 22, 2020

MaruTama commented Oct 25, 2020

NightFurySL2001 commented Dec 19, 2020

NightFurySL2001 commented Dec 19, 2020

MaruTama commented Dec 19, 2020

NightFurySL2001 commented Dec 19, 2020

MaruTama commented Dec 19, 2020

MaruTama commented Sep 22, 2020 •

edited

Loading

MaruTama commented Sep 22, 2020 •

edited

Loading