-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the homograph (heteronyms) #1
Comments
Is it possible to generate the different pinyin glyphs with different glyphs name first? E.g. making both zhang and chang in different file for 长(長). This can accelerate the building process as the glyphs will be available for swapping with minimal changes further. There is also contextual swapping i.e. swapping 1 glyph with another depending on the context, but it'll be as hard to implement as using ligatures. Also, ligatures require the building of double-word glyphs for each pair of words i.e. 行啊,行了,银行,行长 (also have 2 pronunciation: hang zhang, head of bank; hang chang, line lenght) etc which can dramatically increase file size. |
The implementation we are considering is as follows: We think "calt" is appropriate as a feature tag. ccmp, slat, and aalt also We believe it can be done, but it is not suitable. The reason is follows:
(5. Chinese is't ideographic scripts. I don't have to worry about following)
Implementation example Statement of Expectations Standard Pinyin
Appendix I found an example on the web that uses OpenType Ruby tags.
|
Chinese is an ideographic script. Some programs may not support Sadly, making a pinyin font to swap out heteronyms using OpenType would be a bit far-fetched as there exist cases where even the same double word pair produce different pinyin:
or
or
This may require listing exhaustively all the possible pairs of words in two, three, or even four word pairs which may require a longer time for software processing. This should actually be done in an external software and then copy paste into required place. Some basic processing still could be done using OpenType but you will have to limit how far the font can handle before requiring external intervention. Example range could be all the heteronyms in HSK, while heteronyms outside HSK will not be replaced and manual substitution is required. This is actually not related to this project as it promotes the use of ruby annotation instead of bopomofo in font file. Side notes: The best bet for the heteronyms in OpenType is to actually reference to the BPMF IVS as it uses bopomofo in font file itself and provides the ability to "remember" what pinyin is chosen by using newer technology of Ideographic Variation Selector (日本語:異体字セレクタ). It also provides the usage of Stylistic Set but the selection will be lost when copy and pasting to other softwares. This requires that the input text contains the correct IVS to display the pinyin correctly which is impossible for texts online. |
Oh... really.
I see... I should think about this.
Thank you so much for your help. |
It do looks like The first priority is to make the glyphs. There is also a limiting factor of 65535 glyphs in a OpenType font which may be an issue. A subset of SHS may be required to empty/release more glyph spaces for pinyin characters. |
👍 |
Is it possible to explain the text in English? I dont really understand how you did it XD What is the source of dictionary of homographs that the lookup used? It doesn't seem to support homographs for Traditional Chinese (e.g. 乾(gān)淨/乾(qián)坤). Also there's this:
They are simplified/traditional, but shouldn't have that much difference... right? The sources I can access give suī,suí only. |
It's okay. I will organize and translate.
I referred to the following dictionary. Traditional Chinese is not yet supported. Sorry.... I'm not very familiar so I have a question.
I referred to here. |
Not exactly, some homographs in Traditional Chinese was seperated in Simplified Chinese (eg. 乾 gān/qián -> 干gān净、乾qián坤) and some Traditional Chinese was combined into one homographs in Simplified Chinese (eg. 乾gān淨、幹gàn部、支干gàn -> 干gān净、干gàn部、支干gàn). 干 is a very suitable example of how Simplified Chinese messed with the pronunciation....
Well guess that'll work... |
I see... |
"行" is (Xíng), however when it is "银行" the pinyin is (YínHáng).
Since ligatures are not registered in this font, "银行" is displayed as (YínXíng).
The text was updated successfully, but these errors were encountered: