You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dictionary used is CC-CEDICT and whatever node-pinyin uses behind the scenes. I'm not sure exactly how many characters are covered, I'll have to investigate this later.
Strictly speaking, node-pinyin's data is in /tools/dict2.js. After cleanup, there are 24449 characters/phonetic pairs, which looks pretty much as the UNIHAN data, currently at 25500 entries.
node-pinyin's data format doesnt suit linguistic studies tho, as there can be several phonetic entries pairing with the same characters. Without prioritization (i.e. by freq), therefore fiting IME needs but not linguistic needs.
+Thanks for this project !
The text was updated successfully, but these errors were encountered: