You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You mentioned "mpc Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!". I have a parser at https://github.com/gtoal/uparse which works entirely in Unicode - I modified a regex package to work in 32-bit code points rather than 8-bit ascii characters. I don't expect you'll be able to take anything from that parser directly but maybe reading through it you'll get some hints how to update your own code to handle Unicode. Basically it boils down to reading UTF8 on input and writing UTF8 on output, but doing all operations within the code using a 32-bit object rather than a char, which it turns out isn't really all that difficult once you get started.
Best regards,
Graham Toal
The text was updated successfully, but these errors were encountered:
You mentioned "mpc Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!". I have a parser at https://github.com/gtoal/uparse which works entirely in Unicode - I modified a regex package to work in 32-bit code points rather than 8-bit ascii characters. I don't expect you'll be able to take anything from that parser directly but maybe reading through it you'll get some hints how to update your own code to handle Unicode. Basically it boils down to reading UTF8 on input and writing UTF8 on output, but doing all operations within the code using a 32-bit object rather than a char, which it turns out isn't really all that difficult once you get started.
Best regards,
Graham Toal
The text was updated successfully, but these errors were encountered: