Unicode and regexs etc #168

gtoal · 2024-01-03T01:19:41Z

You mentioned "mpc Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!". I have a parser at https://github.com/gtoal/uparse which works entirely in Unicode - I modified a regex package to work in 32-bit code points rather than 8-bit ascii characters. I don't expect you'll be able to take anything from that parser directly but maybe reading through it you'll get some hints how to update your own code to handle Unicode. Basically it boils down to reading UTF8 on input and writing UTF8 on output, but doing all operations within the code using a 32-bit object rather than a char, which it turns out isn't really all that difficult once you get started.

Best regards,
Graham Toal

orangeduck · 2024-01-12T15:40:54Z

Thanks Graham I will take a look.

HalosGhost mentioned this issue Aug 13, 2024

utf #170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode and regexs etc #168

Unicode and regexs etc #168

gtoal commented Jan 3, 2024

orangeduck commented Jan 12, 2024

Unicode and regexs etc #168

Unicode and regexs etc #168

Comments

gtoal commented Jan 3, 2024

orangeduck commented Jan 12, 2024