Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests) #25

cameron314 · 2015-03-11T16:36:47Z

I've changed the StringTable to use UTF-8 to decode strings, instead of the previous ISO-8859-1 decoding (via casting bytes to chars). While no string encoding is mentioned in the ELF specification, it seems that the de-facto standard, like for much else in the Linux-y world, is UTF-8 (e.g. this is what clang produces).

The StringTable is now also populated lazily, and reads the bytes (once) in a block instead of one at a time. This should improve performance.

Also, some of the unit tests were failing locally since they were dependent on the environment of the user running the tests -- one of them depended on the current timezone, and two others depended on Environment.NewLine being "\n". I fixed that too (all the tests pass for me now).

I would be most grateful if you could accept these patches into the master, and in particular update the NuGet.

Finally, thanks for writing this library! It's extremely useful :-)

… files, and thus string constants, have Unix line endings even in Windows test environment)

…CII) -- this matches the de-facto standard encoding of ELF files (that clang produces, for example)

konrad-kruczynski · 2015-03-11T21:02:26Z

It's a pleasure to receive so refined and well described patches! As for the encoding part, I should definitely go for the UTF-8 solution in the first place, good point. Other changes are also nice. The only problematic part is formatting (but frankly speaking, I can't tell whether it was crippled earlier or it is now - due to the problems with formatting in older mono versions). Anyway I'll fix it by myself since it actually means just running the formatter.

I'm not able to do the nuget package today, but should be available tomorrow.

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

konrad-kruczynski · 2015-03-11T21:09:51Z

Since I'd also like to add you to the list of contributors (which will be created in README), please just tell me how I should name you (i.e. name, nick, mail to use).

cameron314 · 2015-03-11T21:12:23Z

Ah, sorry about the formatting. I've set my VS to use tabs and hadn't noticed the mismatch until you pointed it out. Oops!

You can put my name as simply "Cameron", and email as "cameron [@] moodycamel.com". Thanks!

konrad-kruczynski · 2015-03-12T12:29:17Z

The nuget package should be available now.

cameron314 · 2015-03-12T13:35:26Z

Awesome, thanks!

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

cameron314 added 3 commits March 11, 2015 12:12

Fixed unit test dependency on user's current timezone

7468094

Fixed unit test dependency on Environment.NewLine being '\n' (sources…

d94e2b3

… files, and thus string constants, have Unix line endings even in Windows test environment)

Changed string decoding to use UTF-8 (from a mix of ISO-8859-1 and AS…

4626cd7

…CII) -- this matches the de-facto standard encoding of ELF files (that clang produces, for example)

konrad-kruczynski added a commit that referenced this pull request Mar 11, 2015

Merge pull request #25 from cameron314/master

3637bca

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

konrad-kruczynski merged commit 3637bca into konrad-kruczynski:master Mar 11, 2015

konrad-kruczynski added a commit that referenced this pull request Nov 6, 2015

Merge pull request #25 from cameron314/master

91622a7

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

konrad-kruczynski added a commit that referenced this pull request Nov 6, 2015

Merge pull request #25 from cameron314/master

1ae7119

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

konrad-kruczynski added a commit that referenced this pull request Nov 6, 2015

Merge pull request #25 from cameron314/master

ae5d4e2

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests) #25

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests) #25

cameron314 commented Mar 11, 2015

konrad-kruczynski commented Mar 11, 2015

konrad-kruczynski commented Mar 11, 2015

cameron314 commented Mar 11, 2015

konrad-kruczynski commented Mar 12, 2015

cameron314 commented Mar 12, 2015

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests) #25

Use UTF-8 to decode strings everywhere (and fixed locale-dependent unit tests) #25

Conversation

cameron314 commented Mar 11, 2015

konrad-kruczynski commented Mar 11, 2015

konrad-kruczynski commented Mar 11, 2015

cameron314 commented Mar 11, 2015

konrad-kruczynski commented Mar 12, 2015

cameron314 commented Mar 12, 2015