ensure asian characters are not broken #5

karussell · 2012-03-28T08:40:44Z

This is now fixed! But needs a unit test!

From email:

The issue is in Converter.streamToString(). There's a loop to read http data chunks. Each chunk is converted separately to String, but may contain only the first (or seconf) half of a character, thus result in corrupted data. It happens sporadically depending on timing.

Also, the counting of bytesRead was wrong, so for slow connection there may be a "size exceeded" message with no justification.

What I did to test this problem is reading a Japanese article (url below) with the Browser, save its content somewhere (e.g. on file). Then run the streamToString() function in a loop (with some delay) and each time compare its output with the expected output on file. Sometimes I experienced dozens successful tests and then several failures, so this is not too persistent but the errors were often enough.

The article I tested on is http://astand.asahi.com/magazine/wrscience/2012022900015.html, and the corruption was almost always visible in the string "300" (see in the article), where instead of the "3" some junk was displayed.

karussell · 2012-03-28T08:42:09Z

see 09c48a3

Add validation to catch IOErrors while downloading a page

ghost assigned karussell Mar 28, 2012

rborer referenced this issue in finity-ai/snacktory Aug 27, 2015

Merge pull request #5 from skyshard/andres/fix_internal_server_errors

77823d1

Add validation to catch IOErrors while downloading a page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure asian characters are not broken #5

ensure asian characters are not broken #5

karussell commented Mar 28, 2012

karussell commented Mar 28, 2012

ensure asian characters are not broken #5

ensure asian characters are not broken #5

Comments

karussell commented Mar 28, 2012

karussell commented Mar 28, 2012