Skip to content
This repository has been archived by the owner on Mar 9, 2021. It is now read-only.

ensure asian characters are not broken #5

Open
karussell opened this issue Mar 28, 2012 · 1 comment
Open

ensure asian characters are not broken #5

karussell opened this issue Mar 28, 2012 · 1 comment
Assignees

Comments

@karussell
Copy link
Owner

This is now fixed! But needs a unit test!

From email:

The issue is in Converter.streamToString(). There's a loop to read http data chunks. Each chunk is converted separately to String, but may contain only the first (or seconf) half of a character, thus result in corrupted data. It happens sporadically depending on timing.

Also, the counting of bytesRead was wrong, so for slow connection there may be a "size exceeded" message with no justification.

What I did to test this problem is reading a Japanese article (url below) with the Browser, save its content somewhere (e.g. on file). Then run the streamToString() function in a loop (with some delay) and each time compare its output with the expected output on file. Sometimes I experienced dozens successful tests and then several failures, so this is not too persistent but the errors were often enough.

The article I tested on is http://astand.asahi.com/magazine/wrscience/2012022900015.html, and the corruption was almost always visible in the string "300" (see in the article), where instead of the "3" some junk was displayed.

@ghost ghost assigned karussell Mar 28, 2012
@karussell
Copy link
Owner Author

see 09c48a3

rborer referenced this issue in finity-ai/snacktory Aug 27, 2015
Add validation to catch IOErrors while downloading a page
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant