-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recode produces invalid UTF-8 given invalid UTF-8 #37
Comments
Strangely though, the bug doesn't happen with
|
Hi, thanks for fixing this, sorry I can't test it yet because the current git version does not build for me on RHEL 8.5. |
The problem is most likely that help2man needs to be built with gettext support (this is documented!). Merely having the perl module installed is insufficient. |
I'm testing the current version (62b996d). I'm not sure it works yet:
In my view if recode is asked to produce UTF-8 output, it should always produce UTF-8 and never junk bytes -- and this is such a basic requirement that it shouldn't depend on any force or strict flags. I think that if the input is specified as UTF-8 then recode should check that, and die if the input is not valid UTF-8 -- but it can be useful to have a lax mode where junk bytes in the input are skipped as best you can. |
The reason that this does not happen currently is because of recode's conversion optimization. When you request a conversion If instead a conversion is forced, by e.g.
Then the input is validated and the problem is found (and so no invalid output is produced). Also, I notice that with |
Yes, I suspected it might be something like that. But surely the main reason why a user would run |
It's tricky. First, recode might be invoked with Internally, recode has no validation of input or output that is separate from a transformation, unfortunately. |
See #3. Recode versions 3.6, 3.7.9 and 3.7.11 all produce the same invalid output given invalid input:
Since the behaviour is clearly not new it will require some study to see why it behaves as it does (is it a long-standing bug? or deliberate? or a deep-seated design problem?).
The text was updated successfully, but these errors were encountered: