Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log::Dispatch:Screen: fix encoding test #68

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mauke
Copy link

@mauke mauke commented Apr 6, 2023

The STD{IN,OUT,ERR} default handles start out in some unspecified (platform-specific) text encoding. The 'utf8' option of Log::Dispatch::Screen manually encodes the logged messages as bytes in UTF-8 format before writing them to the output handle. Thus, by default, the UTF-8 bytes get re-encoded by whatever the native text encoding is on the platform.

The correct way to handle this is to either not set 'utf8' (and rely on the encoding layer of the handle) or set 'utf8' and call binmode() on the handle (or otherwise apply a ':raw' layer) to ensure that bytes are written as is (without further re-encoding).

Fixes #32.

The STD{IN,OUT,ERR} default handles start out in some unspecified
(platform-specific) text encoding. The 'utf8' option of
Log::Dispatch::Screen manually encodes the logged messages as bytes in
UTF-8 format before writing them to the output handle. Thus, by default,
the UTF-8 bytes get re-encoded by whatever the native text encoding is
on the platform.

The correct way to handle this is to either not set 'utf8' (and rely on
the encoding layer of the handle) or set 'utf8' and call binmode() on
the handle (or otherwise apply a ':raw' layer) to ensure that bytes are
written as is (without further re-encoding).

Fixes houseabsolute#32.
@autarch
Copy link
Member

autarch commented Apr 7, 2023

Hi @mauke,

Thanks for this PR.

It's been a while since I thought about this, but looking at the discussion in #32, I'm not sure this is a good change. Here's a summary of relevant points from the discussion:

  • There's a lot of different env vars that can affect how the Perl interpreter works. In the case of this test, it seems like PERL_UNICODE is the most relevant one.
  • It's not realistic (IMO) to expect CPAN module authors to account for all of these env vars in their tests, where "account for" could mean resetting their effects as this patch does, or conversely trying to test with all possible variations of them.
  • The fact that this tests fails with certain PERL_UNICODE settings might be useful information for people. They could have a corresponding issue when using the Log::Dispatch::Screen module, where if PERL_UNICODE is set than passing utf8 => 1 will cause double encoding.

What do you think?

@mauke
Copy link
Author

mauke commented Apr 7, 2023

I have some objections.

  • My patch does not reset the effects of PERL_UNICODE.
  • PERL_UNICODE is not some weird corner case that completely changes how Perl works (and that authors have to compensate for); it is simply one way to set the default encoding of STD* streams.
  • Writing binary data to a text stream is a bug, even if it happens to "work" on some platforms some of the time.
  • The test fails because the code is wrong. People looking for examples of how to use Log::Dispatch::Screen with utf8 are better served by a working test, not a broken one, IMHO.

Quoting perldoc -f binmode:

Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files.

... which is all systems that use Unicode, like mine (with LANG=en_US.UTF-8)!

For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes.

In other words: regardless of platform, use "binmode" on binary data, like images, for example.

My use case is explicitly supported by how binmode is supposed to be used according to the documentation. The code in the test file violates that implicit contract by writing binary data to a text stream. Making it use binmode fixes things everywhere, regardless of the value of PERL_UNICODE or whether the platform uses (a superset of) ASCII or not.

(Also, just once I'd like to be able to say cpan Dist::Zilla and have it actually go through without errors.)

@Grinnz
Copy link

Grinnz commented Jan 6, 2025

I personally prefer to avoid PERL_UNICODE, but I think that it is correct that the test should ensure it is testing byte stream handles regardless of this - the failure does not seem useful to me, particularly to the end user of Log::Dispatch or modules using it.

@Grinnz
Copy link

Grinnz commented Jan 6, 2025

Additionally, both the reason I avoid PERL_UNICODE and the reason I am hesitant to recommend changing the runtime behavior of the module is that changing the handles one way or another is global behavior which requires every part of the program to make the same correct assumptions. "Fixing" the handles to be binary at runtime could arbitrarily break the next warning another part of the program emits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

08-screen.t fails with PERL_UNICODE and 5.22.1
3 participants