Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should --json output always be utf-8? #1030

Open
dten opened this issue Jun 2, 2020 · 9 comments
Open

should --json output always be utf-8? #1030

dten opened this issue Jun 2, 2020 · 9 comments

Comments

@dten
Copy link
Contributor

dten commented Jun 2, 2020

Struggling to process the json output when the hints have special characters in.

£ is going in as utf8 0xA3 but coming out when run on windows as 0x9C, a dos encoding for £, and when run on linux M-BM-# some escape sequence, presumably for £

makes it very difficult to process the output

an aside on encoding:
invalid (i assume) hints also come out malformed

  - warn: {lhs: "£", rhs: "£", name: "bad £"}

results in this weird output which i looks encoding related

  Aeson exception:
Error in $: Error when decoding YAML, Failed to parse <string>:1:1: error: parse error on input `o', when parsing:
 o
Along path: root 8 warn lhs
When at: String
warn:
  lhs: ¶o
  name: bad ¶o
  rhs: ¶o
)
@googleson78
Copy link
Contributor

should ... output always be utf-8?
yes

@ndmitchell
Copy link
Owner

Everything should always be utf-8, to a first approximation. The problem is that some consoles don't support UTF8, and some consoles have a different encoding. We write String to stdout, and rely on GHC to match the encoding you asked for, which seems correct. What are your consoles set to?

If went to have JSON get this right, maybe we need to supply a file argument to json, so it can put it in a file, and write that file in the only sane encoding (UTF8).

@dten
Copy link
Contributor Author

dten commented Jun 3, 2020

I've seen it 3 places so far

I have a sublime text plugin (python) that calls hlint with Popen, sadly encoding isn't an argument until python 3.6 but sublime plugins are stuck on python 3.3

I have a terminal UI app written in rust that calls hlint as as std::process::Command that was being run in bash under Ubuntu in wsl access from Windows Terminal. I don't see that there's an encoding argument for Command

And I've just tried running it myself in cmd in Windows 10 1909 and > that to a file

@dten
Copy link
Contributor Author

dten commented Jun 3, 2020

I see the environment is supposed to tell ghc what to output, and in the bash terminal I have the following which I would expect made it be utf8

> echo $LANG
en_US.UTF-8

@ndmitchell
Copy link
Owner

Writing stuff to stdout by default in an encoding that stdout doesn't profess to support seems like a bad idea? Is it? Why doesn't GHC get this right? I see a few options:

  1. You figure out how to make GHC believe you that your terminal is UTF8. Can you try a simple example of doing putStrLn some_unicode_char in GHCi and see if that gets it right? E.g. is it a GHC bug, or an HLint bug on top.
  2. We add a flag to make HLint treat stdout as UTF8. Useful for when capturing stdout as its not a terminal. Or maybe if hIsTerminalDevice returns False we should be doing that already? Or maybe GHC should be doing that for us? hGetContents: invalid argument (invalid byte sequence) #96 and https://serokell.io/blog/haskell-with-utf8 seem to be useful information.
  3. We add a flag to write the JSON to a file, since if written to a file, UTF8 is obviously the right choice.

@dten
Copy link
Contributor Author

dten commented Jun 3, 2020

yup. it does seem like bad idea
thanks for the pointers.
i'm gonna make a simple debug program and see what the Handle comes back as when invoked by different methods

@dten
Copy link
Contributor Author

dten commented Jun 3, 2020

well i narrowed one of them down to being ncurses needing wide characters to display utf8 correct

that just leaves the default codepage being latin1 in windows which neither rust or python or very happy with

i tried testing when hIsTerminalDevice is false, it is False in the cases i'm using it, but it's also false if you just hlint | cat which yea obviously wouldn't want to change to utf8 if the terminal does't support it

@dten
Copy link
Contributor Author

dten commented Jun 3, 2020

maybe an interesting point here but i'm also piping in the file content and have to encode it utf8 or hlint says no.

so that means when using hlint - and piping in the content in a non utf-8 supporting console then hlint doesn't work right

for example in powershell when there is a lint against a unicode char

gc Main.hs | hlint -

results in different output than

hlint Main.hs

@ndmitchell
Copy link
Owner

Haskell code is always assumed to be in UTF8 when in a file - I guess I should probably treat it as console locale when reading from stdin. If anyone wants to shove up a patch, then I'd accept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants