Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psql Client encoding doesn't allow Umlaute / other UTF8 special characters #76

Open
karlhorky opened this issue Jun 15, 2024 · 6 comments
Assignees

Comments

@karlhorky
Copy link
Member

Reporting on behalf of @Petzlc, text from @Petzlc:

What happened that was unexpected

Hyper somehow cannot process äöü so it always throws this Error message:

FEHLER:  Zeichen mit Byte-Folge 0x81 in Kodierung »WIN1252« hat keine Entsprechung in Kodierung »UTF8«

What were your expectations

That i can create my table with the values that contain äöü in the spelling.

Your guess of what the problem is

So on my journey of researching i ran into this message that also appears when i start psql and which seems to be a problem for other people too and seems to be the root of this problem:

WARNING: Console code page (850) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

(mine is in german though)

with the following advice:

You should know what encoding is used in your database.

SHOW server_encoding;

When you connect to your database you can specify what encoding should your client use:

SET client_encoding TO 'UTF8';

If server and client encoding differ, the database driver tries to translate between those two encoding. When it can not find an equivalent character, the error is thrown.

A list of things you have tried to solve it

So this seems to me like a pretty forward advice of what to do when someone runs into this error. I didn't try to apply it though and just changed the äöü to ae and oe and ue. Also i'm wondering if this affects é è and so on. I just changed everything to uncomplicated and common letters.

@ProchaLu
Copy link
Member

ProchaLu commented Jul 8, 2024

After some research, it seems that there are multiple options that we need to test on @Petzlc machine.

Solution 1

We need to find out what encoding the machine uses for the database by entering SHOW server_encoding;. We can follow this Stack Overflow post to change the encoding to UTF-8.

Solution 2

You can also manually change the locale system language to use UTF-8. By following this guide, we can ensure that Windows now uses UTF-8 for language support.

@karlhorky
Copy link
Member Author

karlhorky commented Jul 8, 2024

Ok nice, go ahead with more investigation - prefer choices that:

  1. Does not change the language in the whole OS to English - only UTF-8 support for psql
  2. Works cross-platform (Windows, macOS, Linux)

@ProchaLu
Copy link
Member

ProchaLu commented Jul 8, 2024

This is the output of @Petzlc when checking the encoding with SHOW client_encoding;.

 client_encoding
-----------------
 WIN1252
(1 Zeile)

and when checking the locale:

$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"   
LC_NUMERIC="de_DE.UTF-8" 
LC_TIME="de_DE.UTF-8"    
LC_COLLATE="de_DE.UTF-8" 
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_ALL=

Maybe the missing LC_ALL causes this issue?

Setting the locale to en-US.UTF-8 with setx LC_ALL en_US.UTF-8, this will also change the system's default and will affect the full system and not only PostgreSQL.


To change the encoding to UTF-8 we can use SET client_encoding TO 'UTF8'; but this only sets the encoding to UTF-8 for this one Database.

To ensure that we start creating a DB with UTF-8 we can use this PostgreSQL command.

CREATE DATABASE <name> ENCODING 'UTF8'

but this command will not set the default encoding to UTF-8. To set the default encoding for PostgreSQL to UTF-8, the only solution I found so far is setting the encoding UTF-8 in the .psqlrc file for Linux/macOS, and psqlrc.conf for Windows.

Maybe I can find a way to use the Control Functions from PostgreSQL.

@karlhorky
Copy link
Member Author

karlhorky commented Jul 8, 2024

Setting the locale to en-US.UTF-8 with setx LC_ALL en_US.UTF-8, this will also change the system's default and will affect the full system and not only PostgreSQL.

what exactly is affected here, from the "full system"? is it only the command line? if so, that may be acceptable

Maybe the missing LC_ALL causes this issue?

keep in mind that we already set LC_ALL on macOS:

  • echo "\nexport PGDATA=$PGDATA_TMP\nexport LC_ALL=en_US.UTF-8\nexport PSQL_PAGER=\"less --chop-long-lines --header 1\"" >> ~/`[[ $SHELL == *"zsh" ]] && echo '.zshrc' || echo '.bash_profile'`
  • 21a1cd8

if it makes sense to also do this for the other operating systems (first figure out what is affected by it in the "full system") then we can go ahead with such a PR

(and make sure that we do it all the same way in all of Windows, macOS, Linux)

cc @Eprince-hub in case this also affects your #73 issue

@Eprince-hub
Copy link
Member

Eprince-hub commented Jul 8, 2024

What LC_ALL changes

I don't think that the LC_ALL would modify the entire system or anything other than the command line. I even think psql only (not sure about this). My system language was English but my psql was German until I set.

Read some info about Locale here.
https://docs.oracle.com/cd/E19253-01/817-2521/overview-39/index.html

@karlhorky
Copy link
Member Author

I don't think that the LC_ALL would modify the entire system or anything other than the command line. I even think psql only (not sure about this). My system language was English but my psql was German until I set.

yeah check into this a bit more, we should have A) a list of things that we are relatively certain that LC_ALL affects and B) a list of what it does not affect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants