Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special character issue on windows #10

Open
BirgerNi opened this issue Oct 15, 2018 · 5 comments
Open

Special character issue on windows #10

BirgerNi opened this issue Oct 15, 2018 · 5 comments
Assignees

Comments

@BirgerNi
Copy link

read_msg does not work for me on windows when there are special characters in the path. The same code works like expected on linux.

Have a look at the example below. In the path of the second mail there are special characters.

On Windows:

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
  file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"

lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#> 
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]

On Linux:

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
   file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE) 
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"

lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#>
#> [[2]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
@hrbrmstr hrbrmstr self-assigned this Oct 15, 2018
hrbrmstr added a commit that referenced this issue Oct 15, 2018
@hrbrmstr
Copy link
Owner

I just added a call to normalizePath() before the file read ops. I'm AFK tday but will try to reproduce on a Windows VM this week ASAP.

@BirgerNi
Copy link
Author

normalizePath() does not seem to help. Please let me know if I can provide further information.

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
  file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"
(mails2 <- normalizePath(path.expand(mails)))
#> [1] "M:\\msgxtractr\\test\\Copenhagen.msg"
#> [2] "M:\\msgxtractr\\test\\København.msg"

lapply(mails2, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#> 
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]

Created on 2018-10-15 by the reprex package (v0.2.1)

@hrbrmstr
Copy link
Owner

hrbrmstr commented Oct 15, 2018 via email

@hrbrmstr
Copy link
Owner

Try doing:

original_ctype <- Sys.getlocale(category = "LC_CTYPE")
Sys.setlocale("LC_CTYPE", "UTF-8")

before the calls to read_msg()

then

Sys.setlocale("LC_CTYPE", original_ctype)

afterwards.

@BirgerNi
Copy link
Author

I guess your suggestion goes in the right direction, this seems to be an encoding issue.

> Sys.setlocale("LC_CTYPE", "UTF-8")
#> [1] ""
#> Warning message:
#> In Sys.setlocale("LC_CTYPE", "UTF-8") :
#> OS reports request to set locale to "UTF-8" cannot be honored

Unfortunately, I cannot set encoding to UTF-8. According to this topic at so windows still don't support UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants