-
-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 in JSON response isn't handled #97
Comments
Hi @grv87, thanks for bringing this up, I've been worried about this issue as I hadn't set up good ways to test it just yet. Can you provide some more details on what was expected vs actual? Did it complete parsing the request/response without errors, but parse it incorrectly or were errors thrown when parsing? Is the API public or do you have an example that I can try locally? |
It completes parsing without errors, but incorrectly. Response.Data contains mojibake. I've put my problematic example here: https://yadi.sk/i/7_Fs3GTyf2HA7 (I hope it is not overcomplicated. If it is --- let me know, I'll make simpler one). |
i think its like: possible solution: |
Content is not escaped, it is in plain UTF-8. |
This function worked for me, give it a try: Function UTF8_Decode(ByVal sStr As String)
Dim l As Long, sUTF8 As String, iChar As Integer, iChar2 As Integer
For l = 1 To Len(sStr)
iChar = Asc(Mid(sStr, l, 1))
If iChar > 127 Then
If Not iChar And 32 Then ' 2 chars
iChar2 = Asc(Mid(sStr, l + 1, 1))
sUTF8 = sUTF8 & ChrW$(((31 And iChar) * 64 + (63 And iChar2)))
l = l + 1
Else
Dim iChar3 As Integer
iChar2 = Asc(Mid(sStr, l + 1, 1))
iChar3 = Asc(Mid(sStr, l + 2, 1))
sUTF8 = sUTF8 & ChrW$(((iChar And 15) * 16 * 256) + ((iChar2 And 63) * 64) + (iChar3 And 63))
l = l + 2
End If
Else
sUTF8 = sUTF8 & Chr$(iChar)
End If
Next l
UTF8_Decode = sUTF8
End Function
|
Doesn't work when I use it like this: For Each kv In Response.Data("rows")
Value = UTF8_Decode(kv("NAME"))
Next
|
Really cool library! While I don't really have VBA experience my take on the utf-8-issues is: The fundamental problem is that when the actual JSON parsing happens, the text data has already been "converted" from the UTF-8 encoded data as returned by the http request to a VBA string. Looks like the WinHttpRequest doesn't properly decode the data from the response and encode to the VBA "unicode" encoding. You're basically not able to control the decoder encoding used to populate the IWinHttpRequest::ResponseText property. So it might make sense to rather base the JSON parsing and text access on the ResponseBody byte array, Indeed, I've been able to properly access JSON data using these modifications to WebHelpers:
The actual conversion function is gracefully taken from here (with the very basic modification of
As denoted in the snippet, there are still problems to consider with this approach:
Caution: I've no idea whatsoever about VBA/Win/Excel/Mac compatibility regarding the conversion function ;-). |
Followup: Along the same lines, a rather hacky way to address the Response.Content property by resetting it in case of JSON data:
(please bear with my obvious VBA deficiencies) |
Thanks for the notes @hjoukl. This issue is still a very high priority, I just haven't been able to do it in a cross-platform manner.
|
Thanks for your clarification @timhall. I guess the problem wrt to interop is the usage of the "kernel32" function to power the utf-8 decoder, then. Keep up the good work! |
Thank you very much for your wonderful programs!
|
@sokol92 I wasn't able to get this to work on macOS, but I may not have tried at the correct place. So adding this info to your post would help to confirm. The only thing that did work for me on macOS was to use this function on the parsed data (i.e., what you get back from Function Utf8ToUtf16(ByVal strText As String) As String
' macOs only: apparently, Excel uses UTF-16 to represent string literals
' Taken from https://stackoverflow.com/a/64624336/918626
Dim i&, l1&, l2&, l3&, l4&, l&
For i = 1 To Len(strText)
l1 = Asc(Mid(strText, i, 1))
If i + 1 <= Len(strText) Then l2 = Asc(Mid(strText, i + 1, 1))
If i + 2 <= Len(strText) Then l3 = Asc(Mid(strText, i + 2, 1))
If i + 3 <= Len(strText) Then l4 = Asc(Mid(strText, i + 3, 1))
Select Case l1
Case 1 To 127
l = l1
Case 194 To 223
l = ((l1 And &H1F) * 2 ^ 6) Or (l2 And &H3F)
i = i + 1
Case 224 To 239
l = ((l1 And &HF) * 2 ^ 12) Or ((l2 And &H3F) * 2 ^ 6) Or (l3 And &H3F)
i = i + 2
Case 240 To 255
l = ((l1 And &H7) * 2 ^ 18) Or ((l2 And &H3F) * 2 ^ 12) Or ((l3 And &H3F) * 2 ^ 6) Or (l4 And &H3F)
i = i + 4
Case Else
l = 63 ' question mark
End Select
Utf8ToUtf16 = Utf8ToUtf16 & IIf(l < 55296, WorksheetFunction.Unichar(l), "?")
Next i
End Function (for Windows #305 seem to work nicely, but needs minor changes so it's ignored on macOS and works across 32/64bit Excel versions) |
I use the following Request:
Response is JSON with strings in UTF-8, which aren't converted into VBA strings.
The text was updated successfully, but these errors were encountered: