-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary resources #1127
Comments
I am trying to understand what this issue is about - please confirm or correct me if I am wrong. Does the following code - converting an xs:base64Binary to "codepoints" - have something to do with this issue, or is it something completely different? let $hexCodepoints := function($input as xs:hexBinary) as xs:integer*
{
let $hexchars := map{
"00": 0, "01": 1, "02": 2, "03": 3, "04": 4, "05": 5, "06": 6, "07": 7, "08": 8, "09": 9, "0A": 10, "0B": 11, "0C": 12, "0D": 13, "0E": 14, "0F": 15,
"10": 16, "11": 17, "12": 18, "13": 19, "14": 20, "15": 21, "16": 22, "17": 23, "18": 24, "19": 25, "1A": 26, "1B": 27, "1C": 28, "1D": 29, "1E": 30, "1F": 31,
"20": 32, "21": 33, "22": 34, "23": 35, "24": 36, "25": 37, "26": 38, "27": 39, "28": 40, "29": 41, "2A": 42, "2B": 43, "2C": 44, "2D": 45, "2E": 46, "2F": 47,
"30": 48, "31": 49, "32": 50, "33": 51, "34": 52, "35": 53, "36": 54, "37": 55, "38": 56, "39": 57, "3A": 58, "3B": 59, "3C": 60, "3D": 61, "3E": 62, "3F": 63,
"40": 64, "41": 65, "42": 66, "43": 67, "44": 68, "45": 69, "46": 70, "47": 71, "48": 72, "49": 73, "4A": 74, "4B": 75, "4C": 76, "4D": 77, "4E": 78, "4F": 79,
"50": 80, "51": 81, "52": 82, "53": 83, "54": 84, "55": 85, "56": 86, "57": 87, "58": 88, "59": 89, "5A": 90, "5B": 91, "5C": 92, "5D": 93, "5E": 94, "5F": 95,
"60": 96, "61": 97, "62": 98, "63": 99, "64": 100, "65": 101, "66": 102, "67": 103, "68": 104, "69": 105, "6A": 106, "6B": 107, "6C": 108, "6D": 109, "6E": 110, "6F": 111,
"70": 112, "71": 113, "72": 114, "73": 115, "74": 116, "75": 117, "76": 118, "77": 119, "78": 120, "79": 121, "7A": 122, "7B": 123, "7C": 124, "7D": 125, "7E": 126, "7F": 127,
"80": 128, "81": 129, "82": 130, "83": 131, "84": 132, "85": 133, "86": 134, "87": 135, "88": 136, "89": 137, "8A": 138, "8B": 139, "8C": 140, "8D": 141, "8E": 142, "8F": 143,
"90": 144, "91": 145, "92": 146, "93": 147, "94": 148, "95": 149, "96": 150, "97": 151, "98": 152, "99": 153, "9A": 154, "9B": 155, "9C": 156, "9D": 157, "9E": 158, "9F": 159,
"A0": 160, "A1": 161, "A2": 162, "A3": 163, "A4": 164, "A5": 165, "A6": 166, "A7": 167, "A8": 168, "A9": 169, "AA": 170, "AB": 171, "AC": 172, "AD": 173, "AE": 174, "AF": 175,
"B0": 176, "B1": 177, "B2": 178, "B3": 179, "B4": 180, "B5": 181, "B6": 182, "B7": 183, "B8": 184, "B9": 185, "BA": 186, "BB": 187, "BC": 188, "BD": 189, "BE": 190, "BF": 191,
"C0": 192, "C1": 193, "C2": 194, "C3": 195, "C4": 196, "C5": 197, "C6": 198, "C7": 199, "C8": 200, "C9": 201, "CA": 202, "CB": 203, "CC": 204, "CD": 205, "CE": 206, "CF": 207,
"D0": 208, "D1": 209, "D2": 210, "D3": 211, "D4": 212, "D5": 213, "D6": 214, "D7": 215, "D8": 216, "D9": 217, "DA": 218, "DB": 219, "DC": 220, "DD": 221, "DE": 222, "DF": 223,
"E0": 224, "E1": 225, "E2": 226, "E3": 227, "E4": 228, "E5": 229, "E6": 230, "E7": 231, "E8": 232, "E9": 233, "EA": 234, "EB": 235, "EC": 236, "ED": 237, "EE": 238, "EF": 239,
"F0": 240, "F1": 241, "F2": 242, "F3": 243, "F4": 244, "F5": 245, "F6": 246, "F7": 247, "F8": 248, "F9": 249, "FA": 250, "FB": 251, "FC": 252, "FD": 253, "FE": 254, "FF": 255 },
$strInput := xs:string($input)
return
(
for $i in 1 to xs:integer(string-length($strInput) div 2),
$j in 2 * $i -1
return $hexchars(substring($strInput, $j, 2))
)
},
$invertBase64Binary := function($input as xs:base64Binary) as xs:integer*
{
let $hexBin := xs:hexBinary($input),
$codePoints := $hexCodepoints($hexBin)
return
for $cp in $codePoints
return 255 - $cp
}
return
(
"Hex-Binary(YAYBBQEBBQA=): " || xs:hexBinary(xs:base64Binary("YAYBBQEBBQA=")),
"==================================================================",
"Hex Codepoints: ",
$hexCodepoints(xs:hexBinary(xs:base64Binary("YAYBBQEBBQA="))),
"==================================================================",
"Inverted codepoints: ",
$invertBase64Binary(xs:base64Binary("YAYBBQEBBQA="))
) And this produces as result:
|
I think your code is converting a hexBinary value into a sequence of octets, represented as integers in the range 0-255. There's nothing I can see here that is anything to do with characters or codepoints. (The use of the term codepoints in your post is quite misleading: Unicode codepoints require multiple octets.) A simpler implementation of this function in pure XQuery 4.0 might be
That is of course sometimes a useful thing to be able to do; and it is one of the many functions available in the EXPath binary module defined at https://expath.org/spec/binary (see function bin:to-octets). This issue isn't asking for all the functionality of the EXPath binary module to be added into F+O, nor was I thinking about this specific function. Rather, it's observing that with things like
|
I happily agree. This functionality is key, in my opinion, to an important new QT4 feature, where a file that is mostly text but with "bad characters" can be read as binary, fixed, then cast to a string according to a given encoding. I wrote a similar set of functions for TAN, converting between octets and UTF-8 ... https://github.com/textalign/TAN-2021/blob/master/functions/numerics/TAN-fn-octets.xsl ... in conjunction with functions supporting conversions across bits, base64Binary, octets: https://github.com/textalign/TAN-2021/blob/master/functions/numerics/TAN-fn-binary.xsl Should the function have a feature allowing the implementer to detect the encoding and use the one it deems best? |
Agreed. Let us also specify these functions, which can simplify dealing with binary, hex and base64Binary:
I wouldn't be surprised if it might be possible to combine these three into a single function, using the union of the three types. |
For It could be helpful to extend |
Perhaps we should:
|
Encouraging it sounds good; I wouldn't mandate it. At least in our case, other non-standard modules are used much more frequently than this specific module. What will be the primary use cases for binary conversions, apart from parsing and serializing data? We shouldn’t duplicate more and more functions from other modules (in our implementation, we already have two functions for converting binary data, one in the Binary Module, one in our custom and older Conversion Module).
👍
…as well as |
I think the problem is that users are reluctant to use features that aren't guaranteed to be present in every implementation. |
As Florent Georges, the maintainer of EXPath, has almost been unreachable for the last years, chances are high that the public resources get lost, so I wonder whether there’s a chance to move the Binary, File and possibly other modules into the W3 domain domain and make them mandatory in a second step? It could additionally give us the chance to revise the specs, and align them with XPath 4. |
Bringing them into the w3 domain would be ideal. EXPath is a helpful set of tools, but knowing any conformant 4.0 processor will have a given function is, as Michael said, what gives users the confidence to rely on a given feature. |
I made parse-html support binary as the HTML spec has rules for detecting the encoding and decoding a binary data stream. It therefore makes sense to support that in the API. Additionally, the EXPath extensions support reading binary data even if it is not present in the core XQT specs. Vendors can also have their own binary extensions. |
We have some functions that accept binary input (parse-html, parse-csv) and others that don't (parse-xml, parse-json). There seems to be no obvious justification for the inconsistency.
Related to this:
(a) we have no functions to convert (encode/decode) between binary and string given an encoding
(b) we have no function to read a binary resource from a URI
Both of these are available in the EXPath bin library but should perhaps be promoted to the main spec.
The text was updated successfully, but these errors were encountered: