Replies: 2 comments 3 replies
-
Is U+2028 allowed? Yes, it has been allowed since Unicode support was introduced in GEDCOM 5.3. U+2029 (paragraph separator) is also allowed.
Why not? I don't immediately see any problems caused by supporting these characters, but I'm assuming you do? My opinion is that applications should should not modify user-supplied content without their consent unless what they supplied violates the spec. If a user knew enough to enter U+2028 in a payload, I think an application should assume the user wanted that specific character. I have much less opinion about the merits of having U+2028 and U+2029 supported in future versions of the spec. They have do distinct semantic meaning that is not explicit in CONT and that meaning can have impact on semantically-aware algorithms and user interfaces, but very few applications seem to take advantage of that semantic difference today so we could probably prohibit them with minimal disruption if necessary. |
Beta Was this translation helpful? Give feedback.
-
My view: I don't think the GEDCOM spec should change to disallow specific Unicode characters based on what appear to be bugs in individual applications/sites, that feedback should be directed to the buggy application/site. If there is an argument that a Unicode character is generically problematic for ALL apps/sites, then it should be disallowed. So far it sounds like this issue is in the former category. |
Beta Was this translation helpful? Give feedback.
-
Is the unicode character LINE SEPARATOR (U+2028) (aka UTF-8 0xE2 0x80 0xA8) allowed in a UTF-coded GEDCOM 5.5.x file?
Geni.com can create such files for download if the character is included in some INDI.NOTE sections.
I think this should not be allowed, but what do you say?
Beta Was this translation helpful? Give feedback.
All reactions