Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for utf-8 #105

Open
nifuki opened this issue Aug 19, 2022 · 0 comments
Open

Better support for utf-8 #105

nifuki opened this issue Aug 19, 2022 · 0 comments

Comments

@nifuki
Copy link

nifuki commented Aug 19, 2022

If I get it right, this line in SmartHTML is supposed to escape html special characters:

# Escape text into &x1234; format ignoring a alphanumerics and a few special characters
$Text =~ s{([^\:\/\.\-\?\=\+\w\s&#%;)]|&(?!#?\w+;))}{"&#x".sprintf("%x", unpack(U,$1)).";"}ge;

But it messes up non-ASCII symbols in utf8, like "é". Consider using some other utf8-friendly way for that, for instance:

use HTML::Escape;
$Text = escape_html($Text);

Things get more complicated with this code:

sub is_valid {
my $self = shift;
my $EscapedHTML = encode_entities_numeric($self->value);
$self->value($EscapedHTML);
}

When chained with the SmartHTML we get some kind of double-encoding and end up with symbols rendered as &x1234; in a browser. What is the purpose of the encode_entities_numeric - some sanitizing similar to the regex in SmartHML above? If that's the case and if is_valid is always followed by the SmartHTML then encode_entities_numeric could probably be just removed. But I'm no expert in perl and DocDB and need advice on it.

Also is_valid is probably a confusing name as it doesn't really check that something is valid, but it seems to modify things instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant