Better support for utf-8 #105

nifuki · 2022-08-19T09:15:10Z

If I get it right, this line in SmartHTML is supposed to escape html special characters:

Lines 33 to 34 in 4e5b406

    
           # Escape text into &x1234; format ignoring a alphanumerics and a few special characters 
        
           $Text =~ s{([^\:\/\.\-\?\=\+\w\s&#%;)]|&(?!#?\w+;))}{"&#x".sprintf("%x", unpack(U,$1)).";"}ge;

But it messes up non-ASCII symbols in utf8, like "é". Consider using some other utf8-friendly way for that, for instance:

use HTML::Escape;
$Text = escape_html($Text);

Things get more complicated with this code:

DocDB/DocDB/cgi/UntaintHTML.pm

Lines 32 to 36 in 4e5b406

    
           sub is_valid { 
        
             my $self = shift; 
        
             my $EscapedHTML = encode_entities_numeric($self->value);  
        
             $self->value($EscapedHTML); 
        
           }

When chained with the SmartHTML we get some kind of double-encoding and end up with symbols rendered as &x1234; in a browser. What is the purpose of the encode_entities_numeric - some sanitizing similar to the regex in SmartHML above? If that's the case and if is_valid is always followed by the SmartHTML then encode_entities_numeric could probably be just removed. But I'm no expert in perl and DocDB and need advice on it.

Also is_valid is probably a confusing name as it doesn't really check that something is valid, but it seems to modify things instead.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for utf-8 #105

Better support for utf-8 #105

nifuki commented Aug 19, 2022

Better support for utf-8 #105

Better support for utf-8 #105

Comments

nifuki commented Aug 19, 2022