-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove optional tags for mathml that immediately precede the closing </math>
tag
#128
Comments
Hi @universemaster, Thanks for creating the issue. This sounds like a request to support MathML. Is this what you mean? Please keep in mind, that this library is mainly for HTML and can be extended via the configuration to support additional tags in some optimizations. |
Hi, From here: So, I (perhaps erroneously) thought Is that not correct? |
As the name suggests html-minifier is only for html tag. There are some hardcoded lists: https://github.com/terser/html-minifier-terser/blob/master/src/htmlparser.js#L70-L81 https://github.com/terser/html-minifier-terser/blob/master/src/htmlminifier.js#L397-L418
MathML Specification contains different tags, which are probably not part of the normal HTML Specification. |
I have read w3c/mathml#375 and the code example does not make much sense to me. XML and HTML normally only allow these variants (and so do parsers in most cases, with a few very specific exceptions):
There are only very specific tags like |
Regarding this:
brotli can drastically reduce the size of xml and probably MathML (I did not test or verify if this would work with MathML). I mainly agree with the opinions in the linked issue. This would break many parsers. |
Here is the hardcoded list of html tags with optional end tags: https://github.com/terser/html-minifier-terser/blob/master/src/htmlminifier.js#L403 const optionalEndTags = new Set(['html', 'head', 'body', 'li', 'dt', 'dd', 'p', 'rb', 'rt', 'rtc', 'rp', 'optgroup', 'option', 'colgroup', 'caption', 'thead', 'tbody', 'tfoot', 'tr', 'td', 'th']); |
There are not optional tags for MathML and SVG namespaces, spec cleary say:
Ref: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign Look at:
The parser expects the end tag to follow the opening tag and they should have the same name, otherwise it is a parser error. Yes, HTML parser is working in recovery mode and will restore DOM in most of cases, but itself it is unsafe compression (and yes, there are some edge cases when DOM was broken, if you need I can provide). I agree we should compress MathML in this package (for example - we can collapse whitespaces), because it is embeded MathML in HTML (we are html minfier so embed content should be compressed too, like we do with JS/CSS) and the same with SVG element in HTML (as minimum whitespaces too). But SVG and MathML don't have omiting tag concept, it is XML in HTML with some limitation/abilities (depending on how you look at the problems 😄 ), so I don't think it is a good idea to omit end tag for SVG and MathML namespaces |
Thanks for your references for this.
With regard to the specific case from the title of this issue - closing tags that immediately precede the closing math I've been experimenting with the regex find and replace approach below on a massive 9Mb page (before compression) with 27,000 It eliminates 0.5Mb (before compression). However, I haven't yet done performance tests for parse time against network download times. const RegexToSelectClosingTagsForDeletion =
/(<\/[^>]*>)+(?=<\/(?:math)>)/g;
const target = src
.replaceAll(RegexToSelectClosingTagsForDeletion, '') But, it seems from your references that this approach will cause parse errors. I do appreciate your opinion not to add unsafe minifications to this library that causes parse errors (even if the DOM is identical).
Agree, thanks for that. (Edited post to remove mtd, mtr from code snippet. This part needs more work - this bit does silently hide parts of math tables). |
Can you provide your source code before compression? There are two unsafe cases I see (but they can have more):
Shortly - SVG and MathML is XML syntax inside HTML with recoverable logic from HTML (because HTML parser is recovarable 😄). This parser error is not fatal, so maybe it is safe for you, but note - many robots/bots/etc which parse your page and allow to show your work/site/application/information in search systems can have more strict behaviour and stop parsing context after this error. |
In mathml a sequence of closing tags that immediately precede the closing
</math>
element can be omitted. See w3c Mathml spec discussion.Since mathml is so verbose, even omitting these perhaps 2-3 closing tags per
<math>
equation will help, particularly when writing inline such asLet $g$ be an element of $G$.
I suspect the existing (html) parser may allow some further tags to be omitted, but I haven't investigated further.
Chrome will soon support mathml - with an intent-to-ship to come in the next month or so.
Resources that may be useful are the mathml core spec and to create some mathml try temml.
The text was updated successfully, but these errors were encountered: