-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent detection script injection from breaking import maps in classic themes #1084
Prevent detection script injection from breaking import maps in classic themes #1084
Conversation
fa03924
to
e7c450b
Compare
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter One thing I don't understand (maybe I'm not understanding your explanation fully) is why the script is now being injected via replacing a placeholder.
Instead of injecting a placeholder into wp_print_footer_scripts
and replacing it as part of output buffering, why can't we simply inject the actual script into wp_print_footer_scripts
?
@felixarntz Right, because we should only inject the detection script if we need it. If detection isn't needed, then the script should be omitted. Whether detection is needed is determined at the beginning of output buffer processing in performance/plugins/optimization-detective/optimization.php Lines 186 to 196 in 1b4485f
If the script injection were to be moved outside of the output buffer callback, it would look something like this: add_action(
'wp_print_footer_scripts',
static function () {
$slug = od_get_url_metrics_slug( od_get_normalized_query_vars() );
$post = OD_URL_Metrics_Post_Type::get_post( $slug );
$group_collection = new OD_URL_Metrics_Group_Collection(
$post ? OD_URL_Metrics_Post_Type::get_url_metrics_from_post( $post ) : array(),
od_get_breakpoint_max_widths(),
od_get_url_metrics_breakpoint_sample_size(),
od_get_url_metric_freshness_ttl()
);
if ( ! $group_collection->is_every_group_complete() ) {
echo od_get_detection_script( $slug, $group_collection );
}
}
); Nevertheless, as is, this would entail getting the same data twice. But assuming that is resolved with a class refactor to remember that state... there is another downside to doing this, and that is mentioned in this todo: performance/plugins/optimization-detective/optimization.php Lines 344 to 348 in 1b4485f
During optimization, it may be that we find that XPaths in the URL Metrics don't match against something in the document. When this happens, it really should be forcing |
Hold on. In talking with Dennis apparently there is a way to inject tags with the tag processor. |
…tag processor Co-authored-by: dmsnell <[email protected]>
5d6e760
to
69f8629
Compare
@dmsnell It seems like the tag insertion in WP 6.4 isn't working. It's resulting in a document like this: <html lang="en">
<head>
<meta charset="utf-8">
<title>...</title>
</head>
<body>
<img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
<script type="module">/* import detect ... */</script>
<html lang="en">
<head>
<meta charset="utf-8">
<title>...</title>
</head>
<body>
<img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
</body>
</html> When it should be resulting in the following, which it does in WP 6.5: <html lang="en">
<head>
<meta charset="utf-8">
<title>...</title>
</head>
<body>
<img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
<script type="module">/* import detect ... */</script>
</body>
</html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in WordPress 6.5 we did change the span representation and maybe that's in play here, since we're creating the private internal classes. if this code needs to work with both versions then it would likely have to detect that.
alternatively you could also try another approach that I've done, which is to trap externally the edits you want to make. that would ensure you don't have to worry about versioning.
so for example, once you get the head-end and body-end bookmarks, hold on to them until the end and then use them to perform your text operations.
now at the end you can perform your script and link injection in one fell swoop and not worry about the internal WP_HTML_Text_Replacement
class changes.
new class ( $html ) extends WP_HTML_Tag_Processor {
public function split_at_start_of_bookmarks( ...$names ) {
$splits = array();
$at = 0;
foreach ( $names as $name ) {
$mark = $this->bookmarks[ $name ];
$splits[] = substr( $this->html, $at, $mark->start );
$at = $mark->start;
}
if ( $at < strlen( $this->html ) ) {
$splits[] = substr( $this->html, $at );
}
return $splits;
}
}
$links = array();
$scripts = array();
while ( $processor->next_tag( … ) ) {
if ( $need_a_link ) {
$links[] = "<link rel=anything>";
}
…
if ( ! $found_head && 'HEAD' === $tag_name && $processor->is_tag_closer() ) {
$processor->set_bookmark( HEAD_BOOKMARK );
}
…
}
// Processing is done, inject content.
list( $head, $body, $footer ) = $processor->split_at_start_of_bookmarks( HEAD_MARK, BODY_MARK );
return $head . implode( "\n", $links ) . $body . implode( "\n", $scripts ) . $footer;
again, this is only one way of doing it to avoid trying to detect the running version of WordPress. it's probably easiest to perform that version check and adjust whether you add 0
or the start of the bookmark again (length vs. ending location).
} | ||
}; | ||
|
||
$this->processor = new $processor( $html ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when creating the anonymous class, isn't it already instantiated? I don't think we want this second new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. Fixed in 7a6debc
$this->lexical_updates[] = new WP_HTML_Text_Replacement( | ||
$this->bookmarks[ $bookmark ]->start, | ||
0, | ||
$html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh shoot, this may be a result of the change of signature of the span/range functions from (start, end)
to (start, length)
.
on WordPress 6.4 the span would be from $this->bookmarks[ $bookmark ]->start
to $this->bookmarks[ $bookmark ]->end
I think. I wouldn't understand why a 0
value here though would duplicate the string.
also it's suspiciously close to what would happen if we weren't to shadow the $html
from line 183, but that seems impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Easily fixed then in 03ad8f9.
Co-authored-by: dmsnell <[email protected]>
Co-authored-by: dmsnell <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonruter I don't feel like I'm the right person to review this, would be great to get @dmsnell's approval as he's far more familiar with the internals of the HTML tag processor.
I only have one high level feedback, but overall it looks good to me.
@@ -160,7 +179,32 @@ final class OD_HTML_Tag_Processor { | |||
* @param string $html HTML to process. | |||
*/ | |||
public function __construct( string $html ) { | |||
$this->processor = new WP_HTML_Tag_Processor( $html ); | |||
|
|||
$this->processor = new class( $html ) extends WP_HTML_Tag_Processor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit fragile to me. Can we make it an actual named class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragile how? It's what @dmsnell suggested so I follow his lead for the use of the anonymous class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've invited @dmsnell to the Performance team on GitHub for proper approval granting 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about what kind of fragility we're concerned about here. Extending WP_HTML_Tag_Processor
provides guarantees about the behavior of the underlying methods.
The anonymous class was more or less an example I tossed out as a convenient way to create or expose risky behaviors from the HTML API without exporting that to the outside world. It creates these methods but they shouldn't be instantiable or callable from outside, and won't show up in documentation. Also they are convenient if your styling rules forbid having more than one class in the same file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just questioning why we use an anonymous class here. We haven't done that anywhere, even with other classes we consider private. So it's a new pattern and I'm not sure we want to go that route, as it would lead to more such cases in the future.
As @dmsnell says, the anonymous class was just "a convenient way". Changing it to a regular class will probably take just a few minutes, but follow our patterns established so far.
I just don't see any good reason for an anonymous class, other than it was very quick to throw in here and we don't have to think about naming. We can just mark it as @access private
like we do with other classes, then it doesn't matter if someone else tries to use it - that's on them, and there's no support or guarantees for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not 😄
I guess you feel strongly then. I'll do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @westonruter, I think this is in line with how we've been writing classes so far. FWIW there are no anonymous classes in WP core either, and since this is a feature plugin I think it makes sense to follow core's patterns too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this was intended to anticipate WP_HTML_Processor
having such node insertion functionality. By the time this is proposed for core I was expecting the anonymous class would no longer be needed. Anyway, let's move on 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to consider anonymous classes at times when we have something dangerous like this that we don't want to expose. it's unfortunate we have no way of hiding things otherwise.
<link as="image" data-od-added-tag="" fetchpriority="high" href="https://example.com/foo.jpg" rel="preload" media="screen"> | ||
<link data-od-added-tag rel="preload" fetchpriority="high" as="image" href="https://example.com/foo.jpg" media="screen"> | ||
</head> | ||
<body> | ||
<img src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" fetchpriority="high" data-od-added-fetchpriority data-od-removed-loading="lazy"> | ||
<img src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" data-od-removed-fetchpriority="high"> | ||
<img data-od-added-fetchpriority data-od-removed-loading="lazy" fetchpriority="high" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" > | ||
<img data-od-removed-fetchpriority="high" src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: these changes are purely syntactic and not semantic. The equivalent DOM tree is produced. The difference is made here because no longer is DOMDocument being used for comparison in PHPUnit. Using DOMDocument normalized differences in attribute order. But with the removal of DOMDocument, the attribute order and value syntax has to match.
The same goes for the other changes below, other than the move of the script
from the head
to the footer, which is the only semantic change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any obvious issues with the HTML API use. Thanks for continuing to lean into it and explore its boundaries.
$this->lexical_updates[] = new WP_HTML_Text_Replacement( | ||
$start, | ||
// In WordPress 6.5, the signature was changed from $end to $length. | ||
version_compare( get_bloginfo( 'version' ), '6.5', '<' ) ? $start : 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably doesn't matter much, but there's no need to recompute this on every call. the WordPress version won't change during the runtime (at least outside of the Playground where it could), so this could be set in the constructor or as a method-level static var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Although this will only be called twice at most, once for the head insertion and once for body. I'll add a static var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b3fbf5c
4d89f64
to
a715290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, just one question to make sure we didn't miss it.
* @since 0.1.1 Renamed from OD_HTML_Tag_Processor to OD_HTML_Tag_Walker | ||
* @access private | ||
*/ | ||
final class OD_HTML_Tag_Walker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking as this is hard to review: I think you renamed the class that was originally wrapping WP_HTML_Tag_Processor
from OD_HTML_Tag_Processor
to this, and the new OD_HTML_Tag_Processor
is the new extending class for WP_HTML_Tag_Processor
?
That would make sense to me. I assume there are no further notable changes in this class here that haven't been reviewed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I noted that in the @since
tag and I also updated the PR description to explain this change.
Summary
Move detection script module from being injected at the end of
head
to instead happen at the end ofbody
. This prevents the script module from breaking import maps, which are printed in the footer in classic themes rather than in the head (as done in block themes). The import map script must be printed before all script modules.Fixes #1083
Relevant technical choices
Outdated technical choices
Previously the detection script module was injected just before the
</head>
closing tag. This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. So instead of injecting the detection script at the end of thehead
, this PR changes it so that they get injected at the point wherewp_print_footer_scripts
happens. This injection happens by means of printing a placeholder string during template rendering and then this placeholder is replaced with the script module during output buffer processing. The placeholder is randomized to prevent accidental replacement of content. A placeholder is used instead of injecting before</body>
since it may be that this exists inside of a comment, like<!--</body>-->
in which case the replacement would be invalid.Note that the injection of the preload links continues to happen just before
</head>
. The reason that a similar placeholder string is not used is that when a string appears in thehead
, the DOM specifies that this should cause thehead
to immediately close and implicitly open thebody
. In case there is anyDOMDocument
processing being done on the page in addition to our output buffer, the presence of a string placeholder would break the DOM.Additionally, a mustache-ish tag is used for the placeholder instead of an HTML comment because other plugins may try to optimize the page by removing all HTML comments to reduce page weight.
Ultimately, the injection of the preload links in the
head
and the detection script in thebody
should leverage the HTML API (viaWP_HTML_Processor
) once it is updated to support tag node insertions, which it does not support yet.Previously the detection script module was injected just before the
</head>
closing tag (using search/replace). This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. Since it was using search/replace, it was also vulnerable to injecting the script in the wrong place, for example if the HTML contained<!--</head>-->
. Now the HTML Tag Processor is extended (thanks to @dmsnell) to inject the script tag right before the</body>
closing tag (and not confused by<!--</body>-->
). The HTML Tag Processor is also used to inject any preload links at the end of the</head>
. Note also that what was known asOD_HTML_Tag_Processor
has been renamed toOD_HTML_Tag_Walker
, and a new classOD_HTML_Tag_Processor
is introduced which directly extendsWP_HTML_Tag_Processor
. TheOD_HTML_Tag_Walker
class (and the oldOD_HTML_Tag_Processor
class) did not extendWP_HTML_Tag_Processor
but used it by composition. The newOD_HTML_Tag_Processor
includes a method for injecting HTML in thehead
andbody
.Finally, this PR removes the use of
ext-dom
to do equality assertions in PHPUnit. This was overkill and it made it difficult to debug when the assertions started failing.This also fixes (46ca5f8) a trivial bug where an extra slash was present in the path to the web-vitals.js library (e.g.
optimization-detective//build/web-vitals.js
).