Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent detection script injection from breaking import maps in classic themes #1084

Merged
merged 24 commits into from
Apr 11, 2024

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Mar 23, 2024

Summary

Move detection script module from being injected at the end of head to instead happen at the end of body. This prevents the script module from breaking import maps, which are printed in the footer in classic themes rather than in the head (as done in block themes). The import map script must be printed before all script modules.

Fixes #1083

Relevant technical choices

Outdated technical choices

Previously the detection script module was injected just before the </head> closing tag. This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. So instead of injecting the detection script at the end of the head, this PR changes it so that they get injected at the point where wp_print_footer_scripts happens. This injection happens by means of printing a placeholder string during template rendering and then this placeholder is replaced with the script module during output buffer processing. The placeholder is randomized to prevent accidental replacement of content. A placeholder is used instead of injecting before </body> since it may be that this exists inside of a comment, like <!--</body>--> in which case the replacement would be invalid.

Note that the injection of the preload links continues to happen just before </head>. The reason that a similar placeholder string is not used is that when a string appears in the head, the DOM specifies that this should cause the head to immediately close and implicitly open the body. In case there is any DOMDocument processing being done on the page in addition to our output buffer, the presence of a string placeholder would break the DOM.

Additionally, a mustache-ish tag is used for the placeholder instead of an HTML comment because other plugins may try to optimize the page by removing all HTML comments to reduce page weight.

Ultimately, the injection of the preload links in the head and the detection script in the body should leverage the HTML API (via WP_HTML_Processor) once it is updated to support tag node insertions, which it does not support yet.

Previously the detection script module was injected just before the </head> closing tag (using search/replace). This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. Since it was using search/replace, it was also vulnerable to injecting the script in the wrong place, for example if the HTML contained <!--</head>-->. Now the HTML Tag Processor is extended (thanks to @dmsnell) to inject the script tag right before the </body> closing tag (and not confused by <!--</body>-->). The HTML Tag Processor is also used to inject any preload links at the end of the </head>. Note also that what was known as OD_HTML_Tag_Processor has been renamed to OD_HTML_Tag_Walker, and a new class OD_HTML_Tag_Processor is introduced which directly extends WP_HTML_Tag_Processor. The OD_HTML_Tag_Walker class (and the old OD_HTML_Tag_Processor class) did not extend WP_HTML_Tag_Processor but used it by composition. The new OD_HTML_Tag_Processor includes a method for injecting HTML in the head and body.

Finally, this PR removes the use of ext-dom to do equality assertions in PHPUnit. This was overkill and it made it difficult to debug when the assertions started failing.


This also fixes (46ca5f8) a trivial bug where an extra slash was present in the path to the web-vitals.js library (e.g. optimization-detective//build/web-vitals.js).

@westonruter westonruter added [Type] Bug An existing feature is broken [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Mar 23, 2024
@westonruter westonruter force-pushed the fix/optimization-detective-import-maps branch from fa03924 to e7c450b Compare April 4, 2024 16:04
@westonruter westonruter marked this pull request as ready for review April 4, 2024 20:33
@westonruter westonruter requested a review from felixarntz as a code owner April 4, 2024 20:33
Copy link

github-actions bot commented Apr 4, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <[email protected]>
Co-authored-by: felixarntz <[email protected]>
Co-authored-by: dmsnell <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter One thing I don't understand (maybe I'm not understanding your explanation fully) is why the script is now being injected via replacing a placeholder.

Instead of injecting a placeholder into wp_print_footer_scripts and replacing it as part of output buffering, why can't we simply inject the actual script into wp_print_footer_scripts?

@westonruter
Copy link
Member Author

@felixarntz Right, because we should only inject the detection script if we need it. If detection isn't needed, then the script should be omitted. Whether detection is needed is determined at the beginning of output buffer processing in od_optimize_template_output_buffer():

$post = OD_URL_Metrics_Post_Type::get_post( $slug );
$group_collection = new OD_URL_Metrics_Group_Collection(
$post ? OD_URL_Metrics_Post_Type::get_url_metrics_from_post( $post ) : array(),
od_get_breakpoint_max_widths(),
od_get_url_metrics_breakpoint_sample_size(),
od_get_url_metric_freshness_ttl()
);
// Whether we need to add the data-od-xpath attribute to elements and whether the detection script should be injected.
$needs_detection = ! $group_collection->is_every_group_complete();

If the script injection were to be moved outside of the output buffer callback, it would look something like this:

add_action(
	'wp_print_footer_scripts',
	static function () {
		$slug = od_get_url_metrics_slug( od_get_normalized_query_vars() );
		$post = OD_URL_Metrics_Post_Type::get_post( $slug );

		$group_collection = new OD_URL_Metrics_Group_Collection(
			$post ? OD_URL_Metrics_Post_Type::get_url_metrics_from_post( $post ) : array(),
			od_get_breakpoint_max_widths(),
			od_get_url_metrics_breakpoint_sample_size(),
			od_get_url_metric_freshness_ttl()
		);
		
		if ( ! $group_collection->is_every_group_complete() ) {
			echo od_get_detection_script( $slug, $group_collection );
		}
	}
);

Nevertheless, as is, this would entail getting the same data twice. But assuming that is resolved with a class refactor to remember that state... there is another downside to doing this, and that is mentioned in this todo:

// Inject detection script.
// TODO: When optimizing above, if we find that there is a stored LCP element but it fails to match, it should perhaps set $needs_detection to true and send the request with an override nonce. However, this would require backtracking and adding the data-od-xpath attributes.
if ( $needs_detection ) {
$head_injection .= od_get_detection_script( $slug, $group_collection );
}

During optimization, it may be that we find that XPaths in the URL Metrics don't match against something in the document. When this happens, it really should be forcing $needs_detection to be true so that we can start collecting data. More would be involved there, as we'd have to have a way to make sure the endpoint allows receiving a new URL Metric even though the groups are complete, but that was in part why I injected the detection script as part of the output buffer processing. Also, the preload links can only be injected during output buffer processing as well.

@westonruter
Copy link
Member Author

Hold on. In talking with Dennis apparently there is a way to inject tags with the tag processor.

@westonruter westonruter marked this pull request as draft April 4, 2024 21:12
@westonruter westonruter force-pushed the fix/optimization-detective-import-maps branch from 5d6e760 to 69f8629 Compare April 4, 2024 21:48
@westonruter westonruter changed the base branch from trunk to release/3.0.0 April 4, 2024 23:52
@westonruter
Copy link
Member Author

westonruter commented Apr 5, 2024

@dmsnell It seems like the tag insertion in WP 6.4 isn't working. It's resulting in a document like this:

<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        <script type="module">/* import detect ... */</script>

<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        </body>
</html>

When it should be resulting in the following, which it does in WP 6.5:

<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>...</title>
    </head>
    <body>
        <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        <script type="module">/* import detect ... */</script>
    </body>
</html>

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in WordPress 6.5 we did change the span representation and maybe that's in play here, since we're creating the private internal classes. if this code needs to work with both versions then it would likely have to detect that.

alternatively you could also try another approach that I've done, which is to trap externally the edits you want to make. that would ensure you don't have to worry about versioning.

so for example, once you get the head-end and body-end bookmarks, hold on to them until the end and then use them to perform your text operations.

now at the end you can perform your script and link injection in one fell swoop and not worry about the internal WP_HTML_Text_Replacement class changes.

new class ( $html ) extends WP_HTML_Tag_Processor {
	public function split_at_start_of_bookmarks( ...$names ) {
		$splits = array();
		$at     = 0;

		foreach ( $names as $name ) {
			$mark     = $this->bookmarks[ $name ];
			$splits[] = substr( $this->html, $at, $mark->start );
			$at       = $mark->start;
		}

		if ( $at < strlen( $this->html ) ) {
			$splits[] = substr( $this->html, $at );
		}

		return $splits;
	}
}

$links    = array();
$scripts  = array();

while ( $processor->next_tag( … ) ) {
	if ( $need_a_link ) {
		$links[] = "<link rel=anything>";
	}

	…

	if ( ! $found_head && 'HEAD' === $tag_name && $processor->is_tag_closer() ) {
		$processor->set_bookmark( HEAD_BOOKMARK );
	}

	…
}

// Processing is done, inject content.
list( $head, $body, $footer ) = $processor->split_at_start_of_bookmarks( HEAD_MARK, BODY_MARK );
return $head . implode( "\n", $links ) . $body . implode( "\n", $scripts ) . $footer;

again, this is only one way of doing it to avoid trying to detect the running version of WordPress. it's probably easiest to perform that version check and adjust whether you add 0 or the start of the bookmark again (length vs. ending location).

}
};

$this->processor = new $processor( $html );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when creating the anonymous class, isn't it already instantiated? I don't think we want this second new

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Fixed in 7a6debc

$this->lexical_updates[] = new WP_HTML_Text_Replacement(
$this->bookmarks[ $bookmark ]->start,
0,
$html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shoot, this may be a result of the change of signature of the span/range functions from (start, end) to (start, length).

on WordPress 6.4 the span would be from $this->bookmarks[ $bookmark ]->start to $this->bookmarks[ $bookmark ]->end I think. I wouldn't understand why a 0 value here though would duplicate the string.

also it's suspiciously close to what would happen if we weren't to shadow the $html from line 183, but that seems impossible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Easily fixed then in 03ad8f9.

@westonruter westonruter marked this pull request as ready for review April 5, 2024 17:50
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I don't feel like I'm the right person to review this, would be great to get @dmsnell's approval as he's far more familiar with the internals of the HTML tag processor.

I only have one high level feedback, but overall it looks good to me.

@@ -160,7 +179,32 @@ final class OD_HTML_Tag_Processor {
* @param string $html HTML to process.
*/
public function __construct( string $html ) {
$this->processor = new WP_HTML_Tag_Processor( $html );

$this->processor = new class( $html ) extends WP_HTML_Tag_Processor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit fragile to me. Can we make it an actual named class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile how? It's what @dmsnell suggested so I follow his lead for the use of the anonymous class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've invited @dmsnell to the Performance team on GitHub for proper approval granting 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about what kind of fragility we're concerned about here. Extending WP_HTML_Tag_Processor provides guarantees about the behavior of the underlying methods.

The anonymous class was more or less an example I tossed out as a convenient way to create or expose risky behaviors from the HTML API without exporting that to the outside world. It creates these methods but they shouldn't be instantiable or callable from outside, and won't show up in documentation. Also they are convenient if your styling rules forbid having more than one class in the same file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just questioning why we use an anonymous class here. We haven't done that anywhere, even with other classes we consider private. So it's a new pattern and I'm not sure we want to go that route, as it would lead to more such cases in the future.

As @dmsnell says, the anonymous class was just "a convenient way". Changing it to a regular class will probably take just a few minutes, but follow our patterns established so far.

I just don't see any good reason for an anonymous class, other than it was very quick to throw in here and we don't have to think about naming. We can just mark it as @access private like we do with other classes, then it doesn't matter if someone else tries to use it - that's on them, and there's no support or guarantees for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not 😄

I guess you feel strongly then. I'll do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @westonruter, I think this is in line with how we've been writing classes so far. FWIW there are no anonymous classes in WP core either, and since this is a feature plugin I think it makes sense to follow core's patterns too.

Copy link
Member Author

@westonruter westonruter Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this was intended to anticipate WP_HTML_Processor having such node insertion functionality. By the time this is proposed for core I was expecting the anonymous class would no longer be needed. Anyway, let's move on 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to consider anonymous classes at times when we have something dangerous like this that we don't want to expose. it's unfortunate we have no way of hiding things otherwise.

Comment on lines -472 to +476
<link as="image" data-od-added-tag="" fetchpriority="high" href="https://example.com/foo.jpg" rel="preload" media="screen">
<link data-od-added-tag rel="preload" fetchpriority="high" as="image" href="https://example.com/foo.jpg" media="screen">
</head>
<body>
<img src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" fetchpriority="high" data-od-added-fetchpriority data-od-removed-loading="lazy">
<img src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" data-od-removed-fetchpriority="high">
<img data-od-added-fetchpriority data-od-removed-loading="lazy" fetchpriority="high" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" >
<img data-od-removed-fetchpriority="high" src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" >
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these changes are purely syntactic and not semantic. The equivalent DOM tree is produced. The difference is made here because no longer is DOMDocument being used for comparison in PHPUnit. Using DOMDocument normalized differences in attribute order. But with the removal of DOMDocument, the attribute order and value syntax has to match.

The same goes for the other changes below, other than the move of the script from the head to the footer, which is the only semantic change.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious issues with the HTML API use. Thanks for continuing to lean into it and explore its boundaries.

$this->lexical_updates[] = new WP_HTML_Text_Replacement(
$start,
// In WordPress 6.5, the signature was changed from $end to $length.
version_compare( get_bloginfo( 'version' ), '6.5', '<' ) ? $start : 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably doesn't matter much, but there's no need to recompute this on every call. the WordPress version won't change during the runtime (at least outside of the Playground where it could), so this could be set in the constructor or as a method-level static var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Although this will only be called twice at most, once for the head insertion and once for body. I'll add a static var.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b3fbf5c

@westonruter westonruter force-pushed the fix/optimization-detective-import-maps branch from 4d89f64 to a715290 Compare April 11, 2024 00:09
@westonruter westonruter requested a review from felixarntz April 11, 2024 00:17
Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, just one question to make sure we didn't miss it.

* @since 0.1.1 Renamed from OD_HTML_Tag_Processor to OD_HTML_Tag_Walker
* @access private
*/
final class OD_HTML_Tag_Walker {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking as this is hard to review: I think you renamed the class that was originally wrapping WP_HTML_Tag_Processor from OD_HTML_Tag_Processor to this, and the new OD_HTML_Tag_Processor is the new extending class for WP_HTML_Tag_Processor?

That would make sense to me. I assume there are no further notable changes in this class here that haven't been reviewed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I noted that in the @since tag and I also updated the PR description to explain this change.

@westonruter westonruter merged commit fc2f45f into release/3.0.0 Apr 11, 2024
33 checks passed
@westonruter westonruter deleted the fix/optimization-detective-import-maps branch April 11, 2024 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Bug An existing feature is broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detection script injected by Optimization Detective breaks import maps for classic themes
3 participants