Prevent detection script injection from breaking import maps in classic themes #1084

westonruter · 2024-03-23T05:08:56Z

Summary

Move detection script module from being injected at the end of head to instead happen at the end of body. This prevents the script module from breaking import maps, which are printed in the footer in classic themes rather than in the head (as done in block themes). The import map script must be printed before all script modules.

Fixes #1083

Relevant technical choices

Outdated technical choices

Previously the detection script module was injected just before the </head> closing tag. This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. So instead of injecting the detection script at the end of the head, this PR changes it so that they get injected at the point where wp_print_footer_scripts happens. This injection happens by means of printing a placeholder string during template rendering and then this placeholder is replaced with the script module during output buffer processing. The placeholder is randomized to prevent accidental replacement of content. A placeholder is used instead of injecting before </body> since it may be that this exists inside of a comment, like  in which case the replacement would be invalid.

Note that the injection of the preload links continues to happen just before </head>. The reason that a similar placeholder string is not used is that when a string appears in the head, the DOM specifies that this should cause the head to immediately close and implicitly open the body. In case there is any DOMDocument processing being done on the page in addition to our output buffer, the presence of a string placeholder would break the DOM.

Additionally, a mustache-ish tag is used for the placeholder instead of an HTML comment because other plugins may try to optimize the page by removing all HTML comments to reduce page weight.

Ultimately, the injection of the preload links in the head and the detection script in the body should leverage the HTML API (via WP_HTML_Processor) once it is updated to support tag node insertions, which it does not support yet.

Previously the detection script module was injected just before the </head> closing tag (using search/replace). This broke import maps in classic themes, as an import map script must be added before any script modules, and classic themes output import map scripts in the footer. Since it was using search/replace, it was also vulnerable to injecting the script in the wrong place, for example if the HTML contained . Now the HTML Tag Processor is extended (thanks to @dmsnell) to inject the script tag right before the </body> closing tag (and not confused by ). The HTML Tag Processor is also used to inject any preload links at the end of the </head>. Note also that what was known as OD_HTML_Tag_Processor has been renamed to OD_HTML_Tag_Walker, and a new class OD_HTML_Tag_Processor is introduced which directly extends WP_HTML_Tag_Processor. The OD_HTML_Tag_Walker class (and the old OD_HTML_Tag_Processor class) did not extend WP_HTML_Tag_Processor but used it by composition. The new OD_HTML_Tag_Processor includes a method for injecting HTML in the head and body.

Finally, this PR removes the use of ext-dom to do equality assertions in PHPUnit. This was overkill and it made it difficult to debug when the assertions started failing.

This also fixes (46ca5f8) a trivial bug where an extra slash was present in the path to the web-vitals.js library (e.g. optimization-detective//build/web-vitals.js).

plugins/optimization-detective/optimization.php

Update tests

github-actions · 2024-04-04T20:33:20Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: westonruter <[email protected]>
Co-authored-by: felixarntz <[email protected]>
Co-authored-by: dmsnell <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

felixarntz

@westonruter One thing I don't understand (maybe I'm not understanding your explanation fully) is why the script is now being injected via replacing a placeholder.

Instead of injecting a placeholder into wp_print_footer_scripts and replacing it as part of output buffering, why can't we simply inject the actual script into wp_print_footer_scripts?

westonruter · 2024-04-04T21:02:26Z

@felixarntz Right, because we should only inject the detection script if we need it. If detection isn't needed, then the script should be omitted. Whether detection is needed is determined at the beginning of output buffer processing in od_optimize_template_output_buffer():

performance/plugins/optimization-detective/optimization.php

Lines 186 to 196 in 1b4485f

    
           $post = OD_URL_Metrics_Post_Type::get_post( $slug ); 
        
           $group_collection = new OD_URL_Metrics_Group_Collection( 
        
           	$post ? OD_URL_Metrics_Post_Type::get_url_metrics_from_post( $post ) : array(), 
        
           	od_get_breakpoint_max_widths(), 
        
           	od_get_url_metrics_breakpoint_sample_size(), 
        
           	od_get_url_metric_freshness_ttl() 
        
           ); 
        
           // Whether we need to add the data-od-xpath attribute to elements and whether the detection script should be injected. 
        
           $needs_detection = ! $group_collection->is_every_group_complete();

If the script injection were to be moved outside of the output buffer callback, it would look something like this:

add_action(
	'wp_print_footer_scripts',
	static function () {
		$slug = od_get_url_metrics_slug( od_get_normalized_query_vars() );
		$post = OD_URL_Metrics_Post_Type::get_post( $slug );

		$group_collection = new OD_URL_Metrics_Group_Collection(
			$post ? OD_URL_Metrics_Post_Type::get_url_metrics_from_post( $post ) : array(),
			od_get_breakpoint_max_widths(),
			od_get_url_metrics_breakpoint_sample_size(),
			od_get_url_metric_freshness_ttl()
		);
		
		if ( ! $group_collection->is_every_group_complete() ) {
			echo od_get_detection_script( $slug, $group_collection );
		}
	}
);

Nevertheless, as is, this would entail getting the same data twice. But assuming that is resolved with a class refactor to remember that state... there is another downside to doing this, and that is mentioned in this todo:

performance/plugins/optimization-detective/optimization.php

Lines 344 to 348 in 1b4485f

    
           // Inject detection script. 
        
           // TODO: When optimizing above, if we find that there is a stored LCP element but it fails to match, it should perhaps set $needs_detection to true and send the request with an override nonce. However, this would require backtracking and adding the data-od-xpath attributes. 
        
           if ( $needs_detection ) { 
        
           	$head_injection .= od_get_detection_script( $slug, $group_collection ); 
        
           }

During optimization, it may be that we find that XPaths in the URL Metrics don't match against something in the document. When this happens, it really should be forcing $needs_detection to be true so that we can start collecting data. More would be involved there, as we'd have to have a way to make sure the endpoint allows receiving a new URL Metric even though the groups are complete, but that was in part why I injected the detection script as part of the output buffer processing. Also, the preload links can only be injected during output buffer processing as well.

westonruter · 2024-04-04T21:12:46Z

Hold on. In talking with Dennis apparently there is a way to inject tags with the tag processor.

…tag processor Co-authored-by: dmsnell <[email protected]>

westonruter · 2024-04-05T00:06:23Z

@dmsnell It seems like the tag insertion in WP 6.4 isn't working. It's resulting in a document like this:

<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        <script type="module">/* import detect ... */</script>

<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        </body>
</html>

When it should be resulting in the following, which it does in WP 6.5:

<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>...</title>
    </head>
    <body>
        <img data-od-xpath="/*[0][self::HTML]/*[1][self::BODY]/*[0][self::IMG]" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" loading="lazy">
        <script type="module">/* import detect ... */</script>
    </body>
</html>

dmsnell

in WordPress 6.5 we did change the span representation and maybe that's in play here, since we're creating the private internal classes. if this code needs to work with both versions then it would likely have to detect that.

alternatively you could also try another approach that I've done, which is to trap externally the edits you want to make. that would ensure you don't have to worry about versioning.

so for example, once you get the head-end and body-end bookmarks, hold on to them until the end and then use them to perform your text operations.

now at the end you can perform your script and link injection in one fell swoop and not worry about the internal WP_HTML_Text_Replacement class changes.

new class ( $html ) extends WP_HTML_Tag_Processor {
	public function split_at_start_of_bookmarks( ...$names ) {
		$splits = array();
		$at     = 0;

		foreach ( $names as $name ) {
			$mark     = $this->bookmarks[ $name ];
			$splits[] = substr( $this->html, $at, $mark->start );
			$at       = $mark->start;
		}

		if ( $at < strlen( $this->html ) ) {
			$splits[] = substr( $this->html, $at );
		}

		return $splits;
	}
}

$links    = array();
$scripts  = array();

while ( $processor->next_tag( … ) ) {
	if ( $need_a_link ) {
		$links[] = "<link rel=anything>";
	}

	…

	if ( ! $found_head && 'HEAD' === $tag_name && $processor->is_tag_closer() ) {
		$processor->set_bookmark( HEAD_BOOKMARK );
	}

	…
}

// Processing is done, inject content.
list( $head, $body, $footer ) = $processor->split_at_start_of_bookmarks( HEAD_MARK, BODY_MARK );
return $head . implode( "\n", $links ) . $body . implode( "\n", $scripts ) . $footer;

again, this is only one way of doing it to avoid trying to detect the running version of WordPress. it's probably easiest to perform that version check and adjust whether you add 0 or the start of the bookmark again (length vs. ending location).

dmsnell · 2024-04-05T07:46:59Z

plugins/optimization-detective/class-od-html-tag-processor.php

+			}
+		};
+
+		$this->processor = new $processor( $html );


when creating the anonymous class, isn't it already instantiated? I don't think we want this second new

Oh, right. Fixed in 7a6debc

dmsnell · 2024-04-05T07:50:44Z

plugins/optimization-detective/class-od-html-tag-processor.php

+				$this->lexical_updates[] = new WP_HTML_Text_Replacement(
+					$this->bookmarks[ $bookmark ]->start,
+					0,
+					$html


oh shoot, this may be a result of the change of signature of the span/range functions from (start, end) to (start, length).

on WordPress 6.4 the span would be from $this->bookmarks[ $bookmark ]->start to $this->bookmarks[ $bookmark ]->end I think. I wouldn't understand why a 0 value here though would duplicate the string.

also it's suspiciously close to what would happen if we weren't to shadow the $html from line 183, but that seems impossible.

Great! Easily fixed then in 03ad8f9.

Co-authored-by: dmsnell <[email protected]>

felixarntz

@westonruter I don't feel like I'm the right person to review this, would be great to get @dmsnell's approval as he's far more familiar with the internals of the HTML tag processor.

I only have one high level feedback, but overall it looks good to me.

felixarntz · 2024-04-09T21:01:09Z

plugins/optimization-detective/class-od-html-tag-processor.php

@@ -160,7 +179,32 @@ final class OD_HTML_Tag_Processor {
 	 * @param string $html HTML to process.
 	 */
 	public function __construct( string $html ) {
-		$this->processor = new WP_HTML_Tag_Processor( $html );
+
+		$this->processor = new class( $html ) extends WP_HTML_Tag_Processor {


This seems a bit fragile to me. Can we make it an actual named class?

Fragile how? It's what @dmsnell suggested so I follow his lead for the use of the anonymous class.

I've invited @dmsnell to the Performance team on GitHub for proper approval granting 😄

I'm not sure about what kind of fragility we're concerned about here. Extending WP_HTML_Tag_Processor provides guarantees about the behavior of the underlying methods.

The anonymous class was more or less an example I tossed out as a convenient way to create or expose risky behaviors from the HTML API without exporting that to the outside world. It creates these methods but they shouldn't be instantiable or callable from outside, and won't show up in documentation. Also they are convenient if your styling rules forbid having more than one class in the same file.

I'm just questioning why we use an anonymous class here. We haven't done that anywhere, even with other classes we consider private. So it's a new pattern and I'm not sure we want to go that route, as it would lead to more such cases in the future.

As @dmsnell says, the anonymous class was just "a convenient way". Changing it to a regular class will probably take just a few minutes, but follow our patterns established so far.

I just don't see any good reason for an anonymous class, other than it was very quick to throw in here and we don't have to think about naming. We can just mark it as @access private like we do with other classes, then it doesn't matter if someone else tries to use it - that's on them, and there's no support or guarantees for it.

I'd prefer not 😄

I guess you feel strongly then. I'll do it.

OK, this is done.

Thanks @westonruter, I think this is in line with how we've been writing classes so far. FWIW there are no anonymous classes in WP core either, and since this is a feature plugin I think it makes sense to follow core's patterns too.

Right, but this was intended to anticipate WP_HTML_Processor having such node insertion functionality. By the time this is proposed for core I was expecting the anonymous class would no longer be needed. Anyway, let's move on 😄

might be good to consider anonymous classes at times when we have something dangerous like this that we don't want to expose. it's unfortunate we have no way of hiding things otherwise.

westonruter · 2024-04-09T22:55:28Z

tests/plugins/optimization-detective/optimization-tests.php

-							<link as="image" data-od-added-tag="" fetchpriority="high" href="https://example.com/foo.jpg" rel="preload" media="screen">
+							<link data-od-added-tag rel="preload" fetchpriority="high" as="image" href="https://example.com/foo.jpg" media="screen">
 						</head>
 						<body>
-							<img src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" fetchpriority="high" data-od-added-fetchpriority data-od-removed-loading="lazy">
-							<img src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" data-od-removed-fetchpriority="high">
+							<img data-od-added-fetchpriority data-od-removed-loading="lazy" fetchpriority="high" src="https://example.com/foo.jpg" alt="Foo" width="1200" height="800" >
+							<img data-od-removed-fetchpriority="high" src="https://example.com/bar.jpg" alt="Bar" width="10" height="10" loading="lazy" >


Note: these changes are purely syntactic and not semantic. The equivalent DOM tree is produced. The difference is made here because no longer is DOMDocument being used for comparison in PHPUnit. Using DOMDocument normalized differences in attribute order. But with the removal of DOMDocument, the attribute order and value syntax has to match.

The same goes for the other changes below, other than the move of the script from the head to the footer, which is the only semantic change.

dmsnell

I don't see any obvious issues with the HTML API use. Thanks for continuing to lean into it and explore its boundaries.

dmsnell · 2024-04-10T08:25:20Z

plugins/optimization-detective/class-od-html-tag-processor.php

+				$this->lexical_updates[] = new WP_HTML_Text_Replacement(
+					$start,
+					// In WordPress 6.5, the signature was changed from $end to $length.
+					version_compare( get_bloginfo( 'version' ), '6.5', '<' ) ? $start : 0,


this probably doesn't matter much, but there's no need to recompute this on every call. the WordPress version won't change during the runtime (at least outside of the Playground where it could), so this could be set in the constructor or as a method-level static var.

Good point. Although this will only be called twice at most, once for the head insertion and once for body. I'll add a static var.

Done in b3fbf5c

Co-authored-by: dmsnell <[email protected]>

into fix/optimization-detective-import-maps

felixarntz

This LGTM, just one question to make sure we didn't miss it.

felixarntz · 2024-04-11T00:48:33Z

plugins/optimization-detective/class-od-html-tag-walker.php

+ * @since 0.1.1 Renamed from OD_HTML_Tag_Processor to OD_HTML_Tag_Walker
+ * @access private
+ */
+final class OD_HTML_Tag_Walker {


Just double checking as this is hard to review: I think you renamed the class that was originally wrapping WP_HTML_Tag_Processor from OD_HTML_Tag_Processor to this, and the new OD_HTML_Tag_Processor is the new extending class for WP_HTML_Tag_Processor?

That would make sense to me. I assume there are no further notable changes in this class here that haven't been reviewed?

Yes, I noted that in the @since tag and I also updated the PR description to explain this change.

westonruter added [Type] Bug An existing feature is broken [Plugin] Optimization Detective Issues for the Optimization Detective plugin labels Mar 23, 2024

westonruter added this to the optimization-detective n.e.x.t milestone Mar 23, 2024

westonruter commented Mar 23, 2024

View reviewed changes

plugins/optimization-detective/optimization.php Outdated Show resolved Hide resolved

Prevent detection script injection from breaking import maps

e7c450b

westonruter force-pushed the fix/optimization-detective-import-maps branch from fa03924 to e7c450b Compare April 4, 2024 16:04

westonruter added 6 commits April 4, 2024 09:06

Use mustache-style placeholder instead of comment

d17aac8

Use randomized placeholder

537bc19

Remove script placeholder when detection script not needed

402e3be

Update tests

Add missing since and access tags

7e45933

Locate the get/print functions together

ec2a7c9

Add tests

b847139

westonruter marked this pull request as ready for review April 4, 2024 20:33

westonruter requested a review from felixarntz as a code owner April 4, 2024 20:33

felixarntz reviewed Apr 4, 2024

View reviewed changes

westonruter marked this pull request as draft April 4, 2024 21:12

Inject preload links in HEAD and detection script module in BODY via …

69f8629

…tag processor Co-authored-by: dmsnell <[email protected]>

westonruter force-pushed the fix/optimization-detective-import-maps branch from 5d6e760 to 69f8629 Compare April 4, 2024 21:48

westonruter added 6 commits April 4, 2024 15:30

Remove phpstan-ignore-next-line

95b1961

Add closing tags in HTML comments to demonstrate successful injection

11276c6

Use internal libxml errors in tests

2e247e6

Move ext-dom to require-dev

cebb35d

Run composer update

99b4d08

Eliminate use of DOMDocument for PHPUnit equality assertions

7e4cae7

westonruter changed the base branch from trunk to release/3.0.0 April 4, 2024 23:52

Update readme

6b11f45

dmsnell reviewed Apr 5, 2024

View reviewed changes

westonruter and others added 2 commits April 5, 2024 10:37

Remove redundant anonymous class instantiation

7a6debc

Co-authored-by: dmsnell <[email protected]>

Fix back-compat for WP_HTML_Text_Replacement

03ad8f9

Co-authored-by: dmsnell <[email protected]>

westonruter marked this pull request as ready for review April 5, 2024 17:50

westonruter added 2 commits April 5, 2024 12:54

Merge branch 'release/3.0.0' into fix/optimization-detective-import-maps

6eaf68e

Remove extra slash in path to web-vitals.js

46ca5f8

westonruter requested review from felixarntz and adamsilverstein April 8, 2024 17:40

felixarntz reviewed Apr 9, 2024

View reviewed changes

westonruter commented Apr 9, 2024

View reviewed changes

dmsnell approved these changes Apr 10, 2024

View reviewed changes

westonruter and others added 3 commits April 10, 2024 10:11

Use static var to prevent recomputing version needlessly

b3fbf5c

Co-authored-by: dmsnell <[email protected]>

Rename OD_HTML_Tag_Processor to OD_HTML_Tag_Walker

3aefe25

Split anonymous class into OD_HTML_Tag_Processor

a715290

westonruter force-pushed the fix/optimization-detective-import-maps branch from 4d89f64 to a715290 Compare April 11, 2024 00:09

westonruter added 2 commits April 10, 2024 17:11

Refine phpdoc

75a048e

Merge branch 'release/3.0.0' of https://github.com/WordPress/performance

6c4580e

into fix/optimization-detective-import-maps

westonruter requested a review from felixarntz April 11, 2024 00:17

felixarntz approved these changes Apr 11, 2024

View reviewed changes

westonruter merged commit fc2f45f into release/3.0.0 Apr 11, 2024
33 checks passed

westonruter deleted the fix/optimization-detective-import-maps branch April 11, 2024 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent detection script injection from breaking import maps in classic themes #1084

Prevent detection script injection from breaking import maps in classic themes #1084

westonruter commented Mar 23, 2024 •

edited

Loading

github-actions bot commented Apr 4, 2024 •

edited

Loading

felixarntz left a comment

westonruter commented Apr 4, 2024

westonruter commented Apr 4, 2024

westonruter commented Apr 5, 2024 •

edited

Loading

dmsnell left a comment •

edited

Loading

dmsnell Apr 5, 2024

westonruter Apr 5, 2024

dmsnell Apr 5, 2024

westonruter Apr 5, 2024

felixarntz left a comment

felixarntz Apr 9, 2024

westonruter Apr 9, 2024

westonruter Apr 9, 2024

dmsnell Apr 10, 2024

felixarntz Apr 10, 2024

westonruter Apr 10, 2024

westonruter Apr 11, 2024

felixarntz Apr 11, 2024

westonruter Apr 11, 2024 •

edited

Loading

dmsnell Apr 11, 2024

westonruter Apr 9, 2024

dmsnell left a comment

dmsnell Apr 10, 2024

westonruter Apr 10, 2024

westonruter Apr 10, 2024

felixarntz left a comment

felixarntz Apr 11, 2024

westonruter Apr 11, 2024

Prevent detection script injection from breaking import maps in classic themes #1084

Prevent detection script injection from breaking import maps in classic themes #1084

Conversation

westonruter commented Mar 23, 2024 • edited Loading

Summary

Relevant technical choices

github-actions bot commented Apr 4, 2024 • edited Loading

felixarntz left a comment

Choose a reason for hiding this comment

westonruter commented Apr 4, 2024

westonruter commented Apr 4, 2024

westonruter commented Apr 5, 2024 • edited Loading

dmsnell left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonruter Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonruter commented Mar 23, 2024 •

edited

Loading

github-actions bot commented Apr 4, 2024 •

edited

Loading

westonruter commented Apr 5, 2024 •

edited

Loading

dmsnell left a comment •

edited

Loading

westonruter Apr 11, 2024 •

edited

Loading