Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\Dom\HTMLDocument querySelector attribute name is case sensitive in HTML #17802

Closed
momala454 opened this issue Feb 14, 2025 · 4 comments
Closed

Comments

@momala454
Copy link

Description

I don't see any mention on it on the different Dom rfc(the doc is incomplete), but querySelector/querySelectorAll are case sensitive on php, while on browsers like firefox, it is case insensitive

The following code:

<?php

$text = <<<TEXT
 <html>
<head>
<meta charset="Windows-1252">
</head>
<body>
</body>
</html>
TEXT;

$dom = \Dom\HTMLDocument::createFromString($text, options: LIBXML_NOERROR);
var_dump($dom->querySelector('meta[charseT]'));

Resulted in this output:

NULL

But I expected this output instead:

object(Dom\HTMLElement)#2 (29) {
  ["namespaceURI"]=>
  string(28) "http://www.w3.org/1999/xhtml"
  ["prefix"]=>
  NULL
  ["localName"]=>
  string(4) "meta"
  ["tagName"]=>
  string(4) "META"
  ["id"]=>
  string(0) ""
  ["className"]=>
  string(0) ""
  ["classList"]=>
  string(22) "(object value omitted)"
  ["attributes"]=>
  string(22) "(object value omitted)"
  ["firstElementChild"]=>
  NULL
  ["lastElementChild"]=>
  NULL
  ["childElementCount"]=>
  int(0)
  ["previousElementSibling"]=>
  NULL
  ["nextElementSibling"]=>
  NULL
  ["innerHTML"]=>
  string(0) ""
  ["substitutedNodeValue"]=>
  string(0) ""
  ["nodeType"]=>
  int(1)
  ["nodeName"]=>
  string(4) "META"
  ["baseURI"]=>
  string(11) "about:blank"
  ["isConnected"]=>
  bool(true)
  ["ownerDocument"]=>
  string(22) "(object value omitted)"
  ["parentNode"]=>
  string(22) "(object value omitted)"
  ["parentElement"]=>
  string(22) "(object value omitted)"
  ["childNodes"]=>
  string(22) "(object value omitted)"
  ["firstChild"]=>
  NULL
  ["lastChild"]=>
  NULL
  ["previousSibling"]=>
  string(22) "(object value omitted)"
  ["nextSibling"]=>
  string(22) "(object value omitted)"
  ["nodeValue"]=>
  NULL
  ["textContent"]=>
  string(0) ""
}

PHP Version

8.4.4

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented Feb 14, 2025

Good find. It's of course more complex in the spec than one would hope...

Relevant part of spec: https://html.spec.whatwg.org/#case-sensitivity-of-selectors

Basically, the CSS selector attribute name must be converted to lowercase in HTML elements, and then compared case-sensitive to the attribute name in the element...

Also likely an issue in upstream Lexbor as the CSS selector code was adapted from there...

@nielsdos nielsdos changed the title \Dom\HTMLDocument querySelector is case sensitive, unlike in browser \Dom\HTMLDocument querySelector attribute name is case sensitive in HTML Feb 14, 2025
nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 15, 2025
…se sensitive in HTML

According to https://html.spec.whatwg.org/#case-sensitivity-of-selectors,
the CSS selector attribute name must be converted to lowercase in HTML elements,
and then compared case-sensitive to the attribute name in the element.
We implement this not by doing the explicit conversion, but by a manual
loop using a function that first converts the rhs characters to
lowercase and keeps the lhs characters the same, achieving the same
effect.
@momala454
Copy link
Author

thanks for working on a fix. Not sure if I understood correctly what you said, but does your fix also work for this ?

$text = <<<TEXT
 <html>
<head>
<meta charset="Windows-1252">
</head>
<body>
</body>
</html>
TEXT;

$dom = \Dom\HTMLDocument::createFromString($text, options: LIBXML_NOERROR);
var_dump($dom->querySelector('meta[charseT="windows-1252"]'));

@nielsdos
Copy link
Member

Ah great, even more exceptions to the rule 🤔
No it does not fix that yet, I'll get to it but I'm first doing some other task

Attribute selectors on an HTML element in an HTML document must treat the values of attributes with the following names as ASCII case-insensitive:

@nielsdos
Copy link
Member

Should be good now

nielsdos added a commit that referenced this issue Feb 17, 2025
* PHP-8.4:
  Fix lowercase HTML attribute exceptions
  Fix GH-17802: \Dom\HTMLDocument querySelector attribute name is case sensitive in HTML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants