Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Parser #38

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Custom Parser #38

wants to merge 7 commits into from

Conversation

tyler
Copy link
Member

@tyler tyler commented Dec 20, 2024

Just a draft to start with.

This attempts to implement a parser specifically for ESI, which largely ignores the surrounding document. As a few issues have noted, currently we trip over some invalid (and also valid) HTML because we're using an XML parser to find the ESI tags.

That leaves us in the unfortunate position of not being able to use an XML parser to find XML, and so needing to use a heavyweight HTML parser just so we can ignore it. Somehow I've developed the audacity to think that we could get around this by building a custom parser that knows just enough about HTML to get around its foibles while also finding the ESI XML tags. We'll see.

Upsides of this, in addition to not tripping on < inside javascript blocks are that we can integrate the tag parsing and the expression parsing and just turn the whole thing into a stream of tokens that either get splatted out onto the wire or run through an interpreter and then splatted out onto the wire, which can be done in a completely streaming fashion.

Downsides are the 40 years of shenanigans inside HTML. This is going to require some serious testing to have any confidence in.

Goals:

  • Documents without ESI should make it through completely unchanged
  • Don't trip on HTML shenanigans when parsing ESI
  • Correctly identify interpolated expressions inside ESI blocks
  • Everything should be streaming, no blocking while the whole document buffers

Non-goals:

  • Parsing HTML. We need to identify it and ignore it, not validate it and not break it.
  • ...?

Base automatically changed from tyler/vars to main January 15, 2025 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant