refactor(parser): reduce `Token` size from 32 to 16 bytes #1962

Boshen · 2024-01-09T06:26:10Z

Part of #1880

Token size is reduced from 32 to 16 bytes by changing the previous
token value Option<&'a str> to a u32 index handle.

It would be nice if this handle is eliminated entirely because
the normal case for a string is always &source_text[token.span.start.token.span.end]

Unfortunately, JavaScript allows escaped characters to appear in
identifiers, strings and templates. These strings need to be unescaped
for equality checks, i.e. "\a" === "a".

This leads us to adding a escaped_strings[] vec for storing these unescaped and allocated
strings.

Performance regression for adding this vec should be minimal because escaped strings are rare.

Background Reading:

https://floooh.github.io/2018/06/17/handles-vs-pointers.html

Boshen · 2024-01-09T06:26:21Z

main
- refactor(parser): reduce Token size from 32 to 16 bytes #1962 👈

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @Boshen and the rest of your teammates on Graphite

crates/oxc_parser/src/lexer/mod.rs

codspeed-hq · 2024-01-09T06:37:05Z

CodSpeed Performance Report

Merging #1962 will improve performances by 8.76%

_{Comparing 01-08-wip (24cfd1b) with main (66e95a5)}

Summary

⚡ 1 improvements
✅ 13 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`01-08-wip`	Change
⚡	`semantic[pdf.mjs]`	123.1 ms	113.2 ms	+8.76%

Part of #1880 `Token` size is from 32 to 16 bytes by changing the previous token value `Option<&'a str>` to a u32 index handle. It would be nice if this handle is eliminated entirely because the normal case for a string is always `source_text[token.span.start.token.span.end]` Unfortunately, JavaScript allows escaped characters to appear in identifiers, strings and templates. These strings need to be unescaped for equality checks, i.e. `"\a" === "a"`. This leads us to adding a `escaped_strings` `vec` for storing these unescaped and allocated strings. Performance regression for adding this vec should be minimal because escaped strings are rare. Background Reading: * https://floooh.github.io/2018/06/17/handles-vs-pointers.html

Boshen · 2024-01-09T07:05:01Z

@overlookmotel I think this is the furthest I can go 😅

…t#1962) Part of oxc-project#1880 `Token` size is reduced from 32 to 16 bytes by changing the previous token value `Option<&'a str>` to a u32 index handle. It would be nice if this handle is eliminated entirely because the normal case for a string is always `&source_text[token.span.start.token.span.end]` Unfortunately, JavaScript allows escaped characters to appear in identifiers, strings and templates. These strings need to be unescaped for equality checks, i.e. `"\a" === "a"`. This leads us to adding a `escaped_strings[]` vec for storing these unescaped and allocated strings. Performance regression for adding this vec should be minimal because escaped strings are rare. Background Reading: * https://floooh.github.io/2018/06/17/handles-vs-pointers.html

github-actions bot added the A-parser Area - Parser label Jan 9, 2024

Boshen commented Jan 9, 2024

View reviewed changes

crates/oxc_parser/src/lexer/mod.rs Show resolved Hide resolved

Boshen force-pushed the 01-08-wip branch from 58d7ab2 to 474554d Compare January 9, 2024 06:47

Boshen force-pushed the 01-08-wip branch from 474554d to 24cfd1b Compare January 9, 2024 06:50

Boshen requested a review from Dunqing January 9, 2024 07:01

This was referenced Jan 9, 2024

refactor(lexer): Improve template tokenization code #1963

Closed

Reduce size of Token? #1880

Closed

Boshen merged commit 4706765 into main Jan 9, 2024
18 checks passed

Boshen deleted the 01-08-wip branch January 9, 2024 07:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(parser): reduce `Token` size from 32 to 16 bytes #1962

refactor(parser): reduce `Token` size from 32 to 16 bytes #1962

Boshen commented Jan 9, 2024 •

edited

Loading

Boshen commented Jan 9, 2024

codspeed-hq bot commented Jan 9, 2024 •

edited

Loading

Boshen commented Jan 9, 2024

refactor(parser): reduce Token size from 32 to 16 bytes #1962

refactor(parser): reduce Token size from 32 to 16 bytes #1962

Conversation

Boshen commented Jan 9, 2024 • edited Loading

Boshen commented Jan 9, 2024

codspeed-hq bot commented Jan 9, 2024 • edited Loading

CodSpeed Performance Report

Merging #1962 will improve performances by 8.76%

Summary

Benchmarks breakdown

Boshen commented Jan 9, 2024

refactor(parser): reduce `Token` size from 32 to 16 bytes #1962

refactor(parser): reduce `Token` size from 32 to 16 bytes #1962

Boshen commented Jan 9, 2024 •

edited

Loading

codspeed-hq bot commented Jan 9, 2024 •

edited

Loading