-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize parsing of OSC_STRING to minimize string concatenation. #1822
Changes from 1 commit
1c4a71f
2cb293c
2d8f420
6a2e6a5
bbbfc04
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -391,7 +391,11 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP | |
} | ||
|
||
// normal transition & action lookup | ||
transition = (code < 0xa0) ? (table[currentState << 8 | code]) : DEFAULT_TRANSITION; | ||
transition = (code < 0xa0 | ||
? (table[currentState << 8 | code]) | ||
: currentState === ParserState.OSC_STRING | ||
? (ParserAction.OSC_PUT << 4) | ParserState.OSC_STRING | ||
: DEFAULT_TRANSITION); | ||
switch (transition >> 4) { | ||
case ParserAction.PRINT: | ||
print = (~print) ? print : i; | ||
|
@@ -423,10 +427,6 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP | |
case ParserState.GROUND: | ||
print = (~print) ? print : i; | ||
break; | ||
case ParserState.OSC_STRING: | ||
osc += String.fromCharCode(code); | ||
transition |= ParserState.OSC_STRING; | ||
break; | ||
case ParserState.CSI_IGNORE: | ||
transition |= ParserState.CSI_IGNORE; | ||
break; | ||
|
@@ -517,7 +517,16 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP | |
osc = ''; | ||
break; | ||
case ParserAction.OSC_PUT: | ||
osc += data.charAt(i); | ||
for (let j = i + 1; ; j++) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we simplify the check by doing an including check against printables (range check? and maybe unicode if we have to cover those)? This would avoid the table lookup which is kinda costly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh wait this is not quite correct, put is defined as 0x20 - 0x7f, seems 0x20 and 0x7f should be added to the string. I wonder about 0x7f (DEL) though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely should be Less sure about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think we should simply add it for now (S.E.P. for the parser 😄). DEC VTs handled this differently for the GROUND state as decribed here https://vt100.net/emu/dec_ansi_parser#STGRO. The transition table can be overridden as ctor argument. But this is currently not exposed, we had no reason so far to do so. In theory the VT models would need slightly different tables (DEC changed alot for >VT300), the table currently resembles VT500 generation (thus not perfectly VT100 compliant). |
||
if (j >= l | ||
|| ((code = data.charCodeAt(j)) <= 0x9f | ||
&& (table[ParserState.OSC_STRING << 8 | code] >> 4 | ||
!== ParserAction.OSC_PUT))) { | ||
osc += data.substring(i, j); | ||
i = j - 1; | ||
break; | ||
} | ||
} | ||
break; | ||
case ParserAction.OSC_END: | ||
if (osc && code !== 0x18 && code !== 0x1a) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.
If we have to handle unicode here maybe extend the error state? Note that I am not very fond of the way the parser handles unicode atm, maybe we even find a better way than misusing the error state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.
More relevant is what xterm does. The code is a bit hard to read, but I think we're safe treating at least codes above 0xFF as printable. Codes between 0xA0 and 0xFF are probably ok too, though xterm's parse tables are indexed by 8-bit (Latin-1?) codes, not 7-bit ANSII. (Other modern well-maintained terminal emulators such as gnome-terminal are probably also worth looking at.)
(I assume you know xterm.js does a pretty poor job with vttest, so if we're concerned about compatibility there are higher priorities.)
A quick hack for characters above 0x9f is to use some suitable printable character instead:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the error state treats any code >0x9f as printable but limits printables to certain states. This could be extended by the OSC_STRING state.
About vttest - yeah, we have several issues open regarding this (e.g. #1434 this is a really wicked one), also we have test cases that compare the buffer state with xterm (https://github.com/xtermjs/xterm.js/tree/master/fixtures/escape_sequence_files many are still disabled since xterm.js does not the right thing yet). Basically vttest compliance boils down to
InputHandler
doing things slightly different still.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a special pseudo-character for code above 0x9f:
and then:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yepp I like that idea 👍
Maybe we can even apply this to the other unicode aware states later on, would greatly simplify the error state (and restore its original purpose lol).