Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize parsing of OSC_STRING to minimize string concatenation. #1822

Merged
merged 5 commits into from
Dec 11, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions src/EscapeSequenceParser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,11 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
}

// normal transition & action lookup
transition = (code < 0xa0) ? (table[currentState << 8 | code]) : DEFAULT_TRANSITION;
transition = (code < 0xa0
? (table[currentState << 8 | code])
: currentState === ParserState.OSC_STRING
? (ParserAction.OSC_PUT << 4) | ParserState.OSC_STRING
Copy link
Member

@jerch jerch Dec 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.
If we have to handle unicode here maybe extend the error state? Note that I am not very fond of the way the parser handles unicode atm, maybe we even find a better way than misusing the error state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can OSC contain higher chars at all? vt100.net does not cover this as it does not cover unicode at all.

More relevant is what xterm does. The code is a bit hard to read, but I think we're safe treating at least codes above 0xFF as printable. Codes between 0xA0 and 0xFF are probably ok too, though xterm's parse tables are indexed by 8-bit (Latin-1?) codes, not 7-bit ANSII. (Other modern well-maintained terminal emulators such as gnome-terminal are probably also worth looking at.)

(I assume you know xterm.js does a pretty poor job with vttest, so if we're concerned about compatibility there are higher priorities.)

A quick hack for characters above 0x9f is to use some suitable printable character instead:

table[currentState << 8 | (code < 0xa0 ? code : 97)]

Copy link
Member

@jerch jerch Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the error state treats any code >0x9f as printable but limits printables to certain states. This could be extended by the OSC_STRING state.
About vttest - yeah, we have several issues open regarding this (e.g. #1434 this is a really wicked one), also we have test cases that compare the buffer state with xterm (https://github.com/xtermjs/xterm.js/tree/master/fixtures/escape_sequence_files many are still disabled since xterm.js does not the right thing yet). Basically vttest compliance boils down to InputHandler doing things slightly different still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a special pseudo-character for code above 0x9f:

const NON_ASCII_PRINTABLE = 0xA0;

table.add(NON_ASCII_PRINTABLE, ParserState.OSC_STRING, ParserAction.OSC_PUT, ParserState.OSC_STRING);

and then:

table[currentState << 8 | (code < 0xa0 ? code : NON_ASCII_PRINTABLE)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yepp I like that idea 👍
Maybe we can even apply this to the other unicode aware states later on, would greatly simplify the error state (and restore its original purpose lol).

: DEFAULT_TRANSITION);
switch (transition >> 4) {
case ParserAction.PRINT:
print = (~print) ? print : i;
Expand Down Expand Up @@ -423,10 +427,6 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
case ParserState.GROUND:
print = (~print) ? print : i;
break;
case ParserState.OSC_STRING:
osc += String.fromCharCode(code);
transition |= ParserState.OSC_STRING;
break;
case ParserState.CSI_IGNORE:
transition |= ParserState.CSI_IGNORE;
break;
Expand Down Expand Up @@ -517,7 +517,16 @@ export class EscapeSequenceParser extends Disposable implements IEscapeSequenceP
osc = '';
break;
case ParserAction.OSC_PUT:
osc += data.charAt(i);
for (let j = i + 1; ; j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify the check by doing an including check against printables (range check? and maybe unicode if we have to cover those)? This would avoid the table lookup which is kinda costly.
Note an including check does not have to cover all chars, we might get away with the most common. All others will still be handled by the switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

    case ParserAction.OSC_PUT:
      for (let j = i + 1; ; j++) {
        if (j >= l
            || (code = data.charCodeAt(j)) <= 0x20
            || (code >= 0x7f && code <= 0x9f)) {
          osc += data.substring(i, j);
          i = j - 1;
          break;
        }
      }
      break;

Copy link
Member

@jerch jerch Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait this is not quite correct, put is defined as 0x20 - 0x7f, seems 0x20 and 0x7f should be added to the string. I wonder about 0x7f (DEL) though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely should be < 0x20 rather than <= 0x20.

Less sure about code >= 0x7f vs code > 0x7f. Clearly 0x7F isn't "printable". One might argue that while it should be allowed in an OSC string it is sufficiently dubious that we don't want to hardwire it the code, but instead have it be controlled by the transition table. But as long as we don't have a mechanism for overriding the transition table (and are not planning one) we might as well go with the more efficient code > 0x7f. I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should simply add it for now (S.E.P. for the parser 😄). DEC VTs handled this differently for the GROUND state as decribed here https://vt100.net/emu/dec_ansi_parser#STGRO.

The transition table can be overridden as ctor argument. But this is currently not exposed, we had no reason so far to do so. In theory the VT models would need slightly different tables (DEC changed alot for >VT300), the table currently resembles VT500 generation (thus not perfectly VT100 compliant).

if (j >= l
|| ((code = data.charCodeAt(j)) <= 0x9f
&& (table[ParserState.OSC_STRING << 8 | code] >> 4
!== ParserAction.OSC_PUT))) {
osc += data.substring(i, j);
i = j - 1;
break;
}
}
break;
case ParserAction.OSC_END:
if (osc && code !== 0x18 && code !== 0x1a) {
Expand Down