Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize IParser API #2607

Closed
Tyriar opened this issue Dec 5, 2019 · 26 comments · Fixed by #2689
Closed

Stabilize IParser API #2607

Tyriar opened this issue Dec 5, 2019 · 26 comments · Fixed by #2689
Assignees
Labels
area/api type/debt Technical debt that could slow us down in the long run
Milestone

Comments

@Tyriar
Copy link
Member

Tyriar commented Dec 5, 2019

It's currently flagged experimental:

xterm.js/typings/xterm.d.ts

Lines 418 to 422 in bd0d267

/**
* (EXPERIMENTAL) Get the parser interface to register
* custom escape sequence handlers.
*/
readonly parser: IParser;

@jerch any concerns about calling this stable? Any more tweaks in mind?

Not sure how many users there are but here's VS Code's single usage which seems to work well:

https://github.com/microsoft/vscode/blob/1beea2f864c1f403ab225dbadfdc398d76faef94/src/vs/workbench/contrib/terminal/browser/terminalInstance.ts#L499-L505

My only thought is we should think about the verb (add, register, etc.) and figure out where we want to go. I think add feels a little strange without a remove, VS Code uses register for providers/handlers/factories:

Screen Shot 2019-12-05 at 2 35 11 PM

@Tyriar Tyriar added type/debt Technical debt that could slow us down in the long run area/api labels Dec 5, 2019
@jerch
Copy link
Member

jerch commented Dec 5, 2019

@Tyriar Sounds good to me, also using register instead of add.

About loose ends - well I dont like the fact that we have a hard limit on OSC/DCS, but thats not really a concern atm. If we have to support arbitrary length sequences in the future, we could always expose the chunked interfaces as well.
@sdegutis had a request for addressing individual sequence params here, but I currently think we should not get into subsequence propagation business. For most sequences params have a meaning coupled to the position, thus it makes not much sense to go that route. Sequences with freely stacking params like DECSET or SGR it is still possible to hook in with a custom handler and just deal with the params your interested in.

All in all I think the API works great so far.

@Tyriar
Copy link
Member Author

Tyriar commented Dec 5, 2019

well I dont like the fact that we have a hard limit on OSC/DCS, but thats not really a concern atm

Can you elaborate a little on what the limit is?

@sdegutis I'm interested if you had any more feedback after playing with the API.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

@Tyriar The limit is a simple hard limit of max payload length of 10 MB. Any sequence beyond that will be silently swallowed.

@thesoftwarephilosopher
Copy link
Contributor

thesoftwarephilosopher commented Dec 6, 2019

Oh hey @Tyriar and @jerch how's it goin. So my use-case was that I just wanted to create an onScreenChanged callback capture ?1049h and ?1049l for of my basic "widget overlay" feature, because I have to hide the departing widget-container overlay for the given screen when it's switched away from, and show the arriving widget-container for the new current screen/buffer/thing (what's the right name for these things?).

Right now I have to do this:

  term.parser.addCsiHandler({ prefix: '?', final: 'h' }, (params) => {
    if (params.includes(1049)) {
      altScreen = true;
      toggledAltScreen();
    }
    return false;
  });

  term.parser.addCsiHandler({ prefix: '?', final: 'l' }, (params) => {
    if (params.includes(1049)) {
      altScreen = false;
      toggledAltScreen();
    }
    return false;
  });

That's fine enough I guess, but there's a couple ways it could maybe be improved:

  • Matching a whole string, like { whole: '?1049h' } -- this would still need two callbacks though, but it may not be an issue. Not sure about performance but I don't assume it should be worse since you have the whole string length and can quickly switch on that before doing string comparison.

  • Matching a regex like { regex: /?1049(h|l)/ } -- this is probably worse performance since you can't just switch on the string length ahead of time, so you'd have to test the regex every time you get a new character, unless it works the same as whole would and only tries to match after the sequence is done.

@thesoftwarephilosopher
Copy link
Contributor

This is the nicest version I can think of, for my highly specific use-case:

  // not sure the \? is right, forgot how JS regex works, but ignore that
  term.parser.addCsiHandler({ regex: /\?1049(h|l)/ }, (match) => {
    altScreen = match[1] === 'h';
    toggledAltScreen();
    return false;
  });

@thesoftwarephilosopher
Copy link
Contributor

@Tyriar VS Code's single usage would also work well with the regex approach, using { regex: /H$/ }. And probably just as performant, while being more flexible. In fact, may not even need an option map anymore with prefix/suffix/etc; the first argument could just simply be a regex.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

@sdegutis Thx for the feedback. A few notes from my side:

Right now I have to do this: [..code]

Yes thats currently the only way to hook up something like a state change listener. But keep in mind that DECSET can freely stack param values, thus you will miss 1049 if its not the first param, e.g. CSI ? ... ; 1049 ; ... h.

About regexp:
Your regexp idea will not work for a simple reason - the parser acts on UTF32 codepoints not strings (and I am not keen to write a regexp parser for this lol). The function identifier gets translated into numbers which form a jump table key of handlers in the parser. Thats a quite low level processing of data, but the only way to keep the parser fast (in fact the key -> target jump is still quite costly).

A more general note on regexp and ANSI sequences - its almost impossible to get this right. There are simply too many ifs and elses in the parsing state that you would have to pull into the regexp itself. I think all regexp based packages trying to strip sequences from data get this wrong. To correctly grab DECSET 1049 by a regexp it would look more like:

/(([\x1b][\[])|[\x9b])[?][0-9;]*?1049((h|l)|(;[0-9;]+?)(h|l))/

(This is still slightly wrong but who cares, guess you get my point...)

So imho regexp doesnt work here for those two reasons.

@thesoftwarephilosopher
Copy link
Contributor

Makes sense, thanks for the info.

What about allowing params to be arbitrary number of bytes, e.g. { params: '1049' }? But if that's also too inefficient then I'm fine to keep it as-is, since that would only be a (very) minor improvement over what I have now.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

Well the parser does not slice into sequences, it acts on sequences as a whole (pretty much again for perf reasons). So the parser would trigger you handler for all of these:

  • CSI ? h --> param: [0]
  • CSI ? 1 ; 2 ; h --> param: [1, 2, 0]
  • CSI ? 99999999999999999999 h --> param: [2^31 - 1] (clamped to a int32_t)

Ofc it would be possible to do a slicing into single params with a next step, but its simply not worth it (as I wrote above): most sequences are position aware for their params, thus cannot be sliced at all. Other like DECSET can combine the params freely and are still valid, only those could be sliced, but are simply not worth the overhead atm.

Some background on this: a sequence can be seen as function provided by the terminal, where params translate into function arguments, thus CSI ? 1;2;3;4;5 h translates into:

  • function: prefix '?' (DEC private) + final 'h' (SM) --> DECSET
  • call function with params: for (p in params) DECSET p;

We only provide the function interface, thus you get all arguments.

@Tyriar
Copy link
Member Author

Tyriar commented Dec 6, 2019

I think @sdegutis's problem could be solved with a proper example in xterm.d.ts, what he ended up landing on was how to use it correctly (though a return true might be better in the if?).

@thesoftwarephilosopher
Copy link
Contributor

@Tyriar Well also that the params key confused me because I wasn't sure what it meant by bytes at first, partially due to my misunderstanding of DECSET stuff. So looking back, I'm pretty sure that if I wanted to capture 1049, then since params takes any substring, up to two, then I could just pass 10 or 04 or 49 and it would basically limit which callbacks I get to matching that substring (which means 1000 and 49 would match). So yeah I think just a simple example in the docs would clarify. I have no problem with the API as-is.

@thesoftwarephilosopher
Copy link
Contributor

@Tyriar Also a return true in the if would stop xterm.js from switching buffers for me, I wanted it to still switch buffers like normal, I just wanted to capture that event and do extra stuff on top of it.

@Tyriar
Copy link
Member Author

Tyriar commented Dec 6, 2019

@sdegutis oh ok, you're using it exactly the same as me then. Just listening, not touching/canceling anything 🙂

@jerch
Copy link
Member

jerch commented Dec 6, 2019

... could be solved with a proper example in xterm.d.ts...

Yepp, could be part of the hooks cleanup and experimental remove.

though a return true might be better in the if?

Deleted (already answered 😄).

@thesoftwarephilosopher
Copy link
Contributor

@sdegutis oh ok, you're using it exactly the same as me then. Just listening, not touching/canceling anything 🙂

Yep! Really everything I've been talking about in any of these threads is all working towards an "add arbitrary DOM widgets to the terminal" feature. In this thread, it's that when you start vim, then a DOM element on the main buffer needs to be hidden, and shown again when you quit vim, so I needed a hook for that.

@thesoftwarephilosopher
Copy link
Contributor

My only thought is we should think about the verb (add, register, etc.) and figure out where we want to go. I think add feels a little strange without a remove, VS Code uses register for providers/handlers/factories:

Personally I think add is fine.

@Tyriar
Copy link
Member Author

Tyriar commented Dec 6, 2019

@sdegutis sounds like buffer decorations should be bound to a particular buffer so you don't need to juggle it like that.

@thesoftwarephilosopher
Copy link
Contributor

@Tyriar Yeah if it were a built-in feature then that would be pretty much needed. Imagine switching to vim while any decoration is on screen. It'd have to hide until vim quits or ctrl-z (SIGTSTP), right?

@Tyriar
Copy link
Member Author

Tyriar commented Dec 6, 2019

👍 moved discussion to #1852 (comment)

@thesoftwarephilosopher
Copy link
Contributor

Oh one thing to consider for this API: say you get CSI ? 1;2;3;4;5 h like in @jerch 's original example. You may want to react to 3 and prevent default behavior for that, but allow it for 1,2,4,5. So maybe this should return an array of events you want it to continue to try to handle?

@jerch
Copy link
Member

jerch commented Dec 6, 2019

@sdegutis Ah good catch, thats indeed a limitation currently and not possible by API means. Only way to get this behavior is to call the befault handler from the hook with just the params you want to have default actions. Hmm.

@thesoftwarephilosopher
Copy link
Contributor

Or why not dispatch at the moment you'd normally call a handler? Why deal with the array of codes as a whole? For example if CSI ? 1049 ; 2004 h is called, the code already runs an internal function to handle 1049, then one to handle 2004. So at those call sites (or rather just above them) why not just look for existing function in callback array? Is that possible? Would it be too inefficient? Pardon if it's a dumb question.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

Well it simply was no issue so far. Just checked xterm docs - out of 90 listed CSI commands only 14 can be stacked with no positional meaning. Currently we only implement 5 of those (and 50 in total). All others either cannot be split (due to positional meaning) or have zero/one params.

Or why not dispatch at the moment you'd normally call a handler? Why deal with the array of codes as a whole?

Because that would not work for commands that must not be split, thus we would have to maintain a list that separates splitables from the others, even if we dont implement that command ourselves (just in case someone wants to hook into that one). This seems rather unmaintainable to me (and the param descent would be rather ugly).

To get something close to that imho an altered params list as return value is easier to go with (just drop params that you already handled, keep others for the next handler).

@thesoftwarephilosopher
Copy link
Contributor

Got it, thanks for the info. I'll stop with the dumb questions now :) The existing API works fine enough for me.

@jerch
Copy link
Member

jerch commented Dec 6, 2019

There are no dumb ..., whatever - I admit the hooks interface should get better docs, its not obvious how to use it atm.

@jerch
Copy link
Member

jerch commented Dec 21, 2019

@Tyriar To sum this up here - from my side this looks ready to get "official". Gonna address the documentation issue separately with an article about sequences and hooks on the website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api type/debt Technical debt that could slow us down in the long run
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants