-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance - Data batching #3336
Changes from all commits
9e30b2a
5034606
ac0735a
cb95268
cbe3bd9
c1d0c4d
436b71c
3ae50f6
a720fd7
7fce112
3d8ce1e
7f2960b
2a8eaf2
79bd78e
bdfbfa4
968b71e
3b383de
2db1721
3dd9054
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,8 +21,60 @@ try { | |
|
||
const envFromConfig = config.getConfig().env || {}; | ||
|
||
// Max duration to batch session data before sending it to the renderer process. | ||
const BATCH_DURATION_MS = 16; | ||
|
||
// Max size of a session data batch. Note that this value can be exceeded by ~4k | ||
// (chunk sizes seem to be 4k at the most) | ||
const BATCH_MAX_SIZE = 200 * 1024; | ||
|
||
// Data coming from the pty is sent to the renderer process for further | ||
// vt parsing and rendering. This class batches data to minimize the number of | ||
// IPC calls. It also reduces GC pressure and CPU cost: each chunk is prefixed | ||
// with the window ID which is then stripped on the renderer process and this | ||
// overhead is reduced with batching. | ||
class DataBatcher extends EventEmitter { | ||
constructor(uid) { | ||
super(); | ||
this.uid = uid; | ||
this.decoder = new StringDecoder('utf8'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. UTF8? hyper is already using binary transport for the pty? We might add this eventually to xterm.js in the future, had some good results with it lifting some burden from the input parsing (currently there are some nonsense forth and back conversions going on). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I follow this one. IIUC electron can't send There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But then why are you dealing with UTF8 at all? node-pty is supposed to return JS strings already, if encoding is omitted (defaults to UTF8 being decoded by the underlying So how do you send data over to the window process? This is not able to handle binary data? Not a websocket? (Im not used to electron internals.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bummer - electron's IPC mechanism supports ArrayBuffers since v3, but gets BASE64 encoded and a JSON envelope (electron/electron#13055). From my own tests electron performs worse than our demo app based on websockets (about half as fast), maybe the ipc mechanism is the culprit? Even strings get the JSON envelope, thats not any good from the perf perspective. Maybe you want try out our demo app and compare yourself, not sure if directly invoking node-pty in the renderer or falling back to websockets would solve this. Ok enough thread hijacking 😸 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exactly :(
IPC doesn't seem to cost that much though, V8 strings are pretty fast. In this profile of one frame, IPC only takes a tiny sliver (<0.5ms), not a lot of gains there. It might become more prominent when we do WebWorkers though, with an extra hop, but then again, we'll probably use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just realized that real gains are hidden in the parsing of strings part (i.e. your fix xtermjs/xterm.js#1796), so ignore that last part 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah thats the basic idea behind my optimization plan (xtermjs/xterm.js#1811) - by getting rid of any string handling in the input chain and working directly on arraybuffer we can lower the runtime needed for input handling by >50%. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you cannot get it run, here are my results: xtermjs/xterm.js#791 (comment) 😄 |
||
|
||
this.reset(); | ||
} | ||
|
||
reset() { | ||
this.data = this.uid; | ||
this.timeout = null; | ||
} | ||
|
||
write(chunk) { | ||
if (this.data.length + chunk.length >= BATCH_MAX_SIZE) { | ||
// We've reached the max batch size. Flush it and start another one | ||
if (this.timeout) { | ||
clearTimeout(this.timeout); | ||
this.timeout = null; | ||
} | ||
this.flush(); | ||
} | ||
|
||
this.data += this.decoder.write(chunk); | ||
|
||
if (!this.timeout) { | ||
this.timeout = setTimeout(() => this.flush(), BATCH_DURATION_MS); | ||
} | ||
} | ||
|
||
flush() { | ||
// Reset before emitting to allow for potential reentrancy | ||
const data = this.data; | ||
this.reset(); | ||
|
||
this.emit('flush', data); | ||
} | ||
} | ||
|
||
module.exports = class Session extends EventEmitter { | ||
constructor({rows, cols: columns, cwd, shell, shellArgs}) { | ||
constructor({uid, rows, cols: columns, cwd, shell, shellArgs}) { | ||
const osLocale = require('os-locale'); | ||
super(); | ||
const baseEnv = Object.assign( | ||
|
@@ -45,8 +97,6 @@ module.exports = class Session extends EventEmitter { | |
delete baseEnv.GOOGLE_API_KEY; | ||
} | ||
|
||
const decoder = new StringDecoder('utf8'); | ||
|
||
const defaultShellArgs = ['--login']; | ||
|
||
try { | ||
|
@@ -64,11 +114,16 @@ module.exports = class Session extends EventEmitter { | |
} | ||
} | ||
|
||
this.pty.on('data', data => { | ||
this.batcher = new DataBatcher(uid); | ||
this.pty.on('data', chunk => { | ||
if (this.ended) { | ||
return; | ||
} | ||
this.emit('data', decoder.write(data)); | ||
this.batcher.write(chunk); | ||
}); | ||
|
||
this.batcher.on('flush', data => { | ||
this.emit('data', data); | ||
}); | ||
|
||
this.pty.on('exit', () => { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried different values but this one (200k) had the best throughput. Using, e.g. 100k, can lower the throughput by ~25% 🔥, but increasing it further (say 300k) lowers the framerate to <30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A future improvement could determine this value dynamically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juancampa what strategies would you use for dynamic computation of this value? Maybe we can create an RFC issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few notes from my side:
Keep in mind that the 16ms stack on top of the input roundtrip latency and ppl might start complaining about lousy terminal response while typing. In the xterm.js demo app I found 5 ms helping alot while not yet being noticeable, 15 ms already introduced a tiny lag when typing really fast.
4k is the default pty buffer size on many linux distros, but you cannot rely on that (can be changed for every pty and system wide). BSDs (and macOS) use an adaptive buffer size typically ranging from 16k to 64k depending on the current data pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw 200k seems quite big to me, you might want to test this with
yes
, it triggers a very exp. 'lineFeed' event in the terminal for every second byte. Additionally the event can be hooked by external handler, that might introduce even more workload (not sure if you allow plugins to consume it).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same concern (main reason I use kitty) but I haven't felt any lag so far. We'll have to keep an eye on feedback.
Good point, I actually have to remove the "Note..." part, It was happening in my origin implementation, but it didn't feel right to go over a "max", so I fixed it to never exceed
BATCH_MAX_SIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the parser could generate "back pressure". Basically "Hey, the chunks you are sending are too big for me to parse/render in one frame, please send smaller ones". This could be less important if we run the parser on a WebWorker which I think should be the next step (limiting factor on the renderer process).
UPDATE:
Might be better to implement this in xterm.js since it overlaps a lot with xtermjs/xterm.js#1818 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested
yes
and the buffering definitely doesn't help (drops FPS from ~50 to ~30). Basically 200k is a good value forfind ~
which is arguably a more common output thanyes
. This is all more evidence for implementing @jerch's proposal here: xtermjs/xterm.js#1818 (comment)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think its better suited in
Terminal.write
. We already do xon/xoff flow control there, imho its better to keep it in one place. Otherwise every buffering step involved would have to implement flow control on its own to get a working back pressure chain, which gets quite cumbersome with node's async nature.