Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance - Data batching #3336

Merged
merged 19 commits into from
Dec 28, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion app/rpc.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,11 @@ class Server extends EventEmitter {
}

emit(ch, data) {
this.wc.send(this.id, {ch, data});
// This check is needed because data-batching can cause extra data to be
// emitted after the window has already closed
if (!this.win.isDestroyed()) {
this.wc.send(this.id, {ch, data});
}
}

destroy() {
Expand Down
65 changes: 60 additions & 5 deletions app/session.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,60 @@ try {

const envFromConfig = config.getConfig().env || {};

// Max duration to batch session data before sending it to the renderer process.
const BATCH_DURATION_MS = 16;

// Max size of a session data batch. Note that this value can be exceeded by ~4k
// (chunk sizes seem to be 4k at the most)
const BATCH_MAX_SIZE = 200 * 1024;
Copy link
Contributor Author

@juancampa juancampa Dec 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried different values but this one (200k) had the best throughput. Using, e.g. 100k, can lower the throughput by ~25% 🔥, but increasing it further (say 300k) lowers the framerate to <30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future improvement could determine this value dynamically

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juancampa what strategies would you use for dynamic computation of this value? Maybe we can create an RFC issue?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes from my side:
Keep in mind that the 16ms stack on top of the input roundtrip latency and ppl might start complaining about lousy terminal response while typing. In the xterm.js demo app I found 5 ms helping alot while not yet being noticeable, 15 ms already introduced a tiny lag when typing really fast.

4k is the default pty buffer size on many linux distros, but you cannot rely on that (can be changed for every pty and system wide). BSDs (and macOS) use an adaptive buffer size typically ranging from 16k to 64k depending on the current data pressure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw 200k seems quite big to me, you might want to test this with yes, it triggers a very exp. 'lineFeed' event in the terminal for every second byte. Additionally the event can be hooked by external handler, that might introduce even more workload (not sure if you allow plugins to consume it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that the 16ms stack on top of the input roundtrip latency and ppl might start complaining about lousy terminal response while typing. In the xterm.js demo app I found 5 ms helping alot while not yet being noticeable, 15 ms already introduced a tiny lag when typing really fast.

I had the same concern (main reason I use kitty) but I haven't felt any lag so far. We'll have to keep an eye on feedback.

4k is the default pty buffer size on many linux distros, but you cannot rely on that (can be changed for every pty and system wide). BSDs (and macOS) use an adaptive buffer size typically ranging from 16k to 64k depending on the current data pressure.

Good point, I actually have to remove the "Note..." part, It was happening in my origin implementation, but it didn't feel right to go over a "max", so I fixed it to never exceed BATCH_MAX_SIZE

Copy link
Contributor Author

@juancampa juancampa Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what strategies would you use for dynamic computation of this value? Maybe we can create an RFC issue?

I was thinking the parser could generate "back pressure". Basically "Hey, the chunks you are sending are too big for me to parse/render in one frame, please send smaller ones". This could be less important if we run the parser on a WebWorker which I think should be the next step (limiting factor on the renderer process).

UPDATE:

Might be better to implement this in xterm.js since it overlaps a lot with xtermjs/xterm.js#1818 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw 200k seems quite big to me, you might want to test this with yes, it triggers a very exp. 'lineFeed' event in the terminal for every second byte

I just tested yes and the buffering definitely doesn't help (drops FPS from ~50 to ~30). Basically 200k is a good value for find ~ which is arguably a more common output than yes. This is all more evidence for implementing @jerch's proposal here: xtermjs/xterm.js#1818 (comment)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think its better suited in Terminal.write. We already do xon/xoff flow control there, imho its better to keep it in one place. Otherwise every buffering step involved would have to implement flow control on its own to get a working back pressure chain, which gets quite cumbersome with node's async nature.


// Data coming from the pty is sent to the renderer process for further
// vt parsing and rendering. This class batches data to minimize the number of
// IPC calls. It also reduces GC pressure and CPU cost: each chunk is prefixed
// with the window ID which is then stripped on the renderer process and this
// overhead is reduced with batching.
class DataBatcher extends EventEmitter {
constructor(uid) {
super();
this.uid = uid;
this.decoder = new StringDecoder('utf8');
Copy link

@jerch jerch Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTF8? hyper is already using binary transport for the pty? We might add this eventually to xterm.js in the future, had some good results with it lifting some burden from the input parsing (currently there are some nonsense forth and back conversions going on).

Copy link
Contributor Author

@juancampa juancampa Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow this one. IIUC electron can't send Buffer objects to the renderer process so we turn the binary input into a String. If we move parsing to a worker, we might be able to use SharedArrayBuffer to minimize conversions

Copy link

@jerch jerch Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then why are you dealing with UTF8 at all? node-pty is supposed to return JS strings already, if encoding is omitted (defaults to UTF8 being decoded by the underlying Socket).

So how do you send data over to the window process? This is not able to handle binary data? Not a websocket? (Im not used to electron internals.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bummer - electron's IPC mechanism supports ArrayBuffers since v3, but gets BASE64 encoded and a JSON envelope (electron/electron#13055).

From my own tests electron performs worse than our demo app based on websockets (about half as fast), maybe the ipc mechanism is the culprit? Even strings get the JSON envelope, thats not any good from the perf perspective. Maybe you want try out our demo app and compare yourself, not sure if directly invoking node-pty in the renderer or falling back to websockets would solve this.

Ok enough thread hijacking 😸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but gets BASE64 encoded and a JSON envelope

Exactly :(

falling back to websockets

IPC doesn't seem to cost that much though, V8 strings are pretty fast. In this profile of one frame, IPC only takes a tiny sliver (<0.5ms), not a lot of gains there. It might become more prominent when we do WebWorkers though, with an extra hop, but then again, we'll probably use SharedArrayBuffer for that

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that real gains are hidden in the parsing of strings part (i.e. your fix xtermjs/xterm.js#1796), so ignore that last part 😅

Copy link

@jerch jerch Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah thats the basic idea behind my optimization plan (xtermjs/xterm.js#1811) - by getting rid of any string handling in the input chain and working directly on arraybuffer we can lower the runtime needed for input handling by >50%.
This gets even more if we change to direct utf8 binary input as it would save the nonsense conversions. If you are interested I have a heavily optimized playground branch, where you can test it in the demo app (https://github.com/jerch/xterm.js/tree/tttt, also needs https://github.com/jerch/node-textdecode)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you cannot get it run, here are my results: xtermjs/xterm.js#791 (comment) 😄


this.reset();
}

reset() {
this.data = this.uid;
this.timeout = null;
}

write(chunk) {
if (this.data.length + chunk.length >= BATCH_MAX_SIZE) {
// We've reached the max batch size. Flush it and start another one
if (this.timeout) {
clearTimeout(this.timeout);
this.timeout = null;
}
this.flush();
}

this.data += this.decoder.write(chunk);

if (!this.timeout) {
this.timeout = setTimeout(() => this.flush(), BATCH_DURATION_MS);
}
}

flush() {
// Reset before emitting to allow for potential reentrancy
const data = this.data;
this.reset();

this.emit('flush', data);
}
}

module.exports = class Session extends EventEmitter {
constructor({rows, cols: columns, cwd, shell, shellArgs}) {
constructor({uid, rows, cols: columns, cwd, shell, shellArgs}) {
const osLocale = require('os-locale');
super();
const baseEnv = Object.assign(
Expand All @@ -45,8 +97,6 @@ module.exports = class Session extends EventEmitter {
delete baseEnv.GOOGLE_API_KEY;
}

const decoder = new StringDecoder('utf8');

const defaultShellArgs = ['--login'];

try {
Expand All @@ -64,11 +114,16 @@ module.exports = class Session extends EventEmitter {
}
}

this.pty.on('data', data => {
this.batcher = new DataBatcher(uid);
this.pty.on('data', chunk => {
if (this.ended) {
return;
}
this.emit('data', decoder.write(data));
this.batcher.write(chunk);
});

this.batcher.on('flush', data => {
this.emit('data', data);
});

this.pty.on('exit', () => {
Expand Down
4 changes: 2 additions & 2 deletions app/ui/window.js
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ module.exports = class Window {
function createInitialSession() {
let {session, options} = createSession();
const initialEvents = [];
const handleData = data => initialEvents.push(['session data', options.uid + data]);
const handleData = data => initialEvents.push(['session data', data]);
const handleExit = () => initialEvents.push(['session exit']);
session.on('data', handleData);
session.on('exit', handleExit);
Expand Down Expand Up @@ -148,7 +148,7 @@ module.exports = class Window {
}

session.on('data', data => {
rpc.emit('session data', options.uid + data);
rpc.emit('session data', data);
});

session.on('exit', () => {
Expand Down