Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove inner setTimeout(), prevents high CPU usage. #1447

Closed
wants to merge 3 commits into from

Conversation

tmpfs
Copy link
Contributor

@tmpfs tmpfs commented May 12, 2018

It appears that the recursive timeout was leaking memory and causing
high CPU usage. Prior to this change running top (via xterm) showed the
CPU usage for the WebKitProcess around 30-50%. Afterwards CPU usage is
between 5-20% and rendering is much faster visually which is
particularly useful for ncurses applications.

All tests continue to pass.

It appears that the recursive timeout was leaking memory and causing
high CPU usage. Prior to this change running top (via xterm) showed the
CPU usage for the WebKitProcess around 30-50%. Afterwards CPU usage is
between 5-20% and rendering is much faster visually which is
particularly useful for ncurses applications.
@tmpfs
Copy link
Contributor Author

tmpfs commented May 12, 2018

Note that some long running commands such as find . do not stream so well as the rendering is too fast while it blocks on each batch (now the inner setTimeout() has gone) to mitigate this I throttle a little on the server:

reader := func(ru rune, n int) (err error) {
  // Pause to give rendering some time
  time.Sleep(time.Microsecond)
  buffer := []byte(string(ru))
  return conn.WriteMessage(websocket.TextMessage, buffer)
}
p.Pipe(reader)

Thanks for xterm, I now have it running on webkit2 embedded using golang to proxy to a pty and performance is not so far off my actual terminal now 👍

@tmpfs
Copy link
Contributor Author

tmpfs commented May 12, 2018

BTW, If I switch the outer setTimeout for requestAnimationFrame then it feels even smoother, less flickering of the cursor during render.

Copy link
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1324,7 +1324,7 @@ export class Terminal extends EventEmitter implements ITerminal, IDisposable, II
}
if (this.writeBuffer.length > 0) {
// Allow renderer to catch up before processing the next batch
setTimeout(() => this._innerWrite(), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This timeout allows the renderer to catch up, without it there may not be a draw for some time. ie. the frame rate can drop significantly

Copy link
Member

@jerch jerch May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this change basically reduces screen updates by 90%, now I can count the updates for my typical benchmark with ls -lR /usr/lib. Overall runtime is 10% better with less CPU usage, saved by less time consumed in painting (the greenish pie part). JS runtime is the same (so not the fault of timeout per se).

@@ -1290,7 +1290,7 @@ export class Terminal extends EventEmitter implements ITerminal, IDisposable, II
// Kick off a write which will write all data in sequence recursively
this._writeInProgress = true;
// Kick off an async innerWrite so more writes can come in while processing data
setTimeout(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could delay parsing by 1+ frames. We could maybe just remove this timeout instead?

Copy link
Member

@jerch jerch May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cant be removed as it is now, it gives the renderer time to do anything atm. (Removing it will completely disable updates while a command is running, which is reallly fast but prolly not wanted, lol).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the timeout that starts a batch, would considering a batch immediately be no good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my tests with ls -lR /usr/lib and find / the emulator went silent for a couple of seconds and updated the screen once in a while (like every 3s, for ls it was black until the command had finished). I did not step debug this, seems the updates are way behind without the timeout here.

@Tyriar Tyriar self-assigned this May 12, 2018
@jerch
Copy link
Member

jerch commented May 12, 2018

This PR tackles a tricky topic, that might need further work and benchmarks to get it right. It boils down to the following requirements:

  • xterm.js should do screen updates while a command is running
  • must be interruptible as early as possible (thus not aggravate tons of data before the interrupt is handled)
  • all above should have as little as possible impact on overall data throughput

@tmpfs
This was addressed several times in the past, and tbh, I dont see the improvements you describe above with your changes (saves only 10% runtime with dropping 90% of the screen updates, which is questionable). In fact on my machine the adjustments Tyriar made regarding this topic in the past are quite suitable, the emulator runs smooth with screen updates and is interruptable at any time. Maybe your machine has a hard time to render the stuff? Could you post the summary pie from the developer tab? What is your machine setup (electron/chrome version, OS, CPU, graphics)?

Imho this is also a matter of favour - do we want pure performance (less screen updates) vs. reliable behavior (more screen updates as feedback to the user). Also if the performance differs alot between different setups - do we need a runtime benchmark to adjust it on the fly?

Edit:
This issue is also a candidate that could benefit from a webworker separation someday.

@tmpfs
Copy link
Contributor Author

tmpfs commented May 13, 2018

Thanks for looking into this and the information. Clearly, I am only just starting to understand the xterm code and it is quite possible that the performance I was experiencing was very specific to my setup which I will describe in more detail. If it is the case that it is specific to my use case maybe we could add this behaviour via options flags for the moment? @jerch, I like the webworker idea, maybe that will help.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Safari/605.1.15

My machine is fairly powerful (XPS 13 with 16GB RAM and 4 i7 cores):

CPU(s):                    4
Model name:            Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz

Here is how my app works:

  1. Embed webkit2 as a GUI window using Golang (testing on Linux with GTK+3)
  2. Launch the application and load xterm from the internal web server
  3. Open a pty via HTTP GET
  4. Stream stdout/stderr from the pty to xterm via websocket
  5. Write the xterm input to the pty using webview bindings

Note that I am launching the GUI window fullscreen to really give it a work out ;)

I think the big difference is that I am not using websockets for writing to the pty to try to get as close to near-native performance as possible.

Soon, I hope a friend will be compiling and running the app on OSX and then we can see the behaviour with webkit on OSX.

Update: Sorry, just found out that the performance improvements I was seeing only apply when these changes are applied to the 3.3.0 tag, I can't get master anyone near performant with my set up with or without these changes.

@tmpfs
Copy link
Contributor Author

tmpfs commented May 13, 2018

Ok, so I've been looking into this much deeper and found out some stuff, clearly only appears to be webkit specific.

I think we need to test for isSafari as well as isFirefox because webkit does us the favour of implementing createImageBitmap that just throws promise rejections for each argument type (ie, it is a placeholder method). I have modified src/shared/utils/Browser.ts to add:

export const isSafari = !!~userAgent.indexOf('Safari');

And then updated generateCharAtlas with:

  if (!('createImageBitmap' in context) || isFirefox || isSafari) {

In combination with allowing setting WRITE_BATCH_SIZE (ideally via an option) gives me decent performance in webkit in my scenario.

If I put these less intrusive changes in a separate PR could you take a look please?

@Tyriar
Copy link
Member

Tyriar commented May 13, 2018

Imho this is also a matter of favour - do we want pure performance (less screen updates) vs. reliable behavior (more screen updates as feedback to the user).

When I optimize I tend to try to get the maximum frame rate/responsiveness while also trying to minimize the time it takes to respond to keystrokes (like ^C). The actual time it takes to run commands suffers directly because of my focus on FPS but it never seemed to be that bad (+10-20% run time?).

It looks like @tmpfs is optimizing for CPU and run time which is kind of the opposite 😄, that's all about minimizing draws which is a big hit to perceived performance.

if (!('createImageBitmap' in context) || isFirefox || isSafari) {

Problems with Safari just came up in #1441, I removed Firefox from this since it was much slower via createImageBitmap. 👍 to changing this in another PR

In combination with allowing setting WRITE_BATCH_SIZE (ideally via an option) gives me decent performance in webkit in my scenario.

What approximate size seems to work best for you? I'd rather not expose such a low level setting as an option.

@jerch
Copy link
Member

jerch commented May 13, 2018

@tmpfs
Since you use websockets as transport you might see a performance boost by debouncing/batching the data on the server part before sending it to the webkit component. At least with nodejs as server part this can speed up things 2-3x if the pty slave process sends small data packages (see #1402 for some details).

@Tyriar
+1 for keeping things snappy. Imho a runtime penalty of 10% is not such a big deal compared to the benefits of a snappy output. that is capable to react on a keystroke in reasonable time. Btw I think the 10% I saw is because of the developer tab and perf measuring, without it the difference seems to be smaller.

@tmpfs
Copy link
Contributor Author

tmpfs commented May 14, 2018

@jerch, thanks so much for the pointer, that was indeed the problem!

So my mistake was treating it as a character stream, I've switched the read logic to read a buffer (with some additional logic to validate the buffer as utf8 otherwise I get websocket utf8 decode errors) and it is working very fast using vanilla [email protected].

I am going to close this as not necessary, however there is one small issue related to the fix I mentioned above and that is I still see the promise rejections on createImageBitmap like this:

Unhandled Promise Rejection: TypeError: createImageBitmap with ImageData is not implemented

The crude isSafari hack above removes these errors but I wonder if there is a better way to do it, @jerch maybe you could advise the best way to fix these unwanted errors in webkit?

Thanks everyone for your help resolving this issue.

@tmpfs tmpfs closed this May 14, 2018
@Tyriar
Copy link
Member

Tyriar commented May 14, 2018

@tmpfs thanks for sparking the interesting discussion 😃

isSafari is good enough imo, we're not feature detecting at this point, for Safari it just seems slower to do ti this way right?

@vincentwoo
Copy link
Contributor

I would like to see these Safari bitmap issues resolved! Did we decide to ship an exception for Safari users, like Firefox?

@Tyriar
Copy link
Member

Tyriar commented May 23, 2018

@vincentwoo I don't think there's been a PR yet for it but I'm all for adding it.

@Tyriar
Copy link
Member

Tyriar commented May 24, 2018

#1469

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants