Remove inner setTimeout(), prevents high CPU usage. #1447

tmpfs · 2018-05-12T06:37:28Z

It appears that the recursive timeout was leaking memory and causing
high CPU usage. Prior to this change running top (via xterm) showed the
CPU usage for the WebKitProcess around 30-50%. Afterwards CPU usage is
between 5-20% and rendering is much faster visually which is
particularly useful for ncurses applications.

All tests continue to pass.

It appears that the recursive timeout was leaking memory and causing high CPU usage. Prior to this change running top (via xterm) showed the CPU usage for the WebKitProcess around 30-50%. Afterwards CPU usage is between 5-20% and rendering is much faster visually which is particularly useful for ncurses applications.

tmpfs · 2018-05-12T07:41:51Z

Note that some long running commands such as find . do not stream so well as the rendering is too fast while it blocks on each batch (now the inner setTimeout() has gone) to mitigate this I throttle a little on the server:

reader := func(ru rune, n int) (err error) {
  // Pause to give rendering some time
  time.Sleep(time.Microsecond)
  buffer := []byte(string(ru))
  return conn.WriteMessage(websocket.TextMessage, buffer)
}
p.Pipe(reader)

Thanks for xterm, I now have it running on webkit2 embedded using golang to proxy to a pty and performance is not so far off my actual terminal now 👍

tmpfs · 2018-05-12T08:00:13Z

BTW, If I switch the outer setTimeout for requestAnimationFrame then it feels even smoother, less flickering of the cursor during render.

Tyriar

+@jerch

Tyriar · 2018-05-12T14:23:59Z

src/Terminal.ts

@@ -1324,7 +1324,7 @@ export class Terminal extends EventEmitter implements ITerminal, IDisposable, II
    }
    if (this.writeBuffer.length > 0) {
      // Allow renderer to catch up before processing the next batch
-      setTimeout(() => this._innerWrite(), 0);


This timeout allows the renderer to catch up, without it there may not be a draw for some time. ie. the frame rate can drop significantly

Yes this change basically reduces screen updates by 90%, now I can count the updates for my typical benchmark with ls -lR /usr/lib. Overall runtime is 10% better with less CPU usage, saved by less time consumed in painting (the greenish pie part). JS runtime is the same (so not the fault of timeout per se).

Tyriar · 2018-05-12T14:26:43Z

src/Terminal.ts

@@ -1290,7 +1290,7 @@ export class Terminal extends EventEmitter implements ITerminal, IDisposable, II
      // Kick off a write which will write all data in sequence recursively
      this._writeInProgress = true;
      // Kick off an async innerWrite so more writes can come in while processing data
-      setTimeout(() => {


This could delay parsing by 1+ frames. We could maybe just remove this timeout instead?

This cant be removed as it is now, it gives the renderer time to do anything atm. (Removing it will completely disable updates while a command is running, which is reallly fast but prolly not wanted, lol).

This is the timeout that starts a batch, would considering a batch immediately be no good?

For my tests with ls -lR /usr/lib and find / the emulator went silent for a couple of seconds and updated the screen once in a while (like every 3s, for ls it was black until the command had finished). I did not step debug this, seems the updates are way behind without the timeout here.

jerch · 2018-05-12T15:59:46Z

This PR tackles a tricky topic, that might need further work and benchmarks to get it right. It boils down to the following requirements:

xterm.js should do screen updates while a command is running
must be interruptible as early as possible (thus not aggravate tons of data before the interrupt is handled)
all above should have as little as possible impact on overall data throughput

@tmpfs
This was addressed several times in the past, and tbh, I dont see the improvements you describe above with your changes (saves only 10% runtime with dropping 90% of the screen updates, which is questionable). In fact on my machine the adjustments Tyriar made regarding this topic in the past are quite suitable, the emulator runs smooth with screen updates and is interruptable at any time. Maybe your machine has a hard time to render the stuff? Could you post the summary pie from the developer tab? What is your machine setup (electron/chrome version, OS, CPU, graphics)?

Imho this is also a matter of favour - do we want pure performance (less screen updates) vs. reliable behavior (more screen updates as feedback to the user). Also if the performance differs alot between different setups - do we need a runtime benchmark to adjust it on the fly?

Edit:
This issue is also a candidate that could benefit from a webworker separation someday.

tmpfs · 2018-05-13T03:32:06Z

Thanks for looking into this and the information. Clearly, I am only just starting to understand the xterm code and it is quite possible that the performance I was experiencing was very specific to my setup which I will describe in more detail. If it is the case that it is specific to my use case maybe we could add this behaviour via options flags for the moment? @jerch, I like the webworker idea, maybe that will help.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.0 Safari/605.1.15

My machine is fairly powerful (XPS 13 with 16GB RAM and 4 i7 cores):

CPU(s):                    4
Model name:            Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz

Here is how my app works:

Embed webkit2 as a GUI window using Golang (testing on Linux with GTK+3)
Launch the application and load xterm from the internal web server
Open a pty via HTTP GET
Stream stdout/stderr from the pty to xterm via websocket
Write the xterm input to the pty using webview bindings

Note that I am launching the GUI window fullscreen to really give it a work out ;)

I think the big difference is that I am not using websockets for writing to the pty to try to get as close to near-native performance as possible.

Soon, I hope a friend will be compiling and running the app on OSX and then we can see the behaviour with webkit on OSX.

Update: Sorry, just found out that the performance improvements I was seeing only apply when these changes are applied to the 3.3.0 tag, I can't get master anyone near performant with my set up with or without these changes.

tmpfs · 2018-05-13T08:37:56Z

Ok, so I've been looking into this much deeper and found out some stuff, clearly only appears to be webkit specific.

I think we need to test for isSafari as well as isFirefox because webkit does us the favour of implementing createImageBitmap that just throws promise rejections for each argument type (ie, it is a placeholder method). I have modified src/shared/utils/Browser.ts to add:

export const isSafari = !!~userAgent.indexOf('Safari');

And then updated generateCharAtlas with:

  if (!('createImageBitmap' in context) || isFirefox || isSafari) {

In combination with allowing setting WRITE_BATCH_SIZE (ideally via an option) gives me decent performance in webkit in my scenario.

If I put these less intrusive changes in a separate PR could you take a look please?

Tyriar · 2018-05-13T10:18:10Z

Imho this is also a matter of favour - do we want pure performance (less screen updates) vs. reliable behavior (more screen updates as feedback to the user).

When I optimize I tend to try to get the maximum frame rate/responsiveness while also trying to minimize the time it takes to respond to keystrokes (like ^C). The actual time it takes to run commands suffers directly because of my focus on FPS but it never seemed to be that bad (+10-20% run time?).

It looks like @tmpfs is optimizing for CPU and run time which is kind of the opposite 😄, that's all about minimizing draws which is a big hit to perceived performance.

if (!('createImageBitmap' in context) || isFirefox || isSafari) {

Problems with Safari just came up in #1441, I removed Firefox from this since it was much slower via createImageBitmap. 👍 to changing this in another PR

In combination with allowing setting WRITE_BATCH_SIZE (ideally via an option) gives me decent performance in webkit in my scenario.

What approximate size seems to work best for you? I'd rather not expose such a low level setting as an option.

jerch · 2018-05-13T11:31:37Z

@tmpfs
Since you use websockets as transport you might see a performance boost by debouncing/batching the data on the server part before sending it to the webkit component. At least with nodejs as server part this can speed up things 2-3x if the pty slave process sends small data packages (see #1402 for some details).

@Tyriar
+1 for keeping things snappy. Imho a runtime penalty of 10% is not such a big deal compared to the benefits of a snappy output. that is capable to react on a keystroke in reasonable time. Btw I think the 10% I saw is because of the developer tab and perf measuring, without it the difference seems to be smaller.

tmpfs · 2018-05-14T03:52:29Z

@jerch, thanks so much for the pointer, that was indeed the problem!

So my mistake was treating it as a character stream, I've switched the read logic to read a buffer (with some additional logic to validate the buffer as utf8 otherwise I get websocket utf8 decode errors) and it is working very fast using vanilla [email protected].

I am going to close this as not necessary, however there is one small issue related to the fix I mentioned above and that is I still see the promise rejections on createImageBitmap like this:

Unhandled Promise Rejection: TypeError: createImageBitmap with ImageData is not implemented

The crude isSafari hack above removes these errors but I wonder if there is a better way to do it, @jerch maybe you could advise the best way to fix these unwanted errors in webkit?

Thanks everyone for your help resolving this issue.

Tyriar · 2018-05-14T13:33:27Z

@tmpfs thanks for sparking the interesting discussion 😃

isSafari is good enough imo, we're not feature detecting at this point, for Safari it just seems slower to do ti this way right?

vincentwoo · 2018-05-23T02:41:01Z

I would like to see these Safari bitmap issues resolved! Did we decide to ship an exception for Safari users, like Firefox?

Tyriar · 2018-05-23T14:36:49Z

@vincentwoo I don't think there's been a PR yet for it but I'm all for adding it.

Tyriar · 2018-05-24T03:26:01Z

#1469

Prefer requestAnimationFrame() to setTimeout().

b2ca055

Tyriar reviewed May 12, 2018

View reviewed changes

Tyriar self-assigned this May 12, 2018

Merge branch 'master' into master

09ef835

tmpfs closed this May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove inner setTimeout(), prevents high CPU usage. #1447

Remove inner setTimeout(), prevents high CPU usage. #1447

tmpfs commented May 12, 2018

tmpfs commented May 12, 2018 •

edited

Loading

tmpfs commented May 12, 2018

Tyriar left a comment

Tyriar May 12, 2018

jerch May 12, 2018 •

edited

Loading

Tyriar May 12, 2018

jerch May 12, 2018 •

edited

Loading

Tyriar May 13, 2018

jerch May 13, 2018

jerch commented May 12, 2018 •

edited

Loading

tmpfs commented May 13, 2018

tmpfs commented May 13, 2018 •

edited

Loading

Tyriar commented May 13, 2018

jerch commented May 13, 2018 •

edited

Loading

tmpfs commented May 14, 2018

Tyriar commented May 14, 2018

vincentwoo commented May 23, 2018

Tyriar commented May 23, 2018

Tyriar commented May 24, 2018

Remove inner setTimeout(), prevents high CPU usage. #1447

Remove inner setTimeout(), prevents high CPU usage. #1447

Conversation

tmpfs commented May 12, 2018

tmpfs commented May 12, 2018 • edited Loading

tmpfs commented May 12, 2018

Tyriar left a comment

Choose a reason for hiding this comment

Tyriar May 12, 2018

Choose a reason for hiding this comment

jerch May 12, 2018 • edited Loading

Choose a reason for hiding this comment

Tyriar May 12, 2018

Choose a reason for hiding this comment

jerch May 12, 2018 • edited Loading

Choose a reason for hiding this comment

Tyriar May 13, 2018

Choose a reason for hiding this comment

jerch May 13, 2018

Choose a reason for hiding this comment

jerch commented May 12, 2018 • edited Loading

tmpfs commented May 13, 2018

tmpfs commented May 13, 2018 • edited Loading

Tyriar commented May 13, 2018

jerch commented May 13, 2018 • edited Loading

tmpfs commented May 14, 2018

Tyriar commented May 14, 2018

vincentwoo commented May 23, 2018

Tyriar commented May 23, 2018

Tyriar commented May 24, 2018

tmpfs commented May 12, 2018 •

edited

Loading

jerch May 12, 2018 •

edited

Loading

jerch May 12, 2018 •

edited

Loading

jerch commented May 12, 2018 •

edited

Loading

tmpfs commented May 13, 2018 •

edited

Loading

jerch commented May 13, 2018 •

edited

Loading