Ideas to improve performance of Ferrisetw #25

daladim · 2022-08-12T15:16:40Z

Here are a few ideas I had when reading the code before profiling it.
Feel free to add any remarks and comment :)

How important and efficient are all these ideas?
TODO: use a profiler to benchmark the few places that look time-consuming.

When a callback is called (how is the Schema built?)

EVENT_RECORD is Copy. Depending on how the compiler optimizes it, it is possibly copied at every function call:

ctx.on_event(*event_record);
prov.on_event(record, locator);, once for each provider
callbacks.iter_mut().for_each(|cb| cb(record, locator)), once for each callback of each provider
Even in cases where there is a single provider with a single callback, that may be quite a lot of copies (especially as EVENT_RECORD is quite large).

Solution:

To be sure we avoid copies (regardless of how the compiler might happen to optimize it), we could change this to a &EVENT_RECORD.
This would require Schema to not own it

That's for the event payload part.
Considering the ETW schema, it is properly cached in the SchemaLocator and is retrieved quickly.

When a Parser is created

One of the first steps in the callback is to call Parser::create(&schema). This

copies the user_buffer (i.e. the actual event payload). This might be avoided (only take a reference to it?)
calls PropertyIter::enum_properties() for every event record, although this only depends on the schema, not on the record itself!
- that's costly (because enum_properties() builds a Vec<Property>)
- (BTW that's not what Rust usally calls an "Iterator", as this does not implement Iterator. Change its name?)
- Solutions:
  - ~~either build it one per actual schema (not per RecordAndSchema)~~ (not possible, see next comment)
    ~~Here as well, splitting Schema from the EventRecord would be a good thing~~
  - do this lazily only when/if we require a property to be parsed (maybe too much work for little benefit, there should not be tons of different SchemaKeys for a given trace, having a little work done at the first item of every kind should be kinda OK)

When a Property is accessed

parser.try_parse(...) does many things. But most work is done in find_property()

Hopefully, it is cached in the Parser...which depends on the event record
Could it be cached (or most of its event-independant work) in the ETW schema instead?
This would require reviewing the code, and split it into two parts: the record-dependant and record-independant code

Note: API changes

Currently, the callbacks are passed an EVENT_RECORD and a SchemaLocator.
As stated in a TODO in the code, this is not straighforward. We could/should:

(Pass an &EVENT_RECORD, see above)
~~Do not pass the SchemaLocator, but the Schema directly~~ (bad idea, some callbacks do not need the Schema. Let's keep giving them the ability to retrieve it or not)
This Schema would probably not own the event record (nor a ref to it).
Note: passing an already built Parser instead of a Schema is probably not a good idea. The end user may want to avoid its creation on most events, and create it only for e.g. event IDs that interest him

The text was updated successfully, but these errors were encountered:

daladim · 2022-09-16T12:32:47Z

Most of this is done in upcoming MRs.
I overlooked the fact that events may have variable-length properties. Thus, a Parser cannot simply cache the offsets of a property once, and use them for every similar EventRecord. The Parser does need to have a reference on the EventRecord, and dissect each one differently.

There is one optimization we could do however. For later:

Detect if the first properties of a kind of EventRecord have fixed length. Cache them and share them with all the Parsers of a same kind. (but leave the Parsers extract all properties past these "fixed location properties" differently for every EventRecord

daladim · 2023-01-13T16:52:23Z

Most of it has been done, and is included in ferrisetw 1.0.
See #39 (comment), parsing events are up to 4 or 12 times faster, depending on their content.

I'm leaving this issue open because there is still little room for improvement. But gaining perfs is becoming harder and harder now.

daladim mentioned this issue Aug 12, 2022

Ideas for the next major release #18

Closed

9 tasks

This was referenced Sep 19, 2022

Callbacks now use a reference to an EVENT_RECORD #36

Merged

Reviewed parser structs #39

Merged

daladim mentioned this issue Oct 7, 2022

Improved perfs #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas to improve performance of Ferrisetw #25

Ideas to improve performance of Ferrisetw #25

daladim commented Aug 12, 2022 •

edited

Loading

daladim commented Sep 16, 2022 •

edited

Loading

daladim commented Jan 13, 2023 •

edited

Loading

Ideas to improve performance of Ferrisetw #25

Ideas to improve performance of Ferrisetw #25

Comments

daladim commented Aug 12, 2022 • edited Loading

When a callback is called (how is the Schema built?)

When a Parser is created

When a Property is accessed

Note: API changes

daladim commented Sep 16, 2022 • edited Loading

daladim commented Jan 13, 2023 • edited Loading

daladim commented Aug 12, 2022 •

edited

Loading

daladim commented Sep 16, 2022 •

edited

Loading

daladim commented Jan 13, 2023 •

edited

Loading