Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas to improve performance of Ferrisetw #25

Open
4 of 8 tasks
daladim opened this issue Aug 12, 2022 · 2 comments
Open
4 of 8 tasks

Ideas to improve performance of Ferrisetw #25

daladim opened this issue Aug 12, 2022 · 2 comments

Comments

@daladim
Copy link
Collaborator

daladim commented Aug 12, 2022

Here are a few ideas I had when reading the code before profiling it.
Feel free to add any remarks and comment :)

How important and efficient are all these ideas?
TODO: use a profiler to benchmark the few places that look time-consuming.

When a callback is called (how is the Schema built?)

EVENT_RECORD is Copy. Depending on how the compiler optimizes it, it is possibly copied at every function call:

  • ctx.on_event(*event_record);
  • prov.on_event(record, locator);, once for each provider
  • callbacks.iter_mut().for_each(|cb| cb(record, locator)), once for each callback of each provider
    Even in cases where there is a single provider with a single callback, that may be quite a lot of copies (especially as EVENT_RECORD is quite large).

Solution:

  • To be sure we avoid copies (regardless of how the compiler might happen to optimize it), we could change this to a &EVENT_RECORD.
    This would require Schema to not own it

That's for the event payload part.
Considering the ETW schema, it is properly cached in the SchemaLocator and is retrieved quickly.

When a Parser is created

One of the first steps in the callback is to call Parser::create(&schema). This

  • copies the user_buffer (i.e. the actual event payload). This might be avoided (only take a reference to it?)
  • calls PropertyIter::enum_properties() for every event record, although this only depends on the schema, not on the record itself!
    • that's costly (because enum_properties() builds a Vec<Property>)
    • (BTW that's not what Rust usally calls an "Iterator", as this does not implement Iterator. Change its name?)
    • Solutions:
      • either build it one per actual schema (not per RecordAndSchema) (not possible, see next comment)
        Here as well, splitting Schema from the EventRecord would be a good thing
      • do this lazily only when/if we require a property to be parsed (maybe too much work for little benefit, there should not be tons of different SchemaKeys for a given trace, having a little work done at the first item of every kind should be kinda OK)

When a Property is accessed

parser.try_parse(...) does many things. But most work is done in find_property()

  • Hopefully, it is cached in the Parser...which depends on the event record
  • Could it be cached (or most of its event-independant work) in the ETW schema instead?
    This would require reviewing the code, and split it into two parts: the record-dependant and record-independant code

Note: API changes

Currently, the callbacks are passed an EVENT_RECORD and a SchemaLocator.
As stated in a TODO in the code, this is not straighforward. We could/should:

  • (Pass an &EVENT_RECORD, see above)
  • Do not pass the SchemaLocator, but the Schema directly (bad idea, some callbacks do not need the Schema. Let's keep giving them the ability to retrieve it or not)
  • This Schema would probably not own the event record (nor a ref to it).
  • Note: passing an already built Parser instead of a Schema is probably not a good idea. The end user may want to avoid its creation on most events, and create it only for e.g. event IDs that interest him
@daladim
Copy link
Collaborator Author

daladim commented Sep 16, 2022

Most of this is done in upcoming MRs.
I overlooked the fact that events may have variable-length properties. Thus, a Parser cannot simply cache the offsets of a property once, and use them for every similar EventRecord. The Parser does need to have a reference on the EventRecord, and dissect each one differently.

There is one optimization we could do however. For later:

  • Detect if the first properties of a kind of EventRecord have fixed length. Cache them and share them with all the Parsers of a same kind. (but leave the Parsers extract all properties past these "fixed location properties" differently for every EventRecord

@daladim
Copy link
Collaborator Author

daladim commented Jan 13, 2023

Most of it has been done, and is included in ferrisetw 1.0.
See #39 (comment), parsing events are up to 4 or 12 times faster, depending on their content.

I'm leaving this issue open because there is still little room for improvement. But gaining perfs is becoming harder and harder now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant