-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance enhancements #557
Conversation
On the topic of performance, there is a disconnect between ELF proper parsing and DWARF parsing. ELF parsing, at least in the typical use case, happens against a file stream. DWARF parsing is all in memory (barring exotic scenarios where users monkeypatch or subclass But that would pretty much mean abandoning |
Yes, I'd rather keep this capability. A more interesting direction would be to enable more incremental DWARF parsing without slurping whole sections into memory |
Reliance on construct is not a capability per se, it's more of an implementation detail. The DWARF parser mostly spits out Python native structures - lists, OBTW, construct technically has a parse-from-buffer method. Only it works by constructing a Anyway, were I to implement buffer style parsing, I won't be getting rid of construct altogether - compound datatypes can stay. I'd implement a buffer+position object that walks like a stream but doesn't quite quack like a stream, and I'd teach the primitive type parsers (there is just a handful) to recognize those. The compound parsers - I gave some thought to incremental parsing of DWARF. One big obstacle to that would be the transforms that DWARF sections undergo - two decompression hooks, and relocations (also the phantom bytes thing, but that's an edge case by far). Compressed streams are not seekable. Also, I'm assuming we are talking about support for extra large binaries here; were were to implement a no-slurp mode, I'm afraid it'd have to be accompanied by a no-cache mode. At least no CU/DIE cache - lack of an abbrev cache, I'm afraid, would be to costly. But let's assume there are no transforms. I had three possible designs in mind:
Which one do you think sounds the least crazy? |
The mmap idea sounds intriguing |
Basically already done this in #481 :) |
I made the following enhancements, in the rough order of effect:
struct_parse
calls, and replaced all points of usagestream_preserve
for streams in auxiliary sections