-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store resources in provenance #996
Conversation
9ba826e
to
fadaa35
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #996 +/- ##
==========================================
- Coverage 93.17% 93.16% -0.02%
==========================================
Files 18 18
Lines 6374 6462 +88
Branches 1088 1097 +9
==========================================
+ Hits 5939 6020 +81
- Misses 296 300 +4
- Partials 139 142 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I think we can make the changes less intrusive and build a little bit more helper infrastructure. Generally mucking about with dictionaries is something to be minimised.
tsinfer/provenance.py
Outdated
|
||
def __exit__(self, exc_type, exc_val, exc_tb): | ||
end_times = self.start_process.cpu_times() | ||
self.metrics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well make this a dataclass,
@dataclasses.dataclass
class ResourceMetrics:
elapsed_time: float # etc
then, this can support aggregation, so you can sum over a bunch of them.
Can add the asdict()
method for getting out the dict, as usual.
tsinfer/provenance.py
Outdated
usage = resource.getrusage(resource.RUSAGE_SELF) | ||
max_rss = usage.ru_maxrss | ||
|
||
if sys.platform == "darwin": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could avoid the missed coverage via
if sys.platform == "linux":
max_rss *= 1024
return max_rss
tsinfer/inference.py
Outdated
if record_provenance: | ||
tables = ts.dump_tables() | ||
# We don't have a source here because tree sequence files don't have a UUID yet. | ||
record = provenance.get_provenance_dict( | ||
command="match_samples", | ||
resources=timing.get_metrics(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timing.metrics.asdict()
results=results, | ||
**wd.common_params(), | ||
) | ||
# Rewrite the last provenance with the correct info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to rewrite this using the dataclass, creating them with fromdict(), summing them totether and taking asdict of the result. That'll be simpler and less fragile.
66e0e2b
to
4743033
Compare
@jeromekelleher Good call, 4743033 adds a dataclass and de-duplicates the combination logic. |
4743033
to
041b1d7
Compare
041b1d7
to
329c814
Compare
Fixes #975
Note that this adds
psutil
as a dep.