-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/eckit stats #154
base: develop
Are you sure you want to change the base?
Feature/eckit stats #154
Conversation
…, remove Auto as it is configurable
…ce/std deviation/energy norm
This reverts commit 52a2178. # Conflicts: # src/mir/data/MIRField.cc
…op-convert, cppcoreguidelines-avoid-non-const-global-variables, readability-qualified-auto, bugprone-fold-init-type, bugprone-macro-parentheses
…alised non-POD object)
…-make-unique, modernize-use-auto, readability-container-data-pointer, readability-implicit-bool-conversion, readability-size-empty
…eature/eckit-stats
Again please do not merge this, before we have a wider discussion with other eckit devs. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #154 +/- ##
========================================
Coverage 63.74% 63.74%
========================================
Files 1066 1066
Lines 55360 55360
Branches 4086 4086
========================================
Hits 35289 35289
Misses 20071 20071 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions in here. Looks good overall.
namespace eckit::stats { | ||
|
||
|
||
static util::recursive_mutex* local_mutex = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a recursive_mutex?
Can we not use the idiom, used elsewhere, with a static instance() method on the Factory type, and the mutex/map being local members. It's a bit cleaner than using posix once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wrapper around the eckit types that have been superceded by stl; There's a define above that controls this, and I've put it in one place only so it's easy to change.
I called it recursive mutex because that's what the stl calls it, and what the eckit::Mutex really is. But this can of course change, at the moment this is only for controlling access to the factories. On that note, the factories could move to a eckit-based implementation -- which I recently changed to behave correctly for concurrent access, but the change is non-trivial. I think this is what should likely happen first? In any case, happy to update
protected: | ||
// -- Members | ||
|
||
const Parametrisation& parametrisation1_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a touch dangerous. For general use, can we be confident that the supplied Parametrisation will outlive the Comparator instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'm not happy about this. Currently the Comparator is a class supporting comparing two statistics (say mean of two fields), but its use is quite specific to pgen-compare fields1 fields2 (which is in fact mir-compare). I probably should take the Comparator hierarchy out? I don't see that MultIO would use this.
namespace eckit::stats { | ||
|
||
|
||
static util::once_flag once; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment RE instance() method rather than static global variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- maybe factories refactoring come first though.
namespace eckit::stats { | ||
|
||
|
||
class Distribution { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does a Distribution do in this context. Not immediately obvious from the name - I think this needs some documentation comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right -- this is a class to generate noise onto a field according to specific distributions. It might not have a direct use here, but it is important functionality to support statistics testing (incl. downstream packages -- I use this in MIR to generate training data, although this isn't the greatest use). I used this to test the statistics when developing, and I haven't ported those tests as they should be done anew (they aren't useful as is)...
src/eckit/stats/Field.h
Outdated
using MIRValuesVector = std::vector<double>; | ||
|
||
|
||
class Field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit odd. It isn't declared as an abstract base, but all of the methods are NOTIMP, and it doesn't carry any data. Huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you caught on quickly -- this was meant for Mirco to have a look for MultIO first and see what the interface should look like. I've just pushed an update supporting owned and referenced fields, should be much more adequate for MultIO, and fully implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is to serve as an argument to calculate field statistics in (for a definition of Field.) You're just too early on the PR as I intended to discuss it beforehand -- but now I've done a proper implementation that can be used in both MIR and MultIO
|
||
// -- Methods | ||
|
||
virtual std::string execute(const Field&, const Field&) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string is a very non-obvious return type for a Comparator. Unless it is doing something weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this sholuld change -- it is a list of reasons why a comparison failed, and if empty is it a successful comparison. I should bring in more (a lot more) semantics here. Good catch!
min_(std::numeric_limits<double>::quiet_NaN()), | ||
max_(std::numeric_limits<double>::quiet_NaN()), | ||
hasMissing_(hasMissing), | ||
hasLowerLimit_(lowerLimit_ == lowerLimit_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. I assume it is essentially testing if things are NaN, or similar. But I worry that the equality comparison could be optimised away to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this level, to support the statistics, they operate on real values and NaN if not assigned/not-definable (there's no "finite missing value"). The assumption here is the NaN support -- and you're right, this can be compiled out. I know that MIR and Atlas will fail in specific places to to similar optimisations. Maybe I should protect this with a static_assert? But the problem is widespread.
src/eckit/stats/method/MethodT.cc
Outdated
|
||
|
||
static const MethodBuilder<MethodT<detail::AngleT<double, detail::AngleScale::DEGREE, detail::AngleSpace::ASYMMETRIC>>> | ||
__stats1("angle.degree.asymmetric"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like the variable names starting with __. Those are supposed to be reserved symbols.
If you really want to ensure non-clashing, you can put them in an anonymous namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed
namespace eckit::stats::util { | ||
|
||
|
||
#if defined(ECKIT_GEO_ECKIT_THREADS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very odd, and somewhat worrying block of code. Why do we need to do special stuff. Why us std::mutex insufficient. And what is the logic here.
If it really is necessary, this needs to be heavily documented as to why (in comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the current threading model supprted in eckit (and the rest of our stacks), to be it's own implementation based on eckit::Mutex. This provides a #define that can change the threading model to stl, but for the stack we should do this in one go -- to me this is a separate discussion. It doesn't work, mixing the models. I've raised this issue over the last few years, several times, but gave up...
|
||
constexpr double EPS = 1e-6; | ||
|
||
#define EXPECTV(a) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding these to Test.h
Mirco, I've ported 90% of the mir statistics functionality here, it's not a full PR to be merged before much discussion but rather a draft.
For MIR, the requirements were:
I'm sure for other packages it will be different, let's see if they converge to something useful :-)