-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output Struct Overhaul #445
base: v4-prep
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v4-prep #445 +/- ##
===========================================
- Coverage 79.56% 76.88% -2.69%
===========================================
Files 24 27 +3
Lines 3803 3747 -56
Branches 647 611 -36
===========================================
- Hits 3026 2881 -145
- Misses 558 648 +90
+ Partials 219 218 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I had a few questions and minor points but this looks like a huge improvement
pf = pf2 | ||
_bt = None | ||
hb = hb2 | ||
st = st2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we separating the node redshifts from the output redshifts here? previously we only created the coevals on the outputs and only updated the previous snapshot on the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think I may have slightly broken this. The idea is still to only evolve based on the node redshifts, but to yield on every redshift (either out_redshift
or node_redshift
). Currently, it looks like I might be evolving on everything, so I should check and fix that.
if inputs is not None: | ||
return inputs | ||
|
||
outputs = single_field_func._get_all_output_struct_inputs(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how the inheritance works here? It looks like single_field_func
is a subclass of this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a holdover from a refactor... I've fixed it up now.
v | ||
for k, v in outputs.items() | ||
if not k.startswith("previous_") and not k.startswith("descendant_") | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't got to the single fields yet, but I'm curious how this works with the XraySourceBox, which needs the whole HaloBox history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd appreciate a closer look at that, as it's not something I'm as familiar with.
if descendant_halos is None: | ||
descendant_halos = HaloField( | ||
descendant_halos = HaloField.new( | ||
redshift=0.0, | ||
inputs=inputs, | ||
dummy=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me that in the backend, the sampling from grid/descendants is controlled by the redshift of this object being <=0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if having a .dummy()
constructor method would be neater (auto-setting the reshift to say -1 and setting dummy=True).
elif perturbed_field: | ||
inputs = perturbed_field[0].inputs | ||
|
||
if not out_redshifts and not perturbed_field and not inputs.node_redshifts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we want to happen when out_redshifts
is None and we have some inputs with node redshifts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behaviour (and I'm happy to discuss this) is to yield on all node redshifts and out_redshifts
. So as long as at least one of them is non-empty, everything is fine.
|
||
@classmethod | ||
def new(cls, x: dict | InputStruct | None = None, **kwargs): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should update this docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it modestly -- is that what you were thinking?
|
||
def __init_subclass__(cls) -> None: | ||
"""Store each subclass for easy access.""" | ||
cls._subclasses[cls.__name__] = cls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this, since cls
is the same, are we storing the dict of subclass definitions in the subclasses themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, this should be InputStruct._subclasses
. Updated.
|
||
for k in self.struct.primitive_fields: | ||
if getattr(self, k) is not None: | ||
setattr(self.struct.cstruct, k, getattr(self, k)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be shortened to self.cstruct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, good catch.
"""The random seed for this particular instance.""" | ||
return self.inputs.random_seed | ||
|
||
def sync(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand what's happening here:
- make sure arrays are initialised
- expose all initialised arrays python --> C
- primitives which aren't
None
go python --> C - primitives which are
None
go C --> python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup!
default_input_struct.check_output_compatibility([example_ib]) | ||
|
||
default_input_struct.check_output_compatibility([perturbed_field]) | ||
# def test_inputstruct_outputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to rewrite this test for to test the compatibility checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a good idea. I can't remember what all I've covered in tests now, but will have another think tomorrow.
While one of the current failing tests (the macros 3.12) is the same issue of workflows losing one of the temp directories. There's a GSL error in the Ubuntu 3.12 which I haven't seen before. Nikos Found something similar when running the database, I'm curious what's causing this since sometimes just rerunning makes it work again. I don't think it has much to do with this PR but we should look in to it |
Summary
This changes the output structure interface to be more simple and streamlined.
It is quite a comprehensive set of changes that touch a lot of things on the python-side. I'll try to list as many as I can here for easy reference:
Arrays and Backend mapping
arrays.py
module that implements anArray
object. This object knows about the shape and dtype of an array, without necessarily having it instantiated, but also knows how to instantiate it, pass it to C, and keeps track of theArrayState
.OutputStructs
OutputStruct
is now anattrs
class. More importantly, all of the arrays that it needs to handle are defined directly on the class asArray
parameters, making it easier to track them..new()
classmethod that instantiates it from anInputParameters
object, getting the shape/dtype info (and which arrays need to be present) from theinputs
.Array
objects is that the attributes of theOutputStruct
are no longer numpy arrays, and so you can't do for examplenp.mean(ics.lowres_density)
any more. This is smoothed over a bit by newget()
and set()methods specifically for the arrays, so you can do
np.mean(ics.get('lowres_density'))`. This has the added advantage of transparently loading the array from disk if it exists there. Note that on a Coeval object, any field of any OutputStruct can be accessed directly via attribute name, as an array.OutputStruct
class, instead moving it to the newio
subpackage._compat_hash
attribute on eachOutputStruct
that tells it the level of input-hash required.Caching / IO of single-fields (OutputStruct)
io.caching
module implements classes/functions for dealing with the cache. I think this is a bit more intuitive than in previous versions.OutputCache
object has methods for introspecting a particular cache (defined by some directory the user gives at runtime) and reading/writing OutputStructs to it.RunCache
manages full runs (i.e. all boxes belonging to a full redshift-evolved simulation), allowing simple determination of which cache files are present, and which haven't yet been run (useful for checkpointing).CacheConfig
class simply defines a namespace for defining which boxes to write to cache during a larger run (coeval/lightcone).cache_tools
module has been removed as it is redundant with the above module.io/h5.py
, and so is separated from the OutputStruct class definitions themselves. This might facilitate implementing different cache formats in the future. The file format is also slightly different (I think it's slightly better now -- the format is specified in the docstring of the module, so you can check).Single-Field Computations
single_field
module is a lot simpler. I have moved most of the boiler-plate logic to a class-style decorator in_param_config
.Lightcone / Coeval
run_coeval
andrun_lightcone
into a set of external functions:evolve_perturb_halos
and_redshift_loop_generator
.Coeval
andLightcone
objects are much more slim now. I removed the ability to "gather" the cached files associated with a coeval/lc, instead relying on the improved caching module to let people deal with their full-run caches.Coeval.from_file
instead ofCoeval.read()
which I think is more intuitive.Configuration
CacheConfig
).Other Stuff
InputParameters
fromparam_config
toinputs
just because I was getting circular imports.Meta-info:
Issues Solved