Output Struct Overhaul #445

steven-murray · 2024-12-04T13:01:43Z

Summary

This changes the output structure interface to be more simple and streamlined.

It is quite a comprehensive set of changes that touch a lot of things on the python-side. I'll try to list as many as I can here for easy reference:

Arrays and Backend mapping

New arrays.py module that implements an Array object. This object knows about the shape and dtype of an array, without necessarily having it instantiated, but also knows how to instantiate it, pass it to C, and keeps track of the ArrayState.

OutputStructs

The OutputStruct is now an attrs class. More importantly, all of the arrays that it needs to handle are defined directly on the class as Array parameters, making it easier to track them.
Each output struct now has a .new() classmethod that instantiates it from an InputParameters object, getting the shape/dtype info (and which arrays need to be present) from the inputs.
The downside to the above way of managing the C/Python/Disk interface with Array objects is that the attributes of the OutputStruct are no longer numpy arrays, and so you can't do for example np.mean(ics.lowres_density) any more. This is smoothed over a bit by new get() and set()methods specifically for the arrays, so you can donp.mean(ics.get('lowres_density'))`. This has the added advantage of transparently loading the array from disk if it exists there. Note that on a Coeval object, any field of any OutputStruct can be accessed directly via attribute name, as an array.
I've also taken all the caching and I/O management out of the OutputStruct class, instead moving it to the new io subpackage.
There's a new _compat_hash attribute on each OutputStruct that tells it the level of input-hash required.

Caching / IO of single-fields (OutputStruct)

The new io.caching module implements classes/functions for dealing with the cache. I think this is a bit more intuitive than in previous versions.
The OutputCache object has methods for introspecting a particular cache (defined by some directory the user gives at runtime) and reading/writing OutputStructs to it.
The RunCache manages full runs (i.e. all boxes belonging to a full redshift-evolved simulation), allowing simple determination of which cache files are present, and which haven't yet been run (useful for checkpointing).
The CacheConfig class simply defines a namespace for defining which boxes to write to cache during a larger run (coeval/lightcone).
The cache_tools module has been removed as it is redundant with the above module.
All the reading/writing of HDF5 boxes has moved to io/h5.py, and so is separated from the OutputStruct class definitions themselves. This might facilitate implementing different cache formats in the future. The file format is also slightly different (I think it's slightly better now -- the format is specified in the docstring of the module, so you can check).
There is also a mechanism now for being able to read files written by older versions of the code, so we can maintain explicit backwards compatibility with older outputs.

Single-Field Computations

The single_field module is a lot simpler. I have moved most of the boiler-plate logic to a class-style decorator in _param_config.
This new decorator checks redshift consistency, input parameter consistency, manages the cache and sets the current redshift appropriately given all inputs.

Lightcone / Coeval

I refactored some re-used code in run_coeval and run_lightcone into a set of external functions: evolve_perturb_halos and _redshift_loop_generator.
The Coeval and Lightcone objects are much more slim now. I removed the ability to "gather" the cached files associated with a coeval/lc, instead relying on the improved caching module to let people deal with their full-run caches.
Also, to read a Coeval/Lightcone you do Coeval.from_fileinstead of Coeval.read() which I think is more intuitive.

Configuration

I actually think we should generally move away from package-wide configuration, because it always causes trouble. I haven't removed the module itself here because it's slightly outside the scope of the PR, but I did remove the "regenerate" and "write" configuration options, and removed all places where the config was used.
We will have to think about how to re-implement all the functionality we had in the config (e.g. number of sigfigs for the cache). Probably most of this can be put directly into new objects (like the CacheConfig).

Other Stuff

I've removed any documentation or caching references to "global params". These are now to be treated as almost purely read-only (and we should move towards them being completely removed soon).
I moved the definition of InputParameters from param_config to inputs just because I was getting circular imports.

Meta-info:

These changes break strict backwards-compatibility

Issues Solved

for more information, see https://pre-commit.ci

review-notebook-app · 2024-12-24T00:49:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2024-12-24T01:10:25Z

Codecov Report

Attention: Patch coverage is 78.69393% with 323 lines in your changes missing coverage. Please review.

Project coverage is 76.88%. Comparing base (5930245) to head (6b894c8).
Report is 3 commits behind head on v4-prep.

Files with missing lines	Patch %	Lines
src/py21cmfast/io/caching.py	54.43%	74 Missing and 3 partials ⚠️
src/py21cmfast/wrapper/outputs.py	82.63%	47 Missing and 19 partials ⚠️
src/py21cmfast/io/h5.py	71.52%	31 Missing and 12 partials ⚠️
src/py21cmfast/drivers/_param_config.py	82.94%	19 Missing and 10 partials ⚠️
src/py21cmfast/drivers/coeval.py	79.85%	21 Missing and 6 partials ⚠️
src/py21cmfast/wrapper/inputs.py	84.96%	15 Missing and 8 partials ⚠️
src/py21cmfast/wrapper/arrays.py	75.00%	9 Missing and 7 partials ⚠️
src/py21cmfast/drivers/lightcone.py	90.35%	5 Missing and 6 partials ⚠️
src/py21cmfast/drivers/single_field.py	85.33%	7 Missing and 4 partials ⚠️
src/py21cmfast/cli.py	55.55%	4 Missing ⚠️
... and 6 more

Additional details and impacted files

@@             Coverage Diff             @@
##           v4-prep     #445      +/-   ##
===========================================
- Coverage    79.56%   76.88%   -2.69%     
===========================================
  Files           24       27       +3     
  Lines         3803     3747      -56     
  Branches       647      611      -36     
===========================================
- Hits          3026     2881     -145     
- Misses         558      648      +90     
+ Partials       219      218       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

daviesje

Looks great! I had a few questions and minor points but this looks like a huge improvement

daviesje · 2024-12-19T00:22:21Z

src/py21cmfast/drivers/coeval.py

+        pf = pf2
+        _bt = None
+        hb = hb2
+        st = st2


How are we separating the node redshifts from the output redshifts here? previously we only created the coevals on the outputs and only updated the previous snapshot on the nodes.

Hmmm I think I may have slightly broken this. The idea is still to only evolve based on the node redshifts, but to yield on every redshift (either out_redshift or node_redshift). Currently, it looks like I might be evolving on everything, so I should check and fix that.

daviesje · 2024-12-29T23:44:39Z

src/py21cmfast/drivers/_param_config.py

+        if inputs is not None:
+            return inputs
+
+        outputs = single_field_func._get_all_output_struct_inputs(kwargs)


Can you explain how the inheritance works here? It looks like single_field_func is a subclass of this one.

Ah, this is a holdover from a refactor... I've fixed it up now.

daviesje · 2024-12-29T23:52:53Z

src/py21cmfast/drivers/_param_config.py

+            v
+            for k, v in outputs.items()
+            if not k.startswith("previous_") and not k.startswith("descendant_")
+        ]:


I haven't got to the single fields yet, but I'm curious how this works with the XraySourceBox, which needs the whole HaloBox history

Yes, I'd appreciate a closer look at that, as it's not something I'm as familiar with.

daviesje · 2024-12-30T00:01:03Z

src/py21cmfast/drivers/single_field.py

    if descendant_halos is None:
-        descendant_halos = HaloField(
+        descendant_halos = HaloField.new(
            redshift=0.0,
            inputs=inputs,
            dummy=True,
        )


This reminds me that in the backend, the sampling from grid/descendants is controlled by the redshift of this object being <=0.

I wonder if having a .dummy() constructor method would be neater (auto-setting the reshift to say -1 and setting dummy=True).

daviesje · 2024-12-30T01:08:29Z

src/py21cmfast/drivers/coeval.py

+        elif perturbed_field:
+            inputs = perturbed_field[0].inputs
+
+    if not out_redshifts and not perturbed_field and not inputs.node_redshifts:


What do we want to happen when out_redshifts is None and we have some inputs with node redshifts?

The current behaviour (and I'm happy to discuss this) is to yield on all node redshifts and out_redshifts. So as long as at least one of them is non-empty, everything is fine.

daviesje · 2024-12-30T23:29:20Z

src/py21cmfast/wrapper/inputs.py

+
+    @classmethod
+    def new(cls, x: dict | InputStruct | None = None, **kwargs):
+        """


I should update this docstring

I updated it modestly -- is that what you were thinking?

daviesje · 2024-12-30T23:33:59Z

src/py21cmfast/wrapper/inputs.py

+
+    def __init_subclass__(cls) -> None:
+        """Store each subclass for easy access."""
+        cls._subclasses[cls.__name__] = cls


I don't fully understand this, since cls is the same, are we storing the dict of subclass definitions in the subclasses themselves?

I think you're right, this should be InputStruct._subclasses. Updated.

daviesje · 2024-12-31T00:06:47Z

src/py21cmfast/wrapper/outputs.py

+
+        for k in self.struct.primitive_fields:
+            if getattr(self, k) is not None:
+                setattr(self.struct.cstruct, k, getattr(self, k))


Can be shortened to self.cstruct?

Yup, good catch.

daviesje · 2024-12-31T00:08:39Z

src/py21cmfast/wrapper/outputs.py

+        """The random seed for this particular instance."""
+        return self.inputs.random_seed
+
+    def sync(self):


Just so I understand what's happening here:

make sure arrays are initialised

expose all initialised arrays python --> C

primitives which aren't None go python --> C

primitives which are None go C --> python

daviesje · 2024-12-31T00:19:58Z

tests/test_input_structs.py

-        default_input_struct.check_output_compatibility([example_ib])
-
-    default_input_struct.check_output_compatibility([perturbed_field])
+# def test_inputstruct_outputs(


Do we want to rewrite this test for to test the compatibility checks?

Probably a good idea. I can't remember what all I've covered in tests now, but will have another think tomorrow.

daviesje · 2024-12-31T22:10:30Z

While one of the current failing tests (the macros 3.12) is the same issue of workflows losing one of the temp directories. There's a GSL error in the Ubuntu 3.12 which I haven't seen before. Nikos Found something similar when running the database, I'm curious what's causing this since sometimes just rerunning makes it work again. I don't think it has much to do with this PR but we should look in to it

steven-murray and others added 2 commits December 4, 2024 14:01

feat: half-way there

ca20357

[pre-commit.ci] auto fixes from pre-commit.com hooks

4addee1

for more information, see https://pre-commit.ci

steven-murray requested a review from daviesje December 14, 2024 00:29

feat: refactor of output structs

28bee8b

steven-murray marked this pull request as ready for review December 14, 2024 00:51

steven-murray added 5 commits December 14, 2024 01:51

merge main

adb74e2

test: make all tests work again

5d0b065

Merge branch 'v4-prep' into output-struct-overhaul

faf8494

fix: can't import lightcones because of circ dep

3b7f045

fix: typo in method update

5fd5117

docs: update coeval and lightcone tutorials

6b894c8

daviesje approved these changes Dec 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output Struct Overhaul #445

Output Struct Overhaul #445

steven-murray commented Dec 4, 2024 •

edited

Loading

review-notebook-app bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024

daviesje left a comment

daviesje Dec 19, 2024

steven-murray Jan 9, 2025

daviesje Dec 29, 2024

steven-murray Jan 9, 2025

daviesje Dec 29, 2024

steven-murray Jan 9, 2025

daviesje Dec 30, 2024

steven-murray Jan 9, 2025

daviesje Dec 30, 2024

steven-murray Jan 9, 2025

daviesje Dec 30, 2024

steven-murray Jan 9, 2025

daviesje Dec 30, 2024

steven-murray Jan 9, 2025

daviesje Dec 31, 2024

steven-murray Jan 9, 2025

daviesje Dec 31, 2024

steven-murray Jan 9, 2025

daviesje Dec 31, 2024

steven-murray Jan 9, 2025

daviesje commented Dec 31, 2024 •

edited

Loading

Output Struct Overhaul #445

Are you sure you want to change the base?

Output Struct Overhaul #445

Conversation

steven-murray commented Dec 4, 2024 • edited Loading

Arrays and Backend mapping

OutputStructs

Caching / IO of single-fields (OutputStruct)

Single-Field Computations

Lightcone / Coeval

Configuration

Other Stuff

review-notebook-app bot commented Dec 24, 2024

codecov bot commented Dec 24, 2024

Codecov Report

daviesje left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daviesje commented Dec 31, 2024 • edited Loading

steven-murray commented Dec 4, 2024 •

edited

Loading

daviesje commented Dec 31, 2024 •

edited

Loading