Skip to content

Commit

Permalink
Issue/18/dataset types (#24)
Browse files Browse the repository at this point in the history
* Added RailDataset, to allow for strong checking of type-matching between plotters and the datasets they use

* isort

* fixing up docs

* use classes own name in generate_dataset_dict()
  • Loading branch information
eacharles authored Feb 6, 2025
1 parent b5c282c commit 6c8a0ca
Show file tree
Hide file tree
Showing 49 changed files with 545 additions and 673 deletions.
21 changes: 9 additions & 12 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@
import subprocess
import sys
import pkgutil
import rail.projects
import rail.plotting
import rail.cli.rail_plot
import rail.cli.rail_project
import rail
#import rail.plotting
#import rail.cli.rail_project
#import rail.cli.rail_plot


sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('../src/rail/cli'))
sys.path.insert(0, os.path.abspath('../src'))

print(sys.path)

Expand Down Expand Up @@ -114,11 +114,6 @@
nbsphinx_allow_errors = True


autodoc_default_options = {
'special-members': '__call__',
}


# use type hints in autodoc
autodoc_typehints = "description"

Expand Down Expand Up @@ -193,8 +188,10 @@ def run_apidoc(_):
cur_dir = os.path.normpath(os.path.dirname(__file__))
output_path = os.path.join(cur_dir, 'api')

src_path = os.path.normpath(os.path.join(os.path.dirname(__file__), '..', 'src', 'rail'))
paramlist = ['--separate', '--implicit-namespaces', '-M', '-o', output_path, '-f', src_path]
base_path = os.path.normpath(os.path.join(os.path.dirname(__file__), '..', 'src'))

srcpath = os.path.normpath(os.path.join(base_path, 'rail'))
paramlist = ['--separate', '--implicit-namespaces', '--no-toc', '-M', '-o', output_path, '-f', srcpath]
print(f"running {paramlist}")
apidoc_main(paramlist)

Expand Down
9 changes: 6 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ guidance on citing RAIL and the underlying algorithms.

source/contributing
source/fix_an_issue
source/new_dataset
source/new_plotter
source/new_data_extractor
source/new_dataset_holder

.. toctree::
Expand All @@ -73,7 +73,10 @@ guidance on citing RAIL and the underlying algorithms.
demos

.. toctree::
:maxdepth: 2
:maxdepth: 4
:caption: API

api/modules
api/rail



14 changes: 7 additions & 7 deletions docs/source/analysis_components.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,49 +67,49 @@ There are several sub-classes of `RailAlgorithmHolder` for different types of al
PZAlgorithm
-----------

.. autoclass:: rail.projects. algorithm_holder.RailPZAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailPZAlgorithmHolder
:noindex:


Summarizer
----------

.. autoclass:: rail.projects. algorithm_holder.RailSummarizerAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailSummarizerAlgorithmHolder
:noindex:


Classifier
----------

.. autoclass:: rail.projects. algorithm_holder.RailClassificationAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailClassificationAlgorithmHolder
:noindex:


SpecSelection
-------------

.. autoclass:: rail.projects. algorithm_holder.RailSpecSelectionAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailSpecSelectionAlgorithmHolder
:noindex:


ErrorModel
----------

.. autoclass:: rail.projects. algorithm_holder.RailErrorModelAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailErrorModelAlgorithmHolder
:noindex:


Reducer
-------

.. autoclass:: rail.projects. algorithm_holder.RailReducerAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailReducerAlgorithmHolder
:noindex:


Subsampler
----------

.. autoclass:: rail.projects. algorithm_holder.RailSubsamplerAlgorithmHolder
.. autoclass:: rail.projects.algorithm_holder.RailSubsamplerAlgorithmHolder
:noindex:


Expand Down
19 changes: 0 additions & 19 deletions docs/source/cli.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ We anticipate a few types of contributions, and provide separate instructions
for those workflows:

* :ref:`Fix an Issue` in the codebase
* :ref:`Adding a new RailDataset type`
* :ref:`Adding a new RailPlotter`
* :ref:`Adding a new DataExtractor`
* :ref:`Adding a new RailDatasetHolder`

61 changes: 0 additions & 61 deletions docs/source/new_data_extractor.rst

This file was deleted.

33 changes: 33 additions & 0 deletions docs/source/new_dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
=============================
Adding a new RailDataset type
=============================

Because of the variety of formats of files in RAIL, and the variety of analysis flavors
in a ``RailProject``, it is useful to be able to define the particular types of
datasets that are needed to make specific plots. These are implemented as subclasses of the :py:class:`rail.plotting.dataset.RailDataset` class.
A ``RailDataset`` is intended define the quantities needed to make a particular type of plot.


New RailDataset Example
-----------------------

The following example has all of the required pieces of a ``RailDataset`` and almost nothing else.

.. code-block:: python
class RailPZPointEstimateDataset(RailDataset):
"""Dataet to hold a vector p(z) point estimates and corresponding
true redshifts
"""
data_types = dict(
truth=np.ndarray,
pointEstimate=np.ndarray,
)
The required pieces, in the order that they appear are:

#. The ``RailPZPointEstimateDataset (RailDataset):`` defines a class called ``RailPZPointEstimateDataset`` and specifies that it inherits from ``RailDataset``.

#. The ``data_types`` define names and expected data types of the required data.
38 changes: 25 additions & 13 deletions docs/source/new_dataset_holder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,11 @@ The following example has all of the required pieces of a ``RailDatasetHolder``

.. code-block:: python
class RailProjectDatasetHolder(RailDatasetHolder):
class RailPZPointEstimateDataHolder(RailDatasetHolder):
"""Simple class for holding a dataset for plotting data that comes from a RailProject"""
config_options: dict[str, StageParameter] = dict(
name=StageParameter(str, None, fmt="%s", required=True, msg="Dataset name"),
extractor=StageParameter(
str, None, fmt="%s", required=True, msg="Dataset extractor class name"
),
project=StageParameter(
str, None, fmt="%s", required=True, msg="RailProject name"
),
Expand All @@ -45,17 +42,17 @@ The following example has all of the required pieces of a ``RailDatasetHolder``
extractor_inputs: dict = {
"project": RailProject,
"extractor": RailProjectDataExtractor,
"selection": str,
"flavor": str,
"tag": str,
"algo": str,
}
output_type: type[RailDataset] = RailPZPointEstimateDataset
def __init__(self, **kwargs: Any):
RailDatasetHolder.__init__(self, **kwargs)
self._project: RailProject | None = None
self._extractor: RailProjectDataExtractor | None = None
def __repr__(self) -> str:
ret_str = (
Expand All @@ -69,14 +66,9 @@ The following example has all of the required pieces of a ``RailDatasetHolder``
def get_extractor_inputs(self) -> dict[str, Any]:
if self._project is None:
self._project = RailDatasetFactory.get_project(self.config.project)()
if self._extractor is None:
self._extractor = RailProjectDataExtractor.create_from_dict(
dict(name=self.config.name, class_name=self.config.extractor),
)
self._project = RailDatasetFactory.get_project(self.config.project).resolve()
the_extractor_inputs = dict(
project=self._project,
extractor=self._extractor,
selection=self.config.selection,
flavor=self.config.flavor,
tag=self.config.tag,
Expand All @@ -85,6 +77,15 @@ The following example has all of the required pieces of a ``RailDatasetHolder``
self._validate_extractor_inputs(**the_extractor_inputs)
return the_extractor_inputs
def _get_data(self, **kwargs: Any) -> dict[str, Any] | None:
return get_pz_point_estimate_data(**kwargs)
@classmethod
def generate_dataset_dict(
cls,
**kwargs: Any,
) -> list[dict[str, Any]]:
The required pieces, in the order that they appear are:

Expand All @@ -94,8 +95,19 @@ The required pieces, in the order that they appear are:

#. The ``extractor_inputs = [('input', PqHandle)]`` and ``outputs = [('output', PqHandle)]`` define the inputs that will be based to the

#. The ``output_type: type[RailDataset] = RailPZPointEstimateDataset``
line specifies that this class will return a
RailPZPointEstimateDataset dataset.

#. The ``__init__`` method does any class-specific initialization, in this case defining that this class will store and project and extractor

#. The ``__repr__`` method is optional, here it gives a useful representation of the class

#. The ``get_extractor_inputs()`` method does the actual work, note that it doesn't take any arguments, that it uses the factories to find the helper objects and passes algo it's configuration and validates it's outputs
#. The ``get_extractor_inputs()`` method does the first part of the actual work, note
that it doesn't take any arguments, that it uses the factories to
find the helper objects and passes algo it's configuration and
validates it's outputs

#. The ``_get_data()`` method does the rest of actual work (in this case it passes it off to a utility function ``get_pz_point_estimate_data`` which knows how to extract data from the ``RailProject``

#. The ``generate_dataset_dict()`` can scan a ``RailProject`` and generate a dictionary of all the available datasets
11 changes: 6 additions & 5 deletions docs/source/new_plotter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,7 @@ The following example has all of the required pieces of a ``RailPlotter`` and al
n_zbins=StageParameter(int, 150, fmt="%i", msg="Number of z bins"),
)
inputs: dict = {
"truth": np.ndarray,
"pointEstimate": np.ndarray,
}
input_type = RailPZPointEstimateDataset
def _make_2d_hist_plot(
self,
Expand Down Expand Up @@ -90,7 +87,11 @@ The required pieces, in the order that they appear are:

#. The ``config_options`` lines define the configuration parameters for this class, as well as their default values. Note that here we are copying the configuration parameters from the ``RailPlotter`` as well as defining some new ones.

#. The ``inputs: dict = ...`` define the inputs and expected data types for those, in this case two numpy arrays
#. The ``input_type = RailPZPointEstimateDataset`` specifies that this
plotter expects a :py:class:`rail.plotting.pz_plotters.RailPZPointEstimateDataset` type dataset, which in
this case is an dict with one item (called ``truth``) that is a
numpy array, and a second item (called ``pointEstimate``) that is a
also a numpy array.

#. The ``__init__`` method does any class-specific initialization. In this case there isn't any and the method is superfluous.

Expand Down
Loading

0 comments on commit 6c8a0ca

Please sign in to comment.