From f601b2699abb0bd0a2dab0896a02214c4de5deca Mon Sep 17 00:00:00 2001 From: Eric Charles Date: Mon, 3 Feb 2025 01:09:42 -0800 Subject: [PATCH] Eac/reredoc (#13) * updated docs * updating docs --- docs/index.rst | 28 +++--- docs/source/components.rst | 139 ++++++++++++++++++++++++++- docs/source/contributing.rst | 18 ++-- docs/source/factories.rst | 76 --------------- docs/source/fix_an_issue.rst | 4 +- docs/source/flavors.rst | 87 +++++++++++++++++ docs/source/installation.rst | 13 ++- docs/source/new_data_extractor.rst | 9 +- docs/source/new_dataset_holder.rst | 8 +- docs/source/new_plotter.rst | 8 +- docs/source/overview.rst | 62 +++--------- docs/source/pipelines.rst | 145 +++++++++++++++++++++++++++++ docs/source/rail_project.rst | 136 ++++++++++++++++++++++----- tests/ci_project.yaml | 4 +- 14 files changed, 539 insertions(+), 198 deletions(-) delete mode 100644 docs/source/factories.rst create mode 100644 docs/source/flavors.rst create mode 100644 docs/source/pipelines.rst diff --git a/docs/index.rst b/docs/index.rst index e50508e..a8970c0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,10 +1,14 @@ -========================================================================= +######################################################################### rail_projects: a toolkit for managing `RAIL`-based data analysis projects -========================================================================= +######################################################################### + +*********** +Description +*********** + +`rail_projects` is a tool-kit to manage RAIL-baseed data analysis +projects. ----- -RAIL ----- RAIL is a flexible open-source software library providing tools to produce at-scale photometric redshift data products, including uncertainties and summary statistics, and stress-test them under realistically complex systematics. @@ -19,15 +23,6 @@ See `guideline for citing RAIL guidance on citing RAIL and the underlying algorithms. ---------------- -`rail_projects` ---------------- - -`rail_projects` is a tool-kit to manage RAIL-baseed data analysis -projects. - - - .. toctree:: :maxdepth: 1 :caption: Getting Started @@ -39,9 +34,10 @@ projects. :maxdepth: 1 :caption: Concepts - source/rail_project + source/rail_project + source/pipelines + source/flavors source/components - source/factories .. toctree:: :maxdepth: 1 diff --git a/docs/source/components.rst b/docs/source/components.rst index 027b239..40acd8e 100644 --- a/docs/source/components.rst +++ b/docs/source/components.rst @@ -1,8 +1,39 @@ +************************************ +components, factories, and libraries +************************************ + +**components** + +Doing series of related studies using RAIL requires many pieces, such +as the lists of algorithms available, sets of analysis pipelines we +might run, types of plots we might make, types of data we can extract +from out analyses, references to particular files or sets of files we +want to use for out analysses, and so for. In general we call these +analysis components, and we need ways to keep track of them. + +We have implemented interfaces to allow us to read and write +components to yaml files. + + +**factories** + +A Factory is a python class that can make specific type or types of +components, assign names to each, and keep track of what it has made. + + +**libraries**: + +A library is the collection of all the components that have been +loaded. Typically there are collected into one, or a few yaml +configuration files to allow users to load them easily. + + ******************* -Analysis components +Analysis Components ******************* +========================= Analysis component basics ========================= @@ -16,7 +47,7 @@ The basic interface to analysis components is the :py:class:`rail.projects.confi 4. mechansims to read/write the component to yaml, including the ``yaml_tag`` class member defining the yaml tag that marks a block of yaml as defining an object of a particular type of component. - +============================ File and Catalog definitions ============================ @@ -76,7 +107,7 @@ When called with a dict such as `{flavor: baseline, healpix : [3433, 3344]}` the `a_file/3344/baseline_data.hdf5` - +===================== Algorithm definitions ===================== @@ -178,7 +209,7 @@ Subsample :py:class:`rail.projects.subsample_factor.RailSubsample` just provides parameters such as the random number seed and number of object requested need by subsamplers. - +================ Plot definitions ================ @@ -201,6 +232,7 @@ types of plots. +=========================== Plotting dataset defintions =========================== @@ -227,7 +259,7 @@ Project - +====================== Plot Group definitions ====================== @@ -236,3 +268,100 @@ PlotGroup --------- :py:class:`rail.plotting.plot_group.RailPlotGroup` defines a set of plots to make by iterating over a `PlotterList` and a `DatasetList`. + + + +********* +Factories +********* + + +============== +Factory basics +============== + +A Factory is a python class that can make specific type or types of +components, assign names to each, and keep track of what it has made. + +The basic interface to Factories is the :py:class:`rail.projects.factory_mixin.FactoryMixin` class, which defines a few things, + +1. The "Factory pattern" of having a singleton instance of the factory that manages all the components of particular types, and class methods to interact with the instance. +2. A `client_classes` class member object specifying what types of components a particular factory manages. +3. Methods to add objects to a factory, and reset the factory contents. +4. Interfaces for reading and writing objects to and from yaml files. +5. Type validation, to ensure that only the correct types of objects are created or added to factories. + + +================== +Specific Factories +================== + +.. list-table:: Factories + :widths: 40 10 10 40 + :header-rows: 1 + + * - Factory Class + - Yaml Tag + - Example Yaml File + - Managed Classes + * - :py:class:`rail.projects.project_file_factory.RailProjectFileFactory` + - `Files` + - `tests/ci_project_files.yaml `_ + - `RailProjectFileInstance`, `RailProjectFileTemplate` + * - :py:class:`rail.projects.catalog_factory.RailCatalogFactory` + - `Catalogs` + - `tests/ci_catalogs.yaml `_ + - `RailProjectCatalogInstance`, `RailProjectCatalogTemplate` + * - :py:class:`rail.projects.subsample_factory.RailSubsampleFactory` + - `Subsamples` + - `tests/ci_subsamples.yaml `_ + - `RailSubsample` + * - :py:class:`rail.projects.selection_factory.RailSelectionFactory` + - `Selections` + - `tests/ci_selections.yaml `_ + - `RailSelection` + * - :py:class:`rail.projects.algorithm_factory.RailAlgorithmFactory` + - `PZAlgorithms` + - `tests/ci_algorithms.yaml `_ + - `RailPZAlgorithmHolder` + * - + - `Classifiers` + - + - `RailClassificationAlgorithmHolder` + * - + - `Summarizers` + - + - `RailSummarizerAlgorithmHolder` + * - + - `SpecSelections` + - + - `RailSpecSelectionAlgorithmHolder` + * - + - `ErrorModels` + - + - `RailErrorModelAlgorithmHolder` + * - + - `Subsamplers` + - + - `RailSubsamplerAlgorithmHolder` + * - + - `Reducers` + - + - `RailReducerAlgorithmHolder` + * - :py:class:`rail.projects.pipeline_factory.RailPipelineFactory` + - `Pipelines` + - `tests/ci_pipelines.yaml `_ + - `RailPipelineTemplate`, `RailPipelineInstance` + * - :py:class:`rail.plotting.plotter_factory.RailPlotterFactory` + - `Plots` + - `tests/ci_plots.yaml `_ + - `RailPlotter`, `RailPlotterList` + * - :py:class:`rail.plotting.dataset_factory.RailDatasetFactory` + - `Data` + - `tests/ci_datasets.yaml `_ + - `RailDatasetHolder`, `RailDatasetListHolder`, `RailProjectHolder` + * - :py:class:`rail.plotting.plot_group_factory.RailPlotGroupFactory` + - `PlotGroups` + - `tests/ci_plot_groups.yaml `_ + - `RailPlotGroup` + diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index f5fb7e6..5ef9827 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -5,7 +5,7 @@ Contribution Overview RAIL is a constellation of multiple packages developed publicly on GitHub and welcomes all interested developers, regardless of DESC membership or LSST data rights. - +==================== Contributing to RAIL ==================== @@ -14,9 +14,9 @@ algorithms or similar analysis tools, please visit `contributing to RAIL `_ ---------------------------------- +================================= Contributing to ``rail_projects`` ---------------------------------- +================================= If you're interested in contributing to `rail_projects`, but don't know where to start, take a look at the @@ -36,6 +36,7 @@ Those without data rights who wish to gain access to the Slack channel should the team leads initiate the process for adding a DESC External Collaborator. +==================== Where to contribute: ==================== @@ -43,6 +44,8 @@ In all cases, begin by following the developer installation instructions :ref:`Developer Installation` and follow the contribution workflow instructions below. + +===================== Contribution workflow ===================== @@ -113,27 +116,28 @@ It is also considered good practice to make suggestions for optional improvement such as adding a one-line comment before a clever block of code or including a demonstration of new functionality in the example notebooks. + Naming conventions -================== +------------------ We follow the `pep8 `_ recommendations for naming new modules. Modules -------- +^^^^^^^ Modules should use all lowercase, with underscores where it aids the readability of the module name. Classes -------- +^^^^^^^ Python classes and so should use the CapWords convention. - +================== Contribution Types ================== diff --git a/docs/source/factories.rst b/docs/source/factories.rst deleted file mode 100644 index 7bc9e14..0000000 --- a/docs/source/factories.rst +++ /dev/null @@ -1,76 +0,0 @@ -********* -Factories -********* - - -Factory basics -============== - -A Factory is a python class that can make specific type or types of -components, assign names to each, and keep track of what it has made. - -The basic interface to Factories is the :py:class:`rail.projects.factory_mixin.FactoryMixin` class, which defines a few things, - -1. The "Factory pattern" of having a singleton instance of the factory that manages all the components of particular types, and class methods to interact with the instance. -2. A `client_classes` class member object specifying what types of components a particular factory manages. -3. Methods to add objects to a factory, and reset the factory contents. -4. Interfaces for reading and writing objects to and from yaml files. -5. Type validation, to ensure that only the correct types of objects are created or added to factories. - - - -Specific Factories -================== - -.. list-table:: Title - :widths: 45 5 45 - :header-rows: 1 - - * - Factory Class - - Yaml Tag - - Managed Classes - * - :py:class:`rail.projects.project_file_factory.RailProjectFileFactory` - - `Files` - - `RailProjectFileInstance`, `RailProjectFileTemplate` - * - :py:class:`rail.projects.catalog_factory.RailCatalogFactory` - - `Catalogs` - - `RailProjectCatalogInstance`, `RailProjectCatalogTemplate` - * - :py:class:`rail.projects.subsample_factory.RailSubsampleFactory` - - `Subsamples` - - `RailSubsample` - * - :py:class:`rail.projects.selection_factory.RailSelectionFactory` - - `Selections` - - `RailSelection` - * - :py:class:`rail.projects.algorithm_factory.RailAlgorithmFactory` - - `PZAlgorithms` - - `RailPZAlgorithmHolder` - * - - - `Classifiers` - - `RailClassificationAlgorithmHolder` - * - - - `Summarizers` - - `RailSummarizerAlgorithmHolder` - * - - - `SpecSelections` - - `RailSpecSelectionAlgorithmHolder` - * - - - `ErrorModels` - - `RailErrorModelAlgorithmHolder` - * - - - `Subsamplers` - - `RailSubsamplerAlgorithmHolder` - * - - - `Reducers` - - `RailReducerAlgorithmHolder` - * - :py:class:`rail.projects.pipeline_factory.RailPipelineFactory` - - `Pipelines` - - `RailPipelineTemplate`, `RailPipelineInstance` - * - :py:class:`rail.plotting.plotter_factory.RailPlotterFactory` - - `Plots` - - `RailPlotter`, `RailPlotterList` - * - :py:class:`rail.plotting.dataset_factory.RailDatasetFactory` - - `Data` - - `RailDatasetHolder`, `RailDatasetListHolder`, `RailProjectHolder` - * - :py:class:`rail.plotting.plot_group_factory.RailPlotGroupFactory` - - `PlotGroups` - - `RailPlotGroup` diff --git a/docs/source/fix_an_issue.rst b/docs/source/fix_an_issue.rst index 6fbecae..1ded4a8 100644 --- a/docs/source/fix_an_issue.rst +++ b/docs/source/fix_an_issue.rst @@ -1,6 +1,6 @@ -************ +============ Fix an Issue -************ +============ The typical workflow for fixing a specific issue will look something like the following: diff --git a/docs/source/flavors.rst b/docs/source/flavors.rst new file mode 100644 index 0000000..3ed2b80 --- /dev/null +++ b/docs/source/flavors.rst @@ -0,0 +1,87 @@ +**************** +Analysis Flavors +**************** + +A key concept in `rail_projects` are analysis `Flavors`, which are versions of +similar analyses with slightly different parameter settings and/or +input files. + +============= +Flavor basics +============= + +A `RailProject` will contain several `Flavors` and once special `Flavor`, called `Baseline`, +which is included in all the other `Flavors`. + +When various `RailProject` functions are called, the used can specifiy which `Flavors` should be +used. `RailProject` also has tools to make it easy to iterate over several `Flavors`, or even all +the available flavors, and call the functions for each one, + + +================== +Flavor definitions +================== + + +The parameters neeed to define a `Flavor` are given :py:class:`rail.projects.project.RailFlavor` + +.. code-block:: python + + config_options: dict[str, StageParameter] = dict( + name=StageParameter(str, None, fmt="%s", required=True, msg="Flavor name"), + catalog_tag=StageParameter( + str, None, fmt="%s", msg="tag for catalog being used" + ), + pipelines=StageParameter(list, ["all"], fmt="%s", msg="pipelines being used"), + file_aliases=StageParameter(dict, {}, fmt="%s", msg="file aliases used"), + pipeline_overrides=StageParameter(dict, {}, fmt="%s", msg="file aliases used"), + ) + + +These are: + +a `name` for the variant, used to construct filenames + +a `catalog_tag`, which identifies format of the data being used, and sets +the expected names of columns accordingly + +a list of `pipelines` that can be run in this variant + +a list of `file_aliases` that can be used to specify the +input files used in this variant + +a dict of `pipeline_overrides` that modify the behavior of the various pipelines + + +Here is an example: + +.. code-block:: yaml + + # Baseline configuraiton, included in others by default + Baseline: + catalog_tag: roman_rubin + pipelines: ['all'] + file_aliases: # Set the training and test files + test: test_file_100k + train: train_file_100k + train_zCOSMOS: train_file_zCOSMOS_100k + + # These define the variant configurations for the various parts of the analysis + Flavors: + - Flavor: + name: train_cosmos + pipelines: ['pz', 'tomography'] # only run the pz and tomography pipelines + file_aliases: # Set the training and test files + test: test_file_100k + train: train_file_zCOSMOS_100k + - Flavor: + name: gpz_gl + pipelines: ['pz'] # only run the pz pipeline + pipeline_overrides: # Override specifics for particular pipelines + default: + kwargs: + algorithms: ['gpz'] # Only run gpz + pz: # overrides for 'inform' pipline + inform_gpz: # overides for the 'inform_gpz' stage + gpz_method: GL + diff --git a/docs/source/installation.rst b/docs/source/installation.rst index e517687..457066b 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -26,8 +26,9 @@ algorithms) instructions `on the RAIL installation page `_ +======================= Production Installation ------------------------ +======================= Here we will be installing ``rail_projects`` into an existing conda environment "[env]". @@ -37,8 +38,9 @@ Here we will be installing ``rail_projects`` into an existing conda environment pip install pz-rail-projects +======================== Exploration Installation ------------------------- +======================== Here we will be installing the source code from `rail `_ to access all of the @@ -60,9 +62,9 @@ At that point you should be able to run the demonstration notebooks, e.g.; jupyter-notebook examples - +====================== Developer Installation ----------------------- +====================== Here we will be installing the source code from `rail `_ to be able to develop @@ -91,7 +93,7 @@ the source code. pip install -e '.[dev]' - +============= RAIL packages ============= @@ -99,6 +101,7 @@ Depending on how you want to use RAIL you will be installing one or more `RAIL packages `_ +============================= Adding your kernel to jupyter ============================= If you want to use the kernel that you have just created to run RAIL example demos, then you may need to explicitly add an ipython kernel. You may need to first install ipykernel with `conda install ipykernel`. You can do then add your kernel with the following command, making sure that you have the conda environment that you wish to add activated. From your environment, execute the command: diff --git a/docs/source/new_data_extractor.rst b/docs/source/new_data_extractor.rst index c736f2d..1b04306 100644 --- a/docs/source/new_data_extractor.rst +++ b/docs/source/new_data_extractor.rst @@ -1,6 +1,6 @@ -************************** +========================== Adding a new DataExtractor -************************** +========================== Because of the variety of formats of files in RAIL, and the variety of analysis flavors in a ``RailProject``, it is useful to be able to have re-usable tools that extract particular @@ -10,8 +10,9 @@ extract a particular set of data from the ``RailProject``. The inputs and outpu are all defined in particular ways to allow ``RailProjectDataExtractor`` objects to be integrated into larger data analysis pipelines. -Example -======= + +New DataExtractor Example +------------------------- The following example has all of the required pieces of a ``RailProjectDataExtractor`` and almost nothing else. diff --git a/docs/source/new_dataset_holder.rst b/docs/source/new_dataset_holder.rst index 4639d5e..23f208d 100644 --- a/docs/source/new_dataset_holder.rst +++ b/docs/source/new_dataset_holder.rst @@ -1,6 +1,6 @@ -****************************** +============================== Adding a new RailDatasetHolder -****************************** +============================== Because of the variety of formats of files in RAIL, and the variety of analysis flavors in a ``RailProject``, it is useful to be able to have re-usable tools that wrap particular types @@ -11,8 +11,8 @@ are all defined in particular ways to allow ``RailDatasetHolder`` objects to be integrated into larger data analysis pipelines. -Example -======= +New RailDatasetHolder Example +----------------------------- The following example has all of the required pieces of a ``RailDatasetHolder`` and almost nothing else. diff --git a/docs/source/new_plotter.rst b/docs/source/new_plotter.rst index 22c9465..d5bde56 100644 --- a/docs/source/new_plotter.rst +++ b/docs/source/new_plotter.rst @@ -1,6 +1,6 @@ -************************ +======================== Adding a new RailPlotter -************************ +======================== All of the various plotting classes are implemented as subclasses of the :py:class:`rail.plotting.plotter.RailPlotter` class. @@ -10,8 +10,8 @@ and configuration parameters are all defined in particular ways to allow ``RailP objects to be integrated into larger data analysis pipelines. -Example -======= +New RailPlotter Example +----------------------- The following example has all of the required pieces of a ``RailPlotter`` and almost nothing else. diff --git a/docs/source/overview.rst b/docs/source/overview.rst index 19b4d08..cc94370 100644 --- a/docs/source/overview.rst +++ b/docs/source/overview.rst @@ -2,62 +2,28 @@ Overview ******** -------------- +============= RAIL Overview -------------- +============= If you are interested in RAIL itself, please visit `RAIL overiew `_ ----------------------- -rail_projects Overview ----------------------- +====================== +rail.projects Overview +====================== +The :py:mod:`rail.projects` sub-package collects a set of tools to manage RAIL-based data analysis +studies. These tools help users define common pieced to analyses, +while also quickly testing many analysis variants with slight +configuration modifications. +====================== +rail.plotting Overview +====================== -Introduction to components, factories, libraries, and projects -************************************************************** +The :py:mod:`rail.plotting` sub-package collects a set of tools to make +plots from the data generated by using the :py:mod:`rail.projects` tools. -**components** - -Doing series of related studies using RAIL requires many pieces, such -as the lists of algorithms available, sets of analysis pipelines we -might run, types of plots we might make, types of data we can extract -from out analyses, references to particular files or sets of files we -want to use for out analysses, and so for. In general we call these -analysis components, and we need ways to keep track of them. - -We have implemented interfaces to allow us to read and write -components to yaml files. - - -**factories** - -A Factory is a python class that can make specific type or types of -components, assign names to each, and keep track of what it has made. - - -**libraries**: - -A library is the collection of all the components that have been -loaded. Typically there are collected into one, or a few yaml -configuration files to allow users to load them easily. - - -**projects**: - -A ``RailProject`` is the basic user interface class, which lets users -define and run a series of analysis pipelines in a set of different -analysis variants (called 'flavors'). - - - -`projects` -========== - - - -`plotting` -========== diff --git a/docs/source/pipelines.rst b/docs/source/pipelines.rst new file mode 100644 index 0000000..ef10b55 --- /dev/null +++ b/docs/source/pipelines.rst @@ -0,0 +1,145 @@ +********* +Pipelines +********* + +A key concept in `rail_projects` are ceci `Pipelines`, which run blocks of analysis code using `ceci`. + + +================ +Pipelines basics +================ + +A `RailProject` will have access to several `PipelineTemplates` and +use these to define the `Pipelines` that it runs. + +To do this, it will need some additional information. + +1. What `Flavor` to run the `Pipeline` with. This is specified by the + user, and will set up the `Pipeline` to expect the correct column + names. +2. What options to use to construct the `Pipeline`, such as which + algorihms to use, or additional paramters. The is done by merging + default infomation in the `PipelineTemplate` with `Flavor` specific information. +3. Any specific overrides for any of the `Pipeline` stages given in + the `Flavor` definitinos. +4. How to find the input files. How this is done depends on the type + of `Pipeline` and if it is being run on a single file or an entire catalog. +5. Where to write the output data. How this is done depends on the type + of `Pipeline` and if it is being run on a single file or an entire catalog. + + +Running a Pipline of a Catalog +------------------------------ + +When a `Pipeline` is run on a catalog of files, the +`input_catalog_template`, user supplied interpolants and possibly the `input_catalog_basename` parameters are +used to construct list of input files + +The `input_catalog_template` should refer to a `CatalogTemplate` that +will give the template for the catalog, e.g., +`{catalogs_dir}/{project}_{sim_version}/{healpix}/part-0.parquet`. + +All of the interpolants must be given in one of there places: + +1. The IterationVars block of the `RailProject` +2. The CommonPaths block of the `RailProject` +3. Explicitly in the keyword arguements provided to the call to run the catalog + + +The `output_catalog_template` is used to define the output directory +in much the same way. + + +Note that some catalogs have `{basename}` as an interpolant. Since +`ceci` write all of its output files to the same directory by +default, if we want to, say, create a different version of a +`degraded` catalog by selecting the output of a different degrader, we +can simply do so by picked a different file from the same directory. +We can specifiy this by setting the `input_catalog_basename` +parameter. + + +Running a Pipline single set of inputs +-------------------------------------- + +When a `Pipeline` is run on a single set of inputs, the +`input_file_templates` parameter and user supplied interpolants are +used to construct list of input files. + +For example, the input_file_templates might look like this: + + +.. code-block:: yaml + + input_file_templates: + input_train: + flavor: baseline + tag: train + input_test: + flavor: baseline + tag: test + + +This would specify that the are two inputs `input_train` and +`input_test` and that they should be resolved by getting the `test` +and `train` tags from the `FileAliases` block the current `Flavor`, +and resolving them with using `flavor=baseline` as an interpolant. + +This sounds a bit complicated. The idea here is that this is a +mechanism that allow use to input files created in one `Flavor` by +another `Flavor`. E.g., we can make testing / training files in the +baseline `Flavor` and then use them in many other `Flavors`. + + + +==================== +Pipeline definitions +==================== + +Here is an example of a `Pipeline` that we typically only run on catalogs. + +.. code-block:: yaml + + - PipelineTemplate: + name: truth_to_observed + pipeline_class: rail.pipelines.degradation.truth_to_observed.TruthToObservedPipeline + input_catalog_template: reduced + output_catalog_template: degraded + kwargs: + error_models: ['all'] + selectors: ['all'] + blending: true + + +Here is an example of a `Pipeline` that we typically run on individual +input files. + +.. code-block:: yaml + + - PipelineTemplate: + name: pz + pipeline_class: rail.pipelines.estimation.pz_all.PzPipeline + input_file_templates: + input_train: + flavor: baseline + tag: train + input_test: + flavor: baseline + tag: test + kwargs: + algorithms: ['all'] + + + +===================================== +Building pipelines with rail.projects +===================================== + + + + +==================================== +Running pipelines with rail.projects +==================================== + + diff --git a/docs/source/rail_project.rst b/docs/source/rail_project.rst index ad2d859..fd5190c 100644 --- a/docs/source/rail_project.rst +++ b/docs/source/rail_project.rst @@ -2,7 +2,7 @@ RailProject *********** - +================== RailProject basics ================== @@ -23,20 +23,31 @@ A `RailProject` basically specifies which `Pipelines` to run under which `Flavors`, and keeps track of the outputs. - +========================== Rail Project Functionality ========================== +Source code: :py:class:`rail.projects.project.RailProject` + +Once the analysis setup and analysis flavors are defined, +most of what users will do comes down to running a small set of +functions in `RailProject`, which we describe here. + + RailProject.load_config ----------------------- +Source code: :func:`rail.projects.project.RailProject.load_config` + Read a yaml file and create a RailProject RailProject.reduce_data ----------------------- +Source code: :func:`rail.projects.project.RailProject.reduce_data` + Make a reduced catalog from an input catalog by applying a selction and trimming unwanted colums. This is run before the analysis pipelines. @@ -44,34 +55,53 @@ and trimming unwanted colums. This is run before the analysis pipelines. RailProject.subsample_data -------------------------- +Source code: :func:`rail.projects.project.RailProject.subsample_data` + Subsample data from a catalog to make a testing or training file. -This is run after catalog level pipelines, but before pipeliens run +This is run after catalog level pipelines, but before pipelines run on indvidudal training/ testing samples. RailProject.build_pipelines --------------------------- -Build ceci pipeline yaml files. +Source code: :func:`rail.projects.project.RailProject.build_pipelines` + +Build ceci pipeline yaml files for a particular set of analysis +flavors. This will build all the pipelines that are defined for that +analysis flavor. RailProject.run_pipeline_single ------------------------------- -Run a pipeline on a single file +Source code: :func:`rail.projects.project.RailProject.run_pipeline_single` + +Run a pipeline on a single file for a specific analysis flavor. +This will require the user to specify any additional interpolants, +such as the selection name, that are needed to uniqule specify the +input files. + RailProject.run_pipeline_catalog -------------------------------- -Run a pipeline on a catalog of files +Source code: :func:`rail.projects.project.RailProject.run_pipeline_catalog` +Run a pipeline on a catalog of files for a specific analysis flavor. +This will require the user to specify any additional interpolants, +such as the selection name, that are needed to uniqule specify the +input files. + +========================== Rail Project Configuration ========================== Most of these element come from the shared library of elements, -which is accesible from rail.projects.library +which is accesible from :py:mod:`rail.projects.library` module. + Rail Project shared configuration files --------------------------------------- @@ -81,8 +111,13 @@ Rail Project shared configuration files List of shared configuration files to load -Rail Project analysis flavors ------------------------------ +Rail Project analysis flavor definitions +---------------------------------------- + +See :ref:`Flavor definitions` or +:py:class:`rail.projects.project.RailFlavor` for the parameters needed to define an +analysis `Flavor`. + `Baseline: dict[str, Any]` @@ -95,24 +130,63 @@ This is included in all the other analysis flavors List of all the analysis flavors that have been defined in this project + Rail Project bookkeeping elements --------------------------------- -These used to define the file paths for the project. +These are used to define the file paths for the project. `PathTemplates: dict[str, str]` -Overrides for templates used to construct file paths +Overrides for templates used to construct file paths. The defaults +are given in :py:mod:`rail.projects.name_utils` + +.. code-block:: python + + PathTemplates = dict( + pipeline_path="{pipelines_dir}/{pipeline}_{flavor}.yaml", + ceci_output_dir="{project_dir}/data/{selection}_{flavor}", + ceci_file_path="{tag}_{stage}.{suffix}", + ) + `CommonPaths: dict[str, str]` -Defintions of common paths used to construct file paths +Defintions of common paths used to construct file paths. The defaults +are given in :py:mod:`rail.projects.name_utils` + +.. code-block:: python + + CommonPaths = dict( + root=".", # needs to be overridden + scratch_root=".", # needs to be overridden + project="", # needs to be overridden + project_dir="{root}/projects/{project}", + project_scratch_dir="{scratch_root}/projects/{project}", + catalogs_dir="{root}/catalogs", + pipelines_dir="{project_dir}/pipelines", + ) `IterationVars: dict[str, list[str]]` -Iteration variables to construct the catalogs +Iteration variables to construct the catalogs. For example, the +roman-rubin catalog is split by healpix pixel, and to get the whole +catalog you have to iterate over all the healpix pixels, so this would +look like + +.. code-block:: yaml + + IterationVars: + healpix: [all_the_pixels] + + +Note that if you want to set up a project to only use some of the +available data, that is prefectly fine. All you have to do is shorten +the list. + + Rail Project shared elements @@ -123,15 +197,27 @@ of the names of things that are defined in the library that can be used in this project. The default is to use all the items defined in the library. -`Catalogs: list[str]` These are actually CatalogTemplates -`Files: list[str]` These are actually FileTemplates -`Pipelines: list[str]` These are actually PipelineTemplates -`Reducers: list[str]` These reduce the input data catalog -`Subsamplers: list[str]` These subsample catalogs to get individual files -`Selections: list[str]` These are the selection parameters -`Subsamples: list[str]` These are the subsample parameters -`PZAlgorithms: list[str]` -`SpecSelections: list[str]` -`Classifiers: list[str]` -`Summarizers: list[str]` -`ErrorModels: list[str]` +`Catalogs: list[str] = ['all']` These are actually CatalogTemplates + +`Files: list[str] = ['all']` These are actually FileTemplates + +`Pipelines: list[str] = ['all']` These are actually PipelineTemplates + +`Reducers: list[str] = ['all']` These reduce the input data catalog + +`Subsamplers: list[str] = ['all'] These subsample catalogs to get individual +files + +`Selections: list[str] = ['all']` These are the selection parameters + +`Subsamples: list[str] = ['all']` These are the subsample parameters + +`PZAlgorithms: list[str] = ['all']` + +`SpecSelections: list[str] = ['all']` + +`Classifiers: list[str] = ['all']` + +`Summarizers: list[str] = ['all']` + +`ErrorModels: list[str] = ['all']` diff --git a/tests/ci_project.yaml b/tests/ci_project.yaml index 3fc0d83..e23ea6b 100644 --- a/tests/ci_project.yaml +++ b/tests/ci_project.yaml @@ -34,11 +34,11 @@ Project: train: train_file_zCOSMOS_100k - Flavor: name: gpz_gl - pipelines: ['inform', 'estimate', 'evaluate', 'pz'] + pipelines: ['pz'] # only run the pz pipeline pipeline_overrides: # Override specifics for particular pipelines default: kwargs: - algorithms: ['gpz'] + algorithms: ['gpz'] # Only run gpz inform: inform_gpz: gpz_method: GL