Skip to content

Commit

Permalink
Merge pull request #93 from roocs/decadal_fixes
Browse files Browse the repository at this point in the history
Decadal fixes
  • Loading branch information
ellesmith88 authored Dec 2, 2021
2 parents e556186 + a1f33bd commit dfc4d29
Show file tree
Hide file tree
Showing 51 changed files with 5,550 additions and 369 deletions.
182 changes: 178 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ dachar (pron. "day-car")



.. image:: https://img.shields.io/travis/roocs/dachar.svg
:target: https://travis-ci.org/github/roocs/dachar
:alt: Travis
.. image:: https://github.com/roocs/dachar/workflows/build/badge.svg
:target: https://github.com/roocs/dachar/actions
:alt: Build Status



Expand All @@ -37,7 +37,6 @@ Examples ESGF data sets are:
* **CMIP5**\ : ``cmip5.output1.MPI-M.MPI-ESM-LR.decadal1995.mon.land.Lmon.r5i1p1.v20120529``
* **CORDEX**\ : ``cordex.output.AFR-44.DMI.ECMWF-ERAINT.evaluation.r1i1p1.HIRHAM5.v2.day.uas.v20140804``


* Free software: BSD
* Documentation: https://dachar.readthedocs.io.

Expand All @@ -54,6 +53,181 @@ There are three main stages to the characterisation process:
#. **Define Fixes**\ : Suggest fixes required to individual data sets to overcome the
irregularities. Write the required fixes to a new set of files (JSON).

See below for using the cli to scan, analyse, propose fixes and process fixes.
Character, analysis, fix and fix proposal records are stored on elasticsearch indices.
Creating, deleting and writing to indices is described below. The elastic api token must be set in ``etc/roocs.ini`` in order to do these actions.

Characterising
==============

Scanning
--------

.. code-block::
$ dachar scan <project> -l <location>
e.g. ``dachar scan c3s-cmip6 -l ceda``. This will scan all c3s-cmip6 datasets.

There are 2 different scanning modes available - either quick or full. Use ``-m full`` or ``-m quick``. Quick scans can be overwritten with full scans using ``-m full-force``.

Use ``dachar scan -h`` to see the options available for scanning specific datasets.


Analysing
---------

To analyse populations of datasets. The sample id identifies the population to analyse.

.. code-block::
$ dachar analyse -s <sample-id> <project> -l <location>
Using the flag `-f` will overwrite existing analysis records for the sample id.

Proposing Fixes
---------------

Analysis will automatically prpose fixes if any are found, however, if fixes are identified by another source they can be proposed.

There are different way of proposing fixes

1. By providing a JSON file of the fix. More than one JSON file can be provided.

.. code-block::
$ dachar propose-fixes -f <json_file>,<json_file2>,<json_file3>
2. By providing a JSON template and a list of datasets that the fix should be proposed for.

.. code-block::
$ dachar propose-fixes -t <json_template> -d <dataset_list>
See the directory ``tests/test_fixes/decadal_fixes`` for examples.

Note that if CMIP6 fixes are intended to be used for CDS datasets - the ds ids for the datasets must start with ``c3s-cmip6`` instead of ``CMIP6``.

Processing Fixes
----------------

To publish or reject proposed fixes use:

.. code-block::
$ dachar process-fixes -a process
This can also be used as:

.. code-block::
$ dachar process-fixes -a process -d <dataset-id>,<dataset-id>
to process specific fixes.

To withdraw existing fixes, use:

.. code-block::
$ dachar process-fixes -a withdraw -d <dataset-id>,<dataset-id>
To publish all fixes use:

.. code-block::
$ dachar process-fixes -a publish-all
To reject all fixes use:

.. code-block::
$ dachar process-fixes -a reject-all
In this case you will be prompted to give a reason for rejection. This will be applied to all fixes.

Adding to elasticsearch
=======================
When a new version of the index is being created:

1. A new index must be created with new date. This can be done by creating an empty index or cloning the old one.
Creating an empty index will just make a new index with the date of creation and update the alias to point to it if desired.
Cloning creates a new index with the date of creation, fills it with all documents from the old index and updates the alias to point to it if desired.


2. It can then be populated either with all documents in local store or one document at a time.


Cloning an index
----------------
To create an index with today's date and populate it with all documents from another index.

.. code-block::
$ python dachar/index/cli.py clone -i <index-to-create> -c <index-to-clone>
e.g. ``python dachar/index/cli.py clone -i fix -c roocs-fix-2020-12-21``

To update the alias to point to this new index, provide the `-u` flag.

.. code-block::
$ python dachar/index/cli.py clone -i <index-to-create> -c <index-to-clone> -u
Creating an empty index
-----------------------
To create an empty index with today's date.

.. code-block::
$ python dachar/index/cli.py create -i <index-to-create>
e.g. ``python dachar/index/cli.py create -i fix``

To update the alias to point to this new index, provide the `-u` flag.

.. code-block::
$ python dachar/index/cli.py create -i <index-to-create> -u
Deleting an index
------------------
To delete an index.

.. code-block::
$ python dachar/index/cli.py delete -i <index-to-delete>
e.g. ``python dachar/index/cli.py delete -i roocs-fix-2020-12-21``


Populating an index from a local json store
-------------------------------------------
Popluate an elasticsearch index with the contents of a local store.

.. code-block::
$ python dachar/index/cli.py populate -s <store> -i <index-to-populate>
Store must be one of fix, fix-proposal, analysis or character.

e.g. ``python dachar/index/cli.py populate -s fix -i roocs-fix-2020-12-21``


Adding one document to an existing index
----------------------------------------
To add one document from any file path to a store

.. code-block::
$ python dachar/index/cli.py add-document -f <file-path> -d <drs-id> -i <index>
drs-id is what the id is called in the index i.e. either dataset_id (for fix, character and fix proposal store) or sample_id (for the analysis store)

e.g. ``python dachar/index/cli.py add-document -f /path/to/doc.json -d c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.Amon.rlds.gr1.v20190619 -i roocs-fix-2020-12-21``


Credits
=======

Expand Down
92 changes: 92 additions & 0 deletions create-decadal-fixes-and-test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/usr/bin/env python

import time
import os

from dachar.utils.switch_ds import get_grouped_ds_id

basedir = "/badc/cmip6/data"

dsids = """
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r10i1p1f1/Amon/pr/gn/v20210719
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r10i1p1f1/Amon/psl/gn/v20210719
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r10i1p1f1/Amon/tas/gn/v20210719
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r1i1p1f1/Amon/pr/gn/v20210312
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r1i1p1f1/Amon/psl/gn/v20210312
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r1i1p1f1/Amon/tas/gn/v20210312
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r2i1p1f1/Amon/pr/gn/v20210312
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r2i1p1f1/Amon/psl/gn/v20210312
/badc/cmip6/data/CMIP6/DCPP/CMCC/CMCC-CM2-SR5/dcppA-hindcast/s1960-r2i1p1f1/Amon/tas/gn/v20210312
""".strip().replace(basedir + "/", "").replace("/", ".").split()


DS_LIST_FILE = "DSET_IDS.txt"
FIX_DIR = "./decadal_fixes"


def sleep():
time.sleep(1)


def prep_dir(fpath):
dr = os.path.dirname(fpath)
if not os.path.isdir(dr):
os.makedirs(dr)


def write_fix_file(dsid):
fix_file_path = os.path.join(FIX_DIR, get_grouped_ds_id(dsid) + ".json")
prep_dir(fix_file_path)
cmd = f"ROOCS_CONFIG=MY_roocs.ini python generate_decadal_fix.py -f {fix_file_path} -d {dsid}"
print(f"Running: {cmd}")
os.system(cmd)
return fix_file_path


def propose_fix(dsid):
ds_file = "./dsid.txt"
with open(ds_file, "w") as w:
w.write(dsid)

print(f"Wrote dsid to: {ds_file}")
fix_file_path = write_fix_file(dsid)

cmd = f"ROOCS_CONFIG=MY_roocs.ini dachar propose-fixes --template {fix_file_path} --dataset-list {ds_file}"
print(f"[INFO] Running: {cmd}")
os.system(cmd)
sleep()


def main():

prep_dir(FIX_DIR)

print("Deleting and regenerating index (so that it deals with mapping issues and creates alias)")
indexes = ("c3s-roocs-fix-prop", "c3s-roocs-fix")

for indx in indexes:
os.system(f"ROOCS_CONFIG=MY_roocs.ini python dachar/index/cli.py delete -i {indx}")
sleep()

for indx in ("fix", "fix-proposal"):
os.system(f"ROOCS_CONFIG=MY_roocs.ini python dachar/index/cli.py create -u -i {indx}")
sleep()

for dsid in dsids:
p = os.path.join(basedir, dsid.replace(".", "/"))

print(f"[INFO] Checking: {dsid}")
if not os.path.isdir(p):
raise Exception(f"[ERROR] {p} does not exist!")

c3s_cmip6_dsid = dsid.replace("CMIP6", "c3s-cmip6")
propose_fix(c3s_cmip6_dsid)

print("FINALLY: run this to publish the fixes:\n"
"ROOCS_CONFIG=MY_roocs.ini dachar process-fixes -a publish-all")


if __name__ == "__main__":

main()

6 changes: 4 additions & 2 deletions dachar/analyse/analysis_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ class LocalAnalysisRecordsStore(_LocalBaseJsonStore):
store_name = "Analysis Results Store"
config = {
"store_type": "local",
"local.base_dir": "/tmp/an-res-store",
"local.dir_grouping_level": 4,
"local.base_dir": CONFIG["dachar:store"].get("analysis_store",
"/tmp/an-res-store"),
"local.dir_grouping_level": CONFIG["dachar:settings"].get(
"dir_grouping_level", 4)
}
id_mappers = {"*": "__ALL__"}
required_fields = [
Expand Down
8 changes: 7 additions & 1 deletion dachar/analyse/checks/_base_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pprint as pp

from dachar.utils import UNDEFINED, nested_lookup, JDict

from dachar import __version__ as version
from dachar.utils.get_stores import get_fix_prop_store, get_dc_store

from dachar.fixes.fix_api import get_fix
Expand Down Expand Up @@ -53,6 +53,12 @@ class _BaseCheck(object):
typical_threshold = 0.41
atypical_threshold = 0.15

source = {
"name": "dachar",
"version": f"{version}",
"comment": "No specific source provided - link to all fixes in dachar",
"url": "https://github.com/roocs/dachar/tree/master/dachar/fixes"}

def __init__(self, sample):
self.sample = sample
self._load()
Expand Down
Loading

0 comments on commit dfc4d29

Please sign in to comment.