Parameter validation in the presence of {}-substitutions #710

o-smirnov · 2021-04-27T12:18:52Z

o-smirnov
Apr 27, 2021
Maintainer

I have realized that a single up-front validation step (before the recipe is run) can't be sufficient. Some inputs for intermediate steps (files etc.) may not exist, {}-substitutions may evaluate to different things at runtime (depending on outputs of previous steps), globs like "{self.prefix}-????-image.fits" can't be expanded into file lists until the wsclean step is actually run, etc. etc.

Therefore, I propose to implement three validation phases for each step:

Pre-validation. The recipe has not been run yet, so parameter settings may be invalid. Simple parameters, and those without substitutions, are validated, the rest are marked as "unresolved". This phase is intended to catch obvious errors only.

When pre-validating a recipe, only validated aliases are propagated up to the steps. Unresolved aliases are not propagated. Child steps are allowed to have unresolved parameters (but they can't be missing required parameters, this can already be detected at this phase)
Runtime inputs validation, just before a step is run. Substitutions are done on inputs, and all inputs are validated. Files must exist, etc. A recipe's top-level input aliases (now all validated) are propagated up to its steps.
Runtime outputs validation, after a step has been run. Substitutions are done on outputs, and all outputs are validated. Files must exist, etc. A recipe's top-level output aliases are gathered from its steps.

I have a pretty clear idea how to implement this succinctly, but @SpheMakh please try to poke some holes in this logic first.

o-smirnov · 2021-04-29T17:38:29Z

o-smirnov
Apr 29, 2021
Maintainer Author

Anyway, I made it so: ratt-ru/scabha#5 #712

In the end the fat burned off, and the logic distilled to some checks (for 2 and 3, above) inside Step.run(). As for (1), it's implemented as a prevalidate() method, but this is called by run() automatically if needed, and only once.

0 replies

SpheMakh · 2021-05-05T09:25:26Z

SpheMakh
May 5, 2021
Maintainer

I ran the stimela/tests/test_cubical_recipe.py file and saw that you don't have "{ }" around values that are references, in fact when I tried it I got this error

2021-05-05 11:12:25 STIMELA.cubical-image ERROR: error in recipe definition: alias image refers to unknown step '{image'

which indicates that this type of substitution is not recognised.

1 reply

o-smirnov May 5, 2021
Maintainer Author

Please try these branches: https://github.com/o-smirnov/Stimela/tree/configuratt-oms and https://github.com/o-smirnov/scabha/tree/configuratt-oms, and they work, approve the PRs. Also, may need this: pip install git+https://github.com/omry/omegaconf.git. The master version is much faster, and has some bugs fixed.

SpheMakh · 2021-05-06T11:23:03Z

SpheMakh
May 6, 2021
Maintainer

On the substitutions. I think they should be explicit, and should only be relative within the current context. For example, here

https://github.com/ratt-ru/Stimela/blob/bf504702f7b8f5d3ea4c09aea3b305850943cc35/stimela/tests/test_recipe.yml#L50-L52

it should be

aliases: 
   msname: steps.selfcal.ms 
   telescope: steps.makems.tel

Also, self should refer to the class that the attribute belongs to (in this case, self should be a Parameter instance of the parameters ms) so this

https://github.com/ratt-ru/Stimela/blob/bf504702f7b8f5d3ea4c09aea3b305850943cc35/stimela/tests/test_recipe.yml#L16-L18

should be

 ms: 
   dtype: MS 
   implicit: "{inputs.msname}"

This is more intuitive and doesn't require the user to look up what self is, it can be treated as it would in python.

0 replies

o-smirnov · 2021-05-06T11:41:45Z

o-smirnov
May 6, 2021
Maintainer Author

On the substitutions. I think they should be explicit, and should only be relative within the current context

Well, an alias is not a substitution, it's a slightly more special beast. An alias can only ever refer to a step's parameter. So in steps.selfcal.ms, steps. is completely redundant and thus in my view unnecessary. The lack of {} should cue the user that this is not a "substitution" as such.

Also, self should refer to the class that the attribute belongs to

Yeah I feel a little uneasy about self. Anyway there's also difference between OmegaConf DictConfigs and the dataclass objects, don't assume you can just {}-substitute anything anywhere.

Maybe we should do away with self entirely. Maybe {params.msname}? Again, noting the difference between inputs and outputs per se (which are parameter schemas), and specific parameter values. {params.msname} refers to a value. So params would be a namespace containing the current cargo's parameter values?

4 replies

SpheMakh May 6, 2021
Maintainer

Well, an alias is not a substitution, it's a slightly more special beast. An alias can only ever refer to a step's parameter. So in steps.selfcal.ms, steps. is completely redundant and thus in my view unnecessary. The lack of {} should cue the user that this is not a "substitution" as such.

I see.

Yeah I feel a little uneasy about self. Anyway there's also difference between OmegaConf DictConfigs and the dataclass objects, don't assume you can just {}-substitute anything anywhere.

Maybe we should do away with self entirely. Maybe {params.msname}? Again, noting the difference between inputs and outputs per se (which are parameter schemas), and specific parameter values. {params.msname} refers to a value. So params would be a namespace containing the current cargo's parameter values?

It should be possible to cross-reference to a configs section and children via {}-substitutions without having some global namespace like self or params that has all the parameters. We want inputs.foo, outputs.foo, steps.foo.bar

SpheMakh May 6, 2021
Maintainer

Let me implement something and get back to you.

o-smirnov May 6, 2021
Maintainer Author

The substitution dictionary is assembled here: https://github.com/ratt-ru/Stimela/blob/configuratt/stimela/kitchen/recipe.py#L515

o-smirnov May 6, 2021
Maintainer Author

Also please note #713. Logfile names also need substitutions...

o-smirnov · 2021-05-06T18:33:34Z

o-smirnov
May 6, 2021
Maintainer Author

OK, no plan survives contact with the enemy, but check it out @SpheMakh, I now have a working for-loop actually running wsclean over a bunch of MSs.

In the process, I realized that

It is handy to be able to define multiple recipes in a YML file, and select which one to run from the command line
It is handy to change Stimela settings from that same YML file
There are actually FOUR kinds of substitutions going on, and all have their uses. I illustrate this with comments below. Here's my recipe.yml, which I run with stimela exec recipe.yml -r img1gc.

## this augments the standard 'cabs' config section
cabs:
  wsclean:
    # disable container mode
    image: ''
    ## uncomment this for debugging, to echo command line instead of running wsclean
    # command: echo

## this augments the standard 'opts' config section to tweak logging settings
opts:
  log:
#    dir: logs/log-{datetime}
    dir: logs
    nest: 1

## this augments the standard 'vars' config section, which is completely free-form
## (in this case supplying a variable for use later on)
vars:
  scans: [ '04', '06', '08', '11', '13', '15', '18', '20', '22', '24', '27', '29', '31', '34', 
           '36', '38', '41', '43', '45', '48', '50', '52', '55', '57', '59', '61', '64', '65' ]


## 'clean_image' is not a standard config section, therefore stimela exec treats this as a recipe definition
clean_image:
  name: clean_image
  info: "runs a single deconvolution step"

  aliases:
    ## This a reference of the FIRST kind: an alias is a "hard" link between an inner step parameter and an outer recipe parameter.
    ## this defines an input named "column" that has the same schema as steps.clean.column, and its value is automatically propagated to the 'clean' step
    column: [clean.column]
    
    ## note that all other step parameters (which are not set explicitly) are automatically aliased as if you said
    # clean_niter: [clean.niter]

  defaults:
    column: CORRECTED_DATA
    clean_niter: 50000
    clean_fit_spectral_pol: 4
    clean_nchan: 8

  steps:
    clean: 
      cab: wsclean
      params:
        weight: 'briggs 0'
        size: 10000 
        scale: 0.8asec
        mgain: 0.9
        join_channels: true
        padding: 1.3
        nwlayers_factor: 3
        fit_beam: true
        elliptical_beam: true


## 'img1gc' is not a standard config section, therefore stimela exec treats this as a recipe definition
img1gc:
  name: "img1gc"
  info: "makes 1GC images for all scans"

  for_loop:
    var: scan
    ## This a reference of the SECOND kind: an OmegaConf ${}-substitution. This is handled by OmegaConf at load time (so is kind of "static")
    ## This sets the valuye of "over" to be an exact copy of "vars.scans" above (i.e. a list!)
    over: ${vars.scans}

  inputs:
    scan:
      dtype: str

  steps:
    image: 
      recipe: 
        ## This is a reference of the THIRD kind: a _use clause. This _copies_ the definition of 'single_image' above into the section
        _use: clean_image
        # This is just here to illustrate that the copied-over definition can be augmented by adding fields
        info: "my modified version of single_image"

      params:
        ## These are references of the FOURTH kind: {}-substitutions. This is updated by Stimela at run time (so is fully dynamic!)
        ## The substitution is evaluated as a "dumb" string operation
        clean_ms: '../msdir/1608538564_sdp_l0-Jupiter-scan{recipe.scan}.ms'
        clean_prefix: 'img1/im1-s{recipe.scan}'
        column: DATA
        # clean_niter: 100 # 50000
        # clean_fit_spectral_pol: 2 # 4
        # clean_nchan: 2 # 8


  # another reference of the fourth kind (NB: "params" will probably be renamed to "recipe"?)
  logname: '{name}.scan{params.scan}'

20 replies

SpheMakh May 7, 2021
Maintainer

https://gist.github.com/SpheMakh/ac8a45deeeaa2ff941635633e4a2474e

o-smirnov May 7, 2021
Maintainer Author

Ok, the error makes perfect sense. You put the params section right under steps, so it's trying to interpret it as a "Step" object and refusing:

  steps: 
      makems:
          cab: simms
          params:
              msname: vars.msname
              synthesis: 0.128
      params:
        ms: "{steps.makems.ms}"

You need to put params under plotobs:

      plotobs:
          params:
            ms: "{steps.makems.ms}"
          recipe:
              name: "Plotobs"

SpheMakh May 10, 2021
Maintainer

One more error. I'm trying to reference a directory in line 83, the substitution doesn't seem to work leading to a validation error.

 71 recipe:
 72   name: "demo recipe"
 73   info: 'top level recipe definition'
 74   dirs:
 75     input: input
 76     output: output
 77     log: logs
 78   aliases:
 79     telescope: makems.tel
 80     msname: makems.name
 81   defaults:
 82     telescope: kat-7
 83     msname: "{dirs.input}/myms.ms"
 84   steps:
 85       makems:
 86         cab: simms
 87         params:
 88           synthesis: 0.128

o-smirnov May 10, 2021
Maintainer Author

At this line here: https://github.com/ratt-ru/Stimela/blob/configuratt/stimela/kitchen/recipe.py#L574

Add

subst1.vars = self.vars
subst1.dirs = self.dirs

and see if that works?

SpheMakh May 10, 2021
Maintainer

It works, but only I after changed the recipe to be (see lines 75, 79, and 89)

71 recipe:
 72   name: "demo recipe"
 73   info: 'top level recipe definition'
 74   dirs:
 75     input: input
 76     output: output
 77     log: logs
 78   vars:
 79     msname: myms.ms
 80   aliases:
 81     telescope: makems.tel
 82   defaults:
 83     telescope: kat-7
 84   steps:
 85       makems:
 86         cab: simms
 87         params:
 88           synthesis: 0.128
 89           name: "{dirs.input}/{vars.msname}"
 90       plotobs:

This is fine, but it would be more convenient to set special dirs globally, something like this

dirs: 
  input:
    recipe_indir: true

or

 recipe:
   name: "demo recipe"
   info: 'top level recipe definition'
     dirs:
     input: input
     output: output
     log: logs
  recipe_indir: dirs.input
  recipe_outdir: dirs.output

o-smirnov · 2021-05-08T14:56:45Z

o-smirnov
May 8, 2021
Maintainer Author

@SpheMakh please sync up both Stimela and scabha, I pushed some minor tweaks I needed to do for Jupiter. In particular, I added the -s option to exec to run a subset of steps.

Here's my latest Jupiter recipe: https://gist.github.com/o-smirnov/d39775215a9674595a37d4d5e3f9d889

I realized that single-step subrecipes (such as what I had initially for e.g. making an image) were an unnecessary complication. Much simpler to define standard step templates under lib.steps, and insert them into the recipe via _use. See my recipe for how nicely this works for wsclean (there's even "inheritance" going on, so that I only need to define common wsclean settings in one place).

1 reply

SpheMakh May 10, 2021
Maintainer

This is very nice! https://gist.github.com/o-smirnov/d39775215a9674595a37d4d5e3f9d889#file-recipe2-yml-L158

o-smirnov · 2021-07-05T12:01:22Z

o-smirnov
Jul 5, 2021
Maintainer Author

@SpheMakh please update both scabha and Stimela, I have cleaned up and streamlined the substitution mechanism quite a bit.

Also, after much circling around the issue, I've decided to propose the concept of recipe variables (in addition to, and similar to, inputs and outputs, which are parameters. The direct analogy is a function's parameters vs. local variables.) I feel the concept has been knocking on the door anyway. For example, in for_loop kind of recipe, the loop index is a variable. Input/output dirs are essentially variables (though some recipes could choose to treat them as parameters, too). So, I propose an assign section which unifies all this.

The recipe writer doesn't need to use variables, it's an optional feature. The point of them is that they allow for cleaner and more flexible substitutions, and let you do things like set up naming conventions in one place. Like so -- read the comments in the YML:

opts:
  log:
    dir: logs-{config.run.datetime}    # note this is new -- logfile names can do substitutions!
    nest: 3
    symlink: logs                  # this is also new. Akin to the caracal logs symlink, which I've grown to like a lot


cubical_image:
  name: "cubical_image"
  info: 'does one step of cubical, followed by one step of imaging'

  # an "assign" section just assigns a bunch of variables, which then show up in the "recipe" substitution namespace
  # (along with a recipe's parameters -- if there's a name clash between variables and parameters, an error is thrown) 
  assign:
    # nothing special about "dir", it's just a name, but the recipe can now use a {recipe.dir.out} in substitutions
    dir: 
      out: 'output'
    x: 1
    y: 2
    # This recipe is a for-loop over the variable "loop" -- so this forms up another variable based on the current value of "loop".
    # Just showing off substitutions here. But this is handy for use in filenames.
    # BTW I noticed "-" is a legit character in attribute names, so just playing with it here.
    loop-name: "s{recipe.loop:02d}"
    # Next, we form up a variable based on:
    #   * dir.out defined above
    #   * loop-name defined above
    #   * "{info.suffix}" -- "info" is a namespace with info about the current recipe step (for "image-1", suffix is "1")
    # The point of this construct is that I can now have steps like "image-1", "image-a", "image-deep", and
    # use the same filename convention throughout, without changing any step parameters
    image-prefix: "{recipe.dir.out}/im{info.suffix}-{recipe.loop-name}/im{info.suffix}-{recipe.loop-name}"
    # "log" is the only variable that gets special treatment. Assignments to "log" get propagated into opts.log, i.e. affect logfile naming
    # the below will change log file names as "loop-name" changes, causing every iteration of the "scan" for-loop to produce
    # its own logfiles 
    log:
      dir: logs-{config.run.datetime}
      # "{info.fqname}" is the fully-qualified name of the current step, i.e. "recipe_name.step_name"
      name: log-{recipe.loop-name}-{info.fqname}.txt

  for_loop:  # this makes a for-loop, this is as before. "loop" becomes a variable
    var: loop
    over: [1,2,3]

  aliases:
    ms: [calibrate.ms, image-1.ms]

  steps: 
    calibrate: 
        cab: cubical
    image-1:
        cab: myclean
        params:
          prefix: "{recipe.image-prefix}"
        # just to demonstrate that a step can update a recipe's variables. This variable is not used anywhere here though.
        assign:
          foo: "bar"
    image-deep:
        cab: myclean
        params:
          # note that {recipe.image-prefix} will have changed, as the "scan" and step name changes
          prefix: "{recipe.image-prefix}"
          # ...
        assign:
          foo: "bar"

Note also that substitution namespaces are not quite the same thing as the YML config namespace anymore, since their contents change dynamically as you run the recipe. At the moment the main one is formed up here: https://github.com/ratt-ru/Stimela/blob/configuratt/stimela/kitchen/recipe.py#L594. The following top-level names are currently used (but I'm not too attached to the names, so we can discuss):

config contains the entire configuration dict
recipe contains the recipe's parameters and variables
info contains some naming info about the current step (name, fully-qualified name, name suffix if any, etc.)
current contains the parameters of the current step
previous contains the parameters of the previous step
steps.step_label contains the parameters of all (previously defined) steps, using the step label as the attribute

0 replies

bennahugo · 2021-07-05T13:18:29Z

bennahugo
Jul 5, 2021
Maintainer

Just note that pre validation must allow a file not to exist As we have currently it must be possible to do pattern substitution for: beamfiles: myfits-{reim}-{corr}.fits and for recipe strings containing a mix of files, columns and tags as we currently have for cubical and will have for quartical. So you would want the capacity to specify raw strings which should not be substituted automatically as discussed above.

…

On Mon, Jul 5, 2021 at 2:01 PM Oleg Smirnov ***@***.***> wrote: @SpheMakh <https://github.com/SpheMakh> please update both scabha and Stimela, I have cleaned up and streamlined the substitution mechanism quite a bit. Also, after much circling around the issue, I've decided to propose the concept of recipe *variables* (in addition to, and similar to, inputs and outputs, which are *parameters*. The direct analogy is a function's parameters vs. local variables.) I feel the concept has been knocking on the door anyway. For example, in for_loop kind of recipe, the loop index is a variable. Input/output dirs are essentially variables (though some recipes could choose to treat them as parameters, too). So, I propose an assign section which unifies all this. The recipe writer doesn't need to use variables, it's an optional feature. The point of them is that they allow for cleaner and more flexible substitutions, and let you do things like set up naming conventions in one place. Like so -- read the comments in the YML: opts: log: dir: logs-{config.run.datetime} # note this is new -- logfile names can do substitutions! nest: 3 symlink: logs # this is also new. Akin to the caracal logs symlink, which I've grown to like a lot cubical_image: name: "cubical_image" info: 'does one step of cubical, followed by one step of imaging' # an "assign" section just assigns a bunch of variables, which then show up in the "recipe" substitution namespace # (along with a recipe's parameters -- if there's a name clash between variables and parameters, an error is thrown) assign: # nothing special about "dir", it's just a name, but the recipe can now use a {recipe.dir.out} in substitutions dir: out: 'output' x: 1 y: 2 # This recipe is a for-loop over the variable "loop" -- so this forms up another variable based on the current value of "loop". # Just showing off substitutions here. But this is handy for use in filenames. # BTW I noticed "-" is a legit character in attribute names, so just playing with it here. loop-name: "s{recipe.loop:02d}" # Next, we form up a variable based on: # * dir.out defined above # * loop-name defined above # * "{info.suffix}" -- "info" is a namespace with info about the current recipe step (for "image-1", suffix is "1") # The point of this construct is that I can now have steps like "image-1", "image-a", "image-deep", and # use the same filename convention throughout, without changing any step parameters image-prefix: "{recipe.dir.out}/im{info.suffix}-{recipe.loop-name}/im{info.suffix}-{recipe.loop-name}" # "log" is the only variable that gets special treatment. Assignments to "log" get propagated into opts.log, i.e. affect logfile naming # the below will change log file names as "loop-name" changes, causing every iteration of the "scan" for-loop to produce # its own logfiles log: dir: logs-{config.run.datetime} # "{info.fqname}" is the fully-qualified name of the current step, i.e. "recipe_name.step_name" name: log-{recipe.loop-name}-{info.fqname}.txt for_loop: # this makes a for-loop, this is as before. "loop" becomes a variable var: loop over: [1,2,3] aliases: ms: [calibrate.ms, image-1.ms] steps: calibrate: cab: cubical image-1: cab: myclean params: prefix: "{recipe.image-prefix}" # just to demonstrate that a step can update a recipe's variables. This variable is not used anywhere here though. assign: foo: "bar" image-deep: cab: myclean params: # note that {recipe.image-prefix} will have changed, as the "scan" and step name changes prefix: "{recipe.image-prefix}" # ... assign: foo: "bar" Note also that substitution namespaces are *not quite* the same thing as the YML config namespace anymore, since their contents change dynamically as you run the recipe. At the moment the main one is formed up here: https://github.com/ratt-ru/Stimela/blob/configuratt/stimela/kitchen/recipe.py#L594. The following top-level names are currently used (but I'm not too attached to the names, so we can discuss): - config contains the entire configuration dict - recipe contains the recipe's parameters and variables - info contains some naming info about the current step (name, fully-qualified name, name suffix if any, etc.) - current contains the parameters of the current step - previous contains the parameters of the previous step - steps.step_label contains the parameters of all (previously defined) steps, using the step label as the attribute — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#710 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4RE6VQVAZJYS4PKIK6R7LTWGNJ3ANCNFSM43U4DM3A> .

-- -- Benjamin Hugo PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town

1 reply

o-smirnov Jul 5, 2021
Maintainer Author

Just note that pre validation must allow a file not to exist

Yep, there's a must_exist property in the schema that governs this: https://github.com/ratt-ru/scabha/blob/configuratt/scabha/cargo.py#L105

So you would want the capacity to specify raw strings which should not be substituted automatically

Well, as per standard Python practice, {{ and }} substitute to { and }.

If that doesn't cover your needs, I'm not above adding an optional nosubst flag to the schema.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter validation in the presence of {}-substitutions #710

{{title}}

Replies: 8 comments 27 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Parameter validation in the presence of {}-substitutions #710

o-smirnov Apr 27, 2021 Maintainer

Replies: 8 comments · 27 replies

o-smirnov Apr 29, 2021 Maintainer Author

SpheMakh May 5, 2021 Maintainer

o-smirnov May 5, 2021 Maintainer Author

SpheMakh May 6, 2021 Maintainer

o-smirnov May 6, 2021 Maintainer Author

SpheMakh May 6, 2021 Maintainer

SpheMakh May 6, 2021 Maintainer

o-smirnov May 6, 2021 Maintainer Author

o-smirnov May 6, 2021 Maintainer Author

o-smirnov May 6, 2021 Maintainer Author

SpheMakh May 7, 2021 Maintainer

o-smirnov May 7, 2021 Maintainer Author

SpheMakh May 10, 2021 Maintainer

o-smirnov May 10, 2021 Maintainer Author

SpheMakh May 10, 2021 Maintainer

o-smirnov May 8, 2021 Maintainer Author

SpheMakh May 10, 2021 Maintainer

o-smirnov Jul 5, 2021 Maintainer Author

bennahugo Jul 5, 2021 Maintainer

o-smirnov Jul 5, 2021 Maintainer Author

o-smirnov
Apr 27, 2021
Maintainer

Replies: 8 comments 27 replies

o-smirnov
Apr 29, 2021
Maintainer Author

SpheMakh
May 5, 2021
Maintainer

o-smirnov May 5, 2021
Maintainer Author

SpheMakh
May 6, 2021
Maintainer

o-smirnov
May 6, 2021
Maintainer Author

SpheMakh May 6, 2021
Maintainer

SpheMakh May 6, 2021
Maintainer

o-smirnov May 6, 2021
Maintainer Author

o-smirnov May 6, 2021
Maintainer Author

o-smirnov
May 6, 2021
Maintainer Author

SpheMakh May 7, 2021
Maintainer

o-smirnov May 7, 2021
Maintainer Author

SpheMakh May 10, 2021
Maintainer

o-smirnov May 10, 2021
Maintainer Author

SpheMakh May 10, 2021
Maintainer

o-smirnov
May 8, 2021
Maintainer Author

SpheMakh May 10, 2021
Maintainer

o-smirnov
Jul 5, 2021
Maintainer Author

bennahugo
Jul 5, 2021
Maintainer

o-smirnov Jul 5, 2021
Maintainer Author