Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could cut rules be more modular? #1105

Closed
wilsonmr opened this issue Feb 22, 2021 · 10 comments
Closed

Could cut rules be more modular? #1105

wilsonmr opened this issue Feb 22, 2021 · 10 comments

Comments

@wilsonmr
Copy link
Contributor

Not entirely detached from #818

Would it make more sense to store rules like

<rule key>:
 reason: <some reason>
 rule: <some rule>

where at the moment rule key would be either the dataset name or the process_type. Then the production rule which produces the Rule objects would just return the relevant rules for a single dataset

def produce_processed_rules_dataset(self, dataset, default_rules, explicit_rules, theoryid, defaults):
    dataset_rules = []
    ...
    keys = [str(dataset), dataset.commondata.process_type]
    for key in keys:
        if key in explicit_rules:
            dataset_rules.append(Rule(
                    initial_data=explicit_rules[str(dataset)],
                    defaults=defaults, # these should be renamed considering we will have many defaults
                    theory_parameters=theory_parameters,
                    loader=self.loader,
                ))
        elif key in default_rules:
            dataset_rules.append(Rule(
                    initial_data=default_rules[str(dataset)],
                    defaults=defaults,
                    theory_parameters=theory_parameters,
                    loader=self.loader,
                ))
        return dataset_rules

I think this would make the rules a bit more modular because I could redefine a subset of rules in explicit_rules but not have to redefine all of them.

Obvious trip ups are:

  • if any process_type == str(dataset)
  • if there is another way in which a rule is linked to a dataset other than by the name or process type
@Zaharid
Copy link
Contributor

Zaharid commented Feb 22, 2021

If we do this it would make sense to add another field called name or something. Or make rules a dictionary rather than a list. That said it was the case of rules that made me think that it was so much easier to just dump everything (note that some processing on dataset_rules would be required to write the defaults); and I probably stand by that in that case (sorry that I am saying seeming contradictory things).

@wilsonmr
Copy link
Contributor Author

Here I was saying I want to make the rules a dictionary, where the key can either be dataset name or process type however I see from the code that you probably want to be able to distinguish between the two. Also I guess there are some datasets which have multiple rules so clearly that doesn't work.

That said it was the case of rules that made me think that it was so much easier to just dump everything

The point here is that default_rules could (and probably would) be an exclusive list of rules so I agree we dump everything but explicit_rules could be None or just a subset however I now realise that since we have datasets with multiple rules there wouldn't be an easy way of deciding when to use the explicit rule or when to use (or not use) the default so I think my suggestion doesn't work.

@wilsonmr
Copy link
Contributor Author

Either way we might want to change dataset -> dataset_name or something which doesn't collide with production rule.

@Zaharid
Copy link
Contributor

Zaharid commented Feb 23, 2021

I am not sure I understand what the issue is. AFAICT we never expect the filter rules to act as namespaces; they are always parsed as rules and so the dataset key is fine.

@wilsonmr
Copy link
Contributor Author

The issue comes when you dump those rules to a lockfile and then try and use the lockfile as a runcard, then they are treated as namespaces

@Zaharid
Copy link
Contributor

Zaharid commented Feb 23, 2021

But why is that? Seems to me it should go trough the parsers.

@wilsonmr
Copy link
Contributor Author

wilsonmr commented Feb 23, 2021

X_recorded_spec_ has no parser and then it just appears as a namespace list which might have the key dataset the "hack" is to add a parser for the X_recorded_spec_

you could add the parser with the decorator..

@wilsonmr
Copy link
Contributor Author

and the rules themselves don't have a parser, they're read in as raw yaml and then passed straight to class

@siranipour
Copy link
Contributor

The dataset used in a rule definition is treated simply as a string in the filtering code. I think the best solution is to simply rename it. I'll take care of this in a new PR once we've figured out NNPDF/reportengine#41

@Zaharid
Copy link
Contributor

Zaharid commented Feb 23, 2021

I'd rather not TBH. I'd much prefer that the defaults are treated as a plain dictionary instead of a namespace (as done in the last commit in the reportengine PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants