-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ✨ put required fields into constants #961
Conversation
Co-authored-by: Luke W. Johnston <[email protected]>
class RequiredFieldType(str, Enum): | ||
"""A class enumerating allowed types for required fields.""" | ||
|
||
str = "str" | ||
list = "list" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to be very clear about what types we and any calling code expect to deal with. If we find other required types when we look into lower-level properties, we can add them.
|
||
RESOURCE_REQUIRED_FIELDS = { | ||
"name": RequiredFieldType.str, | ||
"path": RequiredFieldType.str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the data
field here as well since data
OR path
is required according to the DataPackage standard?
I guess this is a question of the scope of this package (checks
): do we want it to be usable for everyone who uses/follows the DataPackage standard or only useable within the Seedcase Project specifically (where we won't allow inline data)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly, I was also asking myself this!
I think it would be nice to include data
here for checks
to be more general.
Drawbacks:
- A resource properties with no
data
and nopath
would trigger 2 required errors, even though setting just one of these is enough. Other OR requirements behave the same way, so maybe that's okay. It's what we have when we get rid of nested errors. - We would need to define
RESOURCE_SPROUT_REQUIRED_FIELDS
in a way that excludesdata
. It would be nice if we could use set difference, but sadly that doesn't quite work. So it would look a bit uglier than how it is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, again, it's all about the scope of this package. If we want it to be usable for others that follow the DataPackage (I think that was the last thing I heard about this from @lwjohnst86 ), then we should include it in checks
and remove it again in RESOURCE_SPROUT_REQUIRED_FIELDS
About the drawbacks:
- Could we add something like "either
data
orpath
is required" to thedata
andpath
error messages, maybe? Or will that be too messy? - I guess it's not going to be too bad, since it's not too complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm now also leaning towards adding it!
My only problem with trying to account for the data
OR path
situation is that then, to be consistent, we would need to account for all cases like this. I.e. for all errors that are nested under an anyOf
or oneOf
validator in jsonschema
's output. We can locate these errors, but then I'm wondering what to add to them to highlight that they are part of a group of alternatives.
Would an output like this be helpful or just confusing:
[
{
"message": "'data' is a required property",
"json_path": "$.data",
"validator": "required",
"group": "oneOf-1",
},
{
"message": "'path' is a required property",
"json_path": "$.path",
"validator": "required",
"group": "oneOf-1",
},
]
Here each error can be marked as belonging to a group, so you (or any calling code) can recreate the relationships between errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@signekb I added data
and excluded it in Sprout.
As for improving how we handle these OR-type errors, I think we should let the discussion happen here but then make a new issue for the changes (if any)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, @martonvago !
I don't have the full overview of the more complex error structures, but if it's not too much of a hassle, then maybe adding the "group" to indicate cases like this might be fine for now, as you have shown above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just linking the issue: #969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't take too much work to add the data
, then do it. But I also think it's completely reasonable to include in the design docs that we assume the datapackage only uses data stored as files, not within the json file itself. And if someone wants to do that, they figure it out on their own.
Co-authored-by: Signe Kirk Brødbæk <[email protected]>
# Sprout-specific required fields and their types | ||
|
||
PACKAGE_SPROUT_REQUIRED_FIELDS = ( | ||
omit_keys(checks.PACKAGE_REQUIRED_FIELDS, ["resources"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to make the structures of PACKAGE_SPROUT_REQUIRED_FIELDS
and RESOURCE_SPROUT_REQUIRED_FIELDS
consistent.
As there is only one checks.PACKAGE_REQUIRED_FIELDS
(resources
), this is perhaps unnecessary. Happy to remove if you think so!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it definitely makes it cleaner to remove the logic out in a helper function and I also like the alignment between package and resource required fields. But it also seems like "a lot" for just this.
Could an alternative be to do a
PACKAGE_SPROUT_REQUIRED_FIELDS.pop("resources")
and
RESOURCE_SPROUT_REQUIRED_FIELDS.pop("data")
after each has been defined instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can have that! (It would work because when we take the union of the dictionaries, we create a copy, so we don't end up modifying the original constant accidentally.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But, I mean, either works. I would also be fine with keeping what you did originally.
"created": RequiredFieldType.str, | ||
} | ||
) | ||
PACKAGE_SPROUT_REQUIRED_FIELDS.pop("resources", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The None
is for preventing a KeyError
if the key is missing. We don't expect this, but 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice!
|
||
RESOURCE_REQUIRED_FIELDS = { | ||
"name": RequiredFieldType.str, | ||
"path": RequiredFieldType.str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't take too much work to add the data
, then do it. But I also think it's completely reasonable to include in the design docs that we assume the datapackage only uses data stored as files, not within the json file itself. And if someone wants to do that, they figure it out on their own.
Description
This PR creates some constants for organising how we keep track of required fields in Sprout and outside of Sprout (so in the
checks
package).This is useful particularly when we check if required fields are present and not blank in Sprout.
This PR needs an in-depth review.
Checklist
just run-all