Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ✨ put required fields into constants #961

Merged
merged 19 commits into from
Jan 20, 2025

Conversation

martonvago
Copy link
Contributor

@martonvago martonvago commented Jan 10, 2025

Description

This PR creates some constants for organising how we keep track of required fields in Sprout and outside of Sprout (so in the checks package).

This is useful particularly when we check if required fields are present and not blank in Sprout.

This PR needs an in-depth review.

Checklist

  • Added or updated tests
  • Updated documentation
  • Ran just run-all

@martonvago martonvago self-assigned this Jan 10, 2025
Comment on lines +6 to +10
class RequiredFieldType(str, Enum):
"""A class enumerating allowed types for required fields."""

str = "str"
list = "list"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to be very clear about what types we and any calling code expect to deal with. If we find other required types when we look into lower-level properties, we can add them.

@martonvago martonvago changed the title feat: ✨ put required fields into constants feat: ✨ put required fields into constants [1/7] Jan 10, 2025
@martonvago martonvago marked this pull request as ready for review January 10, 2025 23:20
@martonvago martonvago requested a review from a team as a code owner January 10, 2025 23:20
seedcase_sprout/core/checks/__init__.py Outdated Show resolved Hide resolved

RESOURCE_REQUIRED_FIELDS = {
"name": RequiredFieldType.str,
"path": RequiredFieldType.str,
Copy link
Member

@signekb signekb Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the data field here as well since data OR path is required according to the DataPackage standard?

I guess this is a question of the scope of this package (checks): do we want it to be usable for everyone who uses/follows the DataPackage standard or only useable within the Seedcase Project specifically (where we won't allow inline data)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly, I was also asking myself this!

I think it would be nice to include data here for checks to be more general.

Drawbacks:

  • A resource properties with no data and no path would trigger 2 required errors, even though setting just one of these is enough. Other OR requirements behave the same way, so maybe that's okay. It's what we have when we get rid of nested errors.
  • We would need to define RESOURCE_SPROUT_REQUIRED_FIELDS in a way that excludes data. It would be nice if we could use set difference, but sadly that doesn't quite work. So it would look a bit uglier than how it is now.

Copy link
Member

@signekb signekb Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, again, it's all about the scope of this package. If we want it to be usable for others that follow the DataPackage (I think that was the last thing I heard about this from @lwjohnst86 ), then we should include it in checks and remove it again in RESOURCE_SPROUT_REQUIRED_FIELDS

About the drawbacks:

  • Could we add something like "either data or path is required" to the data and path error messages, maybe? Or will that be too messy?
  • I guess it's not going to be too bad, since it's not too complex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm now also leaning towards adding it!

My only problem with trying to account for the data OR path situation is that then, to be consistent, we would need to account for all cases like this. I.e. for all errors that are nested under an anyOf or oneOf validator in jsonschema's output. We can locate these errors, but then I'm wondering what to add to them to highlight that they are part of a group of alternatives.

Would an output like this be helpful or just confusing:

[
    {
        "message": "'data' is a required property",
        "json_path": "$.data",
        "validator": "required",
        "group": "oneOf-1",
    },
    {
        "message": "'path' is a required property",
        "json_path": "$.path",
        "validator": "required",
        "group": "oneOf-1",
    },
]

Here each error can be marked as belonging to a group, so you (or any calling code) can recreate the relationships between errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@signekb I added data and excluded it in Sprout.
As for improving how we handle these OR-type errors, I think we should let the discussion happen here but then make a new issue for the changes (if any)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, @martonvago !
I don't have the full overview of the more complex error structures, but if it's not too much of a hassle, then maybe adding the "group" to indicate cases like this might be fine for now, as you have shown above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just linking the issue: #969

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't take too much work to add the data, then do it. But I also think it's completely reasonable to include in the design docs that we assume the datapackage only uses data stored as files, not within the json file itself. And if someone wants to do that, they figure it out on their own.

# Sprout-specific required fields and their types

PACKAGE_SPROUT_REQUIRED_FIELDS = (
omit_keys(checks.PACKAGE_REQUIRED_FIELDS, ["resources"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to make the structures of PACKAGE_SPROUT_REQUIRED_FIELDS and RESOURCE_SPROUT_REQUIRED_FIELDS consistent.
As there is only one checks.PACKAGE_REQUIRED_FIELDS (resources), this is perhaps unnecessary. Happy to remove if you think so!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it definitely makes it cleaner to remove the logic out in a helper function and I also like the alignment between package and resource required fields. But it also seems like "a lot" for just this.

Could an alternative be to do a

PACKAGE_SPROUT_REQUIRED_FIELDS.pop("resources")

and

RESOURCE_SPROUT_REQUIRED_FIELDS.pop("data") 

after each has been defined instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can have that! (It would work because when we take the union of the dictionaries, we create a copy, so we don't end up modifying the original constant accidentally.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But, I mean, either works. I would also be fine with keeping what you did originally.

"created": RequiredFieldType.str,
}
)
PACKAGE_SPROUT_REQUIRED_FIELDS.pop("resources", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The None is for preventing a KeyError if the key is missing. We don't expect this, but 🤷‍♀️

@martonvago martonvago requested a review from signekb January 14, 2025 14:58
Base automatically changed from feat/check-error to main January 20, 2025 08:27
Copy link
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice!


RESOURCE_REQUIRED_FIELDS = {
"name": RequiredFieldType.str,
"path": RequiredFieldType.str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it doesn't take too much work to add the data, then do it. But I also think it's completely reasonable to include in the design docs that we assume the datapackage only uses data stored as files, not within the json file itself. And if someone wants to do that, they figure it out on their own.

@lwjohnst86 lwjohnst86 merged commit 34d8bad into main Jan 20, 2025
3 checks passed
@lwjohnst86 lwjohnst86 changed the title feat: ✨ put required fields into constants [1/7] feat: ✨ put required fields into constants Jan 20, 2025
@lwjohnst86 lwjohnst86 deleted the feat/sprout-checks-1-required-fields branch January 20, 2025 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants