Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] refactor config field validation #788

Closed
wants to merge 4 commits into from

Conversation

alicewriteswrongs
Copy link
Contributor

What are the relevant tickets?

none

What's this PR do?

This refactors the config field validation so that instead of having just one type of config field on which we validate all of the fields, we instead declare the various possible config field types (i.e. markdown, string, boolean, etc) separately, so that for each of them we can specify which props are required for them.

This is useful because for instance the relation field must have the collection prop defined. If the config field is defined as simply one type, all of the props which aren't present on every variant must be listed as optional (required=False). This works, but it leaves us with no way to validate that if a config author adds, for instance, a relation field they must define what collection it is supposed to pull from. Pulling these things out into separate objects lets us do that.

It is a bit funky though! I feel I just learned more about yaml than I really want to know. It may turn out for this reason that we don't want to make this change, but I thought I would propose it to see what everyone thinks.

How should this be manually tested?

Make some changes to the site configs that will exercise the new validation thing (like removing a collection from a relation field should now invalidate the config, whereas on the main branch that won't cause an error).

Copy link
Contributor

@gsidebo gsidebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alicewriteswrongs This is great! As far as defining an validating different types of config fields, this is a huge improvement, but the messaging is difficult to parse. If you have a single invalid config field, you'll be shown a message like this:

yamale.yamale_error.YamaleError: Error validating data with schema '/src/websites/config_schema/site-config-schema.yml'
	collections.0.category: Required field missing
	collections.0.fields.1.minimal: 'abcd' is not a bool.
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.fields: Required field missing
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.minimal: Unexpected element
	collections.0.fields.1.collection: Required field missing
	collections.0.fields.1.minimal: Unexpected element

It's still clear enough that the offending item is a value in the second field in the first collection. Not sure if there's a clean solution that the yaml validation library could provide. Maybe we can prefix those lines with the field type? Then those messages would look more like:

yamale.yamale_error.YamaleError: Error validating data with schema '/src/websites/config_schema/site-config-schema.yml'
	collections.0.category: Required field missing
	(markdown_config_field) collections.0.fields.1.minimal: 'abcd' is not a bool.
	(file_config_field) collections.0.fields.1.minimal: Unexpected element

@alicewriteswrongs
Copy link
Contributor Author

mmm yeah it would be good to have the error message be more readable...I'll play around with that a bit

@alicewriteswrongs
Copy link
Contributor Author

ok I came up with I think a better error message setup, basically I'm catching the YamaleError and then doing some fiddling with the results array of errors to dedupe errors and error locations. Then I can print the path at which the error occurred, the error messages at that location, and then also print the context. Here's how it looks:

Screen Shot 2021-11-12 at 11 04 01 AM

there is still some extraneous info, but it's much easier to see where in the config the error is happening and, based on that, go fix it I think. What do you think?

@alicewriteswrongs
Copy link
Contributor Author

@gsidebo wondering if you have thoughts / feedback about the above :)

@alicewriteswrongs alicewriteswrongs force-pushed the ap/refactor-config-field-validation branch from b215af6 to 1f5bb9f Compare November 17, 2021 00:26
@codecov-commenter
Copy link

Codecov Report

Merging #788 (1f5bb9f) into master (eab7fc3) will increase coverage by 3.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #788      +/-   ##
==========================================
+ Coverage   91.67%   94.75%   +3.07%     
==========================================
  Files         192       96      -96     
  Lines        7245     2724    -4521     
  Branches     1140      589     -551     
==========================================
- Hits         6642     2581    -4061     
+ Misses        524      141     -383     
+ Partials       79        2      -77     
Impacted Files Coverage Δ
websites/config_schema/api.py
users/factories.py
...ev/management/commands/generate_example_configs.py
websites/messages.py
gdrive_sync/factories.py
websites/urls.py
videos/management/commands/youtube_tokens.py
websites/__init__.py
main/permissions.py
main/utils.py
... and 86 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eab7fc3...1f5bb9f. Read the comment docs.

@alicewriteswrongs
Copy link
Contributor Author

I pushed up the hacky formatting code I wrote to get that output, if we want to go forward with it I'll clean it up and whatnot

Copy link
Contributor

@gsidebo gsidebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the output I get when I put "minimal": "abcdef" in a markdown widget field config item:

Validation failed!

Error validating data with '/src/websites/config_schema/site-config-schema.yml'

Found errors:
	Required field missing
At path:
	collections.0.fields.0.fields
In context:
	{
	    "label": "Body",
	    "minimal": "abcdef",
	    "name": "body",
	    "widget": "markdown"
	}

Found errors:
	Unexpected element
	'abcdef' is not a bool.
At path:
	collections.0.fields.0.minimal
In context:
	{
	    "label": "Body",
	    "minimal": "abcdef",
	    "name": "body",
	    "widget": "markdown"
	}

Found errors:
	Required field missing
At path:
	collections.0.fields.0.collection
In context:
	{
	    "label": "Body",
	    "minimal": "abcdef",
	    "name": "body",
	    "widget": "markdown"
	}

The deduping is nice, but I think this is still too confusing. It's verbose, and more than that it still looks like I have to make 3 changes to this config item to get it working. The more I think about this, the more I think we need some logic that is aware of the fact that the field that failed validation can match one of several schemas. To me, the core problem with the error message is that it's "flat". Those 3 errors are presented one after the other at the top-level, and it's up to the reader to figure out that it has to do with a config item that could match one of several schemas, so some of those errors can be ignored.

This language might be clunky but I think this sort of message would make the error more clear:

Error validating data ...
  root-url-path: Required field missing
  collections.0.fields.0: Failed to match one of multiple possible schemas. At least one of the errors below must be corrected.
    .minimal: Unexpected element
    .minimal: 'abcdef' is not a bool
    .collection: Required field missing
    .fields: Required field missing

I tinkered with Yamale a bit, and found that you can inspect the contents of the schema. I can send you some scratch code I ran in a shell if you want to look into this. The rough validation logic might look something like this:

  1. Catch validation error

  2. Introspect validation errors and build a map of paths that have multiple errors for the same element. For example...

    This error message...

    collections.0.category: Required field missing
    collections.0.fields.1.minimal: 'abcd' is not a bool.
    collections.0.fields.1.minimal: Unexpected element
    collections.0.fields.1.fields: Required field missing
    collections.0.fields.1.collection: Required field missing
    collections.0.fields.2.derp: Some other error
    

    ...would produce something like this...

    {
        "collections.0.fields.1": [
            ("minimal", "'abcd' is not a bool."),
            ("minimal", "Unexpected element"),
            ("fields", "Required field missing"),
            ("collections", "Required field missing"),
        ]
    }
  3. Call an API method that takes a path (collections.0.fields.1), dives through the schema to find the definition for the item at that path, and returns True if the given config item is an any() type (this method could be smart enough to ignore the .1 from the end of the path since that just indicates a position in a list)

  4. If that API method indicates that we're dealing with an any() type schema, replace the error text for all of those lines with the more helpful error message

I know that's a decent chunk of work, and could end up being bug-prone. As this stands, though, I just see too much potential confusion over that kind of "flat" validation message

@alicewriteswrongs
Copy link
Contributor Author

I opened an issue (#973) to talk about this more - I think, in short, that the Yamale tool is limited in some key ways and in the future we'll likely want to move to using a more general schema like JSONSchema to describe the shape of our data, which will let us use a wider variety of validation tools, generate typedefs for TypeScript automatically, and so on. I think while there is some value in creating better error messages for Yamale, our time would be better spent moving to a more widely supported schema format, so I'm going to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants