-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api.main: preserve model parsing and fix kver
job
#477
Conversation
When submodel like `Checkout` or `Test` node is submitted, explicit validation takes place in `parse_node_obj` method. The method also converts request parameters to defined data types e.g. for storing kernel version, `version` and `patchlevel` fields will be converted to `int`. Losing all these conversion and storing object as it received will raise issues like `kver` job failure. In order to preserve type casting happened during validation in `parse_node_obj`, store it to `node`. If we try to store this node object directly, it will raise issue while JSON serialization of `Node.data` field in `_get_node_event_data`. Also, DB will not be able to map collection name from submodel type as the collection dictionary uses only one collection for all kind of nodes i.e. `node`. Fixing above issues will also fix `kver` job failure. Fixes: b785e19 ("api.main: use node endpoints for all type of Node subtypes") Signed-off-by: Jeny Sadadia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JenySadadia There are some things about this rationale that I don't understand and some others that I don't think are quite right. Let's see if we can both get to the same point:
When submodel like
Checkout
orTest
node is submitted, explicit validation takes place inparse_node_obj
Correct.
The method also converts request parameters to defined data types e.g. for storing kernel version,
version
andpatchlevel
fields will be converted toint
.
Not exactly.
node
in post_node()
should already be a Node
object and its fields are parsed from json into the Node
field datatypes. That has been like this since the beginning.
The call to parse_node_obj()
in post_node()
doesn't convert anything or changes anything in the object, since it's return value isn't used. The purpose of this call, as the comment explains, is to validate the data according to a specific model.
Losing all these conversion and storing object as it received will raise issues like
kver
job failure.
When/where is this conversion lost? and what do you mean with storing the object as received?
Just to clarify: in terms of node parsing and storage in the DB, parse_node_obj()
does nothing. The only thing it does is to validate an object using pydantic. The object is kept intact and the storage in the DB is unaffected by it. Actually, the way the object is received and stored is 100% the same as it always was, so I don't see how this code is now affecting the kver
jobs.
If we try to store this node object directly, it will raise issue while JSON serialization of
Node.data
field in_get_node_event_data
But _get_node_event_data()
doesn't do anything JSON-related. It simply builds a dict from some selected fields of a node. Do you mean that it fails on some nodes because they don't have a data
field? If that's the case, then the fix should be to check if the node has data
in this line. This is a real bug, the data
field is optional in a Node, so we can't assume that all nodes will have it.
Also, DB will not be able to map collection name from submodel type as the collection dictionary uses only one collection for all kind of nodes i.e.
node
But db.create()
is called on the original Node
object as received by post_node()
, not on a sub-type of Node
.
So, I'll try to do a walkthrough of the changes:
# Initially, <node> is received and parsed as a Node object
parsed_node = parse_node_obj(node) # <--- <node> is cast into <parsed_node> as a subtype of Node, containing the same contents
# Convert again to parent model `Node` in order to enable JSON
# serialization of nested models (such as `CheckoutData`) and
# map model against existing DB collection i.e. `Node`
node = Node(**parsed_node.dict()) # <-- <parsed_node> is cast again back to a Node object. This <node> and the one at the beginning of the function should be equivalent
So I think this patch shouldn't have any effect at all. Am I missing something?
@hardboprobot Thanks for the reviews and comments. I'll try to explain things in detail.
Please see below logs on POST
See the difference in Parsing is based on
In the previous implementation, the parsed node obj was not being stored and that's where the conversion necessary for further processing was lost.
It means using data received by the handler again i.e. using
In the previous implementation, a single
I also tried to check for adding JSON encoders in
After node parsing, it becomes an instance of
I hope the above explanations make things clearer. |
@JenySadadia ok, we both understand the same about what happens technically there. Now, there's probably part of the story that I'm missing:
Yes, all of that is intentional and that's the key to implement an easy mechanism to store and retrieve different types of Nodes with automatic model validation without complicating the rest of the underlying api implementation. What's the purpose of changing it?
Aaah, so the problem is specifically in the That's a lot of work for something that should be simpler but I think I get your point. Ok, so first of all, you can skip all this trouble if you sent the json object to the request with integer fields being integers instead of strings:
instead of
But this should be actually enforced by pydantic. That's why we're using it in the first place, if pydantic is silently converting data types what's the point of model validation? Pydantic v2 introduced a strict mode that does just that, but it's not in v1, inexplicably. Oh wait, I found an explanation:
But then, right at the top of the pydantic docs:
This is so, so python. Anyway, I'm not against this change, but I don't think it's the right solution. What I think we should do is to migrate to pydantic v2 (v1 is soon to be deprecated) and make the model validations strict so that when anyone pushes an object with the wrong data types (like in the example you posted) the API request fails and the user gets a clear message about the cause. We don't want the API doing hidden magic on us, we want the data to be precisely formatted, and enforced in every step (once the models are fixed). If you still want to go ahead with this change, then consider this an approval and we can always remember later to migrate to V2 and rework this if needed. PS.: Sorry about the rant. |
Model validations are also there for pydantic v1. It does validate
Yes, that seems like the right thing to do atm as the version upgrade will take some time and it needs to be compatible with @hardboprobot How does it sound? |
Of course, but not strict validations.
Incorrect. The current code does validate the request against a
What
As I said, if you want to merge this I won't stop it. I already gave all the explanations I could, and I tested all of them just to be sure. With all that on the table, it's up to you to decide what to do. |
I meant the validation that is being performed in the request handler automatically.
Above code is validating request model i.e
Nope, I am running the same pydantic version. Below is the test scenario:
Sent POST node request:
API logs:
|
We're talking about different cases. |
Thanks for the links @hardboprobot
This will also need some fixes in the pipeline as we are receiving |
Ah nice find! Ok so that's the way to go IMO. Sane data from the beginning, everything automatically checked, clear error messages, no surprises. |
Created kernelci/kernelci-core#2382 |
Closing this as an alternate solution got merged. |
Fixes kernelci/kernelci-pipeline#403
When submodel like
Checkout
orTest
node is submitted, explicit validation takes place inparse_node_obj
method. The method also converts request parameters to defined data types e.g. for storing kernel version,version
andpatchlevel
fields will be converted toint
.Losing all these conversion and storing object as it received will raise issues like
kver
job failure.In order to preserve type casting happened during validation in
parse_node_obj
, store it tonode
.If we try to store this node object directly, it will raise issue while JSON serialization of
Node.data
field in_get_node_event_data
. Also, DB will not be able to map collection name from submodel type as the collection dictionary uses only one collection for all kind of nodes i.e.node
. Fixing above issues will also fixkver
job failure.Fixes: b785e19 ("api.main: use node endpoints for all type of Node subtypes")