Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api.main: preserve model parsing and fix kver job #477

Closed
wants to merge 1 commit into from

Conversation

JenySadadia
Copy link
Collaborator

@JenySadadia JenySadadia commented Feb 12, 2024

Fixes kernelci/kernelci-pipeline#403

When submodel like Checkout or Test node is submitted, explicit validation takes place in parse_node_obj method. The method also converts request parameters to defined data types e.g. for storing kernel version, version and patchlevel fields will be converted to int.
Losing all these conversion and storing object as it received will raise issues like kver job failure.
In order to preserve type casting happened during validation in parse_node_obj, store it to node.
If we try to store this node object directly, it will raise issue while JSON serialization of Node.data field in _get_node_event_data. Also, DB will not be able to map collection name from submodel type as the collection dictionary uses only one collection for all kind of nodes i.e. node. Fixing above issues will also fix kver job failure.

Fixes: b785e19 ("api.main: use node endpoints for all type of Node subtypes")

When submodel like `Checkout` or `Test` node is submitted,
explicit validation takes place in `parse_node_obj` method.
The method also converts request parameters to defined
data types e.g. for storing kernel version, `version` and
`patchlevel` fields will be converted to `int`.
Losing all these conversion and storing object as it received
will raise issues like `kver` job failure.
In order to preserve type casting happened during validation
in `parse_node_obj`, store it to `node`.
If we try to store this node object directly, it will raise
issue while JSON serialization of `Node.data` field in
`_get_node_event_data`. Also, DB will not be able to map
collection name from submodel type as the collection dictionary
uses only one collection for all kind of nodes i.e. `node`.
Fixing above issues will also fix `kver` job failure.

Fixes: b785e19 ("api.main: use node endpoints for all type of Node subtypes")
Signed-off-by: Jeny Sadadia <[email protected]>
@JenySadadia JenySadadia requested a review from r-c-n February 12, 2024 12:51
Copy link

@r-c-n r-c-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JenySadadia There are some things about this rationale that I don't understand and some others that I don't think are quite right. Let's see if we can both get to the same point:

When submodel like Checkout or Test node is submitted, explicit validation takes place in parse_node_obj

Correct.

The method also converts request parameters to defined data types e.g. for storing kernel version, version and patchlevel fields will be converted to int.

Not exactly.
node in post_node() should already be a Node object and its fields are parsed from json into the Node field datatypes. That has been like this since the beginning.
The call to parse_node_obj() in post_node() doesn't convert anything or changes anything in the object, since it's return value isn't used. The purpose of this call, as the comment explains, is to validate the data according to a specific model.

Losing all these conversion and storing object as it received will raise issues like kver job failure.

When/where is this conversion lost? and what do you mean with storing the object as received?

Just to clarify: in terms of node parsing and storage in the DB, parse_node_obj() does nothing. The only thing it does is to validate an object using pydantic. The object is kept intact and the storage in the DB is unaffected by it. Actually, the way the object is received and stored is 100% the same as it always was, so I don't see how this code is now affecting the kver jobs.

If we try to store this node object directly, it will raise issue while JSON serialization of Node.data field in _get_node_event_data

But _get_node_event_data() doesn't do anything JSON-related. It simply builds a dict from some selected fields of a node. Do you mean that it fails on some nodes because they don't have a data field? If that's the case, then the fix should be to check if the node has data in this line. This is a real bug, the data field is optional in a Node, so we can't assume that all nodes will have it.

Also, DB will not be able to map collection name from submodel type as the collection dictionary uses only one collection for all kind of nodes i.e. node

But db.create() is called on the original Node object as received by post_node(), not on a sub-type of Node.

So, I'll try to do a walkthrough of the changes:

    # Initially, <node> is received and parsed as a Node object
parsed_node = parse_node_obj(node)    # <--- <node> is cast into <parsed_node> as a subtype of Node, containing the same contents
# Convert again to parent model `Node` in order to enable JSON
# serialization of nested models (such as `CheckoutData`) and
# map model against existing DB collection i.e. `Node`
node = Node(**parsed_node.dict())     # <-- <parsed_node> is cast again back to a Node object. This <node> and the one at the beginning of the function should be equivalent

So I think this patch shouldn't have any effect at all. Am I missing something?

@JenySadadia
Copy link
Collaborator Author

@hardboprobot Thanks for the reviews and comments. I'll try to explain things in detail.

Not exactly. node in post_node() should already be a Node object and its fields are parsed from json into the Node field datatypes. That has been like this since the beginning. The call to parse_node_obj() in post_node() doesn't convert anything or changes anything in the object, since it's return value isn't used. The purpose of this call, as the comment explains, is to validate the data according to a specific model.

Please see below logs on POST /node request:

kernelci-api | Node received: id=None kind='checkout' name='checkout' path=['checkout'] group=None parent=None state='done' result='pass' artifacts={'tarball': AnyHttpUrl('https://kciapistagingstorage1.file.core.windows.net/staging/linux-kernelci-staging-stable-staging-stable-20240209.6.tar.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D', )} data={'kernel_revision': {'tree': 'kernelci', 'url': 'https://github.com/kernelci/linux.git', 'branch': 'staging-stable', 'commit': '67aaa1a214b14950ccbc782d4f5b712946f07441', 'describe': 'staging-stable-20240209.6', 'version': {'version': '6', 'patchlevel': '1', 'sublevel': '77', 'extra': '-1-g67aaa1a214b14'}}} debug=None error_code=None created=datetime.datetime(2024, 2, 13, 4, 27, 22, 126844) updated=datetime.datetime(2024, 2, 13, 4, 27, 22, 126851) timeout=datetime.datetime(2024, 2, 13, 10, 27, 22, 126856) holdoff=None owner=None user_groups=[]

kernelci-api | Parsed node: id=None kind='checkout' name='checkout' path=['checkout'] group=None parent=None state='done' result='pass' artifacts={'tarball': AnyHttpUrl('https://kciapistagingstorage1.file.core.windows.net/staging/linux-kernelci-staging-stable-staging-stable-20240209.6.tar.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D', )} data=CheckoutData(kernel_revision=Revision(tree='kernelci', url=AnyUrl('https://github.com/kernelci/linux.git', scheme='https', host='github.com', tld='com', host_type='domain', path='/kernelci/linux.git'), branch='staging-stable', commit='67aaa1a214b14950ccbc782d4f5b712946f07441', describe='staging-stable-20240209.6', version=KernelVersion(version=6, patchlevel=1, sublevel=77, extra='-1-g67aaa1a214b14', name=None), patchset=None)) debug=None error_code=None created=datetime.datetime(2024, 2, 13, 4, 27, 22, 126844) updated=datetime.datetime(2024, 2, 13, 4, 27, 22, 126851) timeout=datetime.datetime(2024, 2, 13, 10, 27, 22, 126856) holdoff=None owner=None user_groups=[]

kernelci-api | INFO:     172.25.0.1:49798 - "POST /latest/node HTTP/1.1" 200 OK

See the difference in Node.data field value before and after parsing. Node.data is just a dictionary. So, the request model will have it like data={'kernel_revision': {'tree': 'kernelci', 'url': 'https://github.com/kernelci/linux.git', 'branch': 'staging-stable', 'commit': '67aaa1a214b14950ccbc782d4f5b712946f07441', 'describe': 'staging-stable-20240209.6', 'version': {'version': '6', 'patchlevel': '1', 'sublevel': '77', 'extra': '-1-g67aaa1a214b14'}}}.

Parsing is based on submodel. In this case, data becomes an instance of CheckoutData and that will have data types converted as per model definition i.e. it will be converted to data=CheckoutData(kernel_revision=Revision(tree='kernelci', url=AnyUrl('https://github.com/kernelci/linux.git', scheme='https', host='github.com', tld='com', host_type='domain', path='/kernelci/linux.git'), branch='staging-stable', commit='67aaa1a214b14950ccbc782d4f5b712946f07441', describe='staging-stable-20240209.6', version=KernelVersion(version=6, patchlevel=1, sublevel=77, extra='-1-g67aaa1a214b14', name=None), patchset=None)).

When/where is this conversion lost?

In the previous implementation, the parsed node obj was not being stored and that's where the conversion necessary for further processing was lost.

and what do you mean with storing the object as received?

It means using data received by the handler again i.e. using node after parse_node_obj and not the one after parsing. The implementation used to be as below:

parse_node_obj(node)
if node.parent:

Just to clarify: in terms of node parsing and storage in the DB, parse_node_obj() does nothing. The only thing it does is to validate an object using pydantic. The object is kept intact and the storage in the DB is unaffected by it. Actually, the way the object is received and stored is 100% the same as it always was, so I don't see how this code is now affecting the kver jobs.

In the previous implementation, a single Node model was used for all kinds of nodes. Also, Node.data was same for all nodes. That's why from the request itself, data type conversion for KernelVersion fields took place and the node data before any kind of parsing was the same as what we received with parse_node_obj now.

But _get_node_event_data() doesn't do anything JSON-related. It simply builds a dict from some selected fields of a node. Do you mean that it fails on some nodes because they don't have a data field? If that's the case, then the fix should be to check if the node has data in this line. This is a real bug, the data field is optional in a Node, so we can't assume that all nodes will have it.

get_node_event_data has 'data': node.data. That's where the issue lies.
While publishing events, JSON encoding fails for pydantic fields with submodels. So, it raises the below error when Node.data is found of any subclass type i.e. CheckoutData in this case:

kernelci-api |   File "/home/kernelci/.local/lib/python3.10/site-packages/cloudevents/sdk/event/base.py", line 223, in MarshalJSON
kernelci-api |     return json.dumps(props)
kernelci-api |   File "/usr/local/lib/python3.10/json/__init__.py", line 231, in dumps
kernelci-api |     return _default_encoder.encode(obj)
kernelci-api |   File "/usr/local/lib/python3.10/json/encoder.py", line 199, in encode
kernelci-api |     chunks = self.iterencode(o, _one_shot=True)
kernelci-api |   File "/usr/local/lib/python3.10/json/encoder.py", line 257, in iterencode
kernelci-api |     return _iterencode(o, 0)
kernelci-api |   File "/usr/local/lib/python3.10/json/encoder.py", line 179, in default
kernelci-api |     raise TypeError(f'Object of type {o.__class__.__name__} '
kernelci-api | TypeError: Object of type CheckoutData is not JSON serializable

I also tried to check for adding JSON encoders in Config class but found out that we need to add a custom JSON encoder there. However, converting it back to Node uses Node.data as a plain dictionary which solves the issue of encoding.

But db.create() is called on the original Node object as received by post_node(), not on a sub-type of Node.

After node parsing, it becomes an instance of Checkout or Test model ,and db.create will try to find the mapping for Checkout in the COLLECTIONS dictionary. I converted it back to Node to resolve the below error:

kernelci-api |   File "/home/kernelci/./api/main.py", line 566, in post_node
kernelci-api |     obj = await db.create(node)
kernelci-api |   File "/home/kernelci/./api/db.py", line 160, in create
kernelci-api |     col = self._get_collection(obj.__class__)
kernelci-api |   File "/home/kernelci/./api/db.py", line 59, in _get_collection
kernelci-api |     col = self.COLLECTIONS[model]
kernelci-api | KeyError: <class 'kernelci.api.models.Checkout'>

So I think this patch shouldn't have any effect at all. Am I missing something?

I hope the above explanations make things clearer.

@r-c-n
Copy link

r-c-n commented Feb 13, 2024

@JenySadadia ok, we both understand the same about what happens technically there. Now, there's probably part of the story that I'm missing:

In the previous implementation, the parsed node obj was not being stored and that's where the conversion necessary for further processing was lost.

The implementation used to be as below:

parse_node_obj(node)
if node.parent:

In the previous implementation, a single Node model was used for all kinds of nodes. Also, Node.data was same for all nodes.

Yes, all of that is intentional and that's the key to implement an easy mechanism to store and retrieve different types of Nodes with automatic model validation without complicating the rest of the underlying api implementation. What's the purpose of changing it?

That's why from the request itself, data type conversion for KernelVersion fields took place and the node data before any kind of parsing was the same as what we received with parse_node_obj now.

Aaah, so the problem is specifically in the KernelVersion fields? I guess you mean the parameters that are originally strings in JSON but are then cast into int automatically by pydantic? And you're explicitly casting the node to a submodel to have those fields cast into integers and then have the object turned back into a Node to keep everything working?

That's a lot of work for something that should be simpler but I think I get your point.

Ok, so first of all, you can skip all this trouble if you sent the json object to the request with integer fields being integers instead of strings:

"version": {
	"version": 6, 
	"patchlevel": 1,
	"sublevel": 77,
	"extra": "-1-g67aaa1a214b14"
}

instead of

"version": {
	"version": "6", 
	"patchlevel": "1",
	"sublevel": "77",
	"extra": "-1-g67aaa1a214b14"
}

But this should be actually enforced by pydantic. That's why we're using it in the first place, if pydantic is silently converting data types what's the point of model validation?

Pydantic v2 introduced a strict mode that does just that, but it's not in v1, inexplicably.

Oh wait, I found an explanation:

Pydantic is no a validation library, it's a parsing library.
It makes no guarantee whatsoever about checking the form of data you give to it; instead it makes a guarantee about the form of data you get out of it.
This sounds like an esoteric difference, but it has real practical consequences, eg.

  • if you pass "3" (which is not an int) to an int field, pydantic will convert it to an int
  • if you pass 3.14 (which again is not an int) to an int field, pydantic will convert it to an int (thereby "loosing information")

I think this is correct and I'm not interested in changing it

But then, right at the top of the pydantic docs:

Pydantic is the most widely used data validation library for Python.

This is so, so python.

Anyway, I'm not against this change, but I don't think it's the right solution. What I think we should do is to migrate to pydantic v2 (v1 is soon to be deprecated) and make the model validations strict so that when anyone pushes an object with the wrong data types (like in the example you posted) the API request fails and the user gets a clear message about the cause. We don't want the API doing hidden magic on us, we want the data to be precisely formatted, and enforced in every step (once the models are fixed).

If you still want to go ahead with this change, then consider this an approval and we can always remember later to migrate to V2 and rework this if needed.

PS.: Sorry about the rant.

@JenySadadia
Copy link
Collaborator Author

Anyway, I'm not against this change, but I don't think it's the right solution. What I think we should do is to migrate to pydantic v2 (v1 is soon to be deprecated) and make the model validations strict so that when anyone pushes an object with the wrong data types (like in the example you posted) the API request fails and the user gets a clear message about the cause. We don't want the API doing hidden magic on us, we want the data to be precisely formatted, and enforced in every step (once the models are fixed).

Model validations are also there for pydantic v1. It does validate Request model atm. For the POST node request, it validates the request model to be a Node object. The issue is child model has a different data type set for Node.data. In the case of Checkout node, it validates the data successfully as the received type of data field is dict and doesn't validate it against CheckoutData model. And that is an expected behavior.
Based on another test I performed for the validation If you add an int field to Node model and try to send str value for it in the request, the pydantic will raise ValidationError. So validation is working as expected.

If you still want to go ahead with this change, then consider this an approval and we can always remember later to migrate to V2 and rework this if needed.

Yes, that seems like the right thing to do atm as the version upgrade will take some time and it needs to be compatible with FastAPI version. I'll add a comment in #321 to keep track and we'll rework this when we upgrade other packages.

@hardboprobot How does it sound?

@r-c-n
Copy link

r-c-n commented Feb 14, 2024

Model validations are also there for pydantic v1

Of course, but not strict validations.

For the POST node request, it validates the request model to be a Node object. The issue is child model has a different data type set for Node.data. In the case of Checkout node, it validates the data successfully as the received type of data field is dict and doesn't validate it against CheckoutData model

Incorrect. The current code does validate the request against a Node model, but then it also performs the validation against a Node subtype (parse_node_obj()), so for checkout nodes it does validate the data field against the CheckoutData model.

Based on another test I performed for the validation If you add an int field to Node model and try to send str value for it in the request, the pydantic will raise ValidationError. So validation is working as expected.

What ValidationError exactly? Because I just tested the same thing and the API silently accepted the string and parsed it into an integer, as per pydantic's documentation. Maybe we're running different pydantic versions?

Yes, that seems like the right thing to do atm as the version upgrade will take some time and it needs to be compatible with FastAPI version. I'll add a comment in #321 to keep track and we'll rework this when we upgrade other packages.

@hardboprobot How does it sound?

As I said, if you want to merge this I won't stop it. I already gave all the explanations I could, and I tested all of them just to be sure. With all that on the table, it's up to you to decide what to do.

@JenySadadia
Copy link
Collaborator Author

JenySadadia commented Feb 14, 2024

Incorrect. The current code does validate the request against a Node model, but then it also performs the validation against a Node subtype (parse_node_obj()), so for checkout nodes it does validate the data field against the CheckoutData model.

I meant the validation that is being performed in the request handler automatically.

async def post_node(node: Node,
                    current_user: User = Depends(get_current_user)):

Above code is validating request model i.e node:Node.

What ValidationError exactly? Because I just tested the same thing and the API silently accepted the string and parsed it into an integer, as per pydantic's documentation. Maybe we're running different pydantic versions?

Nope, I am running the same pydantic version. Below is the test scenario:
I added test to Node model:

class Node(DatabaseModel):
    """KernelCI primitive object to model a node in a hierarchy"""
    class_kind: ClassVar[str] = 'node'
    kind: str = Field(
        default='node',
        description="Type of the object"
    )
    ..... 
    user_groups: List[str] = Field(
        default=[],
        description="User groups that are permitted to update node"
    )
    test: Optional[int]

Sent POST node request:

$ cat node-data.json 
{"kind":"checkout","test": "test-data","name":"checkout","path":["checkout"],"group":null,"parent":null,"state":"done","result":"pass","artifacts":{"tarball":"https://kciapistagingstorage1.file.core.windows.net/staging/linux-kernelci-staging-stable-staging-stable-20240209.6.tar.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D"},"data":{"kernel_revision":{"tree":"kernelci","url":"https://github.com/kernelci/linux.git","branch":"staging-stable","commit":"67aaa1a214b14950ccbc782d4f5b712946f07441","describe":"staging-stable-20240209.6","version":{"version":"6","patchlevel":"1","sublevel":"77","extra":"-1-g67aaa1a214b14"}}},"debug":null,"error_code":null}

$ ./kci node submit node-data.json 
Error: 422 Client Error: Unprocessable Entity for url: http://172.17.0.1:8001/latest/node
[{'loc': ['body', 'test'], 'msg': 'value is not a valid integer', 'type': 'type_error.integer'}]

API logs:

kernelci-api | INFO:     172.19.0.1:59756 - "POST /latest/node HTTP/1.1" 422 Unprocessable Entity

@r-c-n
Copy link

r-c-n commented Feb 14, 2024

test: Optional[int]
{"kind":"checkout","test": "test-data" ...

We're talking about different cases. "test-data" is obviously not a parseable int. Refer to this long comment above and the links I posted explaining the problem. pydantic v1 will silently parse strings into numbers if the string can be parsed as a number. Try feeding it something like "test": "1" and see how the validator happily turns it into "test": 1.

@JenySadadia
Copy link
Collaborator Author

Thanks for the links @hardboprobot
It turns out that Strict types also work with Pydantic v1. See https://docs.pydantic.dev/1.10/usage/types/#strict-types.
I tested with StrictInt and it works.

(kernelci-core_env) jeny@jeny-ThinkPad-T14-Gen-1:~/kernelCI/kernelci-core_env/src/kernelci-core$ cat node-data.json 
{"kind":"node","test": "2","name":"checkout","path":["checkout"],"group":null,"parent":null,"state":"done","result":"pass","artifacts":{"tarball":"https://kciapistagingstorage1.file.core.windows.net/staging/linux-kernelci-staging-stable-staging-stable-20240209.6.tar.gz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D"},"data":{"kernel_revision":{"tree":"kernelci","url":"https://github.com/kernelci/linux.git","branch":"staging-stable","commit":"67aaa1a214b14950ccbc782d4f5b712946f07441","describe":"staging-stable-20240209.6","version":{"version":"6","patchlevel":"1","sublevel":"77","extra":"-1-g67aaa1a214b14"}}},"debug":null,"error_code":null}

$ ./kci node submit node-data.json 
Error: 422 Client Error: Unprocessable Entity for url: http://172.17.0.1:8001/latest/node
[{'loc': ['body', 'test'], 'msg': 'value is not a valid integer', 'type': 'type_error.integer'}]

This will also need some fixes in the pipeline as we are receiving str version information from tarball service atm.

@r-c-n
Copy link

r-c-n commented Feb 15, 2024

Ah nice find! Ok so that's the way to go IMO. Sane data from the beginning, everything automatically checked, clear error messages, no surprises.

@JenySadadia
Copy link
Collaborator Author

Created kernelci/kernelci-core#2382

@JenySadadia JenySadadia added the staging-skip Don't test automatically on staging.kernelci.org label Feb 16, 2024
@JenySadadia
Copy link
Collaborator Author

Closing this as an alternate solution got merged.

@JenySadadia JenySadadia deleted the fix-model-parsing branch February 20, 2024 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
staging-skip Don't test automatically on staging.kernelci.org
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kver failure on staging
2 participants