How should we suggest updating data in a readonly namespace? #188

westonsteimel · 2022-09-09T16:57:05Z

westonsteimel
Sep 9, 2022

I would like to start proposing improvements to the NVD data from the nvd.nist.gov namespace on GSD entries in the hope that someday we can find a way to provide those updates back to the source. The nvd.nist.gov namespace is readonly and populated by the GSD bot, so editing it directly is not supported. I know at some point @kurtseifried had provided a correction to a CPE and I think that was done by copying the affected property into a namespace nvd.nist.gov under the GSD namespace, but I'd like to understand if that is really how we want this to work moving forward.

I do have an example of a small proposed GSD entry change at CloudSecurityAlliance/gsd-database#2400

kurtseifried · 2022-09-09T17:06:10Z

kurtseifried
Sep 9, 2022
Maintainer

So several questions/comments:

we want to import the data automatically from various sources as much as possible (scaling/efficiency, etc)
what do we do when that data is broken or incorrect? do we fix it? where?
what happens when (if?) the original source changes it, was that a fix? (or is it more brokan?) do we want someone to review it? do we flag it?
as for where we fix the data, if we write the NVD data in, then fix it, we have that trail in git, e.g. we still have the original data (if anyone cares to go look for it)
if we are fixing it by overwritting it, how do we indicate we changed it from the original? "hey GSD, you have a typo, NVD says CVSS score of X, you say Y"
how do we indicate WHY we changed it, e.g. what evidence caused us to change this data?
if we fix the data outside the namespace where it exists, and then for example have the API overlay our improved data when serving the file, do we put the overlay data in the root namespace, or what?

0 replies

kurtseifried · 2022-09-09T17:11:58Z

kurtseifried
Sep 9, 2022
Maintainer

Also your commit seems to assume the NVD namespace is empty, which it isn't, so the commit needs to be reworked to overwrite (merge? overlay? some word like that) the data into the existing "nvd.nist.gov" space (or else the json breaks, can't have multiple identically named keys).

This also all assumes we want to overwrite the "broken" entry and not just insert a more correct one.

0 replies

kurtseifried · 2022-09-09T17:17:26Z

kurtseifried
Sep 9, 2022
Maintainer

More thinking outloud:

this is why we picked git initially. We can do things like have a random person overwrite a "Read only" namespace and 1) we can revert it easily and 2) we can know exactly who did what when, and with a good commit message why.

So I'm inclined, for now, to allow overwrites of these "Read only" spaces, and we see where this goes and how to best deal with it, worst case we clean it up by overwriting it with the NVD data and we move the altered data somewhere else.

0 replies

westonsteimel · 2022-09-09T17:24:47Z

westonsteimel
Sep 9, 2022
Author

I was basing it off of what you'd done at CloudSecurityAlliance/gsd-database@1efab96, though once it was fixed in NVD someone else reverted that change. The bot that runs to populate the nvd.nist.gov is definitely going to clobber anything we put there, so it doesn't really make sense to suggest changes at all I guess

0 replies

joshbressers · 2022-09-09T18:43:06Z

joshbressers
Sep 9, 2022
Maintainer

The whole point to a namespace is keeping it off limits to others. I like that we can't monkey with the NVD or CVE data. What we have is what they have. If they update something, we pick up that update quickly.

We know we want to be able to add enrichment data, that's basically the whole point right now. There's no good way to enrich the existing NVD upstream data without a lot of pain

I think there are two types of enrichment (let's just think about this in the context of NVD for the moment, it will help keep the scope sane)

Correcting an existing read only namespace
Adding new data

For adding new data, I think either adding something the GSD namespace or your own namespace would be fine.

For corrections there isn't a great way I'm aware of that can correct portions of the NVD data

0 replies

kurtseifried · 2022-09-09T22:34:07Z

kurtseifried
Sep 9, 2022
Maintainer

Ok so adding data is easy if you use your own namespace, the trick becomes knowing when/how to overlay it, e.g. let's assume I use seifried.org, and people trust/want to use my data in my namespace. If I have something like (again for the sake of argument):

overlay: { namespaces: { cve.org: [some CVE data like an affects set of data]

are we adding my data to the cve.org data? replacing it entirely? Because two very common cases are "they got it wrong, here's the correct one" and "they are incomplete, here's more data". When you can only have one item, like a description it's easy, it overwrites the existing one, but when you have lists (e.g. affects, or references) then what?

So we may also need a way to indicate that this data is "in addition to" or "replaces" whatever keys are in the same space. We also need a way to specify what we're adding or overwriting (originally I used the term "overlay", I still can't think of a better one).

One option would be:

overlay: { replace: { namespaces: { cve.org: [some CVE data like an affects set of data]

overlay: { addto: { namespaces: { cve.org: [some CVE data like CVSS environmental data]

Now having said this all, there may be a better solution:

We populate the root osv:{} based on data in the root and in namespaces (e.g. CVE,. NVD, etc.). We can basically just sort it out and write "the best truth" in osv:{}, if people don't l;ike it, they can choose to have their own parsing rules (e.g. "do we trust seifried.org namepsace to overlay for cve.org?") and so on.

My vote, for now:

People write stuff to their namespaces. We parse it and write it to the root osv:{}. This will be a natural extension of the GSD to osv:{} conversion I'm working on (it already has this mindset).

0 replies

joshbressers · 2022-09-12T03:20:21Z

joshbressers
Sep 12, 2022
Maintainer

I've been putting some thought into this, and I think I have some ideas.

The GSD namespace is really what we want to be THE source of information. If we want to see updates and/or corrections, that's really what we want. I think we should figure out how to turn existing NVD entries into GSD (OSV) format.

If we want to modify the data, that's the place to do it. There will be some issues with keeping NVD changes current is a different discussion, so let's just ignore that point for the moment.

If someone wants to add their own data, that's where a namespace makes sense I think. Some namespaces will be merged into GSD, some won't. It'll just depend, but the point is your namespace is for whatever you want.

0 replies

raphaelahrens · 2022-09-12T07:36:35Z

raphaelahrens
Sep 12, 2022

To describes changes to JSON there is JSON Patch defined in RFC 6902 and the there is definition to merge JSON objects in RFC 7396

0 replies

kurtseifried · 2022-09-12T14:47:54Z

kurtseifried
Sep 12, 2022
Maintainer

The challenge here is we then... store the original JSON files and a series of patches and the final file (e.g. so we don't make everyone apply the series to get up to date), or.. something else? For now, the solution is git, this gives us the history, and the ability to roll back, the hosting is easy (github), and distribution (git clone/git pull) all in one tidy bundle.

0 replies

raphaelahrens · 2022-09-12T22:14:46Z

raphaelahrens
Sep 12, 2022

Fist let me clarify I am not proposing that JSON patch should be used or other forms of diffs. But the question was raised how this could be encoded and before a new format is defined I wanted to mention that others already worked on this problem.

It is also possible to include a JSON patch inside the object that should be changed.

{
  "foo":  1,
  "patch": [
    { "op": "replace", "path": "/foo", "value": 2 }
  ]
}

So the patch could be put into a namespace and there is no need to manage multiple files for one GSD entry.

But this is only a viable solution if you want to store changes to the read only data. Maybe to highlight false data in CVE and co or to clarify contradicting data with GSD and CVE.

I agree with @joshbressers the approach to improve the GSD namespace is the most sensible.

0 replies

kurtseifried · 2023-02-02T16:59:46Z

kurtseifried
Feb 2, 2023
Maintainer

So there's two issues here:

what technical method do we use to update the JSON data (e.g. direct overwrite? patch?)
depending on the technical method, if we overwrite the data how do we handle updates from NIST for example? If we use patche(s) how do we apply them and in what order? What happens when a patch gets out of synch with the data (e.g. NIST updates it to delete an entry or something)

My thinking here is we don't touch cve.org/nist, we synthesize that data, patch it, whatever, and put the result into the GSD namespace. Then for example when the API is serving the data the requestor gets the best up to date complete data from GSD.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should we suggest updating data in a readonly namespace? #188

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How should we suggest updating data in a readonly namespace? #188

westonsteimel Sep 9, 2022

Replies: 11 comments

kurtseifried Sep 9, 2022 Maintainer

kurtseifried Sep 9, 2022 Maintainer

kurtseifried Sep 9, 2022 Maintainer

westonsteimel Sep 9, 2022 Author

joshbressers Sep 9, 2022 Maintainer

kurtseifried Sep 9, 2022 Maintainer

joshbressers Sep 12, 2022 Maintainer

raphaelahrens Sep 12, 2022

kurtseifried Sep 12, 2022 Maintainer

raphaelahrens Sep 12, 2022

kurtseifried Feb 2, 2023 Maintainer

westonsteimel
Sep 9, 2022

kurtseifried
Sep 9, 2022
Maintainer

kurtseifried
Sep 9, 2022
Maintainer

kurtseifried
Sep 9, 2022
Maintainer

westonsteimel
Sep 9, 2022
Author

joshbressers
Sep 9, 2022
Maintainer

kurtseifried
Sep 9, 2022
Maintainer

joshbressers
Sep 12, 2022
Maintainer

raphaelahrens
Sep 12, 2022

kurtseifried
Sep 12, 2022
Maintainer

raphaelahrens
Sep 12, 2022

kurtseifried
Feb 2, 2023
Maintainer