How should we suggest updating data in a readonly namespace? #188
Replies: 11 comments
-
So several questions/comments:
|
Beta Was this translation helpful? Give feedback.
-
Also your commit seems to assume the NVD namespace is empty, which it isn't, so the commit needs to be reworked to overwrite (merge? overlay? some word like that) the data into the existing "nvd.nist.gov" space (or else the json breaks, can't have multiple identically named keys). This also all assumes we want to overwrite the "broken" entry and not just insert a more correct one. |
Beta Was this translation helpful? Give feedback.
-
More thinking outloud: this is why we picked git initially. We can do things like have a random person overwrite a "Read only" namespace and 1) we can revert it easily and 2) we can know exactly who did what when, and with a good commit message why. So I'm inclined, for now, to allow overwrites of these "Read only" spaces, and we see where this goes and how to best deal with it, worst case we clean it up by overwriting it with the NVD data and we move the altered data somewhere else. |
Beta Was this translation helpful? Give feedback.
-
I was basing it off of what you'd done at CloudSecurityAlliance/gsd-database@1efab96, though once it was fixed in NVD someone else reverted that change. The bot that runs to populate the |
Beta Was this translation helpful? Give feedback.
-
The whole point to a namespace is keeping it off limits to others. I like that we can't monkey with the NVD or CVE data. What we have is what they have. If they update something, we pick up that update quickly. We know we want to be able to add enrichment data, that's basically the whole point right now. There's no good way to enrich the existing NVD upstream data without a lot of pain I think there are two types of enrichment (let's just think about this in the context of NVD for the moment, it will help keep the scope sane)
For adding new data, I think either adding something the GSD namespace or your own namespace would be fine. For corrections there isn't a great way I'm aware of that can correct portions of the NVD data |
Beta Was this translation helpful? Give feedback.
-
Ok so adding data is easy if you use your own namespace, the trick becomes knowing when/how to overlay it, e.g. let's assume I use seifried.org, and people trust/want to use my data in my namespace. If I have something like (again for the sake of argument): overlay: { namespaces: { cve.org: [some CVE data like an affects set of data] are we adding my data to the cve.org data? replacing it entirely? Because two very common cases are "they got it wrong, here's the correct one" and "they are incomplete, here's more data". When you can only have one item, like a description it's easy, it overwrites the existing one, but when you have lists (e.g. affects, or references) then what? So we may also need a way to indicate that this data is "in addition to" or "replaces" whatever keys are in the same space. We also need a way to specify what we're adding or overwriting (originally I used the term "overlay", I still can't think of a better one). One option would be: overlay: { replace: { namespaces: { cve.org: [some CVE data like an affects set of data] overlay: { addto: { namespaces: { cve.org: [some CVE data like CVSS environmental data] Now having said this all, there may be a better solution: We populate the root osv:{} based on data in the root and in namespaces (e.g. CVE,. NVD, etc.). We can basically just sort it out and write "the best truth" in osv:{}, if people don't l;ike it, they can choose to have their own parsing rules (e.g. "do we trust seifried.org namepsace to overlay for cve.org?") and so on. My vote, for now: People write stuff to their namespaces. We parse it and write it to the root osv:{}. This will be a natural extension of the GSD to osv:{} conversion I'm working on (it already has this mindset). |
Beta Was this translation helpful? Give feedback.
-
I've been putting some thought into this, and I think I have some ideas. The GSD namespace is really what we want to be THE source of information. If we want to see updates and/or corrections, that's really what we want. I think we should figure out how to turn existing NVD entries into GSD (OSV) format. If we want to modify the data, that's the place to do it. There will be some issues with keeping NVD changes current is a different discussion, so let's just ignore that point for the moment. If someone wants to add their own data, that's where a namespace makes sense I think. Some namespaces will be merged into GSD, some won't. It'll just depend, but the point is your namespace is for whatever you want. |
Beta Was this translation helpful? Give feedback.
-
To describes changes to JSON there is JSON Patch defined in RFC 6902 and the there is definition to merge JSON objects in RFC 7396 |
Beta Was this translation helpful? Give feedback.
-
The challenge here is we then... store the original JSON files and a series of patches and the final file (e.g. so we don't make everyone apply the series to get up to date), or.. something else? For now, the solution is git, this gives us the history, and the ability to roll back, the hosting is easy (github), and distribution (git clone/git pull) all in one tidy bundle. |
Beta Was this translation helpful? Give feedback.
-
Fist let me clarify I am not proposing that JSON patch should be used or other forms of diffs. But the question was raised how this could be encoded and before a new format is defined I wanted to mention that others already worked on this problem. It is also possible to include a JSON patch inside the object that should be changed.
So the patch could be put into a namespace and there is no need to manage multiple files for one GSD entry. But this is only a viable solution if you want to store changes to the read only data. Maybe to highlight false data in CVE and co or to clarify contradicting data with GSD and CVE. I agree with @joshbressers the approach to improve the GSD namespace is the most sensible. |
Beta Was this translation helpful? Give feedback.
-
So there's two issues here:
My thinking here is we don't touch cve.org/nist, we synthesize that data, patch it, whatever, and put the result into the GSD namespace. Then for example when the API is serving the data the requestor gets the best up to date complete data from GSD. |
Beta Was this translation helpful? Give feedback.
-
I would like to start proposing improvements to the NVD data from the
nvd.nist.gov
namespace on GSD entries in the hope that someday we can find a way to provide those updates back to the source. The nvd.nist.gov namespace is readonly and populated by the GSD bot, so editing it directly is not supported. I know at some point @kurtseifried had provided a correction to a CPE and I think that was done by copying the affected property into a namespacenvd.nist.gov
under theGSD
namespace, but I'd like to understand if that is really how we want this to work moving forward.I do have an example of a small proposed GSD entry change at CloudSecurityAlliance/gsd-database#2400
Beta Was this translation helpful? Give feedback.
All reactions