-
Notifications
You must be signed in to change notification settings - Fork 66
[8] Find a way how to get project name from NVD CVE data #2485
Comments
A sub task has been added. In order to be able to at least estimate the accuracy of a model, we need an accessible toy set of labeled data. |
@CermakM can you please share the approach for retrieving the project name ? Which ecosystems being considered for this work ? |
- commit jupyter notebook DISCLAIMER: the code in the notebook is in NON-production quality and serves only as sketch of possible solution - the notebook provides a POC for project name inference from cpe description - the notebook is supposed to visualize possible results when implementing such kind of classifier for the task GitHub issue: openshiftio/openshift.io#2485 Signed-off-by: Marek Cermak <[email protected]> new file: cve-desrciption-cracker.ipynb
@krishnapaparaju sure thing, In the notebook you can see a suggestion of the approach that could greatly improve our current approach (also a slight comparison is present in the ntb). |
This experiment will continue in Sprint 147. @CermakM please update this issue, thanks 👍 |
Conclusion for the current sprint #2433We were able to prove that the description data evince a pattern. With a suitable feature extractor, a classifier can be trained to provide a decent predictions of a project name candidate. Such candidates are evaluated with a numeric confidence score and can be further processed (ordered, filtered, etc.) To be done in sprint #2775 :
cc @msrb |
Description
Mapping CVE entries to actual package names is much easier when we at least know name of a project (e.g. "Apache NiFi", or "Apache POI") that is affected by given vulnerability. Knowing the project name will help us to get better results and less false positives.
The output of this task should be a function that takes one NVD CVE record on input and returns list of possible project name candidates. Having confidence score for each candidate would be nice, but is not necessary.
Sub tasks for sprint #2433
Update: It is possible to use label the NVD feeds that reference GitHub and hence the project name can be infered from the description of the CPE. This set however might not be sufficient and the approach should be further discussed.
Update: Based on the description properties, Naive Bayes classifier was selected for the implementation.
Update: Accuracy has been evaluated on a relatively small dataset (cca 20% of real data) due to lack of labeled data.
The text was updated successfully, but these errors were encountered: