Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: per year bug fixes #268

Open
wants to merge 3 commits into
base: scratch/memory-usage-per-year
Choose a base branch
from

Conversation

jeremylong
Copy link
Owner

  • ensure the correct lastUpdated is captured per year
  • cleanup code duplication
  • ensure CVE-1988 - CVE-2001 are stored in year 2002

@jeremylong
Copy link
Owner Author

Existing cache from yesterday - ran the update and it took 2632 seconds (~43 minutes).

@jeremylong
Copy link
Owner Author

Fixed one minor bug and re-ran the update. 1943 seconds (32 minutes) later and the cache is updated (see the log). It is taking 149 requests to the NVD to update the cache when in reality if I updated the cache an hour ago - it should be a single API call to get all the data.

@jeremylong jeremylong marked this pull request as ready for review February 11, 2025 13:38
@jeremylong jeremylong linked an issue Feb 11, 2025 that may be closed by this pull request
Copy link
Contributor

@EugenMayer EugenMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, just docs and this is a good addition for sure

@@ -72,6 +72,7 @@ public class CveCommand extends AbstractNvdCommand {
* Start year (until today) to cache CVEs for.
*/
private static final int START_YEAR = 2002;
private static final int EARLIEST_CVE_YEAR = 1988;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either name the variables in a way it is obvious or document why both exist, or the next time, it will be not implemented the right way again :)

private void storeToCache(int year, CvesNvdPojo cves, CacheProperties properties) {
int target = year;
if (target < START_YEAR) {
target = START_YEAR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add docs why this is needed and also reference the NVD docs you found on that. In the end, we fetch 1988-2002 as a special case, and store them in the year 2002 alltogether

@EugenMayer
Copy link
Contributor

EugenMayer commented Feb 14, 2025

Fixed one minor bug and re-ran the update. 1943 seconds (32 minutes) later and the cache is updated (see the log). It is taking 149 requests to the NVD to update the cache when in reality if I updated the cache an hour ago - it should be a single API call to get all the data.

Are you able to compare this with the old implementation? (not memory, rather a real test). Asking because i cannot do it myself, it does never finish, neither on my k8s cluster nor locally (either ooms or fails due to API errors). It would be nice to
have a comparison.

Current:
23 yeast * 4 requests (120 days slices) = 132 requests.

We could use an entire separated fetch-logic for the modified file, which might be an easy fix and easy to implement. Idea

  1. Update years
  • if the cache file for a year already
    • exists and size > 0: update it only if the c-time is older then 8 days. Update is year based, 4 requests per year
    • not exist:Fetch for all years, year by year, 4 requests per year (same as above)
  1. Update modified file
  • if the cache
    • exists: get the c-time and and use it as $start + pre-load the existing file into the result
    • not exist: $start = today-8, nothing to preload
  • Make a single requests, fetching all changedSinceStart=$start and changedSinceTo=today, no year limitations - fetch them all at once.
    • merge the result with the preloaded cache
  • write file

So we split both steps entirely, they fetching all years first or last will not matter, "year fetching" and "modify" fetching are separated entirely. With the current code structure, this can already be done fairly DRY.

So what we do is, we split the efforts and using different strategies. This makes sense since 1. will load a huge amount of items (a magnitude more then 2. and thus we should rather focus on relyability, memory usage and consistency.

  1. loads and expected low amount of items, thus preferring speed to keep daily updates "fast" might be viable - thus using one request to load all items at once. I'am expected NVD to close this kind of request down with limits sooner or later, but as long as it lasts, it saves time.

// rant on
Main reason for the entire slowdown is the speed the NVD api answers with. This is hands down the slowest API i have ever used (beside some hacked PoCs). Considering they update the data once a night, thus can hard-cache the persistence layer to read-cluster, easily scale the application since there is zero relation between requests. I have to say how it is, it is silly to offer this as an API for anybody to use, let allow offering it with this global usage scope.
// rant off

Still, if it is more reliable, i do not care it takes 30 mins ones a night.

@jeremylong jeremylong changed the base branch from main to scratch/memory-usage-per-year February 16, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NVD CVE Cache years 1999-2001
2 participants