fix: per year bug fixes #268

jeremylong · 2025-02-11T11:41:49Z

ensure the correct lastUpdated is captured per year
cleanup code duplication
ensure CVE-1988 - CVE-2001 are stored in year 2002

jeremylong · 2025-02-11T12:51:35Z

Existing cache from yesterday - ran the update and it took 2632 seconds (~43 minutes).

resolves #265

jeremylong · 2025-02-11T13:38:02Z

Fixed one minor bug and re-ran the update. 1943 seconds (32 minutes) later and the cache is updated (see the log). It is taking 149 requests to the NVD to update the cache when in reality if I updated the cache an hour ago - it should be a single API call to get all the data.

EugenMayer

nice, just docs and this is a good addition for sure

EugenMayer · 2025-02-11T16:51:15Z

vulnz/src/main/java/io/github/jeremylong/vulnz/cli/commands/CveCommand.java

@@ -72,6 +72,7 @@ public class CveCommand extends AbstractNvdCommand {
     * Start year (until today) to cache CVEs for.
     */
    private static final int START_YEAR = 2002;
+    private static final int EARLIEST_CVE_YEAR = 1988;



We should either name the variables in a way it is obvious or document why both exist, or the next time, it will be not implemented the right way again :)

EugenMayer · 2025-02-11T16:52:12Z

vulnz/src/main/java/io/github/jeremylong/vulnz/cli/commands/CveCommand.java

+    private void storeToCache(int year, CvesNvdPojo cves, CacheProperties properties) {
+        int target = year;
+        if (target < START_YEAR) {
+            target = START_YEAR;


We should add docs why this is needed and also reference the NVD docs you found on that. In the end, we fetch 1988-2002 as a special case, and store them in the year 2002 alltogether

EugenMayer · 2025-02-14T06:58:44Z

Fixed one minor bug and re-ran the update. 1943 seconds (32 minutes) later and the cache is updated (see the log). It is taking 149 requests to the NVD to update the cache when in reality if I updated the cache an hour ago - it should be a single API call to get all the data.

Are you able to compare this with the old implementation? (not memory, rather a real test). Asking because i cannot do it myself, it does never finish, neither on my k8s cluster nor locally (either ooms or fails due to API errors). It would be nice to
have a comparison.

Current:
23 yeast * 4 requests (120 days slices) = 132 requests.

We could use an entire separated fetch-logic for the modified file, which might be an easy fix and easy to implement. Idea

Update years

if the cache file for a year already
- exists and size > 0: update it only if the c-time is older then 8 days. Update is year based, 4 requests per year
- not exist:Fetch for all years, year by year, 4 requests per year (same as above)

Update modified file

if the cache
- exists: get the c-time and and use it as $start + pre-load the existing file into the result
- not exist: $start = today-8, nothing to preload
Make a single requests, fetching all changedSinceStart=$start and changedSinceTo=today, no year limitations - fetch them all at once.
- merge the result with the preloaded cache
write file

So we split both steps entirely, they fetching all years first or last will not matter, "year fetching" and "modify" fetching are separated entirely. With the current code structure, this can already be done fairly DRY.

So what we do is, we split the efforts and using different strategies. This makes sense since 1. will load a huge amount of items (a magnitude more then 2. and thus we should rather focus on relyability, memory usage and consistency.

loads and expected low amount of items, thus preferring speed to keep daily updates "fast" might be viable - thus using one request to load all items at once. I'am expected NVD to close this kind of request down with limits sooner or later, but as long as it lasts, it saves time.

// rant on
Main reason for the entire slowdown is the speed the NVD api answers with. This is hands down the slowest API i have ever used (beside some hacked PoCs). Considering they update the data once a night, thus can hard-cache the persistence layer to read-cluster, easily scale the application since there is zero relation between requests. I have to say how it is, it is silly to offer this as an API for anybody to use, let allow offering it with this global usage scope.
// rant off

Still, if it is more reliable, i do not care it takes 30 mins ones a night.

jeremylong added 2 commits February 11, 2025 06:37

chore: code cleanup

cb30210

chore: spotlessApply

a7efe2e

fix: include CVEs prior to 2002

4eced1d

resolves #265

jeremylong marked this pull request as ready for review February 11, 2025 13:38

jeremylong linked an issue Feb 11, 2025 that may be closed by this pull request

NVD CVE Cache years 1999-2001 #265

Closed

EugenMayer approved these changes Feb 11, 2025

View reviewed changes

EugenMayer mentioned this pull request Feb 11, 2025

Per-Year CVE Cache Download Issue | published: yyyy-01-01 missing #267

Closed

jeremylong mentioned this pull request Feb 13, 2025

Fix missing CVEs from first day of the year fixes #267 #269

Merged

jeremylong force-pushed the main branch from bac5582 to 59d6b88 Compare February 16, 2025 12:17

jeremylong changed the base branch from main to scratch/memory-usage-per-year February 16, 2025 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: per year bug fixes #268

fix: per year bug fixes #268

jeremylong commented Feb 11, 2025

jeremylong commented Feb 11, 2025

jeremylong commented Feb 11, 2025

EugenMayer left a comment

EugenMayer Feb 11, 2025

EugenMayer Feb 11, 2025

EugenMayer commented Feb 14, 2025 •

edited

Loading

fix: per year bug fixes #268

Are you sure you want to change the base?

fix: per year bug fixes #268

Conversation

jeremylong commented Feb 11, 2025

jeremylong commented Feb 11, 2025

jeremylong commented Feb 11, 2025

EugenMayer left a comment

Choose a reason for hiding this comment

EugenMayer Feb 11, 2025

Choose a reason for hiding this comment

EugenMayer Feb 11, 2025

Choose a reason for hiding this comment

EugenMayer commented Feb 14, 2025 • edited Loading

EugenMayer commented Feb 14, 2025 •

edited

Loading