Implement resilient and memory sparing mirror process #279

EugenMayer · 2025-02-17T07:48:28Z

I would love to work on a patch to improve the situation:

Seperate the processes to

do the full fetch/year updates. Use the current approach, using 120 days slices. This heavily reduces the memory and increases reliability. New: each year file will be checked for c-time, if it is not older then 8 days and size > 0, skip it. Thus we only update the year files every 8 days (filling up with all the entries we yet used modified for). One can force-update the years if providing the --forceUpdate, off by default
the modified-file update: re-implement it using a full-fetch of all years, but only a "recent" changed slice of "most recent 8 days or if existing, since the last c-time of the modified file". This task no always fetches those >=8 days, in 1 API call, for all years, and replaces the current modified file. We can do this with a all-years fetch since the expected amount of items considered to the full fetch is a magnitude lower. The current 8 days slice is 24666 CVEs

Those process run one after the other
With this, we combine the positive effects of both: Reduce memory hogging and speedy regular modified updates.

@jeremylong what are your thoughts?

Stats:

innitial population

years are outdated, thus we fetch 8 days for each year, means> 4 (slices) x 23 years = 92 API requests fetching the most recent 8 days for each year
1 api call for modified
= 93 API calls

after the first cache population, lets say i run the next day:

all years are skipped (zero API requests)
modified file is fetched fetching 24hours of recent changes in 1 api request
= 1 API request

after 9 days passed

years are outdated, thus we fetch 8 days for each year, means> 4 (slices) x 23 years = 92 API requests fetching the most recent 8 days for each year
1 api call for modified
= 93 API calls

Thus the more expensive api calls are triggered every 8 days.

Further options:

implement optimized "year update" process where we no longer fetch per year, but for all years, but less the lets say 4 days (if the cache was not older) - this would reduce the "after 9 days passed) run to 2 API calls with slightly more memory usage. But i think this optimization might just not be worth it.

jeremylong#259) * Reimplement caching to use a per-year scope * Add forgiveness if a year fails, continue to the next one * Reimplement how the lastUpdated date is used and stored per year * Add lockfile * fix: preserve modified entries if year fails * polish docs, add exit code --------- Co-authored-by: Jeremy Long <[email protected]>

* fix: bugfix in yearly cache update * fix: ensure sorted * fix: updated modified date in properties

…remylong#269)

* feat: retry nvd cache on failure * fix: updated per peer review

…pdate flag, add docs

EugenMayer · 2025-02-17T14:26:00Z

I found out, that the slice-size is something and makes the NVD API crash. For example:

https://services.nvd.nist.gov/rest/json/cves/2.0?pubStartDate=2021-01-01T00%3A00%3A00Z&pubEndDate=2021-04-28T00%3A00%3A00Z&resultsPerPage=200&startIndex=0

is a 115 days slice that does consistently crash.

https://services.nvd.nist.gov/rest/json/cves/2.0?pubStartDate=2021-01-01T00%3A00%3A00Z&pubEndDate=2021-03-28T00%3A00%3A00Z&resultsPerPage=200&startIndex=0

So 30 days less, works. Also using it to 'cross the 04/28' works

https://services.nvd.nist.gov/rest/json/cves/2.0?pubStartDate=2021-03-01T00%3A00%3A00Z&pubEndDate=2021-04-28T00%3A00%3A00Z&resultsPerPage=200&startIndex=0

so the API, maybe under load, decides to simply not handle requests of a specific size. This crash is consistent since 6 hours.

I thus implemented a new parameter for the cli maxDaysOfYearRange.
By default, it is 115 (for the good days), I tried 90 and finally could finish my mirror sync.

This is most probably one of those things that make the current main implementation so fragile - consider the API load status, the API will not return more items then X and fail with an 503 if a request still would return so many items. 'Fetch them all' is just nothing that works, neither for the memory nor on the API, like expected

jeremylong · 2025-02-17T14:37:03Z

NVD is likely under load right now because of the error and users not upgrading and instead just trying again. It should level off soon.

EugenMayer · 2025-02-17T14:41:17Z

This is not an exception at all. This is the case for a long long time (weeks)- it was the motivation for my work in the first place. And several days around the first implementation the API was down regularly.

This happens day in day out IMHO.

jeremylong · 2025-02-17T15:15:29Z

If you are running into this daily - you are doing something wrong. Once you build the cache don't throw it away. For instance: https://github.com/dependency-check/DependencyCheck_Builder/actions/workflows/cache.yml has been running for a long time and has only had a few times where it had issues.

… better logs

EugenMayer · 2025-02-18T08:46:23Z

The current status of this pr, aka 'api resilience and memory' does finally work with the current ODC like gradle.

Features are:

reduce memory by fetching year-wise [memory]
years are saved one by one, thus continuing with the last year that failed is possible. [resilience]
If one year fails, it will continue from this year the next try instead of restarting from scratch [resilience]
the process will not re-fetch a year, if the cache is not older then 8 days (or --forceUpdate has been used). The 8 days 'diff' is fetched via the modified-changes process [resilience]
modified-changes aren not re-fetched if not older than 1 hour (or --forceUpdate has been used) [resilience]
docker: restart mirroring if the process fails. Restarting will debounce (1min * retryAttempt). [resilience]
docker: You can define how often the restart should happen using MAX_MIRROR_RETRIES [resilience]
you modify the max-days-fetched slice using maxDaysOfYearRange - the NVD API will block requests with larger values earlier, so reducing it can help 'getting through it' [resilience]

Drawbacks;

It uses more API calls (since we are doing smaller chunks to improve resilience/progress and memory footprint) which can prolong the process. AFAICs doing a full fetch takes 30 minutes more (since one API requests takes about 30-60s on NVDs side), an incremental 'recent-modified' is as fast as before, and all 8 days, refetching the year diff, will take approx. 10 minutes. Effectively though, on a full fetch, the current process of gigantic fetches + 'succeed all or fail all' does often not finish at all, since the NVD API throughs random 503/403 (those 12 retries are not helping, they happen way to fast - the error will persist) errors more often then not - restarting the entire process.

IMHO this approach finishes more reliable and deterministic, even with NVD API errors, while it can be slower then the current process, if the NVD API does not fail at all. For me, neither from the dev machine nor from our DC k8s cluster, I have never seen a run that finished without NVD failing to answer (including the 12 retries). So IMHO for me, betting on that horse does not make sense.

You can try it yourself using ghcr.io/eugenmayer/vulnz:7.3.0-1 or the chart or build the cli version yourself from this branch.

EugenMayer · 2025-02-20T18:17:31Z

@jeremylong am i right to assume that you are not considering merging this PR?

EugenMayer and others added 5 commits February 10, 2025 06:29

fix: bugfix in yearly cache update (jeremylong#266)

bac5582

* fix: bugfix in yearly cache update * fix: ensure sorted * fix: updated modified date in properties

fix: missing CVEs from first day of the year fixes jeremylong#267 (je…

e99ccdd

…remylong#269)

feat: retry nvd cache on failure (jeremylong#264)

6dffb79

* feat: retry nvd cache on failure * fix: updated per peer review

Optimize modified cache rebuild api calls

3d5c1aa

EugenMayer mentioned this pull request Feb 17, 2025

fix: Reimplement caching to use a per-year scope #273

Open

EugenMayer added 3 commits February 17, 2025 08:59

Do not update year cache if updated since the last 8 days, add forceU…

5b00097

…pdate flag, add docs

Remove unneeded pre-loading

5e22b66

extract filtering

595fbeb

EugenMayer added 3 commits February 17, 2025 15:42

Add maxDaysOfYearRange parameter

affbc8c

Add support for maxDaysOfYearRange in docker env

a748152

Fix write onto static

21a9975

EugenMayer added 10 commits February 17, 2025 19:52

MAX_DAYS_OF_YEAR_RANGED_ARG mapping and add FORCE_UPDATE_ARG

255513e

Fix docs, add max mirror retries

abd9de4

add retry debounce time

5b4fecd

fix substitution in bash script

d2e5125

Fix retry debounce calc

04da8e8

skip modified cache rerun if it ran succesfully less then 1 hour ago,…

93c1477

… better logs

fix debounce calc calculation in bash

eb3dbfd

fix code styles

521be64

Merge branch 'main' into task/optimize-modified-cache-rebuild

bd50ec4

merge main, use afresh apiBuilder per request

519e51f

EugenMayer changed the base branch from scratch/memory-usage-per-year to main February 18, 2025 07:41

Support forceUpdate for most-recent-changes

fe5600b

EugenMayer mentioned this pull request Feb 18, 2025

HTTP 503 when running --cache #283

Open

EugenMayer changed the title ~~Optimize modified cache rebuild api calls~~ Implement resilient and memory sparing mirror process Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement resilient and memory sparing mirror process #279

Implement resilient and memory sparing mirror process #279

EugenMayer commented Feb 17, 2025 •

edited

Loading

EugenMayer commented Feb 17, 2025 •

edited

Loading

jeremylong commented Feb 17, 2025

EugenMayer commented Feb 17, 2025 •

edited

Loading

jeremylong commented Feb 17, 2025

EugenMayer commented Feb 18, 2025 •

edited

Loading

EugenMayer commented Feb 20, 2025

Implement resilient and memory sparing mirror process #279

Are you sure you want to change the base?

Implement resilient and memory sparing mirror process #279

Conversation

EugenMayer commented Feb 17, 2025 • edited Loading

EugenMayer commented Feb 17, 2025 • edited Loading

jeremylong commented Feb 17, 2025

EugenMayer commented Feb 17, 2025 • edited Loading

jeremylong commented Feb 17, 2025

EugenMayer commented Feb 18, 2025 • edited Loading

EugenMayer commented Feb 20, 2025

EugenMayer commented Feb 17, 2025 •

edited

Loading

EugenMayer commented Feb 17, 2025 •

edited

Loading

EugenMayer commented Feb 17, 2025 •

edited

Loading

EugenMayer commented Feb 18, 2025 •

edited

Loading