-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674
Conversation
Ticket title is 'pre-allocate more DMA buffer on engine start' |
The ticket number should be DAOS-16866 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes engine start issue for me, thanks
Sys target needs only very limited DMA buffer for the WAL of RDB, we can prealloc only 1 chunk for sys target so that leave more hugepages for regular VOS targets. Signed-off-by: Niu Yawei <[email protected]>
c60648e
to
a247fca
Compare
@tanabarr , @wangshilong , just updated copyright. |
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/2/execution/node/133/log |
@daos-stack/daos-gatekeeper could you force land this? CI failed for the known NLT warning of dfuse_cb_release(). |
but no other tests ran here? shouldn't this run at least some md-on-ssd stages? |
Skip-func-hw-test-medium-md-on-ssd: false Signed-off-by: Niu Yawei <[email protected]>
Sure, I merged master and added commit pragma to run some md-on-ssd tests. |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/4/execution/node/1461/log |
Reduce DMA buffer pre-alloc from 60% to 50%, because we observed pre-alloc failures in CI tests. (probably because of memory fragmentation on some test nodes). Signed-off-by: Niu Yawei <[email protected]>
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/319/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/314/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/388/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/387/log |
src/bio/bio_xstream.c
Outdated
@@ -31,7 +32,7 @@ | |||
/* SPDK blob parameters */ | |||
#define DAOS_BS_CLUSTER_SZ (1ULL << 25) /* 32MB */ | |||
/* DMA buffer parameters */ | |||
#define DAOS_DMA_CHUNK_INIT_PCT 60 /* Default pre-xstream init chunks, in percentage */ | |||
#define DAOS_DMA_CHUNK_INIT_PCT 50 /* Default pre-xstream init chunks, in percentage */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this read "per-xstream"?
If we are seeing out of memory when trying to grow the buffer, why does it help only allocating 50% initially, won't we end up seeing the same out of memory issues after multiple buffer growths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sorry for the typo.
I was thinking that DAOS engine usually start once server node started, and we'd try to hold hugepages as much as possible when the memory isn't fragmented and no other potential hugepage consumers.
Fix typo. Signed-off-by: Niu Yawei <[email protected]>
This reverts commit 23518fa.
This reverts commit 23518fa.
This reverts commit 23518fa.
Sys target needs only very limited DMA buffer for the WAL of RDB, we can prealloc only 1 chunk for sys target so that leave more hugepages for regular VOS targets.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: