DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674

NiuYawei · 2025-01-02T13:09:19Z

Sys target needs only very limited DMA buffer for the WAL of RDB, we can prealloc only 1 chunk for sys target so that leave more hugepages for regular VOS targets.

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2025-01-02T13:09:39Z

Ticket title is 'pre-allocate more DMA buffer on engine start'
Status is 'Reopened'
https://daosio.atlassian.net/browse/DAOS-16866

NiuYawei · 2025-01-02T13:11:11Z

The ticket number should be DAOS-16866

tanabarr

fixes engine start issue for me, thanks

Sys target needs only very limited DMA buffer for the WAL of RDB, we can prealloc only 1 chunk for sys target so that leave more hugepages for regular VOS targets. Signed-off-by: Niu Yawei <[email protected]>

NiuYawei · 2025-01-06T00:57:44Z

@tanabarr , @wangshilong , just updated copyright.

daosbuild1 · 2025-01-06T00:59:05Z

Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/2/execution/node/133/log

NiuYawei · 2025-01-08T03:14:20Z

@daos-stack/daos-gatekeeper could you force land this? CI failed for the known NLT warning of dfuse_cb_release().

mchaarawi · 2025-01-08T14:02:22Z

@daos-stack/daos-gatekeeper could you force land this? CI failed for the known NLT warning of dfuse_cb_release().

but no other tests ran here? shouldn't this run at least some md-on-ssd stages?

Skip-func-hw-test-medium-md-on-ssd: false Signed-off-by: Niu Yawei <[email protected]>

NiuYawei · 2025-01-08T14:18:56Z

@daos-stack/daos-gatekeeper could you force land this? CI failed for the known NLT warning of dfuse_cb_release().

but no other tests ran here? shouldn't this run at least some md-on-ssd stages?

Sure, I merged master and added commit pragma to run some md-on-ssd tests.

daosbuild1 · 2025-01-09T00:47:32Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/4/execution/node/1461/log

Reduce DMA buffer pre-alloc from 60% to 50%, because we observed pre-alloc failures in CI tests. (probably because of memory fragmentation on some test nodes). Signed-off-by: Niu Yawei <[email protected]>

daosbuild1 · 2025-01-10T02:00:25Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/319/log

daosbuild1 · 2025-01-10T02:01:59Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/314/log

daosbuild1 · 2025-01-10T02:06:08Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/388/log

daosbuild1 · 2025-01-10T02:08:54Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15674/5/execution/node/387/log

tanabarr · 2025-01-10T10:25:00Z

src/bio/bio_xstream.c

@@ -31,7 +32,7 @@
 /* SPDK blob parameters */
 #define DAOS_BS_CLUSTER_SZ	(1ULL << 25)	/* 32MB */
 /* DMA buffer parameters */
-#define DAOS_DMA_CHUNK_INIT_PCT	60	/* Default pre-xstream init chunks, in percentage */
+#define DAOS_DMA_CHUNK_INIT_PCT 50      /* Default pre-xstream init chunks, in percentage */


Should this read "per-xstream"?
If we are seeing out of memory when trying to grow the buffer, why does it help only allocating 50% initially, won't we end up seeing the same out of memory issues after multiple buffer growths?

yes, sorry for the typo.

I was thinking that DAOS engine usually start once server node started, and we'd try to hold hugepages as much as possible when the memory isn't fragmented and no other potential hugepage consumers.

Fix typo. Signed-off-by: Niu Yawei <[email protected]>

This reverts commit 23518fa.

NiuYawei requested review from a team as code owners January 2, 2025 13:09

NiuYawei requested review from tanabarr and wangshilong January 2, 2025 13:09

wangshilong previously approved these changes Jan 3, 2025

View reviewed changes

tanabarr previously approved these changes Jan 5, 2025

View reviewed changes

DAOS-16894 bio: prealloc less DMA buffer for sys tgt

a247fca

Sys target needs only very limited DMA buffer for the WAL of RDB, we can prealloc only 1 chunk for sys target so that leave more hugepages for regular VOS targets. Signed-off-by: Niu Yawei <[email protected]>

NiuYawei dismissed stale reviews from tanabarr and wangshilong via a247fca January 6, 2025 00:57

NiuYawei force-pushed the niu/DAOS-16894-fix branch from c60648e to a247fca Compare January 6, 2025 00:57

NiuYawei requested review from wangshilong and tanabarr January 6, 2025 00:57

wangshilong previously approved these changes Jan 6, 2025

View reviewed changes

tanabarr previously approved these changes Jan 6, 2025

View reviewed changes

NiuYawei changed the title ~~DAOS-16894 bio: prealloc less DMA buffer for sys tgt~~ DAOS-16866 bio: prealloc less DMA buffer for sys tgt Jan 7, 2025

NiuYawei requested a review from a team January 8, 2025 03:13

Merge remote-tracking branch 'origin/master' into niu/DAOS-16894-fix

76d881c

Skip-func-hw-test-medium-md-on-ssd: false Signed-off-by: Niu Yawei <[email protected]>

DAOS-16866 bio: reduce pre-alloc

2414cc2

Reduce DMA buffer pre-alloc from 60% to 50%, because we observed pre-alloc failures in CI tests. (probably because of memory fragmentation on some test nodes). Signed-off-by: Niu Yawei <[email protected]>

NiuYawei dismissed stale reviews from tanabarr and wangshilong via 2414cc2 January 10, 2025 01:54

NiuYawei requested review from tanabarr and wangshilong January 10, 2025 03:11

wangshilong previously approved these changes Jan 10, 2025

View reviewed changes

tanabarr approved these changes Jan 10, 2025

View reviewed changes

tanabarr previously approved these changes Jan 10, 2025

View reviewed changes

DAOS-16866 bio: fix typo

a10fc00

Fix typo. Signed-off-by: Niu Yawei <[email protected]>

NiuYawei dismissed stale reviews from tanabarr and wangshilong via a10fc00 January 10, 2025 13:57

tanabarr approved these changes Jan 10, 2025

View reviewed changes

NiuYawei requested a review from wangshilong January 13, 2025 01:27

wangshilong approved these changes Jan 13, 2025

View reviewed changes

NiuYawei merged commit 23518fa into master Jan 13, 2025
56 of 57 checks passed

NiuYawei deleted the niu/DAOS-16894-fix branch January 13, 2025 03:42

daltonbohning added a commit that referenced this pull request Jan 16, 2025

Revert "DAOS-16866 bio: prealloc less DMA buffer for sys tgt (#15674)"

7934b6f

This reverts commit 23518fa.

daltonbohning added a commit that referenced this pull request Jan 16, 2025

Revert "DAOS-16866 bio: prealloc less DMA buffer for sys tgt (#15674)"

bc50d5f

This reverts commit 23518fa.

daltonbohning added a commit that referenced this pull request Jan 16, 2025

Revert "DAOS-16866 bio: prealloc less DMA buffer for sys tgt (#15674)"

fb4fc12

This reverts commit 23518fa.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674

DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674

NiuYawei commented Jan 2, 2025

github-actions bot commented Jan 2, 2025 •

edited

Loading

NiuYawei commented Jan 2, 2025

tanabarr left a comment

NiuYawei commented Jan 6, 2025

daosbuild1 commented Jan 6, 2025

NiuYawei commented Jan 8, 2025

mchaarawi commented Jan 8, 2025

NiuYawei commented Jan 8, 2025

daosbuild1 commented Jan 9, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

tanabarr Jan 10, 2025

NiuYawei Jan 10, 2025

DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674

DAOS-16866 bio: prealloc less DMA buffer for sys tgt #15674

Conversation

NiuYawei commented Jan 2, 2025

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Jan 2, 2025 • edited Loading

NiuYawei commented Jan 2, 2025

tanabarr left a comment

Choose a reason for hiding this comment

NiuYawei commented Jan 6, 2025

daosbuild1 commented Jan 6, 2025

NiuYawei commented Jan 8, 2025

mchaarawi commented Jan 8, 2025

NiuYawei commented Jan 8, 2025

daosbuild1 commented Jan 9, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

daosbuild1 commented Jan 10, 2025

tanabarr Jan 10, 2025

Choose a reason for hiding this comment

NiuYawei Jan 10, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 2, 2025 •

edited

Loading