-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16469 dtx: optimize DTX CoS cache - b26 #15085
Conversation
Ticket title is 'IOR Easy performance low with EC_16P2GX' |
c47f421
to
e69b1dd
Compare
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15085/2/testReport/ |
e69b1dd
to
9f67bb3
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15085/3/testReport/ |
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15085/4/testReport/ |
9f67bb3
to
3bcd242
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/5/execution/node/1424/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/5/execution/node/1470/log |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/5/execution/node/1616/log |
b04cca9
to
b73cec8
Compare
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/7/execution/node/1509/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/7/execution/node/1493/log |
b73cec8
to
a6a2a8d
Compare
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/8/execution/node/1446/log |
3e26b41
to
862fa8e
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15085/10/testReport/ |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15085/11/execution/node/1551/log |
de20884
to
28c00bd
Compare
If there are a lot of committable DTX entries in DTX CoS cache, then it may be inefficient to locate the DTX entry in CoS cache with given oid + dkey_hash, that may happen under the case of that DTX batched commit is blocked (such as because of network trouble) as to trigger DTX refresh (for DTX cleanup) on other related engines. If that happened, it will increase the system load on such engine and slow down DTX commit further more. The patch reduces unnecessary search operation inside CoS cache. Other changes: 1. Metrics (io/dtx/async_cmt_lat/tgt_id) for DTX asynchronously commit latency (with unit ms). 2. Fix a bug in sched_ult2xs() with multiple numa sockets for DSS_XS_OFFLOAD case. 3. Delay commit (or abort) collective DTX on the leader target to handle resent race. 4. Avoid blocking dtx_req_wait() if chore failed to send out some DTX RPC. 5. Some cleanup for error handling. Signed-off-by: Fan Yong <[email protected]>
28c00bd
to
f2f0cac
Compare
Ping reviewers, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
Ping reviewers, thanks! |
If there are a lot of committable DTX entries in DTX CoS cache, then it may be inefficient to locate the DTX entry in CoS cache with given oid + dkey_hash, that may happen under the case of that DTX batched commit is blocked (such as because of network trouble) as to trigger DTX refresh (for DTX cleanup) on other related engines. If that happened, it will increase the system load on such engine and slow down DTX commit further more. The patch reduces unnecessary search operation inside CoS cache.
Other changes:
Metrics (io/dtx/async_cmt_lat/tgt_id) for DTX asynchronously commit latency (with unit ms).
Fix a bug in sched_ult2xs() with multiple numa sockets for DSS_XS_OFFLOAD case.
Delay commit (or abort) collective DTX on the leader target to handle resent race.
Avoid blocking dtx_req_wait() if chore failed to send out some DTX RPC.
Some cleanup for error handling.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: