-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-18367 vos: evict self-created object when transaction failure #17320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Ticket title is 'Enhance dtx_act_ent_cleanup() to only evict self-created object when transaction failure' |
18f7850 to
9536e36
Compare
|
Test stage NLT on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17320/6/execution/node/682/log |
9536e36 to
065d404
Compare
|
Test stage NLT on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17320/7/execution/node/681/log |
065d404 to
7f5575c
Compare
|
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/8/testReport/ |
|
Test stage Unit Test bdev on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/8/testReport/ |
|
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/8/testReport/ |
|
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/8/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/8/testReport/ |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17320/8/execution/node/1362/log |
7f5575c to
a7f09c4
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/9/testReport/ |
a7f09c4 to
657d99b
Compare
|
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/10/testReport/ |
Currently, if a transaction failed for some reason, the cleanup logic will try to evict related vos object from cache to avoid leaving stable information in cache. Such logic works well for the system with PMEM. But under md-on-ssd mode, it may cause trouble if the cleanup logic evicts some object that is not created by current failed transaction. Because one vos modification may hold the same object multiple times, and there is CPU yield during these object hold actions. That creates race windows for other concurrent operations against the same object. This patch changes the logic: when the transaction creates some new object(s), it will record related oid(s), if such transaction failed in subsequent process, it will only evict these new created object(s). For those new created dkey or lower component under existing objects, related object cache will not be affected during transaction cleanup. On the other hand, under md-on-ssd mode, CPU may yield during backend TX start, the object that is held by current modification maybe marked as evicted in such race windows. So add logic to check whether related object is evicted or not after backend TX started, if yes, then restart current transaction. Signed-off-by: Fan Yong <[email protected]>
657d99b to
248715a
Compare
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/11/testReport/ |
|
Test stage Unit Test bdev with memcheck on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17320/11/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17320/11/execution/node/1364/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17320/11/execution/node/1405/log |
Currently, if a transaction failed for some reason, the cleanup logic
will try to evict related vos object from cache to avoid leaving stable
information in cache. Such logic works well for the system with PMEM.
But under md-on-ssd mode, it may cause trouble if the cleanup logic
evicts some object that is not created by current failed transaction.
Because one vos modification may hold the same object multiple times,
and there is CPU yield during these object hold actions. That creates
race windows for other concurrent operations against the same object.
This patch changes the logic: when the transaction creates some new
object(s), it will record related oid(s), if such transaction failed
in subsequent process, it will only evict these new created object(s).
For those new created dkey or lower component under existing objects,
related object cache will not be affected during transaction cleanup.
On the other hand, under md-on-ssd mode, CPU may yield during backend
TX start, the object that is held by current modification maybe marked
as evicted in such race windows. So add logic to check whether related
object is evicted or not after backend TX started, if yes, then restart
current transaction.
Signed-off-by: Fan Yong [email protected]
Steps for the author:
After all prior steps are complete: