-
Notifications
You must be signed in to change notification settings - Fork 184
tests: fix failed ostree.sync on c10s/rhcos10.1
#4328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the ostree.sync test to use a dedicated systemd service for continuous file writing, instead of systemd-run. This is a good improvement for robustness and clarity. The associated path changes from /var/tmp to /var/mnt are applied consistently. My review includes a suggestion to simplify the writer script by removing redundant sudo calls, as the script is already executed with root privileges by the systemd service.
06ea6c4 to
77c6901
Compare
|
Run |
ostree.sync on c10s/rhcos10.1
bd975fd to
b975df9
Compare
jbtrystram
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. minor suggestion
|
/hold |
16a85b0 to
56aa961
Compare
marmijo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall and matches what I saw during my local testing of ostree.sync: the rpm-ostree transaction completes fine, but systemd is failing to unmount the NFS client mounts under /var/mnt/data*.
One small change, I think, is to uncomment the ExecStopPost so the write unit cleans up it's own NFS mounts on stop. I tested the changes from this PR, with that line enabled, and ostree.sync test passed reliably on both rhel-10.1 and c10s (5 consecutive runs on each). The rest of the PR LGTM!
| After=remote-fs.target | ||
| [Service] | ||
| ExecStart=/usr/local/bin/nfs-random-write.sh %I | ||
| #ExecStopPost=/bin/umount -lf /var/mnt/data%I || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #ExecStopPost=/bin/umount -lf /var/mnt/data%I || true | |
| ExecStopPost=/bin/umount -lf /var/mnt/data%I || true |
I think we should uncomment the ExecStopPost here so the service can clean up its own NFS mount which avoids the later var-mnt-data*.mount/var.mount shutdown unmount failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should uncomment the ExecStopPost here so the service can clean up its own NFS mount which avoids the later var-mnt-data*.mount/var.mount shutdown unmount failures.
Thank you @marmijo for your review and comments, I am trying to test to see how long the vm will start. Hopefully will find out how long we need for the client to boot, but agree this is a workaround.
Copy from comment
the client-10 hang about 806s ([23910.934397] ~ [23104.029251]) and finally success to reboot. While the client-9 hang 313s ([18009.805121] ~ [17696.012479]) which is much shorter.
56aa961 to
92e0280
Compare
Move the continuous writing to service instead of systemd-run. Fixes openshift/os#1751
92e0280 to
3d45308
Compare
|
@HuijingHei: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Move the continuous writing to service instead of systemd-run.
Fixes openshift/os#1751