tests: fix failed `ostree.sync` on c10s/rhcos10.1 #4328

HuijingHei · 2025-09-19T05:49:13Z

Move the continuous writing to service instead of systemd-run.
Fixes openshift/os#1751

gemini-code-assist

Code Review

This pull request refactors the ostree.sync test to use a dedicated systemd service for continuous file writing, instead of systemd-run. This is a good improvement for robustness and clarity. The associated path changes from /var/tmp to /var/mnt are applied consistently. My review includes a suggestion to simplify the writer script by removing redundant sudo calls, as the script is already executed with root privileges by the systemd service.

mantle/kola/tests/ostree/sync.go

HuijingHei · 2025-09-19T06:00:49Z

Run ostree.sync manually on c10s/rhcos10, the result is passed.

[coreos-assembler]$ ./bin/kola run -p qemu -b rhcos --qemu-image /srv/data/scos-10.0.20250918-0-qemu.x86_64.qcow2 ostree.sync --multiply 5
=== RUN   ostree.sync0
=== RUN   ostree.sync1
=== RUN   ostree.sync2
=== RUN   ostree.sync3
=== RUN   ostree.sync4
--- PASS: ostree.sync0 (258.91s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync3 (258.19s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync4 (257.61s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync2 (258.52s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync1 (258.35s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
PASS, output in tmp/kola/qemu-2025-09-19-0338-7444

[coreos-assembler]$ ../coreos-assembler/bin/kola run -p qemu -b rhcos --qemu-image /srv/data/rhcos-10.1.20250911-0-qemu.x86_64.qcow2 ostree.sync --multiply 5
=== RUN   ostree.sync2
=== RUN   ostree.sync3
=== RUN   ostree.sync4
=== RUN   ostree.sync0
=== RUN   ostree.sync1
--- PASS: ostree.sync2 (258.38s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync0 (258.06s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync1 (260.50s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync4 (260.28s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
--- PASS: ostree.sync3 (263.38s)
        sync.go:221: Got NFS mount.
        sync.go:265: Found test=1 in kernel argument after rebooted.
PASS, output in tmp/kola/qemu-2025-09-19-0522-8047

jbtrystram

LGTM. minor suggestion

mantle/kola/tests/ostree/sync.go

HuijingHei · 2025-09-22T07:45:30Z

/hold
Do more testing and check that this is not stable.

marmijo

This looks great overall and matches what I saw during my local testing of ostree.sync: the rpm-ostree transaction completes fine, but systemd is failing to unmount the NFS client mounts under /var/mnt/data*.

One small change, I think, is to uncomment the ExecStopPost so the write unit cleans up it's own NFS mounts on stop. I tested the changes from this PR, with that line enabled, and ostree.sync test passed reliably on both rhel-10.1 and c10s (5 consecutive runs on each). The rest of the PR LGTM!

marmijo · 2025-12-12T18:51:01Z

mantle/kola/tests/ostree/sync.go

+        After=remote-fs.target
+        [Service]
+        ExecStart=/usr/local/bin/nfs-random-write.sh %I
+        #ExecStopPost=/bin/umount -lf /var/mnt/data%I || true


Suggested change

#ExecStopPost=/bin/umount -lf /var/mnt/data%I || true

ExecStopPost=/bin/umount -lf /var/mnt/data%I || true

I think we should uncomment the ExecStopPost here so the service can clean up its own NFS mount which avoids the later var-mnt-data*.mount/var.mount shutdown unmount failures.

I think we should uncomment the ExecStopPost here so the service can clean up its own NFS mount which avoids the later var-mnt-data*.mount/var.mount shutdown unmount failures.

Thank you @marmijo for your review and comments, I am trying to test to see how long the vm will start. Hopefully will find out how long we need for the client to boot, but agree this is a workaround.

Copy from comment
the client-10 hang about 806s ([23910.934397] ~ [23104.029251]) and finally success to reboot. While the client-9 hang 313s ([18009.805121] ~ [17696.012479]) which is much shorter.

Move the continuous writing to service instead of systemd-run. Fixes openshift/os#1751

openshift-ci · 2025-12-18T14:03:02Z

@HuijingHei: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/rhcos	`3d45308`	link	true	`/test rhcos`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

mantle/kola/tests/ostree/sync.go Outdated Show resolved Hide resolved

HuijingHei force-pushed the fix-sync-rhel10 branch from 06ea6c4 to 77c6901 Compare September 19, 2025 05:53

HuijingHei requested a review from jbtrystram September 19, 2025 06:01

HuijingHei changed the title ~~tests: fix ostree.sync test on c10s/rhcos10.1~~ tests: fix failed ostree.sync on c10s/rhcos10.1 Sep 19, 2025

HuijingHei force-pushed the fix-sync-rhel10 branch 2 times, most recently from bd975fd to b975df9 Compare September 19, 2025 08:30

jbtrystram previously approved these changes Sep 19, 2025

View reviewed changes

mantle/kola/tests/ostree/sync.go Outdated Show resolved Hide resolved

openshift-ci bot added the do-not-merge/hold label Sep 22, 2025

HuijingHei dismissed jbtrystram’s stale review via 44ceaef September 23, 2025 07:52

HuijingHei force-pushed the fix-sync-rhel10 branch 5 times, most recently from 16a85b0 to 56aa961 Compare September 25, 2025 09:13

marmijo reviewed Dec 12, 2025

View reviewed changes

HuijingHei force-pushed the fix-sync-rhel10 branch from 56aa961 to 92e0280 Compare December 15, 2025 05:10

tests: fix failed ostree.sync on c10s/rhcos10.1

3d45308

Move the continuous writing to service instead of systemd-run. Fixes openshift/os#1751

HuijingHei force-pushed the fix-sync-rhel10 branch from 92e0280 to 3d45308 Compare December 18, 2025 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests: fix failed `ostree.sync` on c10s/rhcos10.1 #4328

tests: fix failed `ostree.sync` on c10s/rhcos10.1 #4328

HuijingHei commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

HuijingHei commented Sep 19, 2025 •

edited

Loading

Uh oh!

jbtrystram left a comment

Uh oh!

Uh oh!

HuijingHei commented Sep 22, 2025

Uh oh!

marmijo left a comment

Uh oh!

marmijo Dec 12, 2025

Uh oh!

HuijingHei Dec 15, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	#ExecStopPost=/bin/umount -lf /var/mnt/data%I \|\| true
	ExecStopPost=/bin/umount -lf /var/mnt/data%I \|\| true

tests: fix failed ostree.sync on c10s/rhcos10.1 #4328

Are you sure you want to change the base?

tests: fix failed ostree.sync on c10s/rhcos10.1 #4328

Conversation

HuijingHei commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

HuijingHei commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbtrystram left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuijingHei commented Sep 22, 2025

Uh oh!

marmijo left a comment

Choose a reason for hiding this comment

Uh oh!

marmijo Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

HuijingHei Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tests: fix failed `ostree.sync` on c10s/rhcos10.1 #4328

tests: fix failed `ostree.sync` on c10s/rhcos10.1 #4328

HuijingHei commented Sep 19, 2025 •

edited

Loading