Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v560tu seems slow on IO, needs confirmation #1894

Open
tlaurion opened this issue Jan 20, 2025 · 38 comments
Open

v560tu seems slow on IO, needs confirmation #1894

tlaurion opened this issue Jan 20, 2025 · 38 comments

Comments

@tlaurion
Copy link
Collaborator

tlaurion commented Jan 20, 2025

Only thing I can do is compare across devices.

This is systemd-analyze blame output, for reference

My nv41 has bigger templates and is my main driver.

nv41 64 GB ram, drive: Samsung SSD 980 PRO 2TB

Image

v560tu 64gb ram, drive: SSDPR-PX700-04T-80

Image

cc @macpijan

@macpijan
Copy link
Collaborator

macpijan commented Jan 20, 2025

I assume it can be useful to document disk/memory in both setups as well

@tlaurion
Copy link
Collaborator Author

I assume it can be useful to document disk/memory in both setups as well

Modified OP @macpijan

@tlaurion
Copy link
Collaborator Author

Not sure. Reencryption works @1.4GiB/s, with power draw of 35w, no fan spinning.

PXL_20250120_230242522.jpg

@mkopec
Copy link
Contributor

mkopec commented Jan 21, 2025

What are the models of SSDs in both laptops? V560TU could have a DRAM-less disk which are much slower in random I/O

@macpijan
Copy link
Collaborator

I think it's reasonably to assume disks are from what @wessel-novacustom is offering in the configurator.

V56 offers only one option: https://novacustom.com/product/v56-series/

While NV41 used to offer Samsung disks: https://novacustom.com/product/nv41-series/

Can we please summarize the exact hw configurations being compared here?

@mkopec
Copy link
Contributor

mkopec commented Jan 21, 2025

NV41 can come with 980 Pro (with DRAM cache) or 980 (DRAM-less). Apparently the PX700 disk in V560TU does not have DRAM cache but goodram sells it as HMB 3.0 which uses host RAM for caching. I have no idea whether or not that works in Qubes

@wessel-novacustom
Copy link

@macpijan @mkopec

I'm not sure if that's actually the issue.

If I compare I/O write and especially read operations (especially with small files), the -TU models are performing bad compared to the NVIDIA variants. Now that's for UEFI of course, but might still make sense to compare to see if there is a difference and why.

@mkopec
Copy link
Contributor

mkopec commented Jan 21, 2025

Here's what it looks like with V560TU on my side (96GB / 2TB)

20250121_133744.jpg

@tlaurion
Copy link
Collaborator Author

@mkopec @macpijan @wessel-novacustom : updated OP

nv41 64 GB ram, drive: Samsung SSD 980 PRO 2TB

and

v560tu 96gb ram, drive: SSDPR-PX700-04T-80

Opened issue QubesOS/qubes-issues#9723

Want me to run some tests? Please detail.

Also note that as pointed out at #1889 (comment):

Dasharo/coreboot@94e5f5d...048ca83

When we attempted that coreboot version bump, perf worsen to the point of systemd services timing out on first stage of qubesos installation, and templates installation took more than an hour.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 21, 2025

Also note that as pointed out at #1889 (comment):

Dasharo/[email protected]

When we attempted that coreboot version bump, perf worsen to the point of systemd services timing out on first stage of qubesos installation, and templates installation took more than an hour.

Some additional notes from testing, will edit (re-testing https://github.com/tlaurion/heads/tree/perf_comparison_with_reverted_coreboot_version_bump_v560tu)

  • ram init takes around 2m on 64gb ram, not 1
  • Heads fb refresh (fbwhiptail drawing) slower.
  • Heads hashing of /boot content slower
  • Resealing TPM Disk Unlock Key (cpu based derivation of DRK to validate slot + new key from luks DRK into new slot) way slower

Insight is not only IO being slower, but as if CPU speed was limited to be turtle speed as well.

systemd-analyze blame with branch including changes to coreboot version bump:

Image

full log:

systemd-analyze_blame.txt

CC @macpijan @mkopec

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 21, 2025

systemd-analyze blame with 36e30d0 for comparison.

Exerpt:

32.838s [email protected]
26.821s [email protected]
26.016s dev-disk-by\x2duuid-9254e32c\x2df3a1\x2d4ddf\x2d917d\x2d1496ed47d5d6.device
26.016s dev-disk-by\x2dpartuuid-3f923856\x2dba50\x2d4c86\x2d9d79\x2dcb60c453af91.device
26.016s dev-nvme0n1p3.device
26.016s sys-devices-pci0000:00-0000:00:06.0-0000:01:00.0-nvme-nvme0-nvme0n1-nvme0n1p3.device
26.016s dev-disk-by\x2did-nvme\x2dnvme.1e4b\x2d473441303037343132\x2d53534450522d50583730302d3034542d3830\x2d00000001\x2dpart3.device
26.016s dev-disk-by\x2dpath-pci\x2d0000:01:00.0\x2dnvme\x2d1\x2dpart3.device
26.016s dev-disk-by\x2did-nvme\x2dSSDPR\x2dPX700\x2d04T\x2d80_G4A007412\x2dpart3.device
25.993s dev-disk-by\x2dpartuuid-3a30b820\x2d3a74\x2d41fb\x2d938d\x2d5e0288966fd4.device
25.993s sys-devices-pci0000:00-0000:00:06.0-0000:01:00.0-nvme-nvme0-nvme0n1-nvme0n1p2.device
25.993s dev-nvme0n1p2.device
25.993s dev-disk-by\x2did-nvme\x2dnvme.1e4b\x2d473441303037343132\x2d53534450522d50583730302d3034542d3830\x2d00000001\x2dpart2.device
25.993s dev-disk-by\x2did-nvme\x2dSSDPR\x2dPX700\x2d04T\x2d80_G4A007412\x2dpart2.device
25.993s dev-disk-by\x2duuid-10f43b72\x2d51a0\x2d40ce\x2d8407\x2dc31d8e29b67e.device
25.993s dev-disk-by\x2dpath-pci\x2d0000:01:00.0\x2dnvme\x2d1\x2dpart2.device
25.980s dev-disk-by\x2dpath-pci\x2d0000:01:00.0\x2dnvme\x2d1\x2dpart4.device
25.980s dev-disk-by\x2duuid-fb504cfb\x2d542a\x2d4b62\x2da176\x2d515781072d00.device
25.980s dev-nvme0n1p4.device
25.980s sys-devices-pci0000:00-0000:00:06.0-0000:01:00.0-nvme-nvme0-nvme0n1-nvme0n1p4.device
25.980s dev-disk-by\x2did-nvme\x2dSSDPR\x2dPX700\x2d04T\x2d80_G4A007412\x2dpart4.device
25.980s dev-disk-by\x2dpartuuid-6675f426\x2d7f58\x2d47ec\x2dba1a\x2d54a321a95c54.device
25.980s dev-disk-by\x2did-nvme\x2dnvme.1e4b\x2d473441303037343132\x2d53534450522d50583730302d3034542d3830\x2d00000001\x2dpart4.device
25.832s dev-disk-by\x2did-nvme\x2dSSDPR\x2dPX700\x2d04T\x2d80_G4A007412\x2dpart1.device
25.832s dev-nvme0n1p1.device
25.832s dev-disk-by\x2dpartuuid-bf182adf\x2da263\x2d4c81\x2da12a\x2d789f69fb6263.device
25.832s sys-devices-pci0000:00-0000:00:06.0-0000:01:00.0-nvme-nvme0-nvme0n1-nvme0n1p1.device
25.832s dev-disk-by\x2did-nvme\x2dnvme.1e4b\x2d473441303037343132\x2d53534450522d50583730302d3034542d3830\x2d00000001\x2dpart1.device
25.832s dev-disk-by\x2dpath-pci\x2d0000:01:00.0\x2dnvme\x2d1\x2dpart1.device
25.745s dev-ttyS4.device
25.745s sys-devices-pci0000:00-0000:00:1e.0-dw\x2dapb\x2duart.3-dw\x2dapb\x2duart.3:0-dw\x2dapb\x2duart.3:0.0-tty-ttyS4.device
25.738s dev-ttyS19.device
25.738s sys-devices-platform-serial8250-serial8250:0-serial8250:0.19-tty-ttyS19.device
25.737s dev-ttyS16.device
25.737s sys-devices-platform-serial8250-serial8250:0-serial8250:0.16-tty-ttyS16.device
25.736s sys-devices-platform-serial8250-serial8250:0-serial8250:0.1-tty-ttyS1.device
25.736s dev-ttyS1.device
25.735s sys-devices-platform-serial8250-serial8250:0-serial8250:0.13-tty-ttyS13.device
25.735s dev-ttyS13.device
25.734s dev-ttyS10.device
25.734s sys-devices-platform-serial8250-serial8250:0-serial8250:0.10-tty-ttyS10.device
25.733s sys-devices-platform-serial8250-serial8250:0-serial8250:0.18-tty-ttyS18.device
25.733s dev-ttyS18.device
25.733s sys-devices-platform-serial8250-serial8250:0-serial8250:0.15-tty-ttyS15.device
25.733s dev-ttyS15.device
25.729s sys-devices-platform-serial8250-serial8250:0-serial8250:0.0-tty-ttyS0.device
25.729s dev-ttyS0.device
25.729s sys-devices-platform-serial8250-serial8250:0-serial8250:0.20-tty-ttyS20.device
25.729s dev-ttyS20.device

So similar (but still slower) compared to @mkopec dump at #1894 (comment):
20250121_133744.jpg
I guess some discards/trim operations changed the numbers. Will reinstall with qubes defaults and LVM and dump last numbers on my side on next comment.

systemd-analyze_blame_master.txt

CC @macpijan @mkopec

@macpijan
Copy link
Collaborator

If I compare I/O write and especially read operations (especially with small files), the -TU models are performing bad compared to the NVIDIA variants. Now that's for UEFI of course, but might still make sense to compare to see if there is a difference and why.

I agree this is to be checked. Did we have a tracking of this problem already?

Also, do we want to do it right now?
Should this be gating heads release if (to be confirmed) performance would be similar on the same device across UEFI and heads firmware?

Ideally, we release heads release from the same (similar) coreboot base as the previous UEFI release.

@macpijan
Copy link
Collaborator

macpijan commented Jan 22, 2025

Some additional notes from testing, will edit (re-testing https://github.com/tlaurion/heads/tree/perf_comparison_with_reverted_coreboot_version_bump_v560tu)

* ram init takes around 2m on 96gb ram, not 1

* Heads fb refresh (fbwhiptail drawing) slower.

* Heads hashing of /boot content slower

* Resealing TPM Disk Unlock Key (cpu based derivation of DRK to validate slot + new key from luks DRK into new slot) way slower

Insight is not only IO being slower, but as if CPU speed was limited to be turtle speed as well.

systemd-analyze blame with branch including changes to coreboot version bump:

Thanks for this test, this will be useful testing point for the future. We should focus on testing what we are aiming to release right now, however.

We propose that we both confirm once again that the binary from this commit is "OK" in terms of performance as reported by us here previously: #1894 (comment)

Where "OK" means "comparable to the existing UEFI release in the same hardware specification", not "strictly better than previous laptop model in this specific benchmark", especially that devices with different hardware configurations are being compared here.

If confirmed, we propose that we use this commit as a release for Dasharo+heads @tlaurion @wessel-novacustom to not postpone it any longer.

We can continue investigating the performance concerns, such as raised by @wessel-novacustom here #1894 (comment) in individual dasharo issues.

@wessel-novacustom
Copy link

Ideally, we release heads release from the same (similar) coreboot base as the previous UEFI release.

  • I can see that, but this performance issue is one of the biggest issues the -TU series has.

Ideally, it would be fixed, despite the coreboot base being slightly different in that case.

In case a coreboot patch could be made to fix this issue, I'm interested in getting a link to that patch, so I can assist customers with UEFI firmware who are complaining about this.

My suggestion is to make a v0.9.2-rc1/v1.0.0-rc1 untested UEFI release with just this fix patch. We can then use that version as the base for Heads.

@macpijan
Copy link
Collaborator

Ideally, it would be fixed, despite the coreboot base being slightly different in that case.

I will move this off-channel to discuss quicker and come back here with conclusions.

@mkopec
Copy link
Contributor

mkopec commented Jan 22, 2025

We propose that we both confirm once again that the binary from this commit is "OK" in terms of performance as reported by us here previously: #1894 (comment)

  • 28.983s - 155H, 32G RAM, 1TB SSD

@mkopec
Copy link
Contributor

mkopec commented Jan 22, 2025

...isn't [email protected] dependent on network connection anyway? So we're no longer comparing only I/O speeds. It also continues starting after I've already logged in, confirmed in systemctl list-jobs. It starts up in the background.

@mkopec
Copy link
Contributor

mkopec commented Jan 22, 2025

Here's my kdiskmark result in personal Qube on the same laptop with 155H, 32GB RAM, 1TB SSD, AC in:

Image

all with default settings. Of course with LUKS, LVM and Xen between disk and benchmark this isn't measuring just I/O but it should be more objective.

@tlaurion can you share yours for comparison?

@tlaurion
Copy link
Collaborator Author

Here's my kdiskmark result in personal Qube on the same laptop with 155H, 32GB RAM, 1TB SSD, AC in:

Image

all with default settings. Of course with LUKS, LVM and Xen between disk and benchmark this isn't measuring just I/O but it should be more objective.

@tlaurion can you share yours for comparison?

Sorry 64gb, will edit prior posts :(
Past v560tu had 96gb and 2tb drive. This one has 64gb and 4tb m2 drive.

PXL_20250122_171917296.jpg

PXL_20250122_165619680.MP.jpg

PXL_20250122_170010896.jpg

@mkopec
Copy link
Contributor

mkopec commented Jan 22, 2025

Thanks for testing! So you are having significantly worse performance than me... I think next we will test on the same 4TB disk model as in your V560TU

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 22, 2025

It's really hard to compare things, intra-group and inter-group.

For example, nv41 doesn't require to be installed with kernel-latest. Also, my nv41 setup is btrfs based with heavy optimizations as opposed to lvm default setup, with no revisions to keep (leaving wyng do the one revision to keep being snapshot corresponding to last backup), discard=async in fstab and bees deployed (/var/lib/qubes being fully deduped), so cannot compare stats between nv41 setup and v560tu directly on my side.

Still, some stats that could be eye opening to consider what needs to be improved next.

nv41 stats

snap: let's not use that. debian-template, stable kernel (6.6.68)

Image

fedora-40-dvm, stable kernel (6.6.68):

Image

fedora-40-dvm, latest kernel (after dom0 sudo qubes-dom0-update kernel-latest kernel-latest-qubes-vm deploying 6.12.9, rebooting and launching kdiskmark)

Image

Nonetheless to say: kernel-latest is a necessity but not necessarily a good news for performance
I will personally revert back to non-latest kernel on nv41, because I can, as opposed to v560tu which requires usage of kernel-latest.

CC: @macpijan @mkopec @marmarek
flag raised in qubes-public matrix channel

@wessel-novacustom
Copy link

We should mark this as a known issue for the Dasharo corboot+Heads release. @macpijan @tlaurion

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 23, 2025

We should mark this as a known issue for the Dasharo corboot+Heads release. @macpijan @tlaurion

How come is this a Heads issue? As opposed to Dasharo-UEFI (or simply coreboot), m2 drive or qubesos latest kernel? I'm not sure I follow.

As depicted in 3rd picture at #1894 (comment)

Switching nv41 to using qubesos latest kernel showed similar perf issues of v560tu, which requires installation and usage of kernel-latest at install, as opposed to nv41. No? Then there is 4tb drive which has no SDRAM, offered optional, as compared to tests of @mkopec. Sorry, but this cannot be Heads specific, nor Heads fault.

I'm ok putting this as known issues if sub-issues are opened and referred to in downstram releases. This issue will stay open and reffered in sub-created issues.

From gut feeling here, it's either important perf issue caused by latest kernel on which v560tu depends, the 4tb drive lacking sdram or coreboot doing something funky, but Heads has nothing to do with what is observed here. Heads job is long done and irrelevant to observed issue. Unless the coreboot used commit between the two version is different and at cause for same HCL.

@wessel-novacustom
Copy link

It's not specifically a Heads issue, @tlaurion. I just mean that we won't further investigate this for the Dasharo version that @macpijan is about to release.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 23, 2025

It's not specifically a Heads issue, @tlaurion. I just mean that we won't further investigate this for the Dasharo version that @macpijan is about to release.

So I guess what you mean is that the same notes should be present for both Dasharo-UEFI and Dasharo-Heads, since it's not Heads specific and if it is, it's because of coreboot commit difference.

@wessel-novacustom
Copy link

Issue has been raised separately for UEFI: Dasharo/dasharo-issues#1216

@marmarek
Copy link
Contributor

@tlaurion do you have any CLI version of a benchmark that shows this issue? Something that would produce a text output that I can then parse with a script, compare with plain diff etc. I would like to add disk performance test to our CI, but manually comparing graphical screenshots is not going to fly.

@marmarek
Copy link
Contributor

Maybe some specific configs to the fio tool?

@tlaurion
Copy link
Collaborator Author

@tlaurion do you have any CLI version of a benchmark that shows this issue? Something that would produce a text output that I can then parse with a script, compare with plain diff etc. I would like to add disk performance test to our CI, but manually comparing graphical screenshots is not going to fly.

I don't have insights on this matter, unfortunately.

@tlaurion
Copy link
Collaborator Author

Maybe some specific configs to the fio tool?

@marmarek any command line command you propose I can reuse. kdisktats is what is used by end users on forum to show things visually. Replicating that kdisktats does, from a command line perspective, coukd help here, yes.

@tlaurion
Copy link
Collaborator Author

Maybe some specific configs to the fio tool?

Untested, please advise @marmarek

Random Read Test

fio --name=randread-test --filename=/path/to/testfile --rw=randread --bs=4k \
    --size=1G --numjobs=4 --time_based --runtime=60 --iodepth=32 \
    --ioengine=libaio --direct=1 --group_reporting

Random Write Test

fio --name=randwrite-test --filename=/path/to/testfile --rw=randwrite --bs=4k \
    --size=1G --numjobs=4 --time_based --runtime=60 --iodepth=32 \
    --ioengine=libaio --direct=1 --group_reporting

Mixed Random Read/Write Test

fio --name=randrw-test --filename=/path/to/testfile --rw=randrw --bs=4k \
    --size=1G --numjobs=4 --time_based --runtime=60 --iodepth=32 \
    --ioengine=libaio --direct=1 --group_reporting

Sequential Read Test

fio --name=seqread-test --filename=/path/to/testfile --rw=read \
    --bs=256k --size=1G --numjobs=4 --time_based --runtime=60 \
    --iodepth=32 --ioengine=libaio --direct=1 --group_reporting

Sequential Write Test

fio --name=seqwrite-test --filename=/path/to/testfile --rw=write \
    --bs=256k --size=1G --numjobs=4 --time_based --runtime=60 \
    --iodepth=32 --ioengine=libaio --direct=1 --group_reporting

Explanation of Parameters:

  • --rw: Specifies the type of I/O pattern (randread, randwrite, randrw, read, or write).
  • --bs: Block size (e.g., 4k for random, 256k for sequential).
  • --iodepth: Queue depth for I/O operations.
  • --numjobs: Number of parallel jobs.
  • --runtime: Duration of the test in seconds.
  • --direct: Bypasses OS caching for direct disk access.

These commands will output metrics such as IOPS, bandwidth (throughput), average latency, and queue depth, allowing you to replicate the performance insights provided by kdisktats.

Citations :
[1] Sample FIO Commands for Block Volume Performance Tests on ... https://docs.oracle.com/en-us/iaas/Content/Block/References/samplefiocommandslinux.htm
[2] Benchmark Persistent Disk performance on a Linux VM https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance-linux
[3] 1. fio - Flexible I/O tester rev. 3.38 - FIO's documentation! https://fio.readthedocs.io/en/latest/fio_doc.html
[4] Fio Benchmark | JuiceFS Document Center https://juicefs.com/docs/cloud/benchmark/fio/
[5] Running and understanding write workload using FIO – Yugabyte https://support.yugabyte.com/hc/en-us/articles/13874049853965-Running-and-understanding-write-workload-using-FIO
[6] Performance benchmarking with Fio on Nutanix https://portal.nutanix.com/kb/12075
[7] Elastic Compute Service:Test the performance of block storage ... https://www.alibabacloud.com/help/en/ecs/user-guide/test-the-performance-of-block-storage-devices
[8] Review fio test : r/linuxadmin - Reddit https://www.reddit.com/r/linuxadmin/comments/13u4xub/review_fio_test/

@marmarek
Copy link
Contributor

I prepared something similar based on example configs provided with the fio tool: QubesOS/qubes-core-admin#649
First results: https://openqa.qubes-os.org/tests/126917/file/system_tests-tests-qubes.tests.integ.storage_perf.log
https://openqa.qubes-os.org/tests/126917/logfile?filename=system_tests-perf_test_results.txt (import to oocalc or any other spreadsheet for easier reading)

This is running on a "hw1" runner (some HP laptop), and I'm just repeating the test with kernel-latest. So far, results are pretty much the same (no regression detected). I'll repeat the tests on NV41 or v56. If difference is confirmed with this tool, I can for example run automated git bisect on the kernel between good and bad version to identify specific change causing the regression.

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 26, 2025

I prepared something similar based on example configs provided with the fio tool: QubesOS/qubes-core-admin#649
First results: https://openqa.qubes-os.org/tests/126917/file/system_tests-tests-qubes.tests.integ.storage_perf.log
https://openqa.qubes-os.org/tests/126917/logfile?filename=system_tests-perf_test_results.txt (import to oocalc or any other spreadsheet for easier reading)

This is running on a "hw1" runner (some HP laptop), and I'm just repeating the test with kernel-latest. So far, results are pretty much the same (no regression detected). I'll repeat the tests on NV41 or v56. If difference is confirmed with this tool, I can for example run automated git bisect on the kernel between good and bad version to identify specific change causing the regression.

@marmarek note that my results for nv41 were on btrfs as noted above the screenshots in comment. It was only a reminder that if at cause, v560tu depending on kernel-latest might explain partly perf degradation, where unfortunately, one cannot test kernel-stable on v560tu which requires kernel-latest on initial install for meteorlake support as opposed to nv41.

@marmarek
Copy link
Contributor

I did a test with fio on NV41, using very similar config to "Mixed Random Read/Write Test" above (difference: I let it run for 90s, not 60s; and iodepth 16 not 32) with different kernel versions. I do see a small slowdown with 6.12 kernel in dom0, but nowhere near the effect you see. I'm looking at "read_bandwidth_kb" and "write_bandwidth_kb" columns (column 7 and 48 respectively):

fio in dom0:
fio-kernel-6.6.log
fio-kernel-6.11.log
fio-kernel-6.12.log
fio in VM:
fio-dom0-6.6-vm-6.11.log
fio-dom0-6.6-vm-6.6.log
fio-dom0-6.12-vm-6.12.log

@tlaurion can you check if you see the drastic difference with fio on your side? I did the test on LVM (very default installation of R4.2), if fio reports larger difference for you, then maybe it's btrfs-specific? But if you don't see the difference either, then we need another test. In that case, can you check other examples you provided above? Does any of them show the 2x+ difference between kernel versions for you?

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 27, 2025

I did a test with fio on NV41, using very similar config to "Mixed Random Read/Write Test" above (difference: I let it run for 90s, not 60s; and iodepth 16 not 32) with different kernel versions. I do see a small slowdown with 6.12 kernel in dom0, but nowhere near the effect you see. I'm looking at "read_bandwidth_kb" and "write_bandwidth_kb" columns (column 7 and 48 respectively):

fio in dom0: fio-kernel-6.6.log fio-kernel-6.11.log fio-kernel-6.12.log fio in VM: fio-dom0-6.6-vm-6.11.log fio-dom0-6.6-vm-6.6.log fio-dom0-6.12-vm-6.12.log

@tlaurion can you check if you see the drastic difference with fio on your side? I did the test on LVM (very default installation of R4.2), if fio reports larger difference for you, then maybe it's btrfs-specific? But if you don't see the difference either, then we need another test. In that case, can you check other examples you provided above? Does any of them show the 2x+ difference between kernel versions for you?

dom0 comparatives and used script @marmarek. btrfs, no revision to keep, 2tb drive with sdram cache, 64gb ram, on usbc power with all qubes shutdown:

dom0_6.6.68-1.qubes.fc37.x86_64_additional-seq-read_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_additional-rand-read_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_additional-rand-write_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_additional-seq-write_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_seq-read_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_rand-read_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_rand-write_output.txt
dom0_6.6.68-1.qubes.fc37.x86_64_seq-write_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_additional-seq-read_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_additional-rand-read_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_additional-rand-write_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_additional-seq-write_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_seq-read_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_rand-read_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_rand-write_output.txt
dom0_6.12.9-1.qubes.fc37.x86_64_seq-write_output.txt

tar.gz with script used inside:
nv41_fio_tests.tar.gz

Ker mw know if I should go deeper with this and test fio from within qubes

@marmarek
Copy link
Contributor

So, I did the test on NV41 with LVM using kdiskmark in a fedora-40-xfce VM to have the same testing methodology. In my case I see slight performance improvement (at least with read speeds) with 6.12 kernel in dom0...

Dom0 6.6:
Image
Image

Dom0 6.12:
Image
Image

The laptop has Dasharo (coreboot+heads) 0.9.0. Is there any relevant change in 0.9.1 that could affect results?

@tlaurion
Copy link
Collaborator Author

tlaurion commented Jan 28, 2025

For the sake of automated testing, here are the fio commands called under the hood with default profile with kdiskmark @marmarek

/usr/bin/fio --output-format=json --create_only=1 --filename=/kdiskmark-skCGNj.tmp --size=1024m --zero_buffers=0 --name=prepare
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=randread --size=1024m --zero_buffers=0 --bs=4k --runtime=5 --rw=randread --iodepth=1 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=randread --size=1024m --zero_buffers=0 --bs=4k --runtime=5 --rw=randread --iodepth=32 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=randwrite --size=1024m --zero_buffers=0 --bs=4k --runtime=5 --rw=randwrite --iodepth=1 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=randwrite --size=1024m --zero_buffers=0 --bs=4k --runtime=5 --rw=randwrite --iodepth=32 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=read --size=1024m --zero_buffers=0 --bs=1024k --runtime=5 --rw=read --iodepth=1 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=read --size=1024m --zero_buffers=0 --bs=1024k --runtime=5 --rw=read --iodepth=8 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=write --size=1024m --zero_buffers=0 --bs=1024k --runtime=5 --rw=write --iodepth=1 --numjobs=1
/usr/bin/fio --output-format=json --ioengine=libaio --randrepeat=0 --refill_buffers --end_fsync=1 --direct=1 --rwmixread=70 --filename=/kdiskmark-skCGNj.tmp --name=write --size=1024m --zero_buffers=0 --bs=1024k --runtime=5 --rw=write --iodepth=8 --numjobs=1

qubes-public matrix channel thread

So we should abort this and simply consider that this is a BTRFS perf regression for 6.12.9 when compared to 6.6.68.
Irrelevant for this issue and for QubesOS which deploys in thin LVM by default.

The laptop has Dasharo (coreboot+heads) 0.9.0. Is there any relevant change in 0.9.1 that could affect results?

I do not know what to do with nv41 vs v560tu performance, where nv41 performs better.
That would be a question for @macpijan, is Dasharo/dasharo-issues#1216 and currently planned to be marked as "known issue" for v560tu release @wessel-novacustom @marmarek, while non heads specific.

@marmarek
Copy link
Contributor

Ok, so I reinstalled the NV41 with BTRFS, and I got different results. In the order of running:

Image
Image
Image
Image
Image
Image

So, first of all, most are a lot slower than on LVM. Same hardware, also fresh install (almost empty disk etc). I added also discard=async as suggested in QubesOS/qubes-issues#6476 (comment), but later tests were still quite slow even with this option.
But more importantly, two tests with 6.12 kernel in dom0 are "fast", in fact, those are the best results of all 6 runs - better than 6.6. But then, after restarting the VM (and later dom0 too), I couldn't replicate the good results anymore. So, there is more to it than just the kernel version.

Next tests I need to do with a script (update with the new fio commands etc), to be sure I haven't missed any step. But that will need to wait for the next week unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants