-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS kernel panic during boot due to non-monotonic TSC #15
Comments
High IO on the host during the upgrade might be triggering a kernel panic due to timeouts in the guest. Try adding the parameters I identified here to your boot-args:
|
ok thanks, will try in the morning and report back. |
ok tried that and thought it was going to make a difference as cpu usage went through the roof (like 100% on half my 16 cores!) but it still ended up at the mach reboot crash. it seems to reboot a lot at this disk crypto stage (not crash, just reboot back to opencore chooser): i might try reducing the vcpu's to 4, maybe its a thread timeout/race or something....? |
That's curious, if CPU usage increased it seems like a kernel thread is spinning in an infinite loop. tlbto_us=0 causes failures in a core to respond in a timely fashion to a TLB flush to be completely ignored instead of triggering a panic. I'll check out your VM config |
Where did you get your OVMF image by the way? You might try switching to one provided by your distro just in case |
I think the ovmf was from osx-kvm, I'll try the Debian one, might also try a fresh install again instead of an upgrade or maybe try without gpu passthrough. Reducing the core count made no difference nor did switching to virtio-net from vmxnet3 (didn't realize that worked). It seems to stall at various points for a few minutes then reboot, but once it gets to Mach reboot it's definitely dead. |
I've never observed behaviour like that so I'm a bit in the dark on what might cause it, sorry! If you've got any passthrough devices defined, does it boot if they're removed? |
OVMF_CODE.fd or OVMF_CODE_4M.fd from debian seem to reduce cpu usage to almost nothing, also reduced the reboots but still ends up at MACH Reboot i'll try removing the gpu next as that seemed to work for someone on reddit |
I definitely recommend removing passthrough GPUs during upgrades because the repeated restarts asks a lot of the shitty AMD GPU drivers. |
i'm going to close this and give up, as a completely fresh install with fresh oc15 and ovmf and no passthrough doesn't even get as far as disk utility, so i'm assuming monterey is a lot more fussy about hardware or software (as bigsur runs fine on the same vm). thanks for your time. |
The only real hardware the guest can even see is your CPU, which is perfectly compatible (same generation as mine) |
Oh your CPU argument is missing +hypervisor. The macOS kernel gives all sorts of timing slack to you if it knows it's running in a VM, which requires +hypervisor |
i tried adding that to my existing flags and i tried changing completely to:
but it didn't make any difference. this is really confusing as i've never really had a problem before (other than bigsur which just needed a new opencore) but monterey seems unsurmountable to me - i can't even get to the installer let alone an upgrade! |
@thenickdude do you have a monterey vm running with this EFI? |
Absolutely, I installed using a recovery, a full installer, and upgraded from Big Sur. Didn't have any problems with any of those scenarios. Passthrough of RX580 successful. QEMU 6.0.0-4, edk2-stable202108, pc-q35-6.0 |
Here's my QEMU commandline for my VM with passthrough: /usr/bin/kvm \
-no-shutdown \
-smbios 'type=1,uuid=...' \
-drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
-drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/rpool/vms/vm-110-disk-1' \
-smp '16,sockets=1,cores=16,maxcpus=16' \
-nodefaults \
-boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
-vga none \
-nographic \
-cpu 'Penryn,enforce,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,vendor=GenuineIntel' \
-m 16384 \
-object 'memory-backend-file,id=ram-node0,size=16384M,mem-path=/run/hugepages/kvm/1048576kB,share=on,prealloc=yes' \
-numa 'node,nodeid=0,cpus=0-15,memdev=ram-node0' \
-readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
-device 'vfio-pci,host=0000:03:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' \
-device 'vfio-pci,host=0000:03:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' \
-device 'vfio-pci,host=0000:00:1a.0,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' \
-device 'vfio-pci,host=0000:00:1d.0,id=hostpci2,bus=ich9-pcie-port-3,addr=0x0' \
-drive 'file=/dev/zvol/rpool/vms/vm-111-disk-0,if=none,id=drive-virtio0,cache=unsafe,discard=on,format=raw,aio=io_uring,detect-zeroes=unmap' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=...,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
-machine 'type=q35+pve0' \
-device 'isa-applesmc,osk=...' \
-smbios 'type=2' \
-cpu 'host,kvm=on,vendor=GenuineIntel,+kvm_pv_unhalt,+kvm_pv_eoi,+hypervisor,+invtsc' (Duplicate args are due to Proxmox config restrictions)
|
@thenickdude I just did the upgrade from 12.6.1 to 12.0.1 and everything went great! Here's my configuration if anyone is interested. LibVirt Config<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<name>macOS</name>
<uuid>2aca0dd6-cec9-4717-9ab2-0b7b13d111c3</uuid>
<title>macOS</title>
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<vcpu placement='static'>8</vcpu>
<vcpus>
<vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
<vcpu id='1' enabled='yes' hotpluggable='yes' order='2'/>
<vcpu id='2' enabled='yes' hotpluggable='yes' order='3'/>
<vcpu id='3' enabled='yes' hotpluggable='yes' order='4'/>
<vcpu id='4' enabled='yes' hotpluggable='yes' order='5'/>
<vcpu id='5' enabled='yes' hotpluggable='yes' order='6'/>
<vcpu id='6' enabled='yes' hotpluggable='yes' order='7'/>
<vcpu id='7' enabled='yes' hotpluggable='yes' order='8'/>
</vcpus>
<cputune>
<vcpupin vcpu='0' cpuset='2'/>
<vcpupin vcpu='1' cpuset='8'/>
<vcpupin vcpu='2' cpuset='3'/>
<vcpupin vcpu='3' cpuset='9'/>
<vcpupin vcpu='4' cpuset='4'/>
<vcpupin vcpu='5' cpuset='10'/>
<vcpupin vcpu='6' cpuset='5'/>
<vcpupin vcpu='7' cpuset='11'/>
<emulatorpin cpuset='0-1,6-7'/>
</cputune>
<os>
<type arch='x86_64' machine='pc-q35-6.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
<nvram>/home/kuasha/OSX-KVM/OVMF_VARS-1024x768.fd</nvram>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-passthrough' check='none' migratable='on'>
<topology sockets='1' dies='1' cores='4' threads='2'/>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/home/kuasha/macOS/OpenCore-v15.img'/>
<target dev='sda' bus='sata'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='sata' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x8'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='2' port='0x9'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0xa'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0xb'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0xc'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
</controller>
<controller type='pci' index='6' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='6' port='0xd'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
</controller>
<controller type='pci' index='7' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='7' port='0xe'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
</controller>
<controller type='pci' index='8' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='8' port='0xf'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
</controller>
<controller type='usb' index='0' model='qemu-xhci'>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</controller>
<interface type='bridge'>
<mac address='52:54:00:8e:e2:66'/>
<source bridge='br0'/>
<target dev='tap0'/>
<model type='vmxnet3'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
</interface>
<input type='mouse' bus='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0e' function='0x0'/>
</input>
<input type='keyboard' bus='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x0f' function='0x0'/>
</input>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<sound model='ich9'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</sound>
<audio id='1' type='none'/>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0' multifunction='on'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
</hostdev>
<memballoon model='none'/>
</devices>
<qemu:commandline>
<qemu:arg value='-device'/>
<qemu:arg value='isa-applesmc,osk=ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc'/>
<qemu:arg value='-cpu'/>
<qemu:arg value='host,vendor=GenuineIntel,+hypervisor,+invtsc,kvm=on,+fma,+avx,+avx2,+aes,+ssse3,+sse4_2,+popcnt,+sse4a,+bmi1,+bmi2'/>
<qemu:arg value='-device'/>
<qemu:arg value='hda-micro,audiodev=hda'/>
<qemu:arg value='-audiodev'/>
<qemu:arg value='pa,id=hda,server=unix:/run/user/1000/pulse/native'/>
<qemu:arg value='-object'/>
<qemu:arg value='input-linux,id=mouse1,evdev=/dev/input/by-id/usb-Logitech_Gaming_Mouse_G502_1263366D3336-event-mouse'/>
<qemu:arg value='-object'/>
<qemu:arg value='input-linux,id=kbd1,evdev=/dev/input/by-id/ckb-Corsair_STRAFE_RGB_Gaming_Keyboard_vKB_-event,grab_all=on,repeat=on'/>
</qemu:commandline>
</domain> System Information
|
@kuasha420 could you list the libvirt/qemu version you're using as i'm wondering if its a qemu 6.1 issue as @thenickdude is using 6.0 and it looks like you are too |
I am using qemu 6.1 as well but the machine type should be 6.0. machine type 6.1 has issues. |
just compiled QEMU emulator version 6.1.50 (v6.1.0-1735-gc52d69e7) and that barely even starts macos, changing the machine type doesn't seem to make any difference to me, also tried your commandline. i'm lost, i wonder if its because i'm using virtio-blk instead of sata |
@sej7278 I've also experienced the maxed out 100% CPU usage check. It seems to be related to macOS trying to do a full APFS fsck or something check after a busted shutdown. I think it's just Monterey. |
@sickcodes yes it's definitely doing that but I think I'm making it past that stage |
So that seems to be panicing due to non-monotonic time (clock going backwards). I wonder if you're getting a warning at VM launch time that "invtsc" isn't actually available on your system. Try removing that from your CPU args if it's currently there. Can you post the VM command/config you're currently using and also the output of this on the host:
(You only need to paste the output from a single one of the cores) |
This user has the same panic on bare-metal: https://www.reddit.com/r/hackintosh/comments/qhjnly/random_kernel_panics_on_x79/ OpenCore bug tracker: Although in the case of QEMU I think it's QEMU's job to present a consistent timestamp counter, so in theory TSCAdjustReset shouldn't be needed... |
Also, your host isn't going to sleep during the install because you aren't moving the mouse to keep it awake, is it? |
ah i had a problem with kvm-pit with catalina hard crashing the host, the fix was to remove this lot, but i don't have that in my monterey config, i wonder what the defaults are: <clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock> cpuinfo:
host is a desktop so not going to sleep. when i got that screenshot i was running OpenCore-boot.sh instead of virt-manager. i do get this in dmesg, don't know how to fix it though:
removing +invtsc isn't making any difference
|
Okay, I'm pretty sure that's your problem. On my system the clocksource is set to tsc, but on yours it doesn't even get offered as an option. What does this return "dmesg | grep -i -e tsc -e clocksource"? Mine reports:
I think on my system there was an option buried in the host UEFI settings for TSC synchronisation between sockets. If yours has that too, make sure it's turned on, because otherwise it might cause the TSC to be rejected. I guess since you only have two clocksources to choose from you could try switching to the other and see if things improve:
Finally, check for a BIOS update for your motherboard, since this is the sort of thing they fix there. |
Also it sounds like recent Linux kernels 5.13, 5.14 have made timing changes that cause it to disable TSC in more situations: If you can try 5.12 and see if the tsc clocksource comes back that would be interesting. |
i'll have a look in my bios tomorrow (don't a massive backup right now) but i'm in bios mode not uefi and it is the latest (dell t5610 doesn't get updated very often!). i do recall some time settings but think it was just utc.
i've had this TSC issue since 5.10 kernel as i recall, its not new to 5.14 also noticed this: https://www.dell.com/community/Precision-Fixed-Workstations/TSC-warp-on-T5610-running-Linux/td-p/7820763 i might try cpu pinning to only use a single socket, but that usually kills performance (oddly enough!) |
Yeah if the TSC only differs between sockets then pinning to a single socket should solve this. It sounds like the sockets need their RESET signal to be delivered at the same time to have in-sync TSCs. It also sounds like the BIOS can ruin the sync by attempting to set the TSC register. I noticed there is an Intel errata for the whole E5 v2 line that says the TSC won't be reset by a warm reboot. If this is true then if your TSCs ever go out of sync they would stay out of sync until you perform a cold boot. |
This kernel bug report has a patch attached to try to better sync up the TSCs in this case, if firmware updates are not available to fix the core issue: |
cpu pinning made some sort of difference (different crash points?). will try looking into tsc/kernel next. nothing in the bios relating to tsc. |
I am also having troubles with Monterey on my VM, albeit the issue seems to be different: Using QEMU 6.0.0 with this options:
Using CPU passthrough here, however I have also tried emulating Penryn and Skylake, but it had 0 effect. Most of the times it will come to the kernel panic above (I had to turn on full debugging to get that message, otherwise it was never shown, macos would usually reboot around SMC step). But sometimes it actually even boots macos, but then crases almost immediately. Big Sur is working perfectly fine on both with the same OC and boot options however. My OVMFs are from OSX-KVM. As for configs, tried different ones already including yours and OSX-KVM ones. |
@thenickdude @sej7278 seems like my problem was also related to clocksource and I was able to solve it, so will leave my notes here, perhaps it will help someone. With MacOS Monterey and Intel 3.0 GHz Core i7-4578U after many runs most of the times it would crash with the error from the previous message, which is:
Sometimes it would actually get stuck around SMC step, and even rarer it would fully boot into macos, but then almost immediately go into reboot. So when I digged into the boot log on the host I have found similar problem to what you guys talked above:
While on the working hardware (Intel 2.6 GHz Core i5-4278U) clocksource would always be tsc. From digging out I found out that in newer versions of kernel watchdog disabler was added during boot in case cpu supports tsc features (constant_tsc, nonstop_tsc and tsc_adjust features to be precise) and has 2 or less sockets, which in my case works for my cpu. So adding This solved the problem with hanging completely and now I was getting IOPolledFileWrite panic with clock sync problem 100% of the time. So this is where a thread linked above came to help, as it also has a great explanation on why it happens on Monterey: acidanthera/bugtracker#1676 (comment) A few extra notes:
|
Thanks for those details! |
looks like my bios is buggy, as i tried the
|
Make sure your cpu has tsc_adjust feature, which seems to be most important one for Monterey (according to CpuTscSync readme):
Also perhaps some methods from here can help as well https://aws.amazon.com/premiumsupport/knowledge-center/manage-ec2-linux-clock-source/ |
i seem to only have this lot:
|
Well, I tried everything but without success, I think it comes from lenovo from what I could read, so I'm waiting for a fix |
A user on Reddit reported that they had this issue and fixed it by fully shutting down their host and starting it again (warm reboots didn't fix it). Maybe a power cycle is required to resync the TSC between cores. |
hmm maybe, i have given up on this given that my gt710 won't even work with monterey without complicated fudging, but i might see if i can give it another go once i've found all my notes! to me the powercycle thing sounds a bit more like a vendor reset issue with the gfx card (which mine won't have), although i do have a similar problem with a qnap 2.5gbe card - it loses its flash settings or something when you just power off the pc, but pull the cable and it boots again fine afterwards! |
Some data points from me: I can install both Big Sur and Monterey without problem after host cold boot. After installation they boot fine when host has just been cold booted. After host cold boot dmesg reports So it seems a combination of suspend/resume/reboot makes TSC unstable (I do a lot of suspending so not sure yet if just warm reboot without suspending makes it bad as well). Either way, when my system is in this unstable TSC condition I can install Big Sur just fine, although it won't boot after installation. Monterey won't even install. I tried CPU pinning but no dice.
Motherboard: ASUS x570 Pro Creator |
sounds like some sort of way of making the linux kernel (or qemu?) ignore the tsc problem and allow it to be assigned as a stable clocksource is the workaround we need. |
No, that won't fix the problem since the guest will still observe the skewed TSC and panic. The TSCs need to be resynced on the host |
Ah I see, assumed it was just kernel/qemu not passing it to the VM at all |
Due to other issues I had with my PC I've replaced my Corsair Vengeance LPX DDR4 sticks with G.Skill Ripjaws V ones and suddenly I'm not experiencing any of the above issues anymore. I still see messages about unstable clocksource in dmesg output after warm reboots but they don't seem to affect the ability to boot macOS whatsoever. I can boot and reboot Big Sur now without a problem every single time. So I'll assume the clocksource thing was a red herring in my case. |
Yeah bigsur is not a problem for me, just Monterey (or later?) |
I just successfully resolved this issue on my own system by updating my BIOS. For me it was indeed a BIOS TSC sync with a lack of TSC Adjust on the affected CPU. I could see my Linux kernel avoiding TSC due to this specifically but macOS would do the non-monotonic panic. I didn't have to make any further adjustments to my BIOS and I went to stock settings from the downstream project I'm using to launch, Docker-OSX. No extra CPU flags for hypervisor or the like. |
Same issue on my Lenovo Legion laptop, although I don't expect to get a BIOS update with the fix - Lenovo seems to focus on the "Linux certified" laptops with their patches. Fortunately there is a workaround but it requires a bit of tinkering. The workaround consists of a set of kernel patches which implement an alternative method of TSC synchronization which fixes this issue on my system. More details in this Reddit post. |
Hi, this is possibly not even an opencore issue, but does anything jump out at you as being an obvious problem?
Got a VM running 11.6.1 with your oc 0.7.4 install on my Xeon E5-2650v2, but it simply will not take the Monterey update (also didn't do one of the beta's i tried or a fresh install) it starts the upgrade but after a couple of reboots seems to panic and get stuck at mach reboot.
It gets past the time-out here:
But then panics here:
And finally hangs completely here trying to reboot i guess:
My config.plist is basically the same as yours but with some smbios stuff added by opencore-configurator (serials etc.) oc-validator had nothing bad to say about it.
libvirt xml is: libvirt.txt
Host is Debian Sid, qemu 6.1.0, libvirt 7.6.0, kernel 5.14.12
The text was updated successfully, but these errors were encountered: