April 11, 20251 yr Dear forum, Since two days ago, my server has been randomly crashing. When it happens, the only thing I can do is press the physical power button on the case and wait for the graceful shutdown timeout (90 seconds, if I recall correctly) to power off the machine. After restarting, it boots normally but eventually crashes again at unpredictable intervals. I've been on version 6.12.15 since its release, and the server had been perfectly stable for months until this started. During one of the restarts, I saw error messages related to /dev/sda1, so I took the following steps: - Replaced the USB drive with a new one. - Plugged the USB into a different port on the back panel, in case the original motherboard connector was faulty. However, the server crashed again this morning. Any ideas on what to troubleshoot next? I’m inclined to think it’s a hardware fault, but I’m not sure where to begin. I'm attaching the diagnostics file for reference. The server is a 4770K with a Gigabyte UDH3 and 16 GB DDR3 RAM. The power supply is a Be Quiet Pure Power 12 550-650W with max 3 years. No hardware changes at least in the past year. Thanks in advance, nas-diagnostics-20250410-2241.zip Edited April 11, 20251 yr by SP67 Clarity in header
April 11, 20251 yr Community Expert Apr 10 21:52:28 NAS kernel: e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: Apr 10 21:52:28 NAS kernel: TDH <c> Apr 10 21:52:28 NAS kernel: TDT <4c> Apr 10 21:52:28 NAS kernel: next_to_use <4c> Apr 10 21:52:28 NAS kernel: next_to_clean <b> Apr 10 21:52:28 NAS kernel: buffer_info[next_to_clean]: Apr 10 21:52:28 NAS kernel: time_stamp <10069caee> Apr 10 21:52:28 NAS kernel: next_to_watch <c> Apr 10 21:52:28 NAS kernel: jiffies <10069d0c0> Apr 10 21:52:28 NAS kernel: next_to_watch.status <0> Apr 10 21:52:28 NAS kernel: MAC Status <80083> Apr 10 21:52:28 NAS kernel: PHY Status <796d> Apr 10 21:52:28 NAS kernel: PHY 1000BASE-T Status <3800> Apr 10 21:52:28 NAS kernel: PHY Extended Status <3000> Apr 10 21:52:28 NAS kernel: PCI Status <10> NIC issues, try disabling TSO, known issue with these NICs: https://bugzilla.kernel.org/show_bug.cgi?id=118721#c11
April 11, 20251 yr Author Solution Hi again, I have restarted the system again and run the command to accomplish what you said: ethtool -K ethX tso off I've also changed the network type of my Home Assistant VM from VirtIO to VirtIO-Net as I've read it might cause instabilities. I'm currently running the parity check but I'm seeing some warnings in the logs related to ata7.00: Quote pr 11 18:25:25 NAS kernel: ata7.00: exception Emask 0x10 SAct 0x7f0000 SErr 0x300000 action 0x6 frozen Apr 11 18:25:25 NAS kernel: ata7.00: irq_stat 0x08000000, interface fatal error Apr 11 18:25:25 NAS kernel: ata7: SError: { Dispar BadCRC } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:80:08:0b:96/05:00:57:00:00/40 tag 16 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/80:88:48:10:96/00:00:57:00:00/40 tag 17 ncq dma 65536 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:90:c8:10:96/05:00:57:00:00/40 tag 18 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:98:08:16:96/05:00:57:00:00/40 tag 19 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:a0:48:1b:96/05:00:57:00:00/40 tag 20 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:a8:88:20:96/05:00:57:00:00/40 tag 21 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7.00: failed command: READ FPDMA QUEUED Apr 11 18:25:25 NAS kernel: ata7.00: cmd 60/40:b0:c8:25:96/05:00:57:00:00/40 tag 22 ncq dma 688128 in Apr 11 18:25:25 NAS kernel: res 40/00:00:48:10:96/00:00:57:00:00/40 Emask 0x10 (ATA bus error) Apr 11 18:25:25 NAS kernel: ata7.00: status: { DRDY } Apr 11 18:25:25 NAS kernel: ata7: hard resetting link Apr 11 18:25:25 NAS kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Apr 11 18:25:25 NAS kernel: ata7.00: supports DRM functions and may not be fully accessible Apr 11 18:25:25 NAS kernel: ata7.00: supports DRM functions and may not be fully accessible Apr 11 18:25:25 NAS kernel: ata7.00: configured for UDMA/133 Apr 11 18:25:25 NAS kernel: ata7: EH complete Apr 11 18:25:27 NAS kernel: usb 4-1: new SuperSpeed USB device number 5 using xhci_hcd Apr 11 18:25:27 NAS kernel: usb-storage 4-1:1.0: USB Mass Storage device detected Apr 11 18:25:27 NAS kernel: scsi host9: usb-storage 4-1:1.0 Apr 11 18:25:28 NAS kernel: scsi 9:0:0:0: Direct-Access Samsung Flash Drive 1100 PQ: 0 ANSI: 6 Apr 11 18:25:28 NAS kernel: sd 9:0:0:0: Attached scsi generic sg9 type 0 Apr 11 18:25:28 NAS kernel: sd 9:0:0:0: [sdj] 62656641 512-byte logical blocks: (32.1 GB/29.9 GiB) Apr 11 18:25:28 NAS kernel: sd 9:0:0:0: [sdj] Write Protect is off Apr 11 18:25:28 NAS kernel: sd 9:0:0:0: [sdj] Mode Sense: 43 00 00 00 Apr 11 18:25:28 NAS kernel: sd 9:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Apr 11 18:25:29 NAS kernel: sdj: sdj1 Apr 11 18:25:29 NAS kernel: sd 9:0:0:0: [sdj] Attached SCSI removable disk Apr 11 18:25:30 NAS unassigned.devices: Disk with ID 'Samsung_Flash_Drive_0312720100006541-0:0 ()' is not set to auto mount. Apr 11 18:25:30 NAS emhttpd: Unregistered Plus - invalid key (EGUID) Apr 11 18:25:52 NAS emhttpd: Samsung_Flash_Drive_0312720100006541-0:0 (sdj) 512 62656641 Apr 11 18:25:52 NAS emhttpd: read SMART /dev/sdj Apr 11 18:37:55 NAS flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Apr 11 18:46:37 NAS emhttpd: Plus key detected, GUID: 090C-1000-0309-222050002927 FILE: /boot/config/Plus.key Apr 11 18:51:32 NAS kernel: usb 4-1: USB disconnect, device number 5 Apr 11 18:51:32 NAS kernel: sd 9:0:0:0: [sdj] Synchronizing SCSI cache Apr 11 18:51:32 NAS kernel: sd 9:0:0:0: [sdj] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=DRIVER_OK Apr 11 19:07:55 NAS flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update Regards, Edited April 11, 20251 yr by SP67
April 11, 20251 yr Community Expert 20 minutes ago, SP67 said: Dispar BadCRC This is typically a bad SATA cables, replace the cable for ata7.
April 11, 20251 yr Author Ok, I've just did that. Do I need to do the ethtool -K ethX tso off every time I restart the server? EDIT: I've seen that I can add the command to the go file in /boot/config/go so that I runs automatically at start-up. Regards! Edited April 11, 20251 yr by SP67
April 11, 20251 yr Author Hello again, One of my VM won’t start. Here’s the xml file. The only thing I did was to change the network type to virtIO-net as I said… Thanks! <?xml version='1.0' encoding='UTF-8'?> <domain type='kvm'> <name>Home Assistant</name> <uuid>950bf7bc-b0ee-0fde-2ca3-22f921ce8069</uuid> <description>VM for Home Assistant</description> <metadata> <vmtemplate xmlns="unraid" name="Linux" icon="default.png" os="linux"/> </metadata> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>2097152</currentMemory> <memoryBacking> <nosharepages/> </memoryBacking> <vcpu placement='static'>2</vcpu> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu='1' cpuset='2'/> </cputune> <os> <type arch='x86_64' machine='pc-q35-6.2'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/qemu/ovmf-x64/OVMF_CODE-pure-efi.fd</loader> <nvram>/etc/libvirt/qemu/nvram/950bf7bc-b0ee-0fde-2ca3-22f921ce8069_VARS-pure-efi.fd</nvram> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-passthrough' check='none' migratable='on'> <topology sockets='1' dies='1' cores='1' threads='2'/> <cache mode='passthrough'/> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/local/sbin/qemu</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='writeback'/> <source file='/mnt/user/domains/home_assistant/haos_ova-9.0.qcow2'/> <target dev='hdc' bus='virtio'/> <boot order='1'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </disk> <controller type='pci' index='0' model='pcie-root'/> <controller type='pci' index='1' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='1' port='0x8'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/> </controller> <controller type='pci' index='2' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='2' port='0x9'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='pci' index='3' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='3' port='0xa'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='4' model='pcie-root-port'> <model name='pcie-root-port'/> <target chassis='4' port='0x13'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> </controller> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> </controller> <controller type='sata' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> </controller> <controller type='usb' index='0' model='ich9-ehci1'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x2'/> </controller> <interface type='bridge'> <mac address='52:54:00:d2:23:76'/> <source bridge='br0'/> <model type='virtio-net'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <serial type='pty'> <target type='isa-serial' port='0'> <model name='isa-serial'/> </target> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='unix'> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' websocket='-1' listen='0.0.0.0' keymap='es'> <listen type='address' address='0.0.0.0'/> </graphics> <audio id='1' type='none'/> <video> <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/> </video> <hostdev mode='subsystem' type='usb' managed='no'> <source> <vendor id='0x10c4'/> <product id='0xea60'/> </source> <address type='usb' bus='0' port='2'/> </hostdev> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> </memballoon> </devices> </domain>
April 11, 20251 yr Author Ok, I've re-created the VM with the same qcow file and it's again working fine!
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.