October 21, 20241 yr I was able to get the syslog server setup yesterday, everything was working fine, then crashed today. It has been crashing frequently. Like full stop. No idea why, though if I'm seeing this correctly, is it because of an issue with my GPU? Currently on version 6.12.13. Thanks. UnRaid Syslog.txt
October 21, 20241 yr Community Expert 8 minutes ago, SVThuh said: is it because of an issue with my GPU? Looks like it may be: 192.168.1.101 Oct 20 11:54:38 Tower kern warning kernel NVRM: Xid (PCI:0000:01:00): 140, pid='<unknown>', name=<unknown>, An uncorrectable ECC error detected (possible firmware handling failure) DRAM:1227137661, LTC:0, MMU:0, PCIE:0 This device is the GPU, and after that there are NVidia related call traces, retest without the GPU (or just without the NVidia driver installed)
October 21, 20241 yr Author 7 minutes ago, JorgeB said: Looks like it may be: 192.168.1.101 Oct 20 11:54:38 Tower kern warning kernel NVRM: Xid (PCI:0000:01:00): 140, pid='<unknown>', name=<unknown>, An uncorrectable ECC error detected (possible firmware handling failure) DRAM:1227137661, LTC:0, MMU:0, PCIE:0 This device is the GPU, and after that there are NVidia related call traces, retest without the GPU (or just without the NVidia driver installed) Will do. Thank you for the quick response. Hoping to get this figured out.
October 22, 20241 yr Author @JorgeB, thank you for the advice with the GPU. After removing the GPU, the system seemed to be a bit more stable, but then I just had a drive become disabled. But at least Im able to pull diagnostics this time. It did seem like it locked up hard, but came back and then the disk was not there. The disks are all connected via an HBA. I alse keep getting this error on my syslog, repeatedly. "pcieport 0000:02:02.0: Unable to change power state from D3cold to D0, device inaccessible" Thank you for your help! tower-diagnostics-20241022-1338.zip Edited October 22, 20241 yr by SVThuh
October 23, 20241 yr Community Expert Oct 22 06:13:53 Tower kernel: mpt2sas_cm0: mpt3sas_base_hard_reset_handler: FAILED Oct 22 06:13:53 Tower kernel: mpt2sas_cm0: _base_fault_reset_work: hard reset: failed Problem with the HBA, try using a different PCIe slot.
October 25, 20241 yr Author Thanks @JorgeB. I moved the HBA, but Im still seeing a disabled drive. Is my HBA borked? Should I try that drive with a sata cable? Thanks for all your help! ETA, I ran the filesystem check with the -n flag and got the following: tower-diagnostics-20241024-1930.zip Edited October 25, 20241 yr by SVThuh
October 25, 20241 yr Community Expert Once a disk gets disabled it needs to be rebuilt, if the emulated disk is still mounting and contents look correct, you can rebuild on top: https://docs.unraid.net/unraid-os/manual/storage-management#rebuilding-a-drive-onto-itself
October 26, 20241 yr Author @JorgeB sorry to keep bothering you, but im still having hard crashes. I dont know what else to do??? Thanks for all your help! 10.27.24 syslog.txt
October 27, 20241 yr Community Expert Solution 192.168.1.101 Oct 26 04:21:57 Tower user notice root cp: cannot create regular file '/boot/config': Input/output error 192.168.1.101 Oct 26 04:21:57 Tower kern err kernel FAT-fs (sda1): Directory bread(block 30584) failed 192.168.1.101 Oct 26 04:21:57 Tower kern err kernel FAT-fs (sda1): Directory bread(block 30585) failed Flash drive issues, you can try recreating it first, if issues persist, replace it.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.