July 12, 20241 yr Hi all, About a week ago we had a power outage. Unfortunately both my UPS' were (unknown to me) having issues and failed, which means my server suffered a power loss. Since then i have been having some issues. The 1st issue is that my server has been randomly locking up. The WebUI become inaccessible and if I login to my KVM there is video output but the server takes no input from KB/M. I have to hard reset the machine to get it back up. It has happened probably two or 3 times this week and it is happening at random times. The 2nd issue is that I have a PCIe NIC in the server with 2 10G SFP ports. Unraid only sees one of the ports in the networking section. (This could be a holdover from replacing my USB drive the week before but I didn't notice my link aggregation not working and the interface missing until the outage) The strange thing is that I can see both ports in system devices. I have VM's disabled and no devices assigned to IOMMU Groups. If anyone has any suggestions of where to start looking I would be very grateful. Anonymized diagnostics attached. (I am aware that I have a couple of disks on their way out. I am in the process of moving data off of disk 11 so I can swap it with a cold spare I have.) glados-diagnostics-20240712-1732.zip
July 13, 20241 yr Community Expert For the NIC: Jul 12 14:36:05 GLaDOS kernel: ixgbe 0000:08:00.1: failed to load because an unsupported SFP+ or QSFP module type was detected. Regarding the crashing, enable the syslog server and post that after a crash.
July 13, 20241 yr Author Weird that it's saying it's unsupported because I believe both SFP modules have been installed and working for months. I just swapped it out with another and it appears to be working. I have enabled syslog server and and will post another update when/if it crashes again. I have swapped out Disk11 because the re-allocated sector count was so high. The array is rebuilding now but that typically takes a few days. I also have another cold spare and will probably swap out one of the other 2 disks that have low sector counts. Thanks for the follow-up.
July 16, 20241 yr Author @JorgeB Thanks for you help. No further random crashes so I have no idea what is going on there. New UPS batteries are in transit so hopefully no more power outages. I did lose another disk tonight so had to end up using my last cold spare for that. When the disk went down I rebooted just to make sure it wasn't a random power issue or something and when the server came back up the other NIC port was missing again. One more reboot and it came back but as Eth2 this time for some reason. I have no Eth1 now. Just Eth 0,1, and 3. I checked the syslog for the kernel error you replied with originally and nothing remotely close to that was in the log so I have no idea what is going on there.
July 24, 20241 yr Author @JorgeB Server finally went through a random crash again of course with 98% of my data rebuild complete. I have attached the syslog as requested. It's kind of weird. I am confirming the exact time but it looks like the shutdown happened at line 3070. The timestamp on that line aligns with the hangup and then I had to manually hit the reset button on the server and the next line isn't for another 15 minutes but there is nothing to indicate what caused the actual issue. Any ideas? I REALLY need this data rebuild to finish.syslog-127.0.0.1.log
July 24, 20241 yr Community Expert Unfortunately there's nothing relevant logged, this can be a hardware issue, one thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.