February 23, 20242 yr Last week I found that one of my disks was disabled. I checked the smart reports and didn't see any warnings/errors. The system was able to run as usual as I assume that disk was now emulated. Later that night I found that my server had gone offline (docker services) and I was able to get into the gui for a short period of time. The first thing I noticed were tons of filesystem errors being reported across random drives. Secondly, the log was full. A very short time later, the system froze. I rebooted, formated the failed disk, and started to rebuild. Everything was going fine until 8 hours into uptime. Same result:xfs errors across the board, log blows out, and everything becomes unresponsive. All the array drives connect to the LSI card. This machine has been running without a hitch for a few months. I understand things run great, until they don't but it's unusual how it will run for a few hours then start crapping out. Any ideas of where I should start looking? Has anyone else had this issue after upgrading? The NIC error - I had port based VLANS and need to disconnect that second card as I reconfigured the switch. Diagnostics attached. Thanks in advance :) phobos-diagnostics-20240222-2012.zip
February 23, 20242 yr Community Expert 1 minute ago, medium-tray6321 said: formated the failed disk, and started to rebuild If you formatted the disk in the array, the only thing it can rebuild is a formatted disk. Format is never part of rebuild. You should never format a disk that has data you want to keep. Disks 8 and 9 have little or no data. Did you format these? Maybe highwater allocation just hadn't gotten to them yet since they are so much smaller. All of your disks are mounted, disk8 is disabled. No xfs errors since reboot, but previous syslog showed problems with multiple disks, including the write errors which disabled disk8. This suggests controller or possibly power problems. And you have macvlan traces. https://docs.unraid.net/unraid-os/release-notes/6.12.8/#call-traces-and-crashes-related-to-macvlan
February 24, 20242 yr Author disk8 was disabled. I brought the array down and added it back later. You're correct, disk 8 and 9 hold little data due to their size. We did have a power 'blink' prior to all of this mess. I don't even know where to begin to test for that - motherboard? SAS adapter? PSU? Any ideas?
February 24, 20242 yr Community Expert Looks like this controller is where the trouble is 05:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05) Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020] Kernel driver in use: mpt3sas Kernel modules: mpt3sas Maybe reseat it. Is it overheating?
February 24, 20242 yr Author It has been running fine for over a year. I'll re-seat and check another slot as well.
February 25, 20242 yr Author Solution Pulled 2nd GPU out and used that slot for the LSI card. So far, after a rebuild, everything is stable. I ordered a newer card just in case but so far so good. Thank you for the suggestions. Yeah, we'll fix that macvlan too
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.