March 20, 20251 yr I did an initial search for something similar to this but it ended up being a power connection for someone else. Suspect possible a bad SAS controller on my Supermicro X10DRH-CT but would like someone with better eyes to take a look. After logging in, noticed that not only are both my parity drives disabled but also a recently install 14tb drive that I replaced from a different drive failure issue. the 24tb partities are also only a few months old. Granted I do see that there were a high amount of errors logged from the last time I was looking at the dashboard but it's only been a few days. Thinking back on this, it's possible that the SAS controller, board mounted is probably going because I used an old SAS 2 9200 to rule out any odd behavior and it wasn't until after I connected everything back to the board a few days ago, I noticed this. Mind anyone confirming? Also, any chance I can just boot into maintanence mode, unmount the partity and then remount them to rebuild parity of because of the order of things, are they in fact cooked? mainframe-diagnostics-20250320-1759.zip
March 20, 20251 yr Community Expert You don't have to do rebuild in maintenance mode, but you do have to get the drives connected again obviously. 10 minutes ago, Mtrx said: also a recently install 14tb drive Are you referring to disk8?
March 20, 20251 yr Author I had a feeling that it wouldn’t be too involved but the last time that this happened, I ended up rebuilding parity multiple times due to losing the fs partition of a few different drives. Not sure it was because of a bad connection that ultimately corrupted the fs or what but this looks very similar. The fact that both drives had multiple errors but are so new, leaves me skeptical. Yes, that’s correct. I believe it was disk 8 and disk 6 that are 14tb. Both were recent replacements for failed drives. So then just unassign the drives in maintenance mode and then reassign after starting the array again? making sure I don’t bork something like last time “smh”
March 20, 20251 yr Community Expert The thing is, I don't see parity, parity2, or disk8 in the smart folder of your diagnostics, nor any unassigned disks that might have been assigned to those slots, which usually means they aren't connected.
March 20, 20251 yr Author Interesting… I’m kind of a first time poster, long time reader so I actually didn’t know what was attached in the diagnostics but I saw they were always requested to help narrow down troubleshooting. I did not reboot the server, as I know the logs will get wiped upon reboot since the Os lives in memory. would a interment connect cause errors? I.e. Unraid trying to “ping” or write to the drive but is not receiving an IO message or return signal? The both are still displayed in the dashboard and I can kind of read the disk log, or at least the last log available but it still shows up, just emulated/disabled.
March 20, 20251 yr Community Expert 8 minutes ago, Mtrx said: didn’t know what was attached in the diagnostics They are mostly text, I encourage everyone to examine them. Some logs are in logs folder, output of some commands are in system folder, user share settings are in shares folder, other settings are in config folder. Both parity are disabled, but of course it can't disable disk8 since no more can be. Your syslogs have rolled over more than what are included in diagnostics, probably because of all the errors being logged with those disks. The oldest, syslog2.txt, starts with problems communicating with one parity, then soon after problems connecting with other parity. The most recent, syslog.txt, has similar problems communicating with disk8. These do indeed look like controller issues. In diagnostics system/lsscsi.txt you can see the disks and how they are connected. system/lspci.txt shows your pci hardware. All 3 of those disks (looks like all HDDs) were using this controller: 01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] [1000:005d] (rev 02) Subsystem: Super Micro Computer Inc Device [15d9:0a09] Kernel driver in use: megaraid_sas Kernel modules: megaraid_sas
March 21, 20251 yr Author As I think we both have come to possibly conclude, it very well could be the controller as all the disks, except for UA devices are connected to the Megaraid SAS controller via two SAS HD connections. I was unaware that Unraid could only disable a few devices before ceasing to function. However you can confirm that there was some sort of communication interrupt with the parity? Tbh I wouldn’t necessarily know what that looks like, apart from a simple “error” that looks like some sort of corruption or failure. I have hardly delved into the form of code for the logs. From here I will attempt to slave in the older LSI controller and then see about switching to maintenance mode to reassign the drives. Hopefully that is all there is too it. If both parity are “reassigned”, then it then do a parity rebuild, sync or just pick up where it left off at?
March 21, 20251 yr Community Expert 43 minutes ago, Mtrx said: If both parity are “reassigned”, then it then do a parity rebuild yes
March 21, 20251 yr Community Expert Solution MegaRAID cards are not recommended, see quote from the linked post: Quote Note: RAID controllers are not recommended for Unraid, this includes all LSI MegaRAID models, doesn't mean they cannot be used but there could be various issues because of that, like no SMART info and/or temps being displayed, disks not being recognized by Unraid if the controller is replaced with a different model, and in some cases the partitions can become invalid, requiring rebuilding all the disks.
March 21, 20251 yr Author 1 hour ago, MowMdown said: MegaRAID cards are not recommended, see quote from the linked post: I do recall reading about raid cards vs hba when I first got into Unraid, however I was able to Change the internal “raid” controller config to a JBOD and was able to see each drive and read SMART so I figured all was good. Granted, it is attached to the board directly, which was one of the reasons I picked the board but never thought too much about it. It wasn’t until very recently (last few months) that I’ve been getting these weird error/failures.
March 21, 20251 yr Author 13 hours ago, trurl said: yes Well, in order to restore partity again, I “must” Reassign them… unless there is a different course of action.? Also wonder what it will do about drive 8, that is supposedly “disabled” 🤷♂️
March 21, 20251 yr Author So Update: After disconnecting the onboard Megaraid connections and reinstalling the LSI SAS2008 card, everything is detected upon reboot. Disk 8 seems to have cleared whatever errors it had. The two Parity drives were still disabled but could thoroughly read all smart data without issues. Stopped array, unassigned and reassigned the two parties in the same exact placement, for whatever it was worth. I do have to do a partity sync/ rebuild but I kind of figured that. Can definitely tell the difference of SAS2 vs SAS3 but only by a little bit. Rebuild time is able 1.5 days vs 20hrs or so but that is more than adequate. Will report back and close this issue out once the rebuild is done...wish me luck
March 23, 20251 yr Author Parity-check/ Rebuild complete! No errors detected. Hind sight being what it is, I figured that the onboard Megaraid controller on my Supermicro X10 board would have been fine since I went through the settings to configure it as JBOD but evidence would seem that either one or more cables to the controller are bad or the controller itself on the motherboard is fading. Either way the flaky connection introduced substantial errors and I would not be surprised if it indeed caused a FS corruption previously that made two drives to fail in the past. For now, just sticking with a SAS2 2008 card that seems to be chugging along just fine. Remember - "If it ain't broke, don't fix it"
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.