mganoe Posted November 21, 2021 Share Posted November 21, 2021 As the title describes I was in the middle of a drive rebuild on a precleared disk when my motherboard there was a high temp alarm on 2 CPUs. I paused the rebuild and shut the system down thinking perhaps I had dust buildup in the case. After blowing it out I fired it back up to find another drive failing and 5 disks just missing from the system. I double checked the cables and everything seems to be fine. My attached cards all seem to see the drives, but they do not show at all in UNRAD (6.9.2) from the array devices. I have not tried to restart the array, because I am afraid it might do irreversible damage. Any thoughts would be most welcome at this point. Quote Link to comment
trurl Posted November 22, 2021 Share Posted November 22, 2021 attach diagnostics to your NEXT post in this thread. Quote Link to comment
mganoe Posted November 22, 2021 Author Share Posted November 22, 2021 Sorry, show have included those in the first. gtower-diagnostics-20211121-1902.zip Quote Link to comment
trurl Posted November 22, 2021 Share Posted November 22, 2021 Looks like you have dual parity and 17 data disks in the array, plus 2 cache and flash for a total of 22 disks, disk9 was rebuilding, but disk4 is missing. With dual parity you should be able to rebuild both. Since you rebooted before getting the diagnostics syslog doesn't tell us anything before that. And I can't tell what disk was assigned as disk4, but there are several unassigned disks in your smart folder. 09:00.0 IDE interface [0101]: JMicron Technology Corp. JMB368 IDE controller [197b:2368] Subsystem: JMicron Technology Corp. JMB368 IDE controller [197b:2368] Kernel driver in use: pata_jmicron Kernel modules: pata_jmicron Do you actually have any IDE drives? If not check in BIOS to use AHCI for all your disks. 85:0e.0 RAID bus controller [0104]: Areca Technology Corp. ARC-1220 8-Port PCI-Express to SATA RAID Controller [17d3:1220] Subsystem: Areca Technology Corp. ARC-1220 8-Port PCI-Express to SATA RAID Controller [17d3:1220] Kernel driver in use: arcmsr Kernel modules: arcmsr 87:00.0 RAID bus controller [0104]: Areca Technology Corp. ARC-1680 series PCIe to SAS/SATA 3Gb RAID Controller [17d3:1680] Subsystem: Areca Technology Corp. ARC-1222 8-Port PCIe to SAS/SATA 3Gb RAID Controller [17d3:1222] Kernel driver in use: arcmsr Kernel modules: arcmsr RAID controllers are not recommended, and I suspect this is the main problem. Quote Link to comment
mganoe Posted November 22, 2021 Author Share Posted November 22, 2021 The array has all 24 filled with drives, which is what concerns me. The BIOS is set to use ACHI and the RAID cards are only being used to connect all the drives not as an actual RAID configuration. I moved to UNRAD when I began to get concerned with the age of the cards a few years back. What is odd is the part of the bank that has fallen off are disks 18, 19, 20, 21, 22, which now that I look are all from one RAID card, but 17 is registered just fine and it's also on that same RAID card. The disks you see as unassigned have always been unassigned, but were used in VMs once things spin up normal. Quote Link to comment
trurl Posted November 22, 2021 Share Posted November 22, 2021 Post a screenshot of Main - Array Devices Quote Link to comment
mganoe Posted November 22, 2021 Author Share Posted November 22, 2021 I mapped these out shortly before everything went south, so it should be correct as how things are supposed to look. Quote Link to comment
trurl Posted November 22, 2021 Share Posted November 22, 2021 According to that screenshot of Main - Array Devices, nothing was assigned after disk17. Did you do New Config at some point? Quote Link to comment
mganoe Posted November 22, 2021 Author Share Posted November 22, 2021 No, that is how it came up after the very first reboot after the alarm. When I log into the BIOS of the RAID card I can see all the same disks are still attached, but for some reason they are just not visible to UNRAID. Quote Link to comment
JorgeB Posted November 22, 2021 Share Posted November 22, 2021 8 hours ago, mganoe said: What is odd is the part of the bank that has fallen off are disks 18, 19, 20, 21, 22, which now that I look are all from one RAID card, but 17 is registered just fine and it's also on that same RAID card. The disks you see as unassigned have always been unassigned, but were used in VMs once things spin up normal. Disk 17 is on an LSI controller together with 7 other devices, and all are being detected, also unless you loaded an older Unraid config there are only 17 data disks, look at the screenshot you posted, there are 24 devices total, including cache devices, unassigned SSDs and disks, and an empty slot, you don't even have disk 17 there. Quote Link to comment
mganoe Posted November 22, 2021 Author Share Posted November 22, 2021 My issues is that 18, 19, 20, 21, 22 as well as 4 all show as unassiged with no option to select the drives that are actually connected to them. I did not load a new config what you see is how it just came back online are the reboot. Prior to that reboot all the drives were assigned a drive and nothing showed as unassigned. That said, I did have 3 drives I believe that were not part of the array. Now I"m just trying to determine how I get the missing drives back so I can restart the array. Quote Link to comment
mganoe Posted November 23, 2021 Author Share Posted November 23, 2021 I now see what everyone has been talking about as far as the counts go. I just finished going through and pulling all the drives to verify what was where. It appears the only drive UNRAID is not picking up is the 4TB drive with serial number WD-WCC4E3CECN91, which I suspect should be in Disk 4. I have another drive on the way in the event the drive just died. Should I wait to receive the drive before spinning up the array and restarting the rebuild process on my other failed drive? Also can I simply change the slot count on the main page without any negative side effects? I have no idea how it got like that. Quote Link to comment
trurl Posted November 23, 2021 Share Posted November 23, 2021 If you wait for the other drive you can rebuild both at the same time. Actually, you can use your server with both disabled though there is no protection. Or go ahead and rebuild one then you would be back to having dual parity with only one disabled and so have single protection. You can change the slot count. Quote Link to comment
mganoe Posted November 23, 2021 Author Share Posted November 23, 2021 I think I'll wait for the other drive and then the preclear to complete. Thanks for all the help. You both talked me off the ledge. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.