August 27, 20187 yr Hoping someone with more expertise can assist in figuring why the same 2 slots keep dropping out of the Array. Running unRaid Pro 6.5.3 on: Supermicro SC847 X8-DT6 mobo, 2 x E5530 32GB ECC RAM SAS2-846EL1 Backplane 9211-8i in IT mode Dual 8TB Parity I have ran Extended SMART test on both drives and no errors reported. I just replaced my SAS1 backplane for the SAS2 hoping that would solve my issue. The data is still on the drives because I can move them to another server and view all the contents and no errors are reported on the other machine. I swapped out the HBA for a different 9211-8i and same result. Moved the drives to the 12-bay backplane and no difference. I keep dropping Disk 8 and Disk 5. Disk 8 is a 3 month old Iron Wolf Pro 6TB and Disk 5 is a WD RE 4TB. Both were purchased new. I swapped the 4TB RE for a different 4TB RE and it dropped from the same array slot number but was in a different Hot-Swap slot. Any help would be greatly appreciated. hoskins-fs-diagnostics-20180827-1220.zip
August 27, 20187 yr SMART reports are fine for both Disk 5 and Disk 8, though Disk 8 is getting rather hot. Your diagnostics were grabbed only shortly after a reboot so the syslog covers less than a minutes and shows a normal start up. Try grabbing diagnostics again after the disks are dropped but I expect they will show the HBA losing communication with the disks. It's going to be a problem with something physical between the HBA and the disks or else the power supply to the disks.
August 27, 20187 yr Author Thanks John. I noticed as well that my drives were getting awfully warm. I am only using 14 out of 24 bays so I need to spread them out in my case to get better airflow over each drive. It is currently rebuilding both drives so once I that completes I will wait till they drop and grab diagnostic as soon as I get the email notifying me of the error state. Not sure how I will test my power supply since it is a redundant Hot-Swap power supply with a distributor that connects everything. I made sure all 6 power connectors were fully seated when I replaced the backplane. I will double check each connector again to be sure they are all the way connected. Since they are both NAS/Enterprise drives, getting power from a backplane shouldn't be an issue should it?
August 28, 20187 yr 15 hours ago, kevin_h said: I am only using 14 out of 24 bays so I need to spread them out in my case to get better airflow over each drive. In some situations, empty slots can give worse air flow - it's less resistance for the air to move through the empty slots instead of being forced through the narrow channels between disks.
August 29, 20187 yr Author So apparently I tried to post my reply last night during the maintenance period. The same 2 disks slots dropped again last night. I noticed that the HBA did lost communication with the disk and re-added them at different sdX disk names. I am able to sucessfully rebuild the disk each time. Attached are the diagnostics from right after they dropped. hoskins-fs-diagnostics-20180828-2008.zip
September 1, 20187 yr On 8/29/2018 at 8:16 PM, kevin_h said: I noticed that the HBA did lost communication with the disk and re-added them at different sdX disk names. That means the communication was broken and then later restored so you have something intermittent, either in the data link between the HBA and the disks or in the power supply to the disks. A momentary loss of power would have a similar effect - dropped disks followed by reconnection. Because of the changes to the forum I can't access the diagnostic you attached to your OP (I can access the latest ones just fine though). It might be worth comparing the SMART reports of the two disks in question to see if they have racked up more power cycles (SMART parameter 12) and/or power off head retracts (SMART parameter 192) as these might indicate a power problem. Also an increase in UDMA CRC error count (SMART parameter 199) that would indicate a data cable problem. I think, from your description, you have quite a lot in the path between HBA and disks. Can you bypass any of it temporarily or use a motherboard SATA port for the purpose of testing?
September 2, 20187 yr Author Thanks again John. I have moved the 2 slots to be directly connected to the motherboard SATA connectors with a Molex>SATA power dongle to help eliminate any issues that the SAS backplane may introduce and am in the process of rebuilding the 2 disks again. It'll probably be ~ 24 hrs before I know if the change had any affect. I tried rolling back from 6.5.3 -> 6.5.2 and that did not help. Once the rebuild is complete I will upgrade back to 6.5.3. I also tried adjusting the timings within the 9211 firmware for timeouts and that had no affect. I compared the SMART attributes you mentioned and I am not seeing a dramatic difference. I have rebooted a few times since the diagnostics were first taken. From first diagnostic to current: Iron Wolf Pro (ST6000NE0021) SMART Attribute 12 First: 10 Current: 14 SMART Attribute 192 First: 29 Current: 33 SMART Attribute 199 First: 0 Current: 0 WD Red (WD4000FYYZ *W7) SMART Attribute 12 First: 71 Current: 75 SMART Attribute 192 First: 64 Current: 68 SMART Attribute 199 First: 0 Current: 0 I guess my next test is to use my 1U server as the host and my 4U as a DAS with a 9211-8e to see if it's motherboard related. I have a power board that can boot the backplanes/HD's with no mobo. I am reattaching my original diagnostics zip file so it is accessible again after the forums change. hoskins-fs-diagnostics-20180827-1220.zip
Archived
This topic is now archived and is closed to further replies.