galways Posted September 4, 2015 Share Posted September 4, 2015 Hi, ever since upgrading to 6.1.0 I have had multiple instances of faulty disks. I rebuild the faulty disk and soon after I start a parity check another disk shows as faulty. I have now rebuilt 3 different disks and sure enough another faulty disk shows up. On the first disk reported as faulty I rebuilt it with a new precleared disk that I had on hand. The subsequent rebuilds were replaced with the disks that I pulled after they appeared to be fine after preclearing. The last disk to show as faulty, disk 5, was one of the faulty disks that I precleared. I have obviously screwed something up in what I have been doing, so now seek advise as to how to get the array stable. Running on an Asus x99-A motherboard, intel i7-5820k, 32M ram, corsair HX1000i power supply. I checked sata cables when the first disk went they appear to be OK, power should be more than adequate. Attaching diagnostics and preclear report of the disk that just reported as faulty. Thanks in advance preclear_start_5XW14C4P_2015-09-02.txt tower-diagnostics-20150903-2151.zip preclear_rpt_5XW14C4P_2015-09-02.txt preclear_finish_5XW14C4P_2015-09-02.txt Link to comment
RobJ Posted September 4, 2015 Share Posted September 4, 2015 I don't think you have or have had any faulty disks, I think it's the system itself that is unstable. Something is really wrong, with numerous kernel crashes and drives completely dropping out then coming back. One drive (sdc) was seen and identified and setup, then suddenly the drive completely lost contact with the system, as if its cables had disconnected. Then later it showed up again, cables reconnected! That would seem to either be severe vibration, loose backplane connections, or unreliable power. The kernel crashes and other instability could be memory, so that's the first test, start the unRAID boot menu Memtest and run it for several passes. Check all connections, make sure they are tight, can't vibrate loose. Check the power and SATA cable connectors, and for power splitters, make sure there are no loose connections at all. The motherboard seems odd. There are 2 onboard SATA controllers, a special 4 port controller with 2 ports unusable, and the normal 6 port controller with the first 2 ports unusable. You have been provided with SATA controllers that support 10 SATA ports, yet only have 6 usable ones! That seems strange. The drive you Precleared seems fine. I wouldn't bother Preclearing anything more until the system is stable. If you have any overclocking, turn it off. Set any BIOS settings to safe defaults. With those kernel crashes, I would not trust the system at all until after a reboot. And do not run any parity checks or builds or drive rebuilds either, until system can be trusted. They are only causing additional problems. Link to comment
galways Posted September 5, 2015 Author Share Posted September 5, 2015 Rob, I removed and reinserted every cable connection. While they all appeared to be connected the 24 pin power didn't have its clip fully engaged so it may have been the problem. I didn't run a memory test as yet. I rebuilt the supposed faulty disk and I restarted the parity check at 19:39. At 19:59 back to failed parity check. Have attached the diagnostic report. Are you able to advise if it appears to be the same issue. If so I'll run the memory test. I use an SAS2LP-MV8 which was feeding the drive in question. Could that possibly be the problem? I have another and could swap it out if warrented. Please advise thanks. tower-diagnostics-20150904-2006.zip Link to comment
galways Posted September 6, 2015 Author Share Posted September 6, 2015 Ended up doing a new config. After 12 hours it completed with disk two showing 117 errors. All drives were green. As per advice in http://lime-technology.com/forum/index.php?topic=40106.0 I didn't attempt to rebuild the disk until parity was verified. Ran parity check, it failed and now shows disk 5 as faulty. I'm almost at the end of my rope. Am now going to run the memtest which I didn't do because I had convinced my self that it was a loose power cable. The fact that the parity-sync seems to run fine gave me the false hope that I was out of the woods. Guess not! Reports attached, any further advice appreciated. tower-diagnostics-20150906-0720.zip Link to comment
galways Posted September 8, 2015 Author Share Posted September 8, 2015 Determined errors were being caused by my controller AOC-SAS2LP-MV8 dropping out. I pulled the controller, connected my parity and 8 data drives to the motherboard (no room for dual cache). Parity sync completed with no errors and system is now half way through a parity check without errors. Definitely was the SAS2LP as before pulling it I tried running my array on two SAS2LP's in an attempt to see if the problem was my motherboard ports. Sync ran less then 20 minutes before it came to a crawl indicating 600 plus days to complete. Link to comment
RobJ Posted September 8, 2015 Share Posted September 8, 2015 There are quite a few reports of trouble with the SAS2LP and v6, including yours. The problems are different, can be separated (I think) in 3 different categories. * Marvell disk controller chipsets and virtualization - no attached drives available on affected controller unless IOMMU is disabled * slow parity checks with the SAS2LP - long thread of attempts to understand the problem * other misbehavior, such as randomly dropping drives - reports like yours and bkastner's I think you will find the reports and experiences of bkastner are especially interesting, as they bare some resemblance to yours. Because this may be a driver or firmware problem, I would hang onto the card. When/if the fix appears, the card may be a great option again. Link to comment
galways Posted September 8, 2015 Author Share Posted September 8, 2015 Thanks for the update. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.