deionmann Posted August 12, 2015 Share Posted August 12, 2015 I've had 3 drives fail in the past few months, replaced them, rebuilt them and moved on. After having a fourth fail and thinking about the issue I had decided that my M1015 controller was going bad, so I exchanged it for a 2008 PIKE controller and started a data rebuild on my currently red balled drive... Stupidly I forgot to turn off all my dockers & VM's and had a preclear in progress in the backround smh... I know... stupid now the GUI is saying that neither drives are mountable and have no filesystems. honestly at this point I'm lost and caught with my trousers down. help? current syslog and smart reports for failed drives attached godzilla-syslog-20150812-0128.zip godzilla-smart-report-disk5-20150812-0113.txt godzilla-smart-report-disk6-20150812-0113.txt Link to comment
deionmann Posted August 12, 2015 Author Share Posted August 12, 2015 The screenshots from the GUI if that helps at all Link to comment
trurl Posted August 12, 2015 Share Posted August 12, 2015 Is this the server in your sig? What is the power supply? Smart looks good. How did those other drives fail? Preclear shouldn't impact rebuild in any way, and technically, other read/writes to the array should only have the effect of slowing things down. Was it rebuilding disk6 when disk5 was disabled? Check your cables. Link to comment
deionmann Posted August 12, 2015 Author Share Posted August 12, 2015 Is this the server in your sig? What is the power supply? Smart looks good. How did those other drives fail? Preclear shouldn't impact rebuild in any way, and technically, other read/writes to the array should only have the effect of slowing things down. Was it rebuilding disk6 when disk5 was disabled? Check your cables. Trurl thanks for the reply, It's my Sig, have to update it, its now 18TB wd reds (all 3tb) with 1 1tb wd black cache. The power supply is a 500W Corsair http://m.newegg.com/Product/index?itemnumber=N82E16817139050&nm_mc=KNC-GoogleAdwords-Mobile&cm_mmc=KNC-GoogleAdwords-Mobile-_-pla-_-Power+Supplies-_-N82E16817139050&gclid=Cj0KEQjw3auuBRDj1LnQyLjy-4sBEiQAKPU_vU9Z6E355pPAEtYMTKSarEMw5ZCNHcsTVwoi6YnmdykaAuA58P8HAQ&gclsrc=aw.ds The other drives failed during normal operation, first was drive 4, then drive 6, then drive 5, and again drive 6. Now drive 5 again Yes drive 5 failed while rebuilding drive 6 All the cables are brand new from newegg with clips on both sides, all of which are clipped and secure... maybe ill try replacing them just incase. Since the GUI is saying that their not mountable does that mean that my data on drive 5 is toast? Is there any way to save the data from disk 6 still? Link to comment
itimpi Posted August 12, 2015 Share Posted August 12, 2015 The power supply is a 500W Corsair I think that is rather borderline for that number of drives. A power supply that is slightly under-speced can cause random drive to appear to fail as they temporarily get starved of power. The other drives failed during normal operation, first was drive 4, then drive 6, then drive 5, and again drive 6. Now drive 5 again This is exactly the sort of behaviour that one can expect if he power supply is borderline for the number of drives. I had a Corsair 550W power supply with 20 drives and was getting this sort of behaviour. When I swapped it for a 750W one the problems disappeared. Since the GUI is saying that their not mountable does that mean that my data on drive 5 is toast? Is there any way to save the data from disk 6 still? It is very rare that a disk being unmountable means that there is data loss unless the disk has physically failed. It normally means that a write went wrong and there is some sort of file system corruption that can easily be repaired using the appropriate recovery program. Link to comment
trurl Posted August 12, 2015 Share Posted August 12, 2015 Seems like he only has parity, 6 data, and 2 cache, so 9 drives. Should be enough power for that I would think. Link to comment
itimpi Posted August 12, 2015 Share Posted August 12, 2015 Seems like he only has parity, 6 data, and 2 cache, so 9 drives. Should be enough power for that I would think. Could well be right - I had read the 18TB as 18 drives which is not quite the same Link to comment
deionmann Posted August 12, 2015 Author Share Posted August 12, 2015 My appologies for the slow response (at work) and the confusing state of my Sig. I havent updated it in a while. Currently I only have in my array... 1 Parody 3tb wd red 1 cache 1tb wd black 6 data 3tb wd red 8 disks total All my dockers & vm's reside on the cache the appdata share is cache only And all data goes to the cache as well, with mover doing it's magic in the wee hours of the night Link to comment
deionmann Posted August 13, 2015 Author Share Posted August 13, 2015 Tell me if this sounds accurate. Being that this failure has occurred across several separate drives and three separate drive bays. And the sata cables have been replaced during the interim. I'm thinking that the failure is either the psu or the enclosures that I'm using (norco ss-500) I'll catalog the drives tonight and I'm thinking that if the failures are in both enclosures (I use two) then it's the psu and if it's in one enclosure then it's most likely the enclosure? Also the I'm pretty sure the data is gone on disk 6 since I was in the middle of a rebuild correct? (It was about 60% done when disk 5 failed) Disk 5 & 6 were both btrfs... how would I go about checking and repairing the filesystem on disk 5? Link to comment
deionmann Posted August 13, 2015 Author Share Posted August 13, 2015 I can't stress enough how grateful I am for all this help guys. Trurl & itimpi you guys rock Link to comment
itimpi Posted August 13, 2015 Share Posted August 13, 2015 Also the I'm pretty sure the data is gone on disk 6 since I was in the middle of a rebuild correct? (It was about 60% done when disk 5 failed) Were you rebuilding the failed disk back to itself? If so then there is an excellent chance that the majority of the data is intact as most of the time you would have been writing out sectors with the same contents that they already have. Link to comment
trurl Posted August 13, 2015 Share Posted August 13, 2015 Disk 5 & 6 were both btrfs... how would I go about checking and repairing the filesystem on disk 5? Here Link to comment
deionmann Posted August 13, 2015 Author Share Posted August 13, 2015 Also the I'm pretty sure the data is gone on disk 6 since I was in the middle of a rebuild correct? (It was about 60% done when disk 5 failed) Were you rebuilding the failed disk back to itself? If so then there is an excellent chance that the majority of the data is intact as most of the time you would have been writing out sectors with the same contents that they already have. You sir, just made my day. Yes... Thank god, yes I was. Link to comment
deionmann Posted August 13, 2015 Author Share Posted August 13, 2015 Disk 5 & 6 were both btrfs... how would I go about checking and repairing the filesystem on disk 5? Here Ok, I'll scrub disk 5 and go from there to correct it. That's awesome thanks so much! Looks like I've got my work cut out for me tonight. You guys are awesome Link to comment
deionmann Posted August 14, 2015 Author Share Posted August 14, 2015 Just cataloged my drives and turns out that they span across both my hot swap drive cages. So my plan is to pick up a new psu and cable the drives directly to the motherboard... and test from there using preclears while incrementally adding the hot swap bays one by one. Just picked up aCorsair HX750i, once I get it here and installed I'll run the tests on a spare disk I have THEN try to save the drives and re-add them. This is so much fun I can't wait to get started (insert sarcasm here) Thank you guys so much I'll keep updating as this progresses Link to comment
deionmann Posted August 23, 2015 Author Share Posted August 23, 2015 Update: 1. New PSU is installed 2. All HD's removed from hot swap bays and physically connected to MB. 3. Installed plugin: Unassigned Devices & verified data is good on disk 5 & 6 4. Shrunk array to 4 known 'good' disks. 4. Rebuild parity. 5. Precleared spare WD Red 3 TB 6. Added new drive to array and migrated disk 5 data to new drive via Unassigned Devices plugin 7. Using the old disk 5 as my spare, repeat step 5 & 6 for disk 6 Now I'm back in business with the old disk 6 sitting in standby as my spare. One question, I'd like to test the hot swap cages for bad power and sata links. Whats the best way to go about this? Would a 2 pass preclear on a 120G drive in each bay be enough to test these cages? The cages are both Norco SS-500. @trurl @itimpi thanks so much guys, you helped me save over 5k family pictures and home movies Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.