2 drives down, shows unmountable


Recommended Posts

Hey guys.. looks like i had two drives go dark.  Both show "Unmountable: No file system".  Hovering over the "x" shows "Device is disabled. Contents Emulated".  There is no mount points for disk1 and disk2 and no browse icon to the right.  If it's emulated, should i still be able to browse like it's available?

 

I tried the xfs_repair -n on disk1, it shows an issue, so i ran w/o the -n and it ran for at least half an hour, (it didn't say I needed to use -L) finding all kinds of issues and reporting read/write failures. It finally said "done" but still would not mount.  It also doesn't even report temperature.

 

Not sure what the best step is next.  Should I just disconnect the drive and assign a new drive and see if it rebuilds?  Or should I first just unassign it and start the array and see if it shows me the emulated data?

 

I've attached the diagnostics and a couple of screenshots.  Let me know if anything else is needed.

 

Much appreciated!

nabit1.PNG

nabit2.PNG

nabit-diagnostics-20210112-1603.zip

Edited by heisenfig
Link to comment

umm.. apparently, i've been pretty neglegent.   Looks like July 20 was the last one with 0 errors.  :/  I thought I had notifications setup to send me emails.  I haven't looked at the dashboard in ages until suddenly the array was offline the other day.

 

I'm guessing it's probably going to be just replace the drives and hope for the best.

 

 

 

nabit3.PNG

Link to comment

Ran into a hiccup, I think.  I swapped out Disk1 with a new pre-cleared drive, but upon starting it up, both Disk1 and Disk 2 went back to "unmountable".  It says it's rebuilding drive, but going VERY slow.  like 1.5MB/sec.  Says it's going to take about 60 days to complete.  It's been running 18 hours and only 1.1% done.

 

This is making me think there's another problem, like a cable or sata port issue.

 

Should I just stop the re-sync, swap out the cables on those 2 drives, repair the filesystem again, then start it up again?

 

Link to comment

ouch.  started getting tons of errors on my usb flash (boot) drive.  Tried to browse it and it shows no files.  It showed 18,446,744,073,709,551,616 writes on the flash drive.  Also, it showed up under Unassigned devices.  Tried to stop array, but it couldn't unmount everything.  Ended up having to just power it off.  It did reboot normally though.

 

I replaced the sata3 cable on disk1, as it looked in bad condition.  Kind of a kink near one plug.  I let it say off for a bit while I was working on other stuff and just booted it back up, so temp is low right now.  I went thru all disks and did the 'xfs_repair -n'.  All disks look good except the emulated disk1 and disk2.  So i'm repairing emulated disk1 again right now with the -L option.  So far, I have not see any more of the fatal i/o and hard reset errors I was seeing before.  Could a single bad sata3 cable cause errors across multiple sata3 adapters?  I just ordered a batch of new sata3 cables so I could just swap them all out.

 

As I'm typing here, the temps are getting back up to normal and so far no drive errors in log.  If it starts throwing errors again or doesn't seem like it's playing nice, I'll pull another diagnostic file.

 

As always, the help here is appreciated. 

Link to comment

So far so good after replacing that 1 sata3 cable.  The 2 emulated drives are repaired.  Array started and all disk shares valid and available.   Will have to sort thru lost+found later.  Drive 1 is rebuilding to the new drive much faster.  Says it will finish in about 21 hours instead of 90 days.   And not a single error in the log file.

 

I'm attaching a diagnostic that I ran earlier this morning that shows some of the hard drive soft and hard resets I was seeing.  This was before the whole thing went haywire with the flash drive errors that I outlined above.

nabit-diagnostics-20210114-0918.zip

Link to comment

Almost solved.  but now 2 more disks disabled. 

 

Rebuilt disk 1 successfully, then disk 9 got disabled.  Rebuilt disk 9 and disk 2 successfully and had everything green again for a few days.  I had another drive pre-clearing connected to USB because I wanted to go ahead and replace my last 4TB drive. 

 

So I took that drive and connected it directly to a sata port.  At the same time, I decided to get rid of the IcyDock that 5 of the drives are in.  I believed that maybe part of the trouble i'm having.  on the Icy Doc, the 5 drives each had their own pass-thru data port, but shared power from 3 sata power connections.  So now, all drives are connected directly, but still divided up on the same connectors that they were on.

 

Upon startng the array, I immediately started seeing read errors again.  I think it was one too many drives and causing it to be power starved, causing write errors which leads to the drives getting disabled in the configuration.  This time, it disabled Drive 9 (brand new drive from above) and Parity 2 (an old drive).

 

I just read something that made me understand that those drives (and the ones from before) may not dead, but had just been disabled due to a read error.  And if the read errors were because it wasn't getting good enough power, then those drives may all still be good.

 

So, my theory now is that my power supply is getting weak.  RaidMax RX-1000AE. I thought I had read that it was a 4-rail, but looking at the box, it clearly says "A strong single +12v rail for high-end system heavy load....".   I did have some power issues like this before with the IcyDock and getting an additional modular cable from the PSU to the icydock seemed to fix it at the time.

 

Do you think I need to replace the powersupply?  It says it supplies 83 Amps on the +12v rail.  I have 13 HHDs (2 parity, 11 data), 2 SSD (cache) and 6 fans. The graphics adapter in an nVidia 960.  Is that just too much?

 

I've attached a diagnostic.  I'm still seeing some soft resets (extra drive not attached).

 

Edit: Though the box says it's a single +12v rail, the specs online show 4 outputs divided up into 36A each.  I'm assuming that max per channel with an overall max of 83 Amps.  I'm pretty sure I have them split out between the 4 ports enough.   I don't think I have more than 4 drives on any one port, but I'll double check.

 

Output Current

+3.3V - 24 A
+5V - 30 A
-12V - 0.5 A
+5VSB - 3 A
+12V1 - 36 A
+12V2 - 36 A
+12V3 - 36 A
+12V4 - 36 A

 

Edit 2:  I just checked what's connected to each.

   +12V1: 4 hdd drives + 1 hdd fan

   +12V2: 3 hdd drives + 2 ssd drives

   +12V3: 3 hdd drives + case fans (2 front, 2 top, 1 back)

   +12V4: 3 hdd drives

 

   I just moved 1 drive from the first slot to the 4th slot to see if that's any better.  But then saw a read and a write error to disk 10. Fortunately, it did not get disabled too.  So moving power cables definitely makes a difference, but I don't think i'm anywhere near overpowering the PSU on any port or overall per the specs.  So that leave power cables going bad, or the PSU going bad?  thoughts?

 

 

 

 

 

nabit-diagnostics-20210122-1401.zip

Edited by heisenfig
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.