Red X, unmountable file system??? Please help


Recommended Posts

Not a good weekend for my server...

 

Started up this weekend to find two red X drives.  Replaced with two other drives I had available.

Rebooted and started the array to rebuild, only to see two additional red Xes in different slots.

This was after a long wait to mount the array and having to refresh the main page.

The first two were old 2 Tb drives and didn't think it was odd that they might have failed. 

The second pair were two new 8 Tb drives for which I don't have replacements.

Both times, one would show as "un-mountable file system" but didn't give any result with the xfs check tool.

 

Is my only option to rebuild?  Do I set a new configuration and copy all the drives back into a new array?

I would like to replace the old 8 X 2 Tb drives with 2 new 8 Tb drives.

Should I consider a different infrastructure?  Is it a failing power supply (it's older, too...)?

 

Sorry for all the questions, but I don't want to make it worse.

Sincerely appreciate any help, diagnostic.zip attached.

 

Thanks,

John

 

tower-diagnostics-20180824-2045.zip

Link to comment

Did you have to open up the case to replace those two drives?  If the answer is 'yes', then shutdown the server and open up the case again.  Check all of the SATA connectors at both ends and make sure that they are firmly seated.  What often happens in these cases is that one of more connectors are disturbed and loosened.  

Link to comment
50 minutes ago, Harro said:

, I replaced all cables with locking ends

 

IF you do this, watch out for this problem.  You can easily tell if the locking SATA connector won't lock.  You should double check all SATA connections by gently pulling back on them.  There should be a slight resistance to your pull.  If it comes out without any resistance, you should place that cable with an entirely different one.  

 

       https://support.wdc.com/knowledgebase/answer.aspx?ID=10477

Link to comment

OK, thanks for the replies.

 

No, I didn't need to open the case to remove the drives.  I'm using Norco 5in3 hotswap modules and all the cables are locking.

I am also sure to shutdown before moving any of the drives, since I learned the system doesn't like to hotswap.

The startup described above was just turning it back on from a previously working and active state.

Since the failures were not common to one slot or controller, I would most suspect the power supply.

I'll probably replace it as I'm considering a full rebuild.

 

I guess my main question is that with four "failed" drives, is my only real option to rebuild the array?

I'm not familiar enough with any of the tools to "repair" the drives.

Am I correct in presuming that all the data is still on the drives?

If I were to get a couple new drives and setup a new configuration, can I just set each as Unassigned, copy the data and re-deploy?

Is there a better way?  Do I have any other options?  Is there any insight to glean from the diagnostics file?

 

Sorry for all the questions and thanks for the help!

 

Link to comment

Post up that Diagnostics file.  Be sure to tell us how many drives are in the system and which ones  (i.e., Disk1, ... ) have a problem.  

 

Remember that the data drives in unRAID are format with a standard Linux file system and all of the files on those disks can be easily read if there are no file system corruption.  If there is corruption, there are recovery tools for those file system.

 

So don't panic at this point.  Let's just work through this one step at a time.  

Edited by Frank1940
Link to comment
2 hours ago, Frank1940 said:

Be sure to tell us how many drives are in the system and which ones  (i.e., Disk1, ... ) have a problem.  

You have so many drives, it would help if you could give us some more details. We could probably figure out most of it, but you could save us some time.

Link to comment

I think that the first two drives to red x were in slots 8 and 9, they were both old 2 Tb veterans.

After replacing them with a pair of less old 2 Tb Samsung drives I had laying around, the second two red Xes appeared.

I believe that they were in slots 1 and 3.  Both were very new 8 Tb drives.

At that point I returned and reassigned the "failed" 2 Tb drives to their original slots and shut down.

It's been down ever since... 

Link to comment

Didn't realize I was running an old version.  Seems like I upgraded recently but that was only after a prompt on the main page.

I have two LSI 9211-8i flashed to IT mode.  The power supply is an older Corsair HX 750W.

Here's a shot of the main page with the problem drives...

image.thumb.png.d23f08be50d28bda9e37532a1c6605d3.png

 

Disk 8 and 9 were the two I tried to replace before I had an issue with the other two.

 

Thanks for all the help and apologies for the difficulty.

Link to comment

The diagnostics are several days old now. Is your server currently turned off?

 

There are several more disks that don't appear in the screenshot but I assume those are indicating green.

 

Disks 8,9 don't even appear in the diagnostics. Parity and disk1 SMART look OK.

 

Are all these disks on the same controller?

Link to comment

One thing you should consider is that you have PS issues.  Drive 8 and 9 being missing in the SMART reports in your posted Diagnostics file and and another pair of disks being involved in this screen shot are a most unusual situation.  I can recall some past similar cases that were PS issues.  I did a quick search on Google and could find any real specs on this PS beyond the fact that it could a 12V single rail PS---  with a 40A Max rating on any 12V output in the multi rail configuration.

Link to comment

Yes, the server's been off all week.  I only turned it on last night to take that screenshot. 

I don't think that they are on the same controller, both pairs are in separate 5X3 modules. 

8 and 9 were removed from the configuration when I tried to replace them. 

One similar observation from both instances is that the array didn't mount the drives and start the array as quickly as it usually does. 

Agree that it seems to point to an issue with the PS which is probably older than the discs.  I've accepted that it should be replaced.

Is there currently a PS model and rating that is favored by the community?

 

Aside from diagnosing the cause, I've also accepted that with four failed drives, I'll have to rebuild the array from a new configuration. 

I suspect the data is intact since I didn't even get to start the array either time and there shouldn't have been any writes.

Do I reset the configuration and start copying the data?  Is Krusader the recommended application?

 

Thanks again.

 

Link to comment
39 minutes ago, johnnya1306 said:

Do I reset the configuration and start copying the data?

Do you really want to copy the data to different disks, or maybe just try to set a New Configuration with the disks as is and rebuild parity? I didn't notice any SMART issues on the 2 red X but of course couldn't see the other 2. If there is no corruption everything should be there, and if there is corruption we can try to repair.

 

Anyway, that might be a simpler route to getting your hardware stable than trying to bring new disks into the picture right now. Then after you get things squared you can maybe replace/upsize disks if that's what you want to do.

 

Don't remember asking this before now. Do you have good backups of any important and irreplaceable data? If not then trying to copy that stuff to another system or external drive would be the first priority.

Link to comment
20 hours ago, johnnya1306 said:

I think that the first two drives to red x were in slots 8 and 9, they were both old 2 Tb veterans.

After replacing them with a pair of less old 2 Tb Samsung drives I had laying around, the second two red Xes appeared.

I believe that they were in slots 1 and 3.  Both were very new 8 Tb drives.

At that point I returned and reassigned the "failed" 2 Tb drives to their original slots and shut down.

It's been down ever since... 

 

I'm confused by your last action in the above. Having replaced disks 8 and 9 (presumably successfully, as you don't say otherwise), why did you reassign the old drives to their original slots? How did you think that would help you fix your new problem with disks 1 and 3?

Link to comment

John_M,

The replacement was not successful.  I got the second two red Xes when the system hung while mounting the disks.

 

trurl,

Yes, I would like to replace the old 2 Tb disks with new 8 Tb disks but getting the system stable is my first priority.

With disks 8 and 9 assigned back to the array, I can set a new configuration, keep only the data disks and rebuild parity, right?

Assuming that I can get the array started after replacing the power supply.

 

Thanks all.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.