Jump to content

Tons of issues after power outage and Bad Disk Swap


Unholy

Recommended Posts

Let me start by saying, while quite technical and experienced, storage is not and never has been my strong suit, which is what lead me to UNRAID in the first place... 

 

I have a UPS for my machine, which covers it for about 3 minutes during a power outage, which should give it enough time to power everything down. A few nights ago there was a power outage and the server went down and did not come back up on its own. I had to manually bring it up in the morning. When it came up I had two disks that were emulated, and after an attempt at rebuilding the same disks I swapped them out. I started a rebuild and figured everything would be fine... it was not.

 

Once the rebuild finished I still had disabled/emulated disks. I rebooted the server, thinking it may help bring things back to a stable state... after doing so... Instead what I had was four unmountable disks, including the two original disks which had issues. However the state of the disks was reporting healthy and online, which confused me...

 

Following the advice of some other posts on the forums, I tried running suggested xfs_repair with options, and that did not seem to help. It seems like the 'problem' disks keep jumping from one to another. 

 

While I know this may wind up being an issue that results in some disks needing to be completely reconstructed, I'm hoping to avoid that if at all possible. I am running two LSI 2308 and an Expander card ( I don't know off the top of my head the model and I'm trying to avoid pulling anything apart again ).

 

I believe part of the problem is the number of devices and complication of the configuration...

 

x16 Slot3 : HBA

x16 Slot2 : HBA

x4 M.2 Slot 1 : M.2 to PCIEx4 Slot : Expander

 

So, what I'm wondering now is..

 

Is there a reasonable way to permanently remove a few disks from the array, while maintaining the data that exists on these disks? (The Array is using HighWater, and from what I can tell only about 10 of the disks are actually in use (though ~21gb is allocated on every disk and I don't know what the purpose of that is). EDIT: What I mean is to reduce the array to say, 14-16 disks, by removing a 2-4 of the ones not in use by any of my shares (which at the moment would be ~14-18)

 

I'm open to suggestions, and hoping not to have to start from scratch... I migrated most of this data from a windows server based storage system (which was bad in a lot of ways). I still have that server (powered off) which I could use to move data around if I needed to, but I'm hoping not to have to do that. 

 

A side note, I did just pick up an ASR72405 which should be here in the next week or so, and from a spec standpoint seems to support a much larger number of devices than the 2308s. 

 

I've attached my Diagnostics, as it seems this is always helpful. I've also included a visual representation of the array layout (each row in the backplane has its own SAS cable) - I don't know if this is helpful information, but I figured it won't hurt).

 

Thanks in advance to anyone who has any advice! 

 

 

Disk Layout

image.png.30cfb9b98a45e2f19113b15e6d6b7072.png

unraid-diagnostics-20220227-1144.zip

Edited by Unholy
NOTE ABOUT DEVICE COUNT
Link to comment

There are constant errors with multiple devices, these suggests a power/connection problem, you should also upgrade the LSI firmware since it's very old.

 

Creating a smaller array is relatively easy, you just need to do a new config, unassign the disks you don't want in the new array and start it to begin a parity sync, but this should be done only after the above issues are solved.

Link to comment
3 hours ago, JorgeB said:

There are constant errors with multiple devices, these suggests a power/connection problem, you should also upgrade the LSI firmware since it's very old.

 

Creating a smaller array is relatively easy, you just need to do a new config, unassign the disks you don't want in the new array and start it to begin a parity sync, but this should be done only after the above issues are solved.

 

@JorgeB thanks for the input. I don't know why a firmware update hadn't really crossed my mind. I'm taking a look now. I bought the chassis, drives, and HBA configuration from someone about a year ago, and hadn't given it much thought since it more or less worked as expected. I may need to shut the array down again and take a closer look at the hardware, as I don't know if the expander is a problem and that doesn't seem to be reported in the system info. 

 

 

Link to comment
2 minutes ago, JorgeB said:

Expander is an Intel RES2SV240 with the latest firmware, which is known to work great, unless it's failing.

giphy.gif

Clearly you've been doing this a while. Thanks for that. 

 

I'm trying to locate the correct firmware now, and make sure I understand the flashing process before I even consider getting started. Thanks again for your help. It's much appreciated!

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...