Help please - One drive is "X"'d and another is "unmountable"


Jax

Recommended Posts

Hi,

 

As per the title, I have two drives in my array that are currently unusable.

Array consists of 12 disks + Parity, and disk 4 is error'd out, while disk 8 is showing as "unmountable".

 

This originally started when disk 8 was error'd out a couple of days ago - I stopped the array, removed the disk (which felt loose in the hot swap bay) and put on my desktop caddy for testing.

It all came back fine, so I followed the procedure to re-introduce it into the array. (Ensuring that the drive had a good seat in the bay)

 

The rebuild appeared to start fine, so I went to bed and when I woke up this morning, I see the array in the state that it's in now.

 

Am I screwed?

 

Diag attached - thanks for any assistance that can be provided.

tc-nas-01-diagnostics-20200206-1259.zip

Link to comment

There were read errors on disk4 early on the rebuild, so rebuilt disk will be mostly corrupt, this looks like one of the typical SASLP problems, but since disk4 dropped offline there's no SMART, reboot/power cycle server to see if disk4 comes back online then post SMART report, avoid starting the array for now.

 

 

Link to comment

Thanks - I'll just bite the bullet and replace the controllers first.

Looks like the most reasonable options are available overseas so I'll just keep the array shut down for a few weeks till the cards arrive.

 

I will reach out to you once the cards are in and recognized by Unraid.

 

Thanks again for your help.

Link to comment
  • 2 weeks later...

Hello,

 

New LSI controllers have been installed and the system is powered back up and appears to be in the exact state it was left in.

 

What would next steps be to try and recover disks 4 & 8?

On 2/7/2020 at 3:03 AM, johnnie.black said:

Disk looks healthy, so the problem was most likely cause by the controller, SAS2LP are not recommend for a long time, you should replace them with LSI controllers, then we can try try re-enabling disk4 to rebuild disk8 again, or if you want we can try again with current controllers.

 

Link to comment

Checking the original diags to refresh my memory on what happened here I just noticed that disk8 failed to mount even before there were read errors during the rebuild:

 

Feb  5 22:47:43 TC-NAS-01 emhttpd: shcmd (184): mount -t xfs -o noatime,nodiratime /dev/md8 /mnt/disk8
Feb  5 22:47:43 TC-NAS-01 kernel: XFS (md8): Mounting V5 Filesystem
Feb  5 22:47:43 TC-NAS-01 kernel: XFS (md8): Log inconsistent (didn't find previous header)
Feb  5 22:47:43 TC-NAS-01 kernel: XFS (md8): failed to find log head
Feb  5 22:47:43 TC-NAS-01 kernel: XFS (md8): log mount/recovery failed: error -5
Feb  5 22:47:43 TC-NAS-01 kernel: XFS (md8): log mount failed
Feb  5 22:47:43 TC-NAS-01 root: mount: /mnt/disk8: can't read superblock on /dev/md8.
Feb  5 22:47:43 TC-NAS-01 emhttpd: shcmd (184): exit status: 32
Feb  5 22:47:43 TC-NAS-01 emhttpd: /mnt/disk8 mount error: No file system
Feb  5 22:47:43 TC-NAS-01 emhttpd: shcmd (185): umount /mnt/disk8
Feb  5 22:47:43 TC-NAS-01 root: umount: /mnt/disk8: not mounted.

 

This suggests there were already filesystem issues, so you can still continue but success depends on how bad that corruption was, but it might be easily fixable by xfs_repair, everything else looks fine for now, the procedure is:

 

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed.
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 8 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk8 will start rebuilding, disk should mount immediately (it likely won't mount in this case) but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

  • Thanks 1
Link to comment

Update:

 

Disk 8 has finished rebuilding with 0 errors. (according to the GUI notification)

 

image.png.f33fa5d272388dcb715aa4dbaffab96d.png

I haven't refreshed the GUI, but it's still showing that drive 8 has an unmountable file system - I suspect that may change if I refresh the GUI, but I will leave it as is and wait for your next instructions.

 

Thanks.

 

Edited by Jax
Link to comment
4 hours ago, johnnie.black said:

It won't, you'll need to run a filesystem check, remove -n or nothing will be done and if it asks for -L use it.

You're right - and interestingly, I don't have the option to run a check on this drive from the GUI - the menu section to do the check is missing completely:

 

image.thumb.png.0de141c4afd9705f43d417d4f2e59de6.png

 

Here it's showing fine for disk 9:

image.thumb.png.858847572149d88f0c8a83d7740c28c6.png

 

 

Is there a way to execute a check on it now if it's showing "No file system"?

The filesystem on drive 8 was definitely xfs prior to this issue.

 

Latest diag attached.

tc-nas-01-diagnostics-20200224-0714.zip

 

My continued thanks for all of your help.

Link to comment

That's not a good sign, and I was afraid of that since like mentioned in the invalid slot instructions post, the disk was already unmountable before the first rebuild attempt because no superblock was found, which suggest parity wasn't 100% valid, still wait for xfs_repair to finish searching the disk for a valid superblock, but I wouldn't keep my hopes up.

Link to comment
9 hours ago, johnnie.black said:

That's not a good sign, and I was afraid of that since like mentioned in the invalid slot instructions post, the disk was already unmountable before the first rebuild attempt because no superblock was found, which suggest parity wasn't 100% valid, still wait for xfs_repair to finish searching the disk for a valid superblock, but I wouldn't keep my hopes up.

Well - I left it running and went into the office... just got home now and it has completed unsuccessfully as you had suggested it would.

 

Here is the output from the status pane in the GUI (minus a gazillion "."'s for readability):

 

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
unable to verify superblock, continuing...
....found candidate secondary superblock...
unable to verify superblock, continuing...
.................................found candidate secondary superblock...
unable to verify superblock, continuing...
.........Sorry, could not find valid secondary superblock
Exiting now.

 

So is all the data on the drive toast?

 

Link to comment
9 hours ago, Jax said:

So is all the data on the drive toast?

Unfortunately that's very likely, whatever happened it happened before the first rebuild you attempted, the filesystem was already corrupt at that time, but without prior logs can't guess why, most likely parity wasn't 100% valid.

 

You could try a file recovery utility, like UFS explorer, but difficult to guess how successful it would be, they do have a trial if you want to give it a shot.

  • Thanks 1
Link to comment
7 hours ago, johnnie.black said:

Unfortunately that's very likely, whatever happened it happened before the first rebuild you attempted, the filesystem was already corrupt at that time, but without prior logs can't guess why, most likely parity wasn't 100% valid.

 

You could try a file recovery utility, like UFS explorer, but difficult to guess how successful it would be, they do have a trial if you want to give it a shot.

Gotcha,

 

Well, thanks again for all of your help - I'll check out UFS Explorer.

I'll start the array and see exactly what files are lost - At this point, I think it will be best to just reformat disk 8 after seeing what can be salvaged using a recovery tool... We'll see.

 

 

 

 

 

 

Link to comment

Update:

 

I've tried XFS Explorer and found hundreds of folders and files that were corrupted - thinking of purchasing the software as it's not very expensive and appears to be quite useful. That said -  In looking at what remains on the array, there is nothing critical missing.

 

While the disk 8 was being scanned from my desktop, I assigned a fresh drive in it's spot in the array.

Of course it did the rebuild and came back with the same "unmountable" error.

 

Since I am OK with losing the data that was on Drive 8 - would there be any danger in formatting this new drive 8 to be used in the array?

Or is there a better way to make the drive 8 spot usable again?

 

Link to comment
5 minutes ago, johnnie.black said:

No danger, it will create a new empty filesystem that can be used immediately.

 

 

Excellent - thanks.

 

We can consider this "closed" now.

Thanks again for all of your time and help on this - exceptional!

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.