Drive Failure - now 2nd 'Unmountable: not mounted' after reboot.


Recommended Posts

Ok - apologies for posting what seems like has been posted many times before.  I have a 7 drive array with 1 parity drive.  I noticed a couple days one of my drives failed - so I ordered a replacement.  Replacement came today so I shut down Unraid through the GUI, unplugged my old drive (Disk 1) and plugged in a new drive.  New drive is bigger than all other drives - so was going to attempt a parity swap (https://wiki.unraid.net/The_parity_swap_procedure).  Before getting to that -- I noticed that a 2nd disk (i.e. not the one I removed) was now reporting as unmountable.

First screenshot is from prior to reboot, second screenshot is of after:

image.thumb.png.ccb04137e0ef7c346babae977c48585b.png

image.thumb.png.9b46bfb04ee3ce962dab021def3ad7de.png

Struggling to figure out what I should do here. 

 

I tried swapping cables to the drive - no difference.

 

I tried restarting the array in maintenance mode, went to the drive and checked the 'Check' button in the 'Check Filesystem Status' area.  The results were:

 

Phase 1 - find and verify superblock... superblock read failed, offset 0, size 524288, ag 0, rval -1 fatal error -- Input/output error

 

I did a 'SMART short self-test' and it returned 'Completed without error'.

 

image.thumb.png.66a147f4ac948dbc51ce0ffa580314a6.png

 

What other diagnostics can I generate to post in order to try to figure out how to proceed at this point?

Link to comment

So -- further info ... for the heck of it - tried reconnecting the drive that wasn't working.  It is now picking it up (seemingly) fine.    However - when I attempt to re-assign it to disk 1 it seems to think it is a new drive.    It says "Start will start Parity-Sync and/or Data-Rebuild." and "Stopped. Replacement disk installed" next to the array start button.   Any way to say 'no - just use this as it is'??

 

image.png.35bed88547d01afd4db9107800da5830.png

Link to comment

The reason it wants to rebuild disk1 is because you reassigned it. Unassign disk1, go to Disk Settings and turn off autostart.

 

You're still having connection problems on disk3. Shutdown, check all connections again, SATA and power, including splitters.

 

Reboot, start the array in normal mode with nothing assigned as disk1, then post new diagnostics and a screenshot of Main - Array Devices.

Link to comment

Disks 1 and 3 both have pending sectors. Also, disk3 is WD. You should add attributes 1 and 200 for monitoring on any WD disk. SMART attribute 1 for that WD might also be reason for concern. Neither disk has had extended self-test run. Might be worth running extended test on both to see if one is worse than the other. There is a way to get it to rebuild disk3 instead of disk1 if it seems like a good idea.

 

SMART for other disks looks OK.

 

Lets see if whether the disks are mountable again after you recheck connections.

 

 

 

Link to comment

When attempting to re-start the array disk 3 was still in the same state as before best I could tell. 

 

image.thumb.png.c6bb1eed6d1d8307be0861704d86a44b.png

 

Also - so far as Disk 1 goes - I'm unclear what I should actually try doing with that at the moment.   Should I be attempting to click the 'Mount' button in the 'Unassigned Devices'? 

 

Also - hugely appreciate your assistance.  Have had a sick feeling in my stomach the last several hours here trying to figure this out :).  

Edited by MooTheKow
Link to comment

Looking at syslog again, looks like it is an actual disk problem with disk3. We might want to try to rebuild that one instead of disk1. And you might need to replace both but of course you can only rebuild one at a time with single parity.

 

But your diagnostics and screenshot are without the array started. Can't tell whether filesystems are mountable until the array is started.

 

19 minutes ago, trurl said:

start the array in normal mode with nothing assigned as disk1, then post new diagnostics and a screenshot of Main - Array Devices.

 

Since you have 2 drives with SMART warnings, I have to ask. Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and possible data loss.

Link to comment

Thought I did have it set up to get emails - but I guess I either didn't get it, or didn't understand what the pop-ups were telling me. 

 

Just re-tested the email notifications in the setup - and looks like my authorization is failing now.. apparently need to re-set that up.  (Looks like my gmail SMTP settings from a while back no longer work.. reading up on how to fix that now).  UPDATE: generated a gmail app password and that seems to be working now.

 

So - How can I go about attempting to rebuild disk 3?  I mentioned earlier about Disk 1 and it wanting to treat it as a new drive -- I was unclear from your response what I can do about that.

Edited by MooTheKow
Link to comment

We are posting at the same time. Your new screenshot shows what we need.

 

Probably a good idea to not allow any further writes to any disks in the array for now. Disable Docker and VM Manager in Settings. Doesn't look like you have a cache drive. If not then we don't have to worry about Mover running.

 

Run an extended SMART self-test on disk3 and the disk formerly assigned as disk1. You can do both at the same time. It will take many hours unless the test fails before then.

 

I will check back in the morning.

Link to comment
2 hours ago, MooTheKow said:

noticed a couple days one of my drives failed

7 minutes ago, MooTheKow said:

How can I go about attempting to rebuild disk 3?

 

Even though disk1 was disabled, it is emulated by updating parity. If there were any writes to emulated disk1 while you were waiting to replace it, those writes will be lost if you rebuild disk3 instead. And since disk1 is the disk most likely to be written on a new system, and you have no cache, disk1 is probably where all your docker and VM data is, and where new files were probably written. Also, any writes to emulated disk1 will have updated parity, which means physical disk1 and parity are out-of-sync, and so parity and physical disk1 together are also out-of-sync with disk3. All other disks are required to rebuild a disk, so this means rebuild of disk3 will be compromised.

 

But, since disk3 has failed extended self-test, we may not have any choice but to rebuild it instead of disk1. That is assuming disk1 doesn't also fail extended self-test. We may have to resort to cloning these bad disks before we can recover anything, which might mean you need more spare disks.

 

Do you have backups of anything important and irreplaceable?

 

I will ping @JorgeB to get more help but it is already way past bedtime in his timezone.

 

We will take a look in the morning to see the results of disk1 self-test and decide where to go from there.

Link to comment

So.. still no results on Disk 1 .. woke up at 4am and found it said it was interrupted or something:

 

kowunraid-smart-20221222-2231.zip

 

Tried starting it again --- (whis would be roughly 2 and a half hours ago.. maybe 3?) .. just got up and checked and says 'self-test in progress, 10% complete' still..    Opened a new tab attempting to look at the disk and am seeing this -- that because the test is in progress?

 

image.thumb.png.c7128e1f2534d01f970eeb6504731caf.png

 

Also - spare disks were mentioned in an earlier post.  Currently I have a couple 4TB external drives and a new 14TB internal drive.    if I need to pick up additional internal drives I can/will.

Link to comment

You may have to disable spindown on the disk to get smart test to complete.

1 hour ago, MooTheKow said:

Opened a new tab attempting to look at the disk and am seeing this -- that because the test is in progress?

Just to make sure everything is consistent, only open one browser to your server and see what it looks like. That screenshot suggests the test is no longer running and the disk can't be communicated with.

 

Attach new diagnostics to your NEXT post in this thread.

 

9 hours ago, trurl said:

We may have to resort to cloning these bad disks before we can recover anything, which might mean you need more spare disks.

 

Link to comment

Think so (backup of irreplaceable stuff).  I can still access some files -- so going to try to backup a 'pictures' folder (though i think most are already backed up amazon photos) as well as a videos folder.  The majority of the data is backups of media (blu-ray rips, tv show rips, etc).  Have backups of old hard drives (just in case -- most of it things I've not accessed in years an may never need to).    So (Thankfully) at this point most data loss would just fall into the 'huge inconvenience' category instead of the 'now i need to cry that it is lost forever' category.

I did disable spin-down (Best I can tell) in the settings tab. 

 

kowunraid-diagnostics-20221223-0845.zip

Link to comment

Results from the short self-test:

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Extended offline Interrupted (host reset) 00% 36925 -

# 2 Extended offline Interrupted (host reset) 00% 36919 -

# 3 Short offline Completed without error 00% 17972 -

 

kowunraid-smart-20221223-1150.zip

image.png.6a7e0c5d08f8b42d68a0904d82d436c6.png

 

going to attempt an extended self-test now.

 

Edited by MooTheKow
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.