Jump to content

[solved] Unraid 6.8.3 Failing Drive(s)


openam

Recommended Posts

I'm currently running Unraid 6.8.3.

 

Early December I was running out of space, so I installed a new 8TB drive and started pre-clear on December 6. I added it to the array a couple days later. The 12th I realized that I wasn't running the memory at full speed and I updated the bios settings to increase it. I was getting the occasional high cpu temp alert from Fix Common Problems. I was just running the stock cpu cooler, so I pulled apart my tower, and installed a Noctua NH-U12A on Dec 14, 2020. I started it up and every thing looked like it was fine. A few days later some drives went missing. I figured that I didn't seat everything correctly, since I had completely disassembled to install the fan. I have rebooted several times to try and reseat cables (tried a different Mini SAS to SATA cable different port, etc.)

 

Unfortunately I rebooted my machine a few times before reading this: https://forums.unraid.net/topic/37579-need-help-read-me-first/

I saw the part about overclocking, so I have reset the ram settings in the Bios to the original.

 

Originally it as just disk 4 that was going bad. I have a drive ordered, and it should be here tomorrow, but now it looks like disk 3 is starting to go as well. I'm wondering what the best way to proceed is. I used unbalance to transfer anything that was important on there to the new 8TB that I had installed. My problem is I don't know if there is anything on Disk 3 that I'm going to miss.

 

I'm hoping that there is a way to get one of the disk back, and that I'll just be able to swap the other bad one with the drive showing up tomorrow.

 

If I can't get one of them back what's the best way to re-build and keep the data that exists on the other drives?

palazzo-diagnostics-20201218-1132 - disk 4 missing, disk 3 appears to be missing.zip

Edited by openam
Solved
Link to comment

SMART attributes for Disk4 look OK but the emulated disk is unmountable. Bad connections are much more common than bad disks. Are any of your disks showing SMART warnings on the Dashboard page?

 

There are some problems with disk4 in syslog besides the unmountable filesystem though. Not sure if it's only a connection problem or actual disk problem. Check all connections, SATA and power, both ends, including any splitters. Then reboot and post new diagnostics.

 

 

Link to comment

I think drive 4 is really going bad. It hung on mounting when starting the array (screenshot) I'm attaching logs after reseating the drives and cables, before and after starting the array. It does appear that I can view items directly browsing disk3 again though. This gives me hope that I can just swap out disk 4 tomorrow when my new drive arrives.

 

I bought an 8TB to replace the 4TB. Is there anything special that I'll have to do before placing it in disk 4's spot?

Screen Shot 2020-12-18 at 2.26.19 PM.png

Screen Shot 2020-12-18 at 2.22.34 PM.png

Screen Shot 2020-12-18 at 2.21.35 PM.png

Screen Shot 2020-12-18 at 2.18.21 PM.png

palazzo-diagnostics-20201218-1422 - After reseating drives and cables after starting array.zip palazzo-diagnostics-20201218-1414 - After reseating drives and cables prior to starting array.zip

Link to comment

Yeah pretty much everything irreplaceable is on other working drives in the system, and/or on a separate NAS.

 

What are you looking at to see the failing disk? I'm pretty sure both 3 and 4 are failing. I couldn't even mount disk 4 on it's own went I took it out of the array.

 

I'm thinking my next steps are to start from scratch, but I'd really rather keep my appdata from my docker containers, and as many of the files from the other drives as possible. The remaining stuff is actually backups from this computer I'm using, so once I'm back up and running I can re-backup this machine.

 

I'm pretty sure all 4 of the those 4TB drives are pretty old. I'm guessing the other 2 will probably be gone within the next year.

 

 

Link to comment
12 minutes ago, openam said:

What are you looking at to see the failing disk? I'm pretty sure both 3 and 4 are failing. I couldn't even mount disk 4 on it's own went I took it out of the array.

It says Disk3 is failing, which is coming from SMART for Disk3. Disk4 was disabled and unmoutable but not failing.

 

Disk4 needs to be rebuilt. But reliably rebuilding every bit of a disabled or missing disk requires reliably reading every bit of all other disks. If you had dual parity you could rebuild both but as it is the failing Disk3 is going to compromise the rebuild of Disk4.

 

We might be able to get it to rebuild the failing Disk3 instead of Disk4 and then try to fix the filesytem on Disk4. Do you have a replacement for Disk3?

Link to comment

I don't currently have any extra 4TB or larger drives on hand. Tomorrow I have a new 8TB showing up.

 

I believe it originally started to do a rebuild on disk 4. I saw the blue square/circle/dot next to it when I started the array once. Not sure if that means it's messed up. I have pulled it out of the array (unassigned it from the array) and started the array that way. When it was in that state I tried to mount it using unassigned devices, and it wouldn't mount there. I think I tried that after a blue square array start. Granted maybe reseating the cables and drive have fixed that.

Link to comment
5 minutes ago, openam said:

Tomorrow I have a new 8TB showing up

Let's wait on that and we can see about rebuilding the failing disk3. Don't do anything else with that disk4 because we need it just as it is to try to rebuild disk3. We can probably repair the filesystem on disk4 later. Also might be useful to get others to look at this thread so I am going to ping @JorgeB but I'm sure it's past bedtime in his timezone right now.

  • Thanks 1
Link to comment
5 minutes ago, openam said:

Should I stop the array

Wouldn't hurt, but if you really need to copy anything off the array to another computer for backup purposes that should be OK. Just don't write anything to your server.

 

Looks like that thumbs down on parity is just a single CRC. If you mouseover it will let you acknowledge that.

Link to comment

I actually was moving stuff from disk 3 to disk 7 yesterday and early today, when I started getting issues. Most of what's on disk 7 is actually from disk 3. Does that mean my parity is all messed up already? If so I'd be alright with just blowing it away, and starting over, obviously I'd like to keep as much of the data as possible from the healthy drives. I think this time I'd start with double parity drives. I'd almost like to get some larger drives for parity.

Link to comment
2 hours ago, openam said:

Most of what's on disk 7 is actually from disk 3. Does that mean my parity is all messed up already?

No parity doesn't care about any of that. It doesn't even know about files.

 

Currently it isn't trying to use disk4 since it is disabled and instead it is getting the data for disk4 from the parity calculation by reading all the other disks. It's possible the emulated disk4 may be unmountable simply because the bad disk3 is affecting the emulation. Could be if we make it use disk4 so we can rebuild disk3 to a new disk that disk4 would be mountable after all.

 

Let's see what other ideas may come up while we wait for the new disk to arrive.

Link to comment

Disk3 SMART error it's not always immediately critical, i.e., disk might still be OK for now, and not seeing any related errors in the syslog so far, there are errors about disk4, and it dropped offline, first thing to try it to check filesystem on the emulated disk4 (and hope no errors on disk3 for now) and check connection on actual disk4 to get a SMART report.

  • Thanks 1
Link to comment

Thanks for the check disk filesystem link. It appears the check filesystem on disk 4 just finished. I am attaching a zip with the diagnostics after the check filesystem run. I'm also including disk 6 smart download since it was showing bad on the dashboard now. There is also a few screen shots.

 

- when I started the run

- a status hour and a half later

- dashboard showing drive 6 says it has issues now

- the finish out put of disk 4

 

Not sure if I need to re-run check without the `-n` option now or not.

 

I just looked at the tracking for my new HDD, and it still doesn't say out for delivery. It appears to have arrived at the local carrier facility this morning though. 

 

Thanks again for your help!

palazzo-diagnostics-20201219-1021 - plus screenshots.zip

Link to comment
3 hours ago, openam said:

dashboard showing drive 6 says it has issues now

Same FAILING NOW as disk3 so I guess not critical and at this point with multiple disks in this state I think you have to go forward with repairing the emulated disk and see what happens. So

3 hours ago, openam said:

re-run check without the `-n` option

 

 

Then

 

25 minutes ago, openam said:

Should I run preclear on it?

A clear disk isn't required for this scenario, but it would be a good idea to test it with preclear anyway since you already have other problems going on.

26 minutes ago, openam said:

Should I run full smart analysis on disk 3, 4 and/or 6?

Might as well since you can do them all at the same time and they should complete before that preclear anyway.

  • Like 1
Link to comment

Should I try starting it before running any. Smart checks and/or preclear. It's still showing a red x next to the disk.

 

2 hours ago, trurl said:

A clear disk isn't required for this scenario, but it would be a good idea to test it with preclear anyway since you already have other problems going on.

As for the preclear it appears there are different options available: 

  • clear
  • verify all the disk
  • verify mbr only
  • erase all the disk
  • erase and clear the disk

I guess we can figure out which one to do after verifying I should try starting the array with the red x.

Screen Shot 2020-12-19 at 5.07.18 PM.png

palazzo-smart-20201219-1710 - disk 4.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...