UNRAID 6.10.0-rc2 Failed Disk and Docker Apps Disappeared


Go to solution Solved by trurl,

Recommended Posts

Good Morning,

 

I recently discovered that I had a disk that is listed as Not Installed.  I have experienced a couple power outages in the last couple of months, not sure if they are related or not.  I am also seeing a ton of smart errors and what appear to be critical read errors on another disk.  I rebooted and placed the array into maintenance mode and performed S.M.A.R.T. test on the disk with all the errors and started a Read Check of all the drives but ended up stopping it after a few hours as it was only at around 34% and not progressing beyond the disk with all the S.M.A.R.T. and read errors.  I'm no longer seeing the docker apps I was running either and before the reboot I was seeing an error that the Docker service was not running.  After the reboot I don't see the error and the apps are missing. 

 

I'm only configured with a single parity drive so I'm obviously concerned about data loss.  I've got 2 new drives arriving tomorrow, but wanted to see if I could get some advice on the plan of attack to hopefully get restored to where I was before.  Should I replace the 2 drives before upgrading the OS or should I get the array stabilized beforehand?  Do I need to upgrade to a specific version from the RC or can I jump straight to 6.10.2?  Also, is there any possibility of recovering the docker apps or, at a minimum, the configuration of said apps?

 

Thank you for your time and advice,

Mike

Screenshot 2022-09-24 105528.png

Screenshot 2022-09-24 105612.png

unraid-diagnostics-20220924-0913.zip

Link to comment

You don't seem to have a pool named cache, but some of your user shares are configured to use that pool. In particular, appdata, domains, system are configured to prefer a pool named cache, and since it doesn't exist, all those shares have been created on the array. Possibly this is what has led up to your docker problems.

 

Anything you can tell us about that?

 

More about your other problems after I have studied diagnostics

Link to comment

No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now.

1 hour ago, evans.family.mike said:

performed S.M.A.R.T. test on the disk with all the errors

You didn't say which disk that was. Which disk was it? Do any of your other disks have SMART warnings on the Dashboard page?

 

1 hour ago, evans.family.mike said:

ton of smart errors and what appear to be critical read errors on another disk.

Which disks? Maybe it could be figured out from diagnostics but why make us work so hard?

 

1 hour ago, evans.family.mike said:

not progressing beyond the disk with all the S.M.A.R.T. and read errors.

??

 

1 hour ago, evans.family.mike said:

only configured with a single parity drive so I'm obviously concerned about data loss

Looks like only disk1 is disabled, but it is also unmountable, so we will have to repair its filesystem before rebuilding it.

 

 

Link to comment
53 minutes ago, trurl said:

No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now.

You didn't say which disk that was. Which disk was it? Do any of your other disks have SMART warnings on the Dashboard page?

My apologies, I am referring to the disk ending in 5RSND.  Yes, both the Disk 2 ending in 5RSND and Disk 3 ending in 4MPD6

 

53 minutes ago, trurl said:

Which disks? Maybe it could be figured out from diagnostics but why make us work so hard?

Disk 2 ending in 5RSND.  It's SMART Overall Health is listed as failed

53 minutes ago, trurl said:

??

I should have worded it as "progress was extremely slow due to all of the all the S.M.A.R.T. and read errors"

53 minutes ago, trurl said:

 

Looks like only disk1 is disabled, but it is also unmountable, so we will have to repair its filesystem before rebuilding it.

Ok, would that be done from the Check Filesystem Status button while in Maintenance Mode?

53 minutes ago, trurl said:

 

 

 

Link to comment
1 hour ago, trurl said:

You don't seem to have a pool named cache, but some of your user shares are configured to use that pool. In particular, appdata, domains, system are configured to prefer a pool named cache, and since it doesn't exist, all those shares have been created on the array. Possibly this is what has led up to your docker problems.

 

Anything you can tell us about that?

 

More about your other problems after I have studied diagnostics

See the attached screenshots of the pool devices I have installed.  I guess I did not have them configured properly for use by any of the shares

Screenshot 2022-09-24 134745.png

Screenshot 2022-09-24 134814.png

Screenshot 2022-09-24 134836.png

Link to comment
On 9/24/2022 at 12:45 PM, trurl said:

No SMART report for any disk that might have been assigned as disk1, might have been 8GB serial ending FYZV device sde but isn't in SMART reports now.

On 9/24/2022 at 1:46 PM, evans.family.mike said:

My apologies, I am referring to the disk ending in 5RSND.  Yes, both the Disk 2 ending in 5RSND and Disk 3 ending in 4MPD6

Disk2 is failing as you mention. Disk3 looks OK for now, just 1 Reported Incorrect, and we are going to need it for getting other disks going again.

 

Still doesn't solve the mystery of missing disk1. Do you still have the disk? Is it still attached?

 

On 9/24/2022 at 1:46 PM, evans.family.mike said:

would that be done from the Check Filesystem Status button while in Maintenance Mode?

Correct. But...

 

SInce disk2 is really failing, that might be a reason disk1 emulation isn't working well. And disk2 failure would also make it difficult or impossible to rebuild disk1. I think we are going to have to accept disk1 back into the array if you still have it and it still works, so we can try to rebuild disk2 instead. That is a little more complicated and not really documented in the wiki though there are quite a few threads where we have guided users through the process.

 

@evans.family.mikeSorry for the delay in replying. Real life, etc. I thought someone else might have taken this up.

 

Let me know if you are still waiting or if not, what you have done instead.

 

 

Link to comment

Nothing can move open files. To get those shares moved, you will have to disable Docker and VM Manager in Settings. But let's wait on trying to move anything until your array is stable again. In fact, go ahead and disable Docker and VM Manager in Settings until your array is stable.

 

Since you have gotten into this state, a couple of things I always ask.

 

Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Don't let one unnoticed problem become multiple problems and data loss.

 

Do you have another copy of anything important and irreplaceable? Parity is not a substitute for backups.

Link to comment
1 hour ago, trurl said:

Disk2 is failing as you mention. Disk3 looks OK for now, just 1 Reported Incorrect, and we are going to need it for getting other disks going again.

 

Still doesn't solve the mystery of missing disk1. Do you still have the disk? Is it still attached?

Correct. But...

 

SInce disk2 is really failing, that might be a reason disk1 emulation isn't working well. And disk2 failure would also make it difficult or impossible to rebuild disk1. I think we are going to have to accept disk1 back into the array if you still have it and it still works, so we can try to rebuild disk2 instead. That is a little more complicated and not really documented in the wiki though there are quite a few threads where we have guided users through the process.

 

@evans.family.mikeSorry for the delay in replying. Real life, etc. I thought someone else might have taken this up.

 

Let me know if you are still waiting or if not, what you have done instead.

 

 

No worries and no need to apologize.  I'm just grateful for the assistance

 

Disk 1 is still in the case and should be connected.  I have been hesitant to shut the server down to check for fear of Disk 2 permanently going out to lunch.  However, if I need to do that I will.  

 

Docker service has been disabled.  VM Manager wasn't running.

 

I don't have email notifications on as I always keep the GUI open and visible on another screen.  That said, I will get that configured.

 

Backups?   We don't need no stinking backups... (I mean, I would if it were financially feasible for me to backup 80TB of data)

Link to comment
1 hour ago, evans.family.mike said:

I would if it were financially feasible for me to backup 80TB of data

 

You get to decide what is

3 hours ago, trurl said:

important and irreplaceable

I backup only 2TB of my 20+ to external disks which I rotate offsite. Personal files, a large photo collection, and a large music collection all fit in that small space.

 

My video media gets rsynced to my backup server occasionally, and I only have that backup server because I had the hardware leftover after upgrading my main server. It isn't

3 hours ago, trurl said:

important and irreplaceable

 

Link to comment

If disk2 weren't already failing, and all you had was disabled / emulated-but-unmountable disk1, then we would repair filesystem on emulated disk1, rebuild disk1 to a spare, keep original disk1 as is in case of problems.

 

But I'm concerned that disk2 might fail so badly that we would be in danger of losing disk2 data while we tried to recover disk1 data.

 

If disk1 SMART is OK probably its data is mostly OK, so it might be OK to not try to rebuild it but accept it back into the array and instead rebuild disk2 to a new disk.

 

I'm going to see if we can get storage expert @JorgeB to give us a 2nd opinion.

Link to comment
1 hour ago, trurl said:

But I'm concerned that disk2 might fail so badly that we would be in danger of losing disk2 data while we tried to recover disk1 data.

 

If disk1 SMART is OK probably its data is mostly OK, so it might be OK to not try to rebuild it but accept it back into the array and instead rebuild disk2 to a new disk.

Agree, lets see a SMART report for disk1.

Link to comment

ok, I will order a replacement cable for that disk.  I'm already seeing alerts that Disk 1 is running hot.  Hopefully that is an issue with this specific disk and not something I'll see with the replacement disk.  Now to get the array back in functioning order so I can replace these disks.  Is there a special procedure I need to perform due to the current condition of the array?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.