Jump to content
We're Hiring! Full Stack Developer ×

Missing array drive is no longer being emulated


Recommended Posts

Hi Guys, I have a potentially quite severe problem with my Unraid array.

 

It is a very specific series of events that have occurered that has led to this issue.

 

To start, before this issue happened, I had an array that was built from five drives; 1x 16tb parity drive, 1x 16tb storage drive, and 3x 8tb storage drives. The array at this point was completely find and healthy with 40TB of usable space that showed correctly as 36.3TB in Windows.

 

A couple of weeks ago I started noticing S.M.A.R.T errors coming from one of the 8tb storage drives, it was climbing rapidly with sector reallocations. As this drive was still fairly new I decided to send the drive back to Amazon for a refund.

 

This left my array in a state where it listed; 1x healthy 16tb parity drive, 1x healthy 16tb storage drive, 2x healthy 8tb storage drive, and the recently 1x missing 8tb storage drive. This missing drive contained approximately 3tb of data but was being emulated by unraid perfectly for the times that I needed to access it. It still showed the correct 40TB of total space in the unraid control panel and 36.3TB in Windows and all my data, including for the emulated drive was still there and accessible.

 

My specific setup I believe is very important to understanding the next part of this post.

The 8tb drives are Seagate desktop drives that connect to the computer using usb. Each of these drives has its own cable connecting it to the computer. The two 16tb drives are Seagate Exos drives that are both in a five bay usb enclosure that I will refer to as the 'Yotto' for the rest of this post. This meant that two of the five slots in the Yotto were filled, with three spare slots for new drives when I slowly replaced the 8tb drives.

 

This now brings me to today. I purchased a replacement drive that just arrived, a 16tb Toshiba enterprise drive. The plan was to install it into the Yotto and have unraid reconstruct the missing drive onto it - the way you would normally upgrade a drive if you have parity - but this is where the issue begins. Once powered up, the new drive did not sound particularly healthy. Once unraid had finished booting, all of the drives from the Yotto were missing, that being the two existing 16tb drives and the new drive. But more importantly, the missing drive from earlier did not show up as it has been for the past two weeks. Previously it has shown in the array as 'Missing' with the old drive's serial number underneath it. Unraid was showing the total amount of storage of 16tb, only the two remaining 8tb drives.

 

Upon seeing that the entire Yotto was missing, I shut down unraid and removed the new toshiba drive which was making worse sounds by the second. Booting unraid back up, the seagate drives from the Yotto had returned but for a total capacity of only 32TB, the 8tb emulated drive was completely gone. Instead of the Identification dropdown saying 'missing' with the old drive serial underneath it, it now shows as unassigned. But, even stranger is that unraid still shows the slot the drive used to be in with a red cross that when hovering over it shows a tooltip reading the following: 'DEVICE IS MISSING (DISABLED), CONTENTS EMULATED'. After some poking around, I do know for certain that the data on this missing drive is not being emulated as some files are missing that would have been on it.

 

Side note, the emulated disk has also disappeared from my Historical Devices panel at the bottom of the page.

 

When starting the array in maintenance mode, the offending disk (labelled Disk 4) shows as 'Not installed' with a warning on the end of the line stating 'Unmountable: Wrong or no file system'. There is also the option at the bottom of the page stating that an unmountable disk is present and gives me the option to format it.

 

This all leads to the question of what should I do? I have contacted Amazon for a replacement of the new drive which will happen... sometime? They have given me absolutely no timeframe on this other than more than a few days. I obviously can't be using the array in the state that it is currently in. The last thing I want to do is lose any of those files so if there is a chance of data recovery from the parity data then I would like to keep that option open, but if there is anything I can do in the meantime to get access to this data again then that would be fairly helpful as there is a decent bit of important data on that drive.

 

By chance I have a couple of screenshots from before the issue which I have attached, I've also attached screenshots of anything after the issue that I think could be relevant but please ask for more if it could be useful.

 

Thank you to anybody that has given the time to even bother reading this, any sensible suggestions would be greatly appreciated.

Disk 4 Missing.png

Historical Devices Empty.png

Offline Array After Issue.png

Online Array After Issue.png

Online Array Before Issue.png

Windows After Issue.png

Windows Before Issue.png

Link to comment
Just now, ElliotJMD said:

Thanks for the response,

 

As I am currently without the drive do I still want to follow that and run a check even with it showing the emulated drive error?

Yes - the check (and repair) will be run against the emulated drive.    You want to repair the emulated drive as all a rebuild does is make a physical drive match an emulated drive (including any file system corruption that might be present).

Link to comment

Just thought I would mention some things not mentioned in the thread:

 

On 8/7/2022 at 2:52 PM, ElliotJMD said:

This left my array in a state where it listed; 1x healthy 16tb parity drive, 1x healthy 16tb storage drive, 2x healthy 8tb storage drive, and the recently 1x missing 8tb storage drive. This missing drive contained approximately 3tb of data but was being emulated by unraid perfectly for the times that I needed to access it. It still showed the correct 40TB of total space in the unraid control panel and 36.3TB in Windows and all my data, including for the emulated drive was still there and accessible.

While it is true Unraid was emulating the drive, it is also true that you no longer had any redundancy (parity protection) since you had a missing disk and only single parity. Perhaps you understood that already. How long have you been running without protection?

 

On 8/7/2022 at 2:52 PM, ElliotJMD said:

The 8tb drives are Seagate desktop drives that connect to the computer using usb

USB NOT recommended for array or pool for many reasons.

 

On 8/7/2022 at 2:52 PM, ElliotJMD said:

The two 16tb drives are Seagate Exos drives that are both in a five bay usb enclosure

I assume there is only 1 USB connection for the whole enclosure, even worse than using USB for each disk.

 

And now, the emulated disk is no longer mountable. This makes me wonder about something and I'm not sure about the answer so I hope one of the other knowledgeable people on this thread will respond.

 

Since it is not possible to disable any more disks, what happens if a write to any of the actual disks in the array fails. What happens if a write to the emulated disk fails (which I guess would mean a write to parity since it is the disk emulating writes)?

 

I've never heard any reports of Unraid stopping the whole show in these cases. So does the write just fail, possibly causing filesystem corruption? I assume an error is logged, but does the user know about it?

Link to comment
7 minutes ago, trurl said:

While it is true Unraid was emulating the drive, it is also true that you no longer had any redundancy (parity protection) since you had a missing disk and only single parity. Perhaps you understood that already. How long have you been running without protection?

Yes, I was fully aware I was running without protection. I have been missing the drive for approximately two weeks but I have only had the array online for a total of slightly less than two hours when I needed to access some files that I needed.

 

9 minutes ago, trurl said:

USB NOT recommended for array or pool for many reasons.

 

9 minutes ago, trurl said:

I assume there is only 1 USB connection for the whole enclosure, even worse than using USB for each disk.

Although you are correct with it not being necessarily a good idea, due to space and hardware constraints it is unfortunately the only possible way to go for me.

 

11 minutes ago, trurl said:

Since it is not possible to disable any more disks, what happens if a write to any of the actual disks in the array fails. What happens if a write to the emulated disk fails (which I guess would mean a write to parity since it is the disk emulating writes)?

 

I've never heard any reports of Unraid stopping the whole show in these cases. So does the write just fail, possibly causing filesystem corruption? I assume an error is logged, but does the user know about it?

I do not know what will happen if I try to write data to the array, as soon as I noticed the emulated disk missing I immediately stopped the array to prevent any further damage

Link to comment
1 hour ago, ElliotJMD said:

as soon as I noticed the emulated disk missing

It isn't missing, it is unmountable. You could rebuild it but the result would be an unmountable disk.

 

57 minutes ago, ElliotJMD said:

That's the diagnostics after starting the array normally

I think the diagnostics were asked for because it was assumed you had repaired the filesystem. Since you didn't actually do the repair it is still unmountable.

 

The reason I didn't tell you to go ahead and repair the disk after seeing the check results is because they didn't look very good, but maybe that is the only way forward.

 

Do you have backups of anything important and irreplaceable?

Link to comment
3 minutes ago, trurl said:

Do you have backups of anything important and irreplaceable?

Most of the data on the missing drive can be gotten again, I don't think there was any substantial data on the drive that was important. I mostly use it as a local repository of all my data. The rest of the array currently does not have any backup as I kept putting it off as I worked it out to take somewhere in excess of two years on my internet to backup all of my data, I do have a couple TBs of portable HDDs that I can copy super important info onto in the meantime if you believe it is worth it.

 

5 minutes ago, trurl said:

The reason I didn't tell you to go ahead and repair the disk after seeing the check results is because they didn't look very good, but maybe that is the only way forward.

If there is a chance of getting the data back it would be preferable even if it leaves me with a bunch of corrupted files, depending what is on the drive I would actually be perfectly contempt with only retrieving a table of contents.

 

An additional, after a lot of effort with Amazon, I have a new HDD coming in two days so if this data is recovered then I won't be emulating the drive for very long.

Link to comment
1 minute ago, ElliotJMD said:

I do have a couple TBs of portable HDDs that I can copy super important info onto in the meantime if you believe it is worth it.

You must always have another copy of anything important and irreplaceable even if everything is working well. Parity is not a substitute for backups.

 

Link to comment

Another thing to consider. The way the disk is emulated, even though unmountable, and the way it would be emulated while you did the filesystem repair, and the way it would be emulated to rebuild it whether or not the filesystem were repaired, is by reading all the other disks. So you are making all your other disks work harder.

Link to comment
2 minutes ago, trurl said:

while you did the filesystem repair

I did the check but I haven't done the repair yet, I wasn't exactly sure how to do it even reading the docs although I did get an error message that sounds somewhat useful to somebody that understands it

 

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Link to comment
2 hours ago, ElliotJMD said:

Although you are correct with it not being necessarily a good idea, due to space and hardware constraints it is unfortunately the only possible way to go for me.

I wouldn't trust 24TB plus whatever was on the missing disk to that setup.

 

1 minute ago, ElliotJMD said:

Do you reckon I should run the -L parameter as a kind of last ditch attempt?

Running -L is usually (always?) necessary to actually do the repair. Not sure if you will like the result, but I guess there is nothing else to be done.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...