ElliotJMD Posted August 7, 2022 Share Posted August 7, 2022 Hi Guys, I have a potentially quite severe problem with my Unraid array. It is a very specific series of events that have occurered that has led to this issue. To start, before this issue happened, I had an array that was built from five drives; 1x 16tb parity drive, 1x 16tb storage drive, and 3x 8tb storage drives. The array at this point was completely find and healthy with 40TB of usable space that showed correctly as 36.3TB in Windows. A couple of weeks ago I started noticing S.M.A.R.T errors coming from one of the 8tb storage drives, it was climbing rapidly with sector reallocations. As this drive was still fairly new I decided to send the drive back to Amazon for a refund. This left my array in a state where it listed; 1x healthy 16tb parity drive, 1x healthy 16tb storage drive, 2x healthy 8tb storage drive, and the recently 1x missing 8tb storage drive. This missing drive contained approximately 3tb of data but was being emulated by unraid perfectly for the times that I needed to access it. It still showed the correct 40TB of total space in the unraid control panel and 36.3TB in Windows and all my data, including for the emulated drive was still there and accessible. My specific setup I believe is very important to understanding the next part of this post. The 8tb drives are Seagate desktop drives that connect to the computer using usb. Each of these drives has its own cable connecting it to the computer. The two 16tb drives are Seagate Exos drives that are both in a five bay usb enclosure that I will refer to as the 'Yotto' for the rest of this post. This meant that two of the five slots in the Yotto were filled, with three spare slots for new drives when I slowly replaced the 8tb drives. This now brings me to today. I purchased a replacement drive that just arrived, a 16tb Toshiba enterprise drive. The plan was to install it into the Yotto and have unraid reconstruct the missing drive onto it - the way you would normally upgrade a drive if you have parity - but this is where the issue begins. Once powered up, the new drive did not sound particularly healthy. Once unraid had finished booting, all of the drives from the Yotto were missing, that being the two existing 16tb drives and the new drive. But more importantly, the missing drive from earlier did not show up as it has been for the past two weeks. Previously it has shown in the array as 'Missing' with the old drive's serial number underneath it. Unraid was showing the total amount of storage of 16tb, only the two remaining 8tb drives. Upon seeing that the entire Yotto was missing, I shut down unraid and removed the new toshiba drive which was making worse sounds by the second. Booting unraid back up, the seagate drives from the Yotto had returned but for a total capacity of only 32TB, the 8tb emulated drive was completely gone. Instead of the Identification dropdown saying 'missing' with the old drive serial underneath it, it now shows as unassigned. But, even stranger is that unraid still shows the slot the drive used to be in with a red cross that when hovering over it shows a tooltip reading the following: 'DEVICE IS MISSING (DISABLED), CONTENTS EMULATED'. After some poking around, I do know for certain that the data on this missing drive is not being emulated as some files are missing that would have been on it. Side note, the emulated disk has also disappeared from my Historical Devices panel at the bottom of the page. When starting the array in maintenance mode, the offending disk (labelled Disk 4) shows as 'Not installed' with a warning on the end of the line stating 'Unmountable: Wrong or no file system'. There is also the option at the bottom of the page stating that an unmountable disk is present and gives me the option to format it. This all leads to the question of what should I do? I have contacted Amazon for a replacement of the new drive which will happen... sometime? They have given me absolutely no timeframe on this other than more than a few days. I obviously can't be using the array in the state that it is currently in. The last thing I want to do is lose any of those files so if there is a chance of data recovery from the parity data then I would like to keep that option open, but if there is anything I can do in the meantime to get access to this data again then that would be fairly helpful as there is a decent bit of important data on that drive. By chance I have a couple of screenshots from before the issue which I have attached, I've also attached screenshots of anything after the issue that I think could be relevant but please ask for more if it could be useful. Thank you to anybody that has given the time to even bother reading this, any sensible suggestions would be greatly appreciated. Quote Link to comment
itimpi Posted August 7, 2022 Share Posted August 7, 2022 Do NOT format the drive - that will write an empty file system to the drive and update parity to reflect this resulting in you losing its contents. The correct procedure to follow is documented here in the online documentations accessible via the ‘Manual’ link at the bottom of the GUI. Quote Link to comment
ElliotJMD Posted August 7, 2022 Author Share Posted August 7, 2022 Thanks for the response, As I am currently without the drive do I still want to follow that and run a check even with it showing the emulated drive error? Quote Link to comment
itimpi Posted August 7, 2022 Share Posted August 7, 2022 Just now, ElliotJMD said: Thanks for the response, As I am currently without the drive do I still want to follow that and run a check even with it showing the emulated drive error? Yes - the check (and repair) will be run against the emulated drive. You want to repair the emulated drive as all a rebuild does is make a physical drive match an emulated drive (including any file system corruption that might be present). Quote Link to comment
ElliotJMD Posted August 7, 2022 Author Share Posted August 7, 2022 Okay, thank you. I will start a check now Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 The check just finished this morning and it found no errors, there isn't a repair button on this page that I can find and the emulated drive still isn't showing up as storage in the array. I've attached some screenshots of the page again Quote Link to comment
JorgeB Posted August 9, 2022 Share Posted August 9, 2022 Post new diags after array start in normal mode. Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 6 hours ago, ElliotJMD said: The check just finished this morning and it found no errors, there isn't a repair button on this page that I can find I think you must not have done check filesystem at all since check filesystem isn't on that page. Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 When you do check filesystem, be sure to capture the output so you can post it. Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 Apologies, didn't realise there were two different check buttons. I ran the check filesystem one and this was the output of it, it's quite long so I've attached it as a text file Disk4 Check.txt Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 Just thought I would mention some things not mentioned in the thread: On 8/7/2022 at 2:52 PM, ElliotJMD said: This left my array in a state where it listed; 1x healthy 16tb parity drive, 1x healthy 16tb storage drive, 2x healthy 8tb storage drive, and the recently 1x missing 8tb storage drive. This missing drive contained approximately 3tb of data but was being emulated by unraid perfectly for the times that I needed to access it. It still showed the correct 40TB of total space in the unraid control panel and 36.3TB in Windows and all my data, including for the emulated drive was still there and accessible. While it is true Unraid was emulating the drive, it is also true that you no longer had any redundancy (parity protection) since you had a missing disk and only single parity. Perhaps you understood that already. How long have you been running without protection? On 8/7/2022 at 2:52 PM, ElliotJMD said: The 8tb drives are Seagate desktop drives that connect to the computer using usb USB NOT recommended for array or pool for many reasons. On 8/7/2022 at 2:52 PM, ElliotJMD said: The two 16tb drives are Seagate Exos drives that are both in a five bay usb enclosure I assume there is only 1 USB connection for the whole enclosure, even worse than using USB for each disk. And now, the emulated disk is no longer mountable. This makes me wonder about something and I'm not sure about the answer so I hope one of the other knowledgeable people on this thread will respond. Since it is not possible to disable any more disks, what happens if a write to any of the actual disks in the array fails. What happens if a write to the emulated disk fails (which I guess would mean a write to parity since it is the disk emulating writes)? I've never heard any reports of Unraid stopping the whole show in these cases. So does the write just fail, possibly causing filesystem corruption? I assume an error is logged, but does the user know about it? Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 7 minutes ago, trurl said: While it is true Unraid was emulating the drive, it is also true that you no longer had any redundancy (parity protection) since you had a missing disk and only single parity. Perhaps you understood that already. How long have you been running without protection? Yes, I was fully aware I was running without protection. I have been missing the drive for approximately two weeks but I have only had the array online for a total of slightly less than two hours when I needed to access some files that I needed. 9 minutes ago, trurl said: USB NOT recommended for array or pool for many reasons. 9 minutes ago, trurl said: I assume there is only 1 USB connection for the whole enclosure, even worse than using USB for each disk. Although you are correct with it not being necessarily a good idea, due to space and hardware constraints it is unfortunately the only possible way to go for me. 11 minutes ago, trurl said: Since it is not possible to disable any more disks, what happens if a write to any of the actual disks in the array fails. What happens if a write to the emulated disk fails (which I guess would mean a write to parity since it is the disk emulating writes)? I've never heard any reports of Unraid stopping the whole show in these cases. So does the write just fail, possibly causing filesystem corruption? I assume an error is logged, but does the user know about it? I do not know what will happen if I try to write data to the array, as soon as I noticed the emulated disk missing I immediately stopped the array to prevent any further damage Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 9 hours ago, JorgeB said: Post new diags after array start in normal mode. redmatter-diagnostics-20220809-2104.zip That's the diagnostics after starting the array normally Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 1 hour ago, ElliotJMD said: as soon as I noticed the emulated disk missing It isn't missing, it is unmountable. You could rebuild it but the result would be an unmountable disk. 57 minutes ago, ElliotJMD said: That's the diagnostics after starting the array normally I think the diagnostics were asked for because it was assumed you had repaired the filesystem. Since you didn't actually do the repair it is still unmountable. The reason I didn't tell you to go ahead and repair the disk after seeing the check results is because they didn't look very good, but maybe that is the only way forward. Do you have backups of anything important and irreplaceable? Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 1 minute ago, trurl said: It isn't missing, it is unmountable Of course the disk itself is missing and has been for a while. It is still being emulated, but the emulated disk is unmountable. Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 3 minutes ago, trurl said: Do you have backups of anything important and irreplaceable? Most of the data on the missing drive can be gotten again, I don't think there was any substantial data on the drive that was important. I mostly use it as a local repository of all my data. The rest of the array currently does not have any backup as I kept putting it off as I worked it out to take somewhere in excess of two years on my internet to backup all of my data, I do have a couple TBs of portable HDDs that I can copy super important info onto in the meantime if you believe it is worth it. 5 minutes ago, trurl said: The reason I didn't tell you to go ahead and repair the disk after seeing the check results is because they didn't look very good, but maybe that is the only way forward. If there is a chance of getting the data back it would be preferable even if it leaves me with a bunch of corrupted files, depending what is on the drive I would actually be perfectly contempt with only retrieving a table of contents. An additional, after a lot of effort with Amazon, I have a new HDD coming in two days so if this data is recovered then I won't be emulating the drive for very long. Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 1 minute ago, ElliotJMD said: I do have a couple TBs of portable HDDs that I can copy super important info onto in the meantime if you believe it is worth it. You must always have another copy of anything important and irreplaceable even if everything is working well. Parity is not a substitute for backups. Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 Since you know you have no parity protection, and you can't even read the emulated missing disk, what will you do if another disk begins to have problems? Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 Another thing to consider. The way the disk is emulated, even though unmountable, and the way it would be emulated while you did the filesystem repair, and the way it would be emulated to rebuild it whether or not the filesystem were repaired, is by reading all the other disks. So you are making all your other disks work harder. Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 2 minutes ago, trurl said: while you did the filesystem repair I did the check but I haven't done the repair yet, I wasn't exactly sure how to do it even reading the docs although I did get an error message that sounds somewhat useful to somebody that understands it ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 I'll also create some copies of some of the files that are the most important to me Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 26 minutes ago, ElliotJMD said: I did get an error message The xfs utility doesn't know that Unraid has already tried to mount the disk and failed. Quote Link to comment
ElliotJMD Posted August 9, 2022 Author Share Posted August 9, 2022 16 minutes ago, trurl said: The xfs utility doesn't know that Unraid has already tried to mount the disk and failed. Do you reckon I should run the -L parameter as a kind of last ditch attempt? Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 2 hours ago, ElliotJMD said: Although you are correct with it not being necessarily a good idea, due to space and hardware constraints it is unfortunately the only possible way to go for me. I wouldn't trust 24TB plus whatever was on the missing disk to that setup. 1 minute ago, ElliotJMD said: Do you reckon I should run the -L parameter as a kind of last ditch attempt? Running -L is usually (always?) necessary to actually do the repair. Not sure if you will like the result, but I guess there is nothing else to be done. Quote Link to comment
trurl Posted August 9, 2022 Share Posted August 9, 2022 Just curious, how long does 16TB parity check take with that setup? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.