ScottishTower Posted February 24, 2021 Share Posted February 24, 2021 (edited) Hi All I'm new here, never had to use the forum as my Unraid setup has been running just fine for almost a year...that is until now and, I can't figure out what has went wrong. Hopefully someone will have the patience and point me in the right direction. So, 48 hours ago my setup was 1 Parity disk, 4TB SATA spinner Disk 1, 4TB SATA spinner Disk 2, 2TB SATA spinner Disk 3, 2TB SATA spinner Disk 4, 2TB USB2.0 spinner....ye i know but it has been no problem. 1 cache 240GB SATA SSD and of course the thumb drive with the system on it. So disk 3 was 10 years old and I got a new 4TB disk to replace it with, just because it was old and I needed more storage, it wasn't malfunctioning. I replaced the disk according to the standard procedure and the data began to rebuild onto the new disk. All fine at this stage and it was going to take at least 1 day so I went off and let it do its thing. When I checked back in 6 or so hours disk 1 was reporting lots and lots of errors, at this stage 109,606,305 errors. The process was still running but I was obviously concerned but decided to leave it to complete anyway which would still have been well into the next day. The next morning, to my surprise, the rebuild had already completed but disk 1 was showing almost 1 billion errors. The rebuild had completed far too early and I had all these errors so I just restarted the system anyway. When it booted back up I initially thought everything was fine because disk 1 was now reporting 0 errors and I could see all my shares on the network. However it was then that I found there to be a ton of missing files in those shares. When I clicked on the individual disks I could see all my files and folders but it seemed that the disks were not being formed into an array and I also couldn't copy any files into shares, Windows just reported the shares as not accessible. I had a look around and it seems as though my situation is "Disk failed while rebuilding another". I also found that because a lot of the errors were CRC errors that it might be due to a bad cable so I replaced the cable for disk 1. Somewhere in-between all this the system then decided to disable disk 1 with a red cross and so I ran the SMART extended self-test overnight and this morning the results are "completed without error". I have shut down and removed disk 1 then reinstalled it as a new device and it is currently rebuilding without any errors being reported. Also disk 1 is now being "emulated" but a lot of my shares have disappeared from the shares page and the two that still exist only contain a few files that are present on the cache drive. So none of the files on the disks are present in the shares on the network but they do exist on the actual disks. If you have read this far can you give any advice? Or even where to start fixing this? I'm ready for the worst case scenario, wipe the lot and start again. The machine is only used as file storage. None of the files are irreplaceable but I done all this by the book (I think) and I'm still screwed. Many thanks Edited February 24, 2021 by ScottishTower Quote Link to comment
Frank1940 Posted February 24, 2021 Share Posted February 24, 2021 Post up the Diagnostics file ( Tools >>> Diagnostics ) in a new post to this thread. Until you get this issue resolved, be sure to capture a new Diagnostic file each time before you reboot! The Gurus will need those files to see what is happening at the time of the event. 1 Quote Link to comment
trurl Posted February 24, 2021 Share Posted February 24, 2021 Really wish you had stopped the rebuild of disk 3 and asked for advice at that point. My guess is you corrupted the disk3 rebuild due to a bad connection on disk 1, and then you proceeded to try to rebuild disk1 with the corrupted disk 3. Very possible there was nothing wrong with any disk and you may have lost data by not seeking advice at the beginning. And the reboot means we have lost information about what happened. Will wait for your Diagnostics. 1 Quote Link to comment
ScottishTower Posted February 24, 2021 Author Share Posted February 24, 2021 Thanks for the responses. This is the first diagnostic I have created so I guess valuable information has already been lost, my inexperience to blame. The scenario you mention, trurl, looks very likely. I have attached the diagnostic zip file, thanks. Should I let it continue rebuilding disk 1? There are no errors with the new cable so that seems to point even more to a bad connection that has caused all this. ubuntuserver160-diagnostics-20210224-1246.zip Quote Link to comment
trurl Posted February 24, 2021 Share Posted February 24, 2021 Looks like all disks are mounting, so that is good. Syslog does indicate filesystem corruption on emulated/rebulding disk1 though. Maybe we can work through that. Do you have the original disk3? Probably won't help much with the rebuilding disk1 but might help if there are problems with the rebuilt disk3. 1 hour ago, ScottishTower said: Should I let it continue rebuilding disk 1? Are there any errors for any disk in the Errors column on Main? 1 Quote Link to comment
ScottishTower Posted February 24, 2021 Author Share Posted February 24, 2021 10 minutes ago, trurl said: Do you have the original disk3? Probably won't help much with the rebuilding disk1 but might help if there are problems with the rebuilt disk3. Yep I have it. 10 minutes ago, trurl said: Are there any errors for any disk in the Errors column on Main? There are no errors. Quote Link to comment
trurl Posted February 24, 2021 Share Posted February 24, 2021 SMART for all disks looks OK. 2 hours ago, ScottishTower said: When I checked back in 6 or so hours disk 1 was reporting lots and lots of errors, at this stage 109,606,305 errors. The process was still running but I was obviously concerned but decided to leave it to complete anyway which would still have been well into the next day. There should be a big warning somewhere that says if you don't understand parity, ask for advice before doing anything when you need parity to actually work for you. Parity is a common concept in computers and communications. It is basically the same wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from ALL THE OTHER BITS. Parity isn't difficult to understand. The calculation really is as simple as 1+1=2. Here is the wiki on parity: https://wiki.unraid.net/UnRAID_6/Overview#Parity-Protected_Array But you don't really even need to understand that very simple calculation. All you really needed to know to avoid your mistake is that in order to reliably rebuild every bit of a disk, it must be able to reliably read every bit of parity PLUS every bit of ALL other disks. Parity by itself obviously doesn't have the capacity to recover the data for any and all disks. It needs all the other disks. 1 Quote Link to comment
trurl Posted February 24, 2021 Share Posted February 24, 2021 10 minutes ago, ScottishTower said: There are no errors. Might as well let the rebuild complete then. Let us know when it finishes or if there is any problem during the rebuild. Then we can see what to do about the filesystems on disk1 and disk3. Keep the original disk3 just as it is. Don't even try to look at it in Unassigned Devices or anywhere else. 1 Quote Link to comment
ScottishTower Posted February 24, 2021 Author Share Posted February 24, 2021 Thank you, trurl. I will report back when then rebuild is finished which will be 1 day and 2 hours. I will also educate myself a bit more about parity and arrays in the meantime. Thanks for your help. Quote Link to comment
ScottishTower Posted February 25, 2021 Author Share Posted February 25, 2021 Rebuild completed with no errors. I have attached a screenshot of how the 4 data drives look. The newly rebuilt drive says 3.96TB used but when I click to view what's on the drive there is nothing there. There are plenty of my files still in the other drives but the shares only show what is stored in the cache drive. I have also attached the latest diagnostic. I never rebooted, this is straight after it has rebuilt disk 1. ubuntuserver160-diagnostics-20210225-1113.zip Quote Link to comment
JorgeB Posted February 25, 2021 Share Posted February 25, 2021 Check filesystem on disk1: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Remove -n or nothing will be done. 1 Quote Link to comment
ScottishTower Posted February 25, 2021 Author Share Posted February 25, 2021 (edited) I ran the check on Disk 1 and Disk 3 and everything is working again. Trurl and JorgeB ....and Frank1940, thank you very much for your guidance with this. I have lost some data but that is to be expected in this situation? I wish there was a way I could buy you guys a beer. One other thing is there any use for the original disk 3? Oh and there is stuff in the lost+found, do I just manually copy that back to where it belongs? Edited February 25, 2021 by ScottishTower Quote Link to comment
Frank1940 Posted February 25, 2021 Share Posted February 25, 2021 5 minutes ago, ScottishTower said: One other thing is there any use for the original disk 3? Yes, if you are missing files, mount that original disk 3 with the Unassigned Devices plugin and see if any of the missing files are on that disk. 1 Quote Link to comment
ScottishTower Posted February 25, 2021 Author Share Posted February 25, 2021 Ok what about the stuff in lost+found? (sorry I just edited that in) Quote Link to comment
itimpi Posted February 25, 2021 Share Posted February 25, 2021 23 minutes ago, ScottishTower said: Ok what about the stuff in lost+found? (sorry I just edited that in) As long as you can tell what it is then just move it back to where it belongs. It often not that simple when items end up in lost+found as that typically happens because a directory entry was missing or corrupt and the file names are lost so random numeric names get assigned leaving it up to you to manually inspect files to work out what they are. Often not worth the effort unless it is important data. 1 Quote Link to comment
ScottishTower Posted February 25, 2021 Author Share Posted February 25, 2021 Thanks, yes I can see the problem. I will probably just end up deleting most of it. Quote Link to comment
itimpi Posted February 25, 2021 Share Posted February 25, 2021 1 hour ago, ScottishTower said: Thanks, yes I can see the problem. I will probably just end up deleting most of it. If you do want to try and work out what the files should be then the linux 'file' command can be used to at least give you the file type. 2 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.