New disk disaster Unraid OS Basic Version: 6.8.3


Recommended Posts

Hi All

 

I'm new here, never had to use the forum as my Unraid setup has been running just fine for almost a year...that is until now and, I can't figure out what has went wrong.

 

Hopefully someone will have the patience and point me in the right direction.

 

So, 48 hours ago my setup was

1 Parity disk, 4TB SATA spinner

Disk 1, 4TB SATA spinner

Disk 2, 2TB SATA spinner

Disk 3, 2TB SATA spinner

Disk 4, 2TB USB2.0 spinner....ye i know but it has been no problem.

1 cache 240GB SATA SSD

and of course the thumb drive with the system on it.

 

So disk 3 was 10 years old and I got a new 4TB disk to replace it with, just because it was old and I needed more storage, it wasn't malfunctioning.

 

I replaced the disk according to the standard procedure and the data began to rebuild onto the new disk. All fine at this stage and it was going to take at least 1 day so I went off and let it do its thing. 

 

When I checked back in 6 or so hours disk 1 was reporting lots and lots of errors, at this stage 109,606,305 errors. The process was still running but I was obviously concerned but decided to leave it to complete anyway which would still have been well into the next day.

 

The next morning, to my surprise, the rebuild had already completed but disk 1 was showing almost 1 billion errors. The rebuild had completed far too early and I had all these errors so I just restarted the system anyway. When it booted back up I initially thought everything was fine because disk 1 was now reporting 0 errors and I could see all my shares on the network. However it was then that I found there to be a ton of missing files in those shares.

 

When I clicked on the individual disks I could see all my files and folders but it seemed that the disks were not being formed into an array and I also couldn't copy any files into shares, Windows just reported the shares as not accessible. 

 

I had a look around and it seems as though my situation is "Disk failed while rebuilding another".

 

I also found that because a lot of the errors were CRC errors that it might be due to a bad  cable so I replaced the cable for disk 1.

 

Somewhere in-between all this the system then decided to disable disk 1 with a red cross and so I ran the SMART extended self-test overnight and this morning the results are "completed without error".

 

I have shut down and removed disk 1 then reinstalled it as a new device and it is currently rebuilding without any errors being reported. Also disk 1 is now being "emulated" but a lot of my shares have disappeared from the shares page and the two that still exist only contain a few files that are present on the cache drive. So none of the files on the disks are present in the shares on the network but they do exist on the actual disks.

 

If you have read this far can you give any advice? Or even where to start fixing this? I'm ready for the worst case scenario, wipe the lot and start again.

 

The machine is only used as file storage. None of the files are irreplaceable but I done all this by the book (I think) and I'm still screwed.

 

Many thanks

 

 

 

Edited by ScottishTower
Link to comment

Really wish you had stopped the rebuild of disk 3 and asked for advice at that point. My guess is you corrupted the disk3 rebuild due to a bad connection on disk 1, and then you proceeded to try to rebuild disk1 with the corrupted disk 3.

 

Very possible there was nothing wrong with any disk and you may have lost data by not seeking advice at the beginning.

 

And the reboot means we have lost information about what happened.

 

Will wait for your Diagnostics. 

  • Like 1
Link to comment

Thanks for the responses. This is the first diagnostic I have created so I guess valuable information has already been lost, my inexperience to blame.

 

The scenario you mention, trurl,  looks very likely.

 

I have attached the diagnostic zip file, thanks. 

 

Should I let it continue rebuilding disk 1? There are no errors with the new cable so that seems to point even more to a bad connection that has caused all this.

ubuntuserver160-diagnostics-20210224-1246.zip

Link to comment

Looks like all disks are mounting, so that is good. Syslog does indicate filesystem corruption on emulated/rebulding disk1 though. Maybe we can work through that.

 

Do you have the original disk3? Probably won't help much with the rebuilding disk1 but might help if there are problems with the rebuilt disk3.

 

1 hour ago, ScottishTower said:

Should I let it continue rebuilding disk 1?

 

Are there any errors for any disk in the Errors column on Main?

  • Like 1
Link to comment

SMART for all disks looks OK.

 

2 hours ago, ScottishTower said:

When I checked back in 6 or so hours disk 1 was reporting lots and lots of errors, at this stage 109,606,305 errors. The process was still running but I was obviously concerned but decided to leave it to complete anyway which would still have been well into the next day.

 

There should be a big warning somewhere that says if you don't understand parity, ask for advice before doing anything when you need parity to actually work for you.

 

Parity is a common concept in computers and communications. It is basically the same wherever it is used. Parity is just an extra bit that allows a missing bit to be calculated from ALL THE OTHER BITS.

 

Parity isn't difficult to understand. The calculation really is as simple as 1+1=2. Here is the wiki on parity:

 

https://wiki.unraid.net/UnRAID_6/Overview#Parity-Protected_Array

 

But you don't really even need to understand that very simple calculation. All you really needed to know to avoid your mistake is that in order to reliably rebuild every bit of a disk, it must be able to reliably read every bit of parity PLUS every bit of ALL other disks.

 

Parity by itself obviously doesn't have the capacity to recover the data for any and all disks. It needs all the other disks.

 

 

  • Thanks 1
Link to comment
10 minutes ago, ScottishTower said:

There are no errors.

Might as well let the rebuild complete then. Let us know when it finishes or if there is any problem during the rebuild. Then we can see what to do about the filesystems on disk1 and disk3. Keep the original disk3 just as it is. Don't even try to look at it in Unassigned Devices or anywhere else.

  • Like 1
Link to comment

Rebuild completed with no errors. I have attached a screenshot of how the 4 data drives look. The newly rebuilt drive says 3.96TB used but when I click to view what's on the drive there is nothing there.  There are plenty of my files still in the other drives but the shares only show what is stored in the cache drive. I have also attached the latest diagnostic. I never rebooted, this is straight after it has rebuilt disk 1.

unraid 1.jpg

ubuntuserver160-diagnostics-20210225-1113.zip

Link to comment

I ran the check on Disk 1 and Disk 3 and everything is working again. Trurl and JorgeB ....and Frank1940,  thank you very much for your guidance with this. I have lost some data but that is to be expected in this situation? I wish there was a way I could buy you guys a beer. 

 

One other thing is there any use for the original disk 3?

 

Oh and there is stuff in the lost+found, do I just manually copy that back to where it belongs?

 

 

1780936459_unraid2.jpg.5f08a0ff51f596cc8e6c372126eefb52.jpg

Edited by ScottishTower
Link to comment
23 minutes ago, ScottishTower said:

Ok what about the stuff in lost+found? (sorry I just edited that in)

As long as you can tell what it is then just move it back to where it belongs.    
 

It often not that simple when items end up in lost+found as that typically happens because a directory entry was missing or corrupt and the file names are lost so random numeric names get assigned leaving it up to you to manually inspect files to work out what they are.   Often not worth the effort unless it is important data.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.