Jump to content

Disk Errors - Need suggestion on what will be my next step


Rendeaust

Recommended Posts

Hello everyone. I'm new to unraid and I need help. Me saying that I'm not doing great so far will be a gross understatement. This all started this weekend when I decided that I want to move to unraid. Had some issues here and there, but I've overcame them all so far. Until today.

 

Last night, I've set schedules for the first time (Mover, SSD Trim, etc.) including parity checks. It ran parity check for the 3rd time (twice when I first setup the array 3 days ago). It started the parity check schedule this morning like I told it so, and it have found hundreds of thousands of errors.

 

1560874058_ScreenShot2019-03-05at2_18_28PM.thumb.png.dc74c8603cb6abde59875f92379ad9c4.png

 

At first, I thought this was normal because I just moved all of my files to it and maybe it needs to recalculate or something. Did some research and I was wrong big time. I've let the parity check to finish, stopped the array and shutdown the server.

Unsure and panicking at this point, I've replaced SATA cables and made sure everything is connected properly. Started the server, unassigned the parity drive from the array and ran preclear and left the machine (yes, all of my drives were precleared before).

 

4 hours later, I logged in and an hour earlier, one of the other drives in the system started erroring out. That drive has now total of 0 bytes of data partitions on it. I was forced to end the preclear prematurely. I can't leave the server running with logs spamming me about that drive every single second. I've restarted the system again, opened BIOS and the drive isn't being detected anymore. I don't know what happened, the machine is 100% stationary, the drive in question has a fan right in front of it and the drive was idle. Array isn't started while I was doing the preclear. 

 

I swapped out the cable. It is being detected again. Please note that never, not even once, have this happened before when this drive was connected to a Windows PC. 90% of my data is in this drive. I can't afford to loose this drive whilst my parity isn't 100% reliable at this very moment.

I'm actually not sure what to do at this point. I'm running parity check again. Just to make sure there isn't any errors anymore and maybe I'll run preclear on the parity drive one more time.

All of the decisions I'm making on my own has been biting me back so far and I'm starting to loose hope. Do you have any suggestions? Thanks to everyone who is willing to help.

redstone-nas-diagnostics-20190306-0123.zip

 

Link to comment
1 hour ago, Rendeaust said:

90% of my data is in this drive. I can't afford to loose this drive whilst my parity isn't 100% reliable at this very moment.

Some things are a little unclear about your description, but maybe I can fill in some of the details from the diagnostics. Is this disk you are referring to here actually the Unassigned Device in your diagnostics? Not a disk already in the array?

 

You absolutely must have another copy of anything important and irreplaceable, even if everything is working to perfection. Parity is not a substitute for backups.

 

Obviously your parity is (or was) invalid, but some of the things in your post make me wonder if you understand a key feature of Unraid parity. It is realtime. Anytime any write operation (write, copy, move, delete, format are all write operations) happens in the array, parity is updated at the same time. So once parity is valid, it isn't necessary to schedule very frequent parity checks. Most of us just do a monthly check to test that all is working well.

 

1 hour ago, Rendeaust said:

I'm running parity check again. Just to make sure there isn't any errors anymore and maybe I'll run preclear on the parity drive one more time.

The only purpose to running preclears is to test a disk. All of the disk SMART reports in your diagnostics look fine. Since these are new disks some people run multiple preclears to try to get them past "infant mortality", but I have always just done one full cycle of preclear. You can't preclear a disk that is already assigned to the array, and if you removed parity from the array to preclear it then obviously parity would be invalid again, so there is either no point in completing the parity check, or no point in preclearing it again.

 

Slow down, don't panic. The most important thing is making sure you have backups of anything important and irreplaceable. Getting Unraid working might be a way towards that end if those important files are already on your Unassigned Device. So please clarify that bit for us.

Link to comment

Also I notice you are booting from a USB3 flash drive. Unraid accesses flash very little. The OS is unpacked fresh from the archives on flash into RAM at bootup, and the OS runs completely in RAM. Settings from the webUI are saved to flash anytime you change them.

 

So, having the speed of a USB3 flash drive has no real advantage. However, using a USB3 port for the boot device has been know to cause many problems. Put it in a USB2 port.

 

Link to comment
3 minutes ago, trurl said:

Some things are a little unclear about your description, but maybe I can fill in some of the details from the diagnostics. Is this disk you are referring to here actually the Unassigned Device in your diagnostics? Not a disk already in the array?

All of the disk I'm referring to this help post are disks that are assigned to the array. No unassigned devices are involved.

 

4 minutes ago, trurl said:

You absolutely must have another copy of anything important and irreplaceable, even if everything is working to perfection. Parity is not a substitute for backups.

7 minutes ago, trurl said:

Slow down, don't panic. The most important thing is making sure you have backups of anything important and irreplaceable.

If it's not clear, I'm in a state of where I cannot or unable to do this. I wanted to save all my files with the resources I currently have, which is this server. I understand your concern, and I know it's stupid to hear that someone on the internet is asking for help without a backup but I have to face my reality that all the choice I have is to save this server. I wish you could understand.

 

6 minutes ago, trurl said:

Obviously your parity is invalid, but some of the things in your post make me wonder if you understand a key feature of Unraid parity. It is realtime. Anytime any write operation (write, copy, move, delete, format are all write operations) happens in the array, parity is updated at the same time. So once parity is valid, it isn't necessary to schedule very frequent parity checks. Most of us just do a monthly check to test that all is working well.

I just understand this when I saw errors in the parity check, like what I said in the post.

 

5 minutes ago, trurl said:

However, using a USB3 port for the boot device has been know to cause many problems. Put it in a USB2 port.

I actually did not know this. I will move the flash drive after the ongoing parity check.

 

10 minutes ago, trurl said:

Getting Unraid working might be a way towards that end if those important files are already on your Unassigned Device. So please clarify that bit for us.

I hope this is enough clarification. Thanks for assisting me on this.

Link to comment
25 minutes ago, trurl said:

What are the Unassigned Devices I see mounted in your Diagnostics then?

That's an old SSD for my Windows 10 boot drive. I'm planning to back that up to the array after I sort everything out and use it as raid 1 to my cache drive.

Link to comment

As far as I was able to tell from your diagnostics, the data on your server should be just fine. In case you don't know it, one of the great features of Unraid is that each disk is an independent filesystem. This means that each disk can be read independently from all the others on any Linux OS. So even if you don't currently have good parity, your data is only a little at risk. Until you have good parity, you won't be able to reconstruct a disk if it truly fails, but even if you have a bad connection and Unraid disables a disk because it can't write to it temporarily, its data will probably be mostly OK.

 

Mar  6 00:53:17 Redstone-NAS kernel: mdcmd (60): check correct
Mar  6 00:53:17 Redstone-NAS kernel: md: recovery thread: check P ...

Just go ahead and complete that correcting parity check. After it corrects all the sync errors run a noncorrecting parity check just to make sure they have all been fixed. If not, then maybe you should consider a memtest.

 

Probably there is nothing to be overly concerned with. Maybe at some point you did something incorrectly while changing or adding disks that invalidated parity or didn't get it synced in the first place.

 

 

Link to comment
1 minute ago, trurl said:

Probably there is nothing to be overly concerned with. Maybe at some point you did something incorrectly while changing or adding disks that invalidated parity or didn't get it synced in the first place.

This has assured me a little bit. I'm starting to lean on this line of thinking as well. Maybe I'm just unlucky today. I'm just frustrated that I can't get this system to get working properly.

 

2 minutes ago, trurl said:

If not, then maybe you should consider a memtest.

I have read this somewhere as well. I will consider this if parity check returned errors again. 47.1 % as we speak, so far so good.

Link to comment
15 hours ago, trurl said:

In case you don't know it, one of the great features of Unraid is that each disk is an independent filesystem. This means that each disk can be read independently from all the others on any Linux OS.

Just curious, I'm planning to encrypt my drives. Is the encrypted drive also readable from different OS, assuming its linux, as long as I have the key or pass?

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...