Jump to content
LAST CALL on the Unraid Summer Sale! 😎 ⌛ ×

parity check read error


aspdend

Recommended Posts

Been using UnRAID for a while, (but still consider myself a noob with minimal Linux knowledge) am running v6.3.3 and just had my first failed parity check with a read error on one of my drives. The parity check stopped very early on and the drive in question was redballed.

 

As the drive is reasonably new and has been running in the array fine for a few months, and I have just upgraded my motherboard - I am pretty certain that the issue is connectivity rather than anything else. The drive is connected to a supermicro controller so I have opened the box and unplugged and reseated all the connections from the card. I have rebooted and am currently running an extended SMART test on the disk - it passed the short SMART test with flying colours. 

 

My question is, how do I re-assign the disk to the array without losing data?

 

If I stop the array, the disk shows as available in the relevant slot (along with the 3 No unassigned drives I am currently looking to pre-clear before adding and expanding my array/cache) but if I assign it and start the array it shows as unassigned still and I get the orange triangle on the page showing that a disk is unassigned. Parity still shows as valid from the last parity check. I don't want to make the wrong move and lose data - but what step am I missing in getting the drive reassigned and then doing another parity check to verify the data?

 

 

Link to comment

OK, so I stopped the array, unassigned Disk 2 (the failed one), started the array, stopped the array, re-assigned Disk 2 and started the array. I then proceeded to rebuild the data onto Disk 2. As Disk 2 is one of 4 that are currently fed form a breakout cable off one of my 2No Supermicro AOC-SAS2LP-MV8 cards, I unplugged all the connections and re-seated the SAS end and all 4 of the SATA ends, at the same time rotating the cables so they fed different disks. Unfortunately, the data rebuild stopped - again with a read error on Disk 2!

 

This is a huge concern as I don't have a spare disk of the right size (Disk 2 is 8Tb - purchased a couple of months ago) so if I RMA it I will have to wait for a replacement to arrive before I can rebuild the data.

 

Is it possible to use the Unbalance plugin for example to move emulated data onto the rest of the array so that the failed disk is empty and I can then unassign it so the array can carry on until a replacement arrives?

 

I have attached the original diagnostics zip file for perusal.

tower-diagnostics-20170501-0705.zip

Link to comment
3 minutes ago, aspdend said:

Is it possible to use the Unbalance plugin for example to move emulated data onto the rest of the array so that the failed disk is empty and I can then unassign it so the array can carry on until a replacement arrives?

 

Yes but you'd need to do a new config and resync parity to get the array to forget that disk after all data is moved.

 

Disk2 dropped offline, so reboot and post new diags so we can see a SMART report for it.

Link to comment
11 minutes ago, aspdend said:

Is it possible to use the Unbalance plugin for example to move emulated data onto the rest of the array so that the failed disk is empty and I can then unassign it so the array can carry on until a replacement arrives?

While that would be possible I don't recommend it. Emulation is reading all other disks plus parity to calculate the missing disk's data, then writing the emulated data to another disk in the array would write a disk and write parity. So as you can see a lot of disk activity going on to make this happen, and you would be doing this without any protection since you already have a failure.

 

My preference if you must get the data from an emulated disk would be to write it to another system, or to a disk not in the parity array.

 

You really should have backups of anything irreplaceable anyway. Do you?

 

Best way to proceed is get us those Diagnostics with disk2 as johnnie said.

Link to comment
21 minutes ago, johnnie.black said:

 

Yes but you'd need to do a new config and resync parity to get the array to forget that disk after all data is moved.

 

Disk2 dropped offline, so reboot and post new diags so we can see a SMART report for it.

 

8 minutes ago, trurl said:

While that would be possible I don't recommend it. Emulation is reading all other disks plus parity to calculate the missing disk's data, then writing the emulated data to another disk in the array would write a disk and write parity. So as you can see a lot of disk activity going on to make this happen, and you would be doing this without any protection since you already have a failure.

 

My preference if you must get the data from an emulated disk would be to write it to another system, or to a disk not in the parity array.

 

You really should have backups of anything irreplaceable anyway. Do you?

 

Best way to proceed is get us those Diagnostics with disk2 as johnnie said.

Thanks for the quick responses...

 

I do have my data backed up via crashplan, and I believe the majority of that disk is recorded TV and films that I can just re-rip if needed, it's just the time it will take to find them all and rip them.

 

I can use unassigned disks to copy the data off to - Disk 2 is currently about 4Tb full so I have a couple of disks that I can use to pull the data onto. My main worry is that we are off on holiday on Saturday for a week and what will happen whilst I am away...

 

I will certainly reboot and post new diagnostics, should I reboot with Disk 2 still showing assigned but redballed and post the diagnostics after the reboot?

Link to comment
2 minutes ago, aspdend said:

My main worry is that we are off on holiday on Saturday for a week and what will happen whilst I am away...

You could just shutdown until you get back.

 

3 minutes ago, aspdend said:

should I reboot with Disk 2 still showing assigned but redballed and post the diagnostics after the reboot?

Doesn't matter whether disk is assigned or not, we just need diagnostics that has disk2 connected instead of dropped.

Link to comment
1 minute ago, trurl said:

You could just shutdown until you get back.

 

Doesn't matter whether disk is assigned or not, we just need diagnostics that has disk2 connected instead of dropped.

Might be the best idea...

 

OK - will do that when I get home in a few hours and upload.

Link to comment

SMART looks OK but if the same disk failed again on a different cable...on the other hand it's on a SAS2LP and these are know to drop disks once in while without an apparent reason, I would swap places with a disk connect on the onboard controller, power cable too just to rule it out, and try a final rebuild if it fails again there replace it..

Link to comment
On 2017-5-4 at 9:24 PM, johnnie.black said:

SMART looks OK but if the same disk failed again on a different cable...on the other hand it's on a SAS2LP and these are know to drop disks once in while without an apparent reason, I would swap places with a disk connect on the onboard controller, power cable too just to rule it out, and try a final rebuild if it fails again there replace it..

Well, I did as suggested, changed the sas ports over on the card, rebooted and successfully restored the data to the drive. Smart status all looking good! Am now running the parity check as that's what threw the error up in the first place and it's at 20% so far...will update when it clears

Link to comment

There's a strong possibility the problem was caused by the SAS2LP, but there's a chance it's the disk, leave it on the onboard controller for now, if the same disk fails again in the near future it's probably the disk, if another disk on the SAS2LP controller drops it's probably the controller.

Link to comment

Well, parity check completed with no errors, all is looking good on the array now. I will keep an eye out for that disk exhibiting any future issues! Thanks again for all the help chaps! 

On 2017-5-6 at 10:01 AM, johnnie.black said:

There's a strong possibility the problem was caused by the SAS2LP, but there's a chance it's the disk, leave it on the onboard controller for now, if the same disk fails again in the near future it's probably the disk, if another disk on the SAS2LP controller drops it's probably the controller.

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...