parity check caused disk to disable


droopie

Recommended Posts

i had to rebuild my drive array to shrink the array, 

then, i added a 2nd parity drive.

i did a parity check and got no errors. 

i added 1 new drive, did a partial parity check to see if i had any issues with other drives because this is where i can see the issues. saw no issues around the first 250gb so i stopped the parity check.

i then added another new drive, did a parity check again and saw no issues. 

i then went to do another parity check, and towards the very last 2 hours at 8am, disk 1 got disabled with errors. parity check said it had 0 errors and says it completed at 10am. what happened to disk 1? i have not moved or disconnected cables from parity 1, parity 2, disk 1, disk 2 for over a year. the reason i added parity 2 earlier was due to the array rebuild during a shrink but 1 week ago till when i added the drive, parity 2 was enabled and working. 

 

can you look at my server and see what it could have caused it? and how i should proceed to fix disk 1? thanks in advance!

 

edit: so for over a year, all the 8tb drives which are parity 1-2 and disks 1-2 are on a 4sata to 1 sas cable. just trying to figure out in my head what could cause it as that disk is right in the middle. i can always remove the newly added disks (3 and or 5) if i must to make my server stable again. 

image.thumb.png.9e01ba9b6f93c90f421a9cbeaf518630.png

image.thumb.png.f657b5c6b530118a4b64d5de6a4d6c75.png

server-diagnostics-20210122-1241.zip

Edited by droopie
Link to comment

Not like disk problem, but the disable disk have 16903 UDMA_CRC ERROR, other same type 8TB disk have't, pls swap cable or port and enable SMART monitor on item "199" until it not raise again.

 

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  --S---   128   128   054    -    116
  3 Spin_Up_Time            POS---   144   144   024    -    457 (Average 453)
  4 Start_Stop_Count        -O--C-   100   100   000    -    2555
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         -O-R--   100   100   067    -    0
  8 Seek_Time_Performance   --S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   100   100   000    -    4094
 10 Spin_Retry_Count        -O--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    808
 22 Helium_Level            PO---K   100   100   025    -    100
192 Power-Off_Retract_Count -O--CK   096   096   000    -    5647
193 Load_Cycle_Count        -O--C-   096   096   000    -    5647
194 Temperature_Celsius     -O----   147   147   000    -    44 (Min/Max 25/72)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   198   198   000    -    16903

 

Edited by Vr2Io
Link to comment

so i bring down the array, disconnect the sata, start the array, turn off the array, replace the sata and start the array? will it rebuild or will the new cable be detected? never had to do this before so i dont know the process to re-enable the drive besides a drive rebuild. 

Edited by droopie
Link to comment
6 minutes ago, droopie said:

so i bring down the array, disconnect the sata, start the array, turn off the array, replace the sata and start the array? will it rebuild or will the new cable be detected? never had to do this before so i dont know the process to re-enable the drive besides a drive rebuild. 

 

Stop the server, check or change DATA and POWER connections on both side, including splitters.

Power back up and post new diagnostics.

Edited by ChatNoir
Link to comment
1 hour ago, ChatNoir said:

 

Stop the server, check or change DATA and POWER connections on both side, including splitters.

Power back up and post new diagnostics.

so i have two LSI 9211-8i P20 IT Mode Dell H310 cards in my system. parity 1-2 and disk 1-2 are using sas cable 1 on port a of pci card 1. since it was advised to replace the cable, i just disconnected it and am using an unused sata from the other pci card 2 port a. i started the array back up and got the log. also checked all the power down the line from the disk all the way to the psu too. unplugged, plugged in. 

server-diagnostics-20210122-1400.zip

Edited by droopie
Link to comment
2 hours ago, JorgeB said:

Looks more like a connection problem, replace/swap cables on that disk.

would you suspect the newly added disks 3 and 5 even tho they are 1tb each? i also have a ups which according to the unraid page for it, never goes over 200watts of my 600watt power supply. saying this because i would get a lot of errors on drives from powering them using the hard drive connectors from the psu so i recently switched almost all my disks except the first 6 to molex power. 

Link to comment

i have 2 parity drives on a 2 to 1 splitter on the psu. same applies to disk 1-3 are on a 3 to 1 splitter but on the other psu cable. disk 1-2 are 8gb and when the disk got disabled, all drives had finished doing the job and was actually in the spin down state while only the two parity, and disks 1-2 being spun up. the newly added drives are only 1gb so i thought since they finished the parity check and when i check the system, all other drives but the two 8gb drives were spun up and thought i was in the clear for power. forgot to mention that disks 6 is also in a 2-1 splitter which is on the same psu line of parity 1-2 but that drive is only 1gb. the rest of the disks are on molex power.

 

pci 1 sas A is blue, B is green,

pci 2 sas A is red, orange is B. 

 

m is for molex power at the bottom of the pic, the 2 leads from the psu are to power the sata. each line has 3 hard drive power connectors so i tried to show how i am powering them. 

 

just waiting to see how i should proceed to try and enable disk 1 again. 

image.thumb.png.b07b7cb8dc24c6bbda7c4a2b7598995c.png

Link to comment
7 hours ago, JorgeB said:

If the emulated disk is mounting correctly and data looks OK just rebuild on top.

And to do that I stop the array, unassigned the disk, and start the array then stop and reassign to do a construction? I just want to clarify so I don't accidentally tell unraid to think it's a new drive and wipe the data. Thanks in advance!

Link to comment
9 minutes ago, droopie said:

And to do that I stop the array, unassigned the disk, and start the array then stop and reassign to do a construction? I just want to clarify so I don't accidentally tell unraid to think it's a new drive and wipe the data. Thanks in advance!

FYI:  This is covered here in the online documentation that can be accessed via the ‘Manual’ link at the bottom of the Unraid GUI.

Link to comment

Thanks I did see that but it said there are two ways to try and reanable a drive, reconstruction and trusting parity. Didn't know which would better fit my case.

 

Also I have a question, if this drive fails to reconstruct, will data be lost? Or will it be possible to reconstruct on another drive to try again? So far I don't see any errors in unraid and the 199 count hasn't gone up but I'm just worried because I have really bad luck. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.