about parity errors


Recommended Posts

last week i had both of my parity drives go red, 1 died replaced it done a rebuild then the second died, done the next rebuild and 3 days later i had the scheduled sync run and got a message of over 100k errors fixed, so i stopped the server done a quick memtest (~8hr 1 pass) had no issues i am currntly doing a 3rd p.s. and im at 50% and 68K errors corrected, i have ordered some new 8087 to 4 sata cables just test the cables but they wont arrive until wed.

whet should i be looking for in the logs

 

after 1st re-sync and before 2nd hdd crash iceberg-diagnostics-20190727-0936.zip this is where the drive died

Jul 27 17:13:28 iceberg kernel: md: sync done. time=87060sec
Jul 27 17:13:28 iceberg kernel: md: recovery thread: exit status: 0
Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: attempting task abort! scmd(000000001dc9c3cc)
Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: [sdd] tag#0 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00
Jul 27 19:29:47 iceberg kernel: scsi target7:0:0: handle(0x000a), sas_address(0x5001e67467de7fec), phy(12)
Jul 27 19:29:47 iceberg kernel: scsi target7:0:0: enclosure logical id(0x5001e67467de7fff), slot(12) 
Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: device_block, handle(0x000a)
Jul 27 19:29:49 iceberg kernel: sd 7:0:0:0: device_unblock and setting to running, handle(0x000a)
Jul 27 19:29:49 iceberg kernel: sd 7:0:0:0: [sdd] Synchronizing SCSI cache
Jul 27 19:29:49 iceberg rc.diskinfo[7620]: SIGHUP received, forcing refresh of disks info.
Jul 27 19:29:49 iceberg rc.diskinfo[7620]: SIGHUP ignored - already refreshing disk info.
Jul 27 19:29:51 iceberg kernel: sd 7:0:0:0: task abort: SUCCESS scmd(000000001dc9c3cc)
Jul 27 19:29:51 iceberg kernel: md: disk29 read error, sector=4916493968
Jul 27 19:29:51 iceberg kernel: md: disk29 read error, sector=4916493976

after scheduled parity check iceberg-diagnostics-20190801-2003.zip

 

 

Link to comment

Aug 5 11:31:54 iceberg kernel: md: sync done. time=109225sec

Aug 5 11:31:54 iceberg kernel: md: recovery thread: exit status: 0

14tb dives suck 30Hrs for the scrub, but they were on sale for a good price at the time

 

so i just finished the correcting yesterday and the non-correcting today, fixed over 101K errors in first pass and had 0 after the N.C check, so why would i have had so many errors to fix? could it be because parity disk 2 was dying after i rebuilt disk 1?

 

also should a monthly scrub be as a correcting or non-correcting?

Link to comment
2 hours ago, fr05ty said:

so why would i have had so many errors to fix? could it be because parity disk 2 was dying after i rebuilt disk 1?

Possibly, impossible to known without diags showing those issues.

 

2 hours ago, fr05ty said:

also should a monthly scrub be as a correcting or non-correcting?

Parity check should always be non correcting, unless errors are expected, like after an unclean shutdown.

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.