fr05ty Posted August 2, 2019 Share Posted August 2, 2019 last week i had both of my parity drives go red, 1 died replaced it done a rebuild then the second died, done the next rebuild and 3 days later i had the scheduled sync run and got a message of over 100k errors fixed, so i stopped the server done a quick memtest (~8hr 1 pass) had no issues i am currntly doing a 3rd p.s. and im at 50% and 68K errors corrected, i have ordered some new 8087 to 4 sata cables just test the cables but they wont arrive until wed. whet should i be looking for in the logs after 1st re-sync and before 2nd hdd crash iceberg-diagnostics-20190727-0936.zip this is where the drive died Jul 27 17:13:28 iceberg kernel: md: sync done. time=87060sec Jul 27 17:13:28 iceberg kernel: md: recovery thread: exit status: 0 Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: attempting task abort! scmd(000000001dc9c3cc) Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: [sdd] tag#0 CDB: opcode=0x85 85 06 20 00 d8 00 00 00 00 00 4f 00 c2 00 b0 00 Jul 27 19:29:47 iceberg kernel: scsi target7:0:0: handle(0x000a), sas_address(0x5001e67467de7fec), phy(12) Jul 27 19:29:47 iceberg kernel: scsi target7:0:0: enclosure logical id(0x5001e67467de7fff), slot(12) Jul 27 19:29:47 iceberg kernel: sd 7:0:0:0: device_block, handle(0x000a) Jul 27 19:29:49 iceberg kernel: sd 7:0:0:0: device_unblock and setting to running, handle(0x000a) Jul 27 19:29:49 iceberg kernel: sd 7:0:0:0: [sdd] Synchronizing SCSI cache Jul 27 19:29:49 iceberg rc.diskinfo[7620]: SIGHUP received, forcing refresh of disks info. Jul 27 19:29:49 iceberg rc.diskinfo[7620]: SIGHUP ignored - already refreshing disk info. Jul 27 19:29:51 iceberg kernel: sd 7:0:0:0: task abort: SUCCESS scmd(000000001dc9c3cc) Jul 27 19:29:51 iceberg kernel: md: disk29 read error, sector=4916493968 Jul 27 19:29:51 iceberg kernel: md: disk29 read error, sector=4916493976 after scheduled parity check iceberg-diagnostics-20190801-2003.zip Quote Link to comment
JorgeB Posted August 3, 2019 Share Posted August 3, 2019 Last check was non correct, you need to run a correcting check first then a non correct one to confirm no more errors. Quote Link to comment
fr05ty Posted August 5, 2019 Author Share Posted August 5, 2019 Aug 5 11:31:54 iceberg kernel: md: sync done. time=109225sec Aug 5 11:31:54 iceberg kernel: md: recovery thread: exit status: 0 14tb dives suck 30Hrs for the scrub, but they were on sale for a good price at the time so i just finished the correcting yesterday and the non-correcting today, fixed over 101K errors in first pass and had 0 after the N.C check, so why would i have had so many errors to fix? could it be because parity disk 2 was dying after i rebuilt disk 1? also should a monthly scrub be as a correcting or non-correcting? Quote Link to comment
JorgeB Posted August 5, 2019 Share Posted August 5, 2019 2 hours ago, fr05ty said: so why would i have had so many errors to fix? could it be because parity disk 2 was dying after i rebuilt disk 1? Possibly, impossible to known without diags showing those issues. 2 hours ago, fr05ty said: also should a monthly scrub be as a correcting or non-correcting? Parity check should always be non correcting, unless errors are expected, like after an unclean shutdown. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.