10meghalfduplex Posted January 27, 2018 Posted January 27, 2018 Hi There, I had a drive drop off a couple of days ago, sucessfully rebuilt. all seemed ok. Now I have had two more dries drop off, thankfully i have dual parity, so i am okayish.... but its got me stumped... the drives both have good smart tests, and seem available, but don't get added to the array, syslog has a bunch of errors i do not understand, hoping you good people can help me. I have changed sata cables and even swapped cables from a spare drive that was not in the array... it seems really weird. I have started a rebuild on the array with the spare good drive, but i would bet good money that the two red ball drives are fine. I have also swapped out the sata controller with a spare... any advice/can you decode my syslog please? solar-diagnostics-20180128-0041.zip
JorgeB Posted January 27, 2018 Posted January 27, 2018 Please post the complete diagnostics: Tools -> Diagnostics
JorgeB Posted January 27, 2018 Posted January 27, 2018 Full diags might help more but just by looking at the syslog you had two disks drop offline practically at the same time: Quote Jan 28 00:04:59 Solar kernel: ata9: hard resetting link Jan 28 00:04:59 Solar kernel: ata9: SATA link down (SStatus 0 SControl 310) Jan 28 00:04:59 Solar kernel: ata9.00: disabled Jan 28 00:05:00 Solar kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Jan 28 00:05:00 Solar kernel: ata3.00: model number mismatch 'WDC WD60EFRX-68L0BN1' != 'WDC WD30EFRX-68EUZN0' Jan 28 00:05:00 Solar kernel: ata3.00: revalidation failed (errno=-19) Jan 28 00:05:00 Solar kernel: ata3.00: disabled Since they are on different controllers this would suggest most likely that there's a power problem, the PSU itself or cable issues. You also have thousands of errors on the cache pool, again both are on different controllers, so likely a power/cable issue as well: Quote Jan 28 00:09:35 Solar kernel: BTRFS info (device sdc1): bdev /dev/sdc1 errs: wr 1380178, rd 1467214, flush 69, corrupt 0, gen 0 Jan 28 00:09:35 Solar kernel: BTRFS info (device sdc1): bdev /dev/sdi1 errs: wr 9677454, rd 8358759, flush 19937, corrupt 0, gen 0 You need to run a correcting scrub once these are fixed as there's crc errors on the pool.
10meghalfduplex Posted January 27, 2018 Author Posted January 27, 2018 argh sorry, diag now attached to original post
10meghalfduplex Posted February 9, 2018 Author Posted February 9, 2018 I ended up blasting the config and rebuilding the array, all went ok. However I find the Unraid is super sensitive to read errors, I pulled a hot swap drive out (the one that had previously red balled - and the resulting 47 read errors made it disable the disk (parity 2) - only way i could get it back in was to down the array, remove parity 2, up the array, down the array, add parity 2, up the array and rebuild parity.
JorgeB Posted February 9, 2018 Posted February 9, 2018 1 hour ago, 10meghalfduplex said: I pulled a hot swap drive out (the one that had previously red balled - and the resulting 47 read errors made it disable the disk (parity 2) Do you mean you pulled parity2 and it was disabled? Or you pulled another disk and parity2 was disabled? If the former that's the expected result.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.