Lots of red balls, syslog is a mess...


Recommended Posts

Hi There, 

 

I had a drive drop off a couple of days ago, sucessfully rebuilt. all seemed ok.

 

Now I have had two more dries drop off, thankfully i have dual parity, so i am okayish.... but its got me stumped... the drives both have good smart tests, and seem available, but don't get added to the array, syslog has a bunch of errors i do not understand, hoping you good people can help me.

 

I have changed sata cables and even swapped cables from a spare drive that was not in the array... it seems really weird.

 

I have started a rebuild on the array with the spare good drive, but i would bet good money that the two red ball drives are fine.

 

I have also swapped out the sata controller with a spare... 

 

any advice/can you decode my syslog please?

 

 

 

solar-diagnostics-20180128-0041.zip

Edited by 10meghalfduplex
Link to comment

Full diags might help more but just by looking at the syslog you had two disks drop offline practically at the same time:

 

Quote

Jan 28 00:04:59 Solar kernel: ata9: hard resetting link
Jan 28 00:04:59 Solar kernel: ata9: SATA link down (SStatus 0 SControl 310)
Jan 28 00:04:59 Solar kernel: ata9.00: disabled

 

Jan 28 00:05:00 Solar kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Jan 28 00:05:00 Solar kernel: ata3.00: model number mismatch 'WDC WD60EFRX-68L0BN1' != 'WDC WD30EFRX-68EUZN0'
Jan 28 00:05:00 Solar kernel: ata3.00: revalidation failed (errno=-19)
Jan 28 00:05:00 Solar kernel: ata3.00: disabled

 

 

Since they are on different controllers this would suggest most likely that there's a power problem, the PSU itself or cable issues.

 

You also have thousands of errors on the cache pool, again both are on different controllers, so likely a power/cable issue as well:

 

Quote

Jan 28 00:09:35 Solar kernel: BTRFS info (device sdc1): bdev /dev/sdc1 errs: wr 1380178, rd 1467214, flush 69, corrupt 0, gen 0
Jan 28 00:09:35 Solar kernel: BTRFS info (device sdc1): bdev /dev/sdi1 errs: wr 9677454, rd 8358759, flush 19937, corrupt 0, gen 0

 

You need to run a correcting scrub once these are fixed as there's crc errors on the pool.

Link to comment
  • 2 weeks later...

I ended up blasting the config and rebuilding the array, all went ok.  However I find the Unraid is super sensitive to read errors, I pulled a hot swap drive out (the one that had previously red balled - and the resulting 47 read errors made it disable the disk (parity 2) - only way i could get it back in was to down the array, remove parity 2, up the array, down the array, add parity 2, up the array and rebuild parity.

 

Link to comment
1 hour ago, 10meghalfduplex said:

I pulled a hot swap drive out (the one that had previously red balled - and the resulting 47 read errors made it disable the disk (parity 2)

Do you mean you pulled parity2 and it was disabled? Or you pulled another disk and parity2 was disabled? If the former that's the expected result.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.