[SOLVED] Preclear generating some odd system errors


Recommended Posts

I just installed 2 3TB WD Reds in my existing unRAID system running 5.0.  unRAID is up and running normally, so no issues there.  As part of this addition I needed to add a new controller card, IOCrest PCI x1, 4-Port SATA6G (SI-PEX40064).  Listed the available drives to pre-clear and looks good...

 

root@laffy:/boot# ./preclear_disk.sh -l

====================================1.13

Disks not assigned to the unRAID array

  (potential candidates for clearing)

========================================

    /dev/sdi = ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N1954014

    /dev/sdh = ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N1975277

I executed a pre-clear in two different shells to get them both pre-cleared at the same time.  In each window I used the following commands:

 

./preclear_disk.sh -A -c 3 /dev/sdh (window 1)

./preclear_disk.sh -A -c 3 /dev/sdi (window 2)

Both looked fine and were busy processing at about 125MB/s each.  However, I happened to have a running tail on my syslog and started seeing this (full syslog since boot attached)...

 

Feb 22 12:38:03 laffy kernel: ata8.00: NCQ disabled due to excessive errors

Feb 22 12:38:03 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:38:03 laffy kernel: ata8.00: failed command: IDENTIFY DEVICE

Feb 22 12:38:03 laffy kernel: ata8.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in

Feb 22 12:38:03 laffy kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Feb 22 12:38:03 laffy kernel: ata8.00: status: { DRDY }

Feb 22 12:38:03 laffy kernel: ata8: hard resetting link

Feb 22 12:38:03 laffy kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Feb 22 12:38:03 laffy kernel: ata8.00: configured for UDMA/133

Feb 22 12:38:03 laffy kernel: ata8: EH complete

I figured there was some contention going with having two pre-clear's running, so I stopped the second one.  Those errors appear to have subsided except for this...

 

Feb 22 12:47:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:47:13 laffy kernel: ata8.00: failed command: SMART

Feb 22 12:47:13 laffy kernel: ata8.00: cmd b0/d1:01:01:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in

Feb 22 12:47:13 laffy kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

Feb 22 12:47:13 laffy kernel: ata8.00: status: { DRDY }

Feb 22 12:47:13 laffy kernel: ata8: hard resetting link

Feb 22 12:47:13 laffy kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

Feb 22 12:47:13 laffy kernel: ata8.00: configured for UDMA/133

Feb 22 12:47:13 laffy kernel: ata8: EH complete

But, it's been quiet for the last 16 minutes, so I'm assuming it has stabilized and pre-clear continues to chug along.

 

Anything to worry about here?  Bad idea to run two pre-clears in parallel?

syslog.20140222-1256.txt

Link to comment

BIOS was updated about 6 months ago, so it's that far out.  Maybe the new controller I added?  I'll see if there are any updates needed there.  Could be there are some configuration settings that need to be tweaked on that new controller too.

 

Regarding NCQ, are you saying looking at the logs you can see that NCQ is off?  Sort of looks like it's turned on?

Link to comment

BIOS was updated about 6 months ago, so it's that far out.  Maybe the new controller I added?  I'll see if there are any updates needed there.  Could be there are some configuration settings that need to be tweaked on that new controller too.

 

Regarding NCQ, are you saying looking at the logs you can see that NCQ is off?  Sort of looks like it's turned on?

 

Right. Did you turn it on? Look under Settings->Disk Settings

Link to comment

Right now "Force NCQ disabled" is set to "Yes".  I've never adjusted the disk settings.

 

Pre-clear is now on the last state of the first disk and there have been no errors since yesterday evening.  Will see how the other passes go and then move on to the next disk.

 

Memory is completely fine, so no memory exhaustion here (biggest addon is CrashPlan at 300MB and very little beyond that).

Link to comment

Still getting errors trickling in...

 

root@laffy:/var/log# grep exception syslog

Feb 22 12:35:09 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:35:38 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:36:20 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:37:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:38:03 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:38:26 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:39:33 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:41:03 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:43:06 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 12:47:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 13:47:12 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 16:03:04 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1cf SErr 0x0 action 0x0

Feb 22 16:03:13 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7f SErr 0x0 action 0x0

Feb 22 16:47:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 22 19:47:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 23 08:31:32 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xff SErr 0x0 action 0x0

Feb 23 08:31:35 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xff SErr 0x0 action 0x0

Feb 23 08:32:18 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0

Feb 23 08:32:22 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7fff0ff SErr 0x0 action 0x0

Feb 23 08:32:26 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3fffff SErr 0x0 action 0x0

Feb 23 08:32:30 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1fffff SErr 0x0 action 0x0

Feb 23 08:32:36 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xfffff SErr 0x0 action 0x0

Feb 23 08:32:49 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7ffff SErr 0x0 action 0x0

Feb 23 08:32:52 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3ffff SErr 0x0 action 0x0

Feb 23 08:33:00 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1ffff SErr 0x0 action 0x0

Feb 23 08:33:03 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1ffff SErr 0x0 action 0x0

Feb 23 08:33:08 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0

Feb 23 08:33:12 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7fff SErr 0x0 action 0x0

Feb 23 08:33:16 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3fff SErr 0x0 action 0x0

Feb 23 08:33:20 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1fff SErr 0x0 action 0x0

Feb 23 08:33:25 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xfff SErr 0x0 action 0x0

Feb 23 08:33:28 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7ff SErr 0x0 action 0x0

Feb 23 08:33:34 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x0

Feb 23 08:33:39 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0

Feb 23 08:33:45 laffy kernel: ata7.00: exception Emask 0x0 SAct 0xff SErr 0x0 action 0x0

Feb 23 08:33:53 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7f SErr 0x0 action 0x0

Feb 23 08:34:00 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7f SErr 0x0 action 0x0

Feb 23 08:34:40 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x7f SErr 0x0 action 0x0

Feb 23 08:35:01 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3f SErr 0x0 action 0x0

Feb 23 08:35:55 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x3f SErr 0x0 action 0x0

Feb 23 08:36:02 laffy kernel: ata7.00: exception Emask 0x0 SAct 0x1f SErr 0x0 action 0x0

Feb 23 13:47:12 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 23 13:47:23 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 24 05:47:12 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 24 10:47:12 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 24 10:47:23 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Feb 24 14:47:13 laffy kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

This is very odd.  ata7 is the one I'm still pre-clearing.  ata8 has had no work for it since 22 Feb, so I'm baffled how I'm still getting errors there.  What do ata7 and ata8 have in common?  They're both on that SATA controller I just added.  No other drives have ever reported this problem.

Link to comment

I reseated the cables and reorganized some of the wiring in case there were some interference issues going on.

 

My preclear finished for both drives and the errors did pop up at random a few times for each run for the two drives.  Preclear results were clean and passed (3 passes).  I decided to press on with the drive replacements by stopping the array, changing the drive assignment for one drive and restart and let it rebuild the first new disk from parity.  That was successful last night and running another parity check.  There were no errors during the parity rebuild or the following check.

 

So, I'm calling all clear at this point.  I have one more drive to replace and will check the logs for any more errors after that.  I'm guessing the preclear just drives the hard drive in a unique way that pushes the IOCrest card to the limits.  If any more errors do show up, then I'll simply remove it and find another alternative.  At this point, I think I'm good.

 

It appears from the error that it can be a large variety of things:  loose cable, bad cable, cable crosstalk, controller issues, controller/motherboard issue, power issue..., the list goes on.  As long as the error is not coming up in normal operations, I'm pretty sure I'm good here.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.