grandprix Posted August 31, 2014 Share Posted August 31, 2014 Ok. So, I'm beginning to think that -perhaps- I used a nuclear bomb to remove an ant hill. My problems at this thread, is what prompted me to do the "upgrade": http://lime-technology.com/forum/index.php?topic=33669.msg310737#msg310737 I figured I had a failing controller on the mobo. Now, perhaps it was just the parity drive, because after the hardware "upgrade" (in quotes because with the exception of the case and drives, everything else has been replaced, except for the PSU, I'll get to that). After the upgrade to the X10SLM-F-O, ECC RAM and Xeon, this happened when running a no-correct immediately after "upgrade": Aug 30 18:51:52 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=1355 c=1354 q=108) Aug 30 18:51:52 Tower kernel: Pid: 2921, comm: unraidd Not tainted 3.9.11p-unRAID #5 Aug 30 18:51:52 Tower kernel: Call Trace: Aug 30 18:51:52 Tower kernel: [<c1062c2a>] print_cpu_stall+0xbc/0x107 Aug 30 18:51:52 Tower kernel: [<c1062eba>] __rcu_pending+0x4f/0x12a Aug 30 18:51:52 Tower kernel: [<c1063008>] rcu_check_callbacks+0x73/0x9b Aug 30 18:51:52 Tower kernel: [<c1032ed9>] update_process_times+0x2d/0x53 Aug 30 18:51:52 Tower kernel: [<c105520b>] tick_sched_timer+0x77/0xa1 Aug 30 18:51:52 Tower kernel: [<c1040e02>] ? __remove_hrtimer+0x25/0x7a Aug 30 18:51:52 Tower kernel: [<c1040f45>] __run_hrtimer+0x45/0xaf Aug 30 18:51:52 Tower kernel: [<c10412ad>] hrtimer_interrupt+0xf1/0x1e7 Aug 30 18:51:52 Tower kernel: [<c101c43a>] smp_apic_timer_interrupt+0x6d/0x7f Aug 30 18:51:52 Tower kernel: [<c1401411>] apic_timer_interrupt+0x2d/0x34 Aug 30 18:51:52 Tower kernel: [<c1400e0b>] ? _raw_spin_lock+0xd/0x1f Aug 30 18:51:52 Tower kernel: [<f944f21d>] handle_stripe+0x4b/0xceb [md_mod] Aug 30 18:51:52 Tower kernel: [<c1044f5f>] ? __wake_up+0x3b/0x42 Aug 30 18:51:52 Tower kernel: [<f944e8d9>] ? _release_stripe+0xd0/0xfa [md_mod] Aug 30 18:51:52 Tower kernel: [<f944ff2e>] unraidd+0x71/0xb5 [md_mod] Aug 30 18:51:52 Tower kernel: [<f944ccb2>] md_thread+0xd3/0xea [md_mod] Aug 30 18:51:52 Tower kernel: [<c103f031>] ? wake_up_bit+0x5b/0x5b Aug 30 18:51:52 Tower kernel: [<c103ebf1>] kthread+0x90/0x95 Aug 30 18:51:52 Tower kernel: [<f944cbdf>] ? import_device+0x166/0x166 [md_mod] Aug 30 18:51:52 Tower kernel: [<c1401837>] ret_from_kernel_thread+0x1b/0x28 Aug 30 18:51:52 Tower kernel: [<c103eb61>] ? kthread_freezable_should_stop+0x4a/0x4a Aug 30 19:02:53 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6001 jiffies g=1434 c=1433 q=108) Aug 30 19:02:53 Tower kernel: Pid: 2849, comm: mdrecoveryd Not tainted 3.9.11p-unRAID #5 Aug 30 19:02:53 Tower kernel: Call Trace: Aug 30 19:02:53 Tower kernel: [<c1062c2a>] print_cpu_stall+0xbc/0x107 Aug 30 19:02:53 Tower kernel: [<c1062eba>] __rcu_pending+0x4f/0x12a Aug 30 19:02:53 Tower kernel: [<c1063008>] rcu_check_callbacks+0x73/0x9b Aug 30 19:02:53 Tower kernel: [<c1032ed9>] update_process_times+0x2d/0x53 Aug 30 19:02:53 Tower kernel: [<c105520b>] tick_sched_timer+0x77/0xa1 Aug 30 19:02:53 Tower kernel: [<c1040e02>] ? __remove_hrtimer+0x25/0x7a Aug 30 19:02:53 Tower kernel: [<c1040f45>] __run_hrtimer+0x45/0xaf Aug 30 19:02:53 Tower kernel: [<c10412ad>] hrtimer_interrupt+0xf1/0x1e7 Aug 30 19:02:53 Tower kernel: [<c101c43a>] smp_apic_timer_interrupt+0x6d/0x7f Aug 30 19:02:53 Tower kernel: [<c1401411>] apic_timer_interrupt+0x2d/0x34 Aug 30 19:02:53 Tower kernel: [<c124007b>] ? des3_ede_decrypt+0x232/0x4d4 Aug 30 19:02:53 Tower kernel: [<f944fd65>] ? handle_stripe+0xb93/0xceb [md_mod] Aug 30 19:02:53 Tower kernel: [<f944ffb5>] unraid_sync+0x43/0x52 [md_mod] Aug 30 19:02:53 Tower kernel: [<f944bdc9>] md_do_sync+0x13c/0x3b3 [md_mod] Aug 30 19:02:53 Tower kernel: [<c103f031>] ? wake_up_bit+0x5b/0x5b Aug 30 19:02:53 Tower kernel: [<f944c57b>] md_do_recovery+0x117/0x19c [md_mod] Aug 30 19:02:53 Tower kernel: [<f944ccb2>] md_thread+0xd3/0xea [md_mod] Aug 30 19:02:53 Tower kernel: [<c103f031>] ? wake_up_bit+0x5b/0x5b Aug 30 19:02:53 Tower kernel: [<c103ebf1>] kthread+0x90/0x95 Aug 30 19:02:53 Tower kernel: [<f944cbdf>] ? import_device+0x166/0x166 [md_mod] Aug 30 19:02:53 Tower kernel: [<c1401837>] ret_from_kernel_thread+0x1b/0x28 Aug 30 19:02:53 Tower kernel: [<c103eb61>] ? kthread_freezable_should_stop+0x4a/0x4a Aug 30 22:03:53 Tower kernel: md: parity incorrect, sector=2549700264 Aug 30 22:03:53 Tower kernel: md: parity incorrect, sector=2549700272 Aug 30 22:35:38 Tower login[2439]: ROOT LOGIN on '/dev/tty1' Aug 30 22:50:27 Tower login[4006]: ROOT LOGIN on '/dev/tty1' I searched and found three answers (I suppose): 1. "Nothing of concern, would be fixed in later 5.0rc" -- but am running 5.0.5 stable 2. "LOWMEM" is low, but that is when I logged into terminal to run a free -l and it was showing 635k free for Low Memory. 3. Forget what three was, I -believe- possible bad drive, which, since I still got the random sector placed 2 sync errors that I was getting with the old hardware (again besides the case and drives, and PSU, promise to get to that, Im now on all new hardware). So figuring maybe it was #3, I stopped the no-correct, brought down unraid, shutdown, change out the parity drive with one that was pre-cleared twice, and it is now rebuilding as I write this. So far, so good, I suppose. Aug 31 00:44:00 Tower kernel: mdcmd (52): start UPGRADE_DISK Aug 31 00:44:00 Tower kernel: unraid: allocating 77688K for 1536 stripes (12 disks) Aug 31 00:44:00 Tower kernel: md1: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md2: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md3: running, size: 2930266532 blocks Aug 31 00:44:00 Tower kernel: md4: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md5: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md6: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md7: running, size: 2930266532 blocks Aug 31 00:44:00 Tower kernel: md8: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md9: running, size: 1953514552 blocks Aug 31 00:44:00 Tower kernel: md10: running, size: 2930266532 blocks Aug 31 00:44:00 Tower kernel: md11: running, size: 2930266532 blocks Aug 31 00:44:00 Tower emhttp: shcmd (38): udevadm settle Aug 31 00:44:00 Tower emhttp: shcmd (39): /usr/local/sbin/emhttp_event array_started Aug 31 00:44:00 Tower emhttp_event: array_started Aug 31 00:44:00 Tower emhttp: Mounting disks... Aug 31 00:44:00 Tower emhttp: shcmd (40): mkdir /mnt/disk1 Aug 31 00:44:00 Tower emhttp: shcmd (41): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md1 /mnt/disk1 |& logger Aug 31 00:44:00 Tower kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal Aug 31 00:44:00 Tower kernel: REISERFS (device md1): using ordered data mode Aug 31 00:44:00 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:00 Tower kernel: REISERFS (device md1): journal params: device md1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:00 Tower kernel: REISERFS (device md1): checking transaction log (md1) Aug 31 00:44:00 Tower kernel: REISERFS (device md1): Using r5 hash to sort names Aug 31 00:44:00 Tower emhttp: shcmd (42): mkdir /mnt/disk2 Aug 31 00:44:00 Tower emhttp: shcmd (43): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md2 /mnt/disk2 |& logger Aug 31 00:44:00 Tower kernel: REISERFS (device md2): found reiserfs format "3.6" with standard journal Aug 31 00:44:00 Tower kernel: REISERFS (device md2): using ordered data mode Aug 31 00:44:00 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:00 Tower kernel: REISERFS (device md2): journal params: device md2, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:00 Tower kernel: REISERFS (device md2): checking transaction log (md2) Aug 31 00:44:00 Tower kernel: REISERFS (device md2): Using r5 hash to sort names Aug 31 00:44:00 Tower emhttp: shcmd (44): mkdir /mnt/disk3 Aug 31 00:44:00 Tower emhttp: shcmd (45): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md3 /mnt/disk3 |& logger Aug 31 00:44:00 Tower kernel: REISERFS (device md3): found reiserfs format "3.6" with standard journal Aug 31 00:44:00 Tower kernel: REISERFS (device md3): using ordered data mode Aug 31 00:44:00 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:00 Tower kernel: REISERFS (device md3): journal params: device md3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:00 Tower kernel: REISERFS (device md3): checking transaction log (md3) Aug 31 00:44:00 Tower kernel: REISERFS (device md3): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (46): mkdir /mnt/disk4 Aug 31 00:44:01 Tower emhttp: shcmd (47): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md4 /mnt/disk4 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md4): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md4): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md4): journal params: device md4, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md4): checking transaction log (md4) Aug 31 00:44:01 Tower kernel: REISERFS (device md4): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (48): mkdir /mnt/disk5 Aug 31 00:44:01 Tower emhttp: shcmd (49): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md5 /mnt/disk5 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md5): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md5): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md5): journal params: device md5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md5): checking transaction log (md5) Aug 31 00:44:01 Tower kernel: REISERFS (device md5): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (50): mkdir /mnt/disk6 Aug 31 00:44:01 Tower emhttp: shcmd (51): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md6 /mnt/disk6 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md6): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md6): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md6): journal params: device md6, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md6): checking transaction log (md6) Aug 31 00:44:01 Tower kernel: REISERFS (device md6): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (52): mkdir /mnt/disk7 Aug 31 00:44:01 Tower emhttp: shcmd (53): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md7 /mnt/disk7 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md7): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md7): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md7): journal params: device md7, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md7): checking transaction log (md7) Aug 31 00:44:01 Tower kernel: REISERFS (device md7): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (54): mkdir /mnt/disk8 Aug 31 00:44:01 Tower emhttp: shcmd (55): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md8 /mnt/disk8 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md8): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md8): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md8): journal params: device md8, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md8): checking transaction log (md8) Aug 31 00:44:01 Tower kernel: REISERFS (device md8): Using r5 hash to sort names Aug 31 00:44:01 Tower emhttp: shcmd (56): mkdir /mnt/disk9 Aug 31 00:44:01 Tower emhttp: shcmd (57): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md9 /mnt/disk9 |& logger Aug 31 00:44:01 Tower kernel: REISERFS (device md9): found reiserfs format "3.6" with standard journal Aug 31 00:44:01 Tower kernel: REISERFS (device md9): using ordered data mode Aug 31 00:44:01 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:01 Tower kernel: REISERFS (device md9): journal params: device md9, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:01 Tower kernel: REISERFS (device md9): checking transaction log (md9) Aug 31 00:44:02 Tower kernel: REISERFS (device md9): Using r5 hash to sort names Aug 31 00:44:02 Tower emhttp: shcmd (58): mkdir /mnt/disk10 Aug 31 00:44:02 Tower emhttp: shcmd (59): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md10 /mnt/disk10 |& logger Aug 31 00:44:02 Tower kernel: REISERFS (device md10): found reiserfs format "3.6" with standard journal Aug 31 00:44:02 Tower kernel: REISERFS (device md10): using ordered data mode Aug 31 00:44:02 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:02 Tower kernel: REISERFS (device md10): journal params: device md10, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:02 Tower kernel: REISERFS (device md10): checking transaction log (md10) Aug 31 00:44:02 Tower kernel: REISERFS (device md10): Using r5 hash to sort names Aug 31 00:44:02 Tower emhttp: shcmd (60): mkdir /mnt/disk11 Aug 31 00:44:02 Tower emhttp: shcmd (61): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md11 /mnt/disk11 |& logger Aug 31 00:44:02 Tower kernel: REISERFS (device md11): found reiserfs format "3.6" with standard journal Aug 31 00:44:02 Tower kernel: REISERFS (device md11): using ordered data mode Aug 31 00:44:02 Tower kernel: reiserfs: using flush barriers Aug 31 00:44:02 Tower kernel: REISERFS (device md11): journal params: device md11, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Aug 31 00:44:02 Tower kernel: REISERFS (device md11): checking transaction log (md11) Aug 31 00:44:02 Tower kernel: REISERFS (device md11): Using r5 hash to sort names Aug 31 00:44:02 Tower emhttp: shcmd (62): mkdir /mnt/user Aug 31 00:44:02 Tower emhttp: shcmd (63): /usr/local/sbin/shfs /mnt/user -disks 16777214 -o noatime,big_writes,allow_other -o remember=0 |& logger Aug 31 00:44:02 Tower emhttp: shcmd (64): crontab -c /etc/cron.d -d &> /dev/null Aug 31 00:44:02 Tower emhttp: shcmd (65): /usr/local/sbin/emhttp_event disks_mounted Aug 31 00:44:02 Tower emhttp_event: disks_mounted Aug 31 00:44:02 Tower kernel: mdcmd (53): check CORRECT Aug 31 00:44:02 Tower kernel: md: recovery thread woken up ... Aug 31 00:44:02 Tower kernel: md: recovery thread syncing parity disk ... Aug 31 00:44:02 Tower kernel: md: using 2048k window, over a total of 2930266532 blocks. Aug 31 00:44:03 Tower emhttp: shcmd (66): :>/etc/samba/smb-shares.conf Aug 31 00:44:03 Tower avahi-daemon[2494]: Files changed, reloading. Aug 31 00:44:04 Tower emhttp: Restart SMB... Aug 31 00:44:04 Tower emhttp: shcmd (67): killall -HUP smbd Aug 31 00:44:04 Tower emhttp: shcmd (68): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service Aug 31 00:44:04 Tower avahi-daemon[2494]: Files changed, reloading. Aug 31 00:44:04 Tower avahi-daemon[2494]: Service group file /services/smb.service changed, reloading. Aug 31 00:44:04 Tower emhttp: shcmd (69): ps axc | grep -q rpc.mountd Aug 31 00:44:04 Tower emhttp: _shcmd: shcmd (69): exit status: 1 Aug 31 00:44:04 Tower emhttp: shcmd (70): /usr/local/sbin/emhttp_event svcs_restarted Aug 31 00:44:04 Tower emhttp_event: svcs_restarted Aug 31 00:44:04 Tower emhttp: shcmd (71): /usr/local/sbin/emhttp_event started Aug 31 00:44:04 Tower emhttp_event: started Aug 31 00:44:05 Tower avahi-daemon[2494]: Service "Tower" (/services/smb.service) successfully established. But, I still have those "configured for UDMA/133" and other link oddities (to me anyway). The syslog I'm attaching will show those as well as the CPU stalls, etc. Any and all help is greatly appreciated. Please? syslog-2014-08-30.txt Link to comment
grandprix Posted August 31, 2014 Author Share Posted August 31, 2014 Ok, parity seemed to have rebuilt without issue? But, now I'm getting read errors on the parity drive while doing a no-correct (to see if the parity is good after rebuild). I can't seem to win with "ata3.00" no matter the controller, cabling or hard drive. THIS syslog attached to this reply is from unraid start up, to parity rebuild, to the errors being received during no-correct (which is not done yet, I just kicked it off, as I went to bed while it was still in rebuild stage). Aug 31 11:40:19 Tower kernel: md: nocheck_array: check not active Aug 31 11:40:32 Tower kernel: md: unRAID driver removed Aug 31 11:40:32 Tower kernel: md: unRAID driver 2.2.0 installed Aug 31 11:40:40 Tower emhttp: Start array... Aug 31 11:40:40 Tower kernel: mdcmd (52): start STOPPED Aug 31 11:40:41 Tower avahi-daemon[3315]: Service "Tower" (/services/smb.service) successfully established. Aug 31 11:40:58 Tower kernel: mdcmd (53): check NOCORRECT Aug 31 11:40:58 Tower kernel: md: recovery thread woken up ... Aug 31 11:40:58 Tower kernel: md: recovery thread checking parity... Aug 31 11:40:58 Tower kernel: md: using 2048k window, over a total of 2930266532 blocks. Aug 31 11:42:32 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:42:32 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:42:32 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:42:32 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:42:32 Tower kernel: ata3.00: cmd c8/00:10:f0:f2:05/00:00:00:00:00/e1 tag 0 dma 8192 in Aug 31 11:42:32 Tower kernel: res 50/00:00:ef:f2:05/00:00:01:00:00/e1 Emask 0x10 (ATA bus error) Aug 31 11:42:32 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:42:32 Tower kernel: ata3: hard resetting link Aug 31 11:42:32 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:42:32 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:42:32 Tower kernel: ata3: EH complete Aug 31 11:48:38 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:48:38 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:48:38 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:48:38 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:48:38 Tower kernel: ata3.00: cmd c8/00:10:70:39:f4/00:00:00:00:00/e4 tag 0 dma 8192 in Aug 31 11:48:38 Tower kernel: res 50/00:00:6f:39:f4/00:00:04:00:00/e4 Emask 0x10 (ATA bus error) Aug 31 11:48:38 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:48:38 Tower kernel: ata3: hard resetting link Aug 31 11:48:38 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:48:38 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:48:38 Tower kernel: ata3: EH complete Aug 31 11:51:23 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:51:23 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:51:23 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:51:23 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:51:23 Tower kernel: ata3.00: cmd c8/00:18:b0:d4:aa/00:00:00:00:00/e6 tag 0 dma 12288 in Aug 31 11:51:23 Tower kernel: res 50/00:00:af:d4:aa/00:00:06:00:00/e6 Emask 0x10 (ATA bus error) Aug 31 11:51:23 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:51:23 Tower kernel: ata3: hard resetting link Aug 31 11:51:23 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:51:23 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:51:23 Tower kernel: ata3: EH complete Aug 31 11:52:14 Tower kernel: ata3: limiting SATA link speed to 3.0 Gbps Aug 31 11:52:14 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:52:14 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:52:14 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:52:14 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:52:14 Tower kernel: ata3.00: cmd c8/00:10:f0:f5:32/00:00:00:00:00/e7 tag 0 dma 8192 in Aug 31 11:52:14 Tower kernel: res 50/00:00:ef:f5:32/00:00:07:00:00/e7 Emask 0x10 (ATA bus error) Aug 31 11:52:14 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:52:14 Tower kernel: ata3: hard resetting link Aug 31 11:52:15 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Aug 31 11:52:15 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:52:15 Tower kernel: ata3: EH complete http://pastebin.com/SpdwtR6D Link to comment
grandprix Posted August 31, 2014 Author Share Posted August 31, 2014 Here is the SMART report for that drive (the new parity drive): smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: HGST HDN724030ALE640 Serial Number: PK1234P8JE9ABX LU WWN Device Id: 5 000cca 22ce23adf Firmware Version: MJ8OA5E0 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Aug 31 12:11:01 2014 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 24) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 455) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 82 3 Spin_Up_Time 0x0007 124 124 024 Pre-fail Always - 502 (Average 502) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 17 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline - 34 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 3370 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 16 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 17 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 17 194 Temperature_Celsius 0x0002 187 187 000 Old_age Always - 32 (Min/Max 24/37) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 2 SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 3370 hours (140 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 ff f5 32 07 Error: ICRC, ABRT 1 sectors at LBA = 0x0732f5ff = 120780287 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 ff f5 32 e7 ff 11:35:32.891 READ DMA c8 00 10 f0 f5 32 e7 00 11:35:32.874 READ DMA c8 00 10 e0 f5 32 e7 00 11:35:32.874 READ DMA c8 00 08 d8 f5 32 e7 00 11:35:32.874 READ DMA c8 00 10 c8 f5 32 e7 00 11:35:32.874 READ DMA Error 1 occurred at disk power-on lifetime: 3370 hours (140 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 84 51 01 ff f2 05 01 Error: ICRC, ABRT 1 sectors at LBA = 0x0105f2ff = 17167103 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 10 f0 f2 05 e1 00 11:25:52.941 READ DMA c8 00 10 e0 f2 05 e1 00 11:25:52.941 READ DMA c8 00 18 c8 f2 05 e1 00 11:25:52.941 READ DMA c8 00 10 b8 f2 05 e1 00 11:25:52.941 READ DMA c8 00 08 b0 f2 05 e1 00 11:25:52.941 READ DMA SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
grandprix Posted August 31, 2014 Author Share Posted August 31, 2014 The parity drive itself is on a 6Gbps port on a X10SLM-F-O motherboard. That drive is one of these: http://www.newegg.com/Product/Product.aspx?Item=N82E16822145911 Using this to connect from mobo SATA port to the drive: http://www.newegg.com/Product/Product.aspx?Item=N82E16816133033 Link to comment
grandprix Posted August 31, 2014 Author Share Posted August 31, 2014 And here we go with the CPU stalls again (again, during a no-correct): Aug 31 12:30:04 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=8426 c=8425 q=163) Aug 31 12:30:04 Tower kernel: Pid: 3399, comm: unraidd Not tainted 3.9.11p-unRAID #5 Aug 31 12:30:04 Tower kernel: Call Trace: Aug 31 12:30:04 Tower kernel: [] print_cpu_stall+0xbc/0x107 Aug 31 12:30:04 Tower kernel: [] __rcu_pending+0x4f/0x12a Aug 31 12:30:04 Tower kernel: [] rcu_check_callbacks+0x73/0x9b Aug 31 12:30:04 Tower kernel: [] update_process_times+0x2d/0x53 Aug 31 12:30:04 Tower kernel: [] tick_sched_timer+0x77/0xa1 Aug 31 12:30:04 Tower kernel: [] ? __remove_hrtimer+0x25/0x7a Aug 31 12:30:04 Tower kernel: [] __run_hrtimer+0x45/0xaf Aug 31 12:30:04 Tower kernel: [] hrtimer_interrupt+0xf1/0x1e7 Aug 31 12:30:04 Tower kernel: [] ? _scsih_build_scatter_gather+0x238/0x25b [mpt2sas] Aug 31 12:30:04 Tower kernel: [] smp_apic_timer_interrupt+0x6d/0x7f Aug 31 12:30:04 Tower kernel: [] apic_timer_interrupt+0x2d/0x34 Aug 31 12:30:04 Tower kernel: [] ? __slab_free+0x101/0x2a6 Aug 31 12:30:04 Tower kernel: [] kmem_cache_free+0xaf/0xb7 Aug 31 12:30:04 Tower kernel: [] ? scsi_pool_free_command+0x25/0x32 Aug 31 12:30:04 Tower kernel: [] ? scsi_pool_free_command+0x25/0x32 Aug 31 12:30:04 Tower kernel: [] scsi_pool_free_command+0x25/0x32 Aug 31 12:30:04 Tower kernel: [] __scsi_put_command+0x4c/0x5a Aug 31 12:30:04 Tower kernel: [] scsi_put_command+0x4b/0x50 Aug 31 12:30:04 Tower kernel: [] scsi_next_command+0x21/0x34 Aug 31 12:30:04 Tower kernel: [] scsi_end_request+0x66/0x70 Aug 31 12:30:04 Tower kernel: [] scsi_io_completion+0x1b0/0x421 Aug 31 12:30:04 Tower kernel: [] ? scsi_device_unbusy+0x7c/0x82 Aug 31 12:30:04 Tower kernel: [] scsi_finish_command+0x91/0x97 Aug 31 12:30:04 Tower kernel: [] scsi_softirq_done+0xc5/0xcd Aug 31 12:30:04 Tower kernel: [] blk_done_softirq+0x4a/0x57 Aug 31 12:30:04 Tower kernel: [] __do_softirq+0x94/0x151 Aug 31 12:30:04 Tower kernel: [] ? ttwu_do_wakeup+0xf/0xaa Aug 31 12:30:04 Tower kernel: [] irq_exit+0x33/0x6c Aug 31 12:30:04 Tower kernel: [] do_IRQ+0x87/0x9b Aug 31 12:30:04 Tower kernel: [] ? xor_blocks+0x5b/0x7c Aug 31 12:30:04 Tower kernel: [] common_interrupt+0x2c/0x31 Aug 31 12:30:04 Tower kernel: [] ? handle_stripe+0xb37/0xceb [md_mod] Aug 31 12:30:04 Tower kernel: [] ? __wake_up+0x3b/0x42 Aug 31 12:30:04 Tower kernel: [] unraidd+0x71/0xb5 [md_mod] Aug 31 12:30:04 Tower kernel: [] md_thread+0xd3/0xea [md_mod] Aug 31 12:30:04 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Aug 31 12:30:04 Tower kernel: [] kthread+0x90/0x95 Aug 31 12:30:04 Tower kernel: [] ? import_device+0x166/0x166 [md_mod] Aug 31 12:30:04 Tower kernel: [] ret_from_kernel_thread+0x1b/0x28 Aug 31 12:30:04 Tower kernel: [] ? kthread_freezable_should_stop+0x4a/0x4a Aug 31 12:36:23 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=8460 c=8459 q=119) Aug 31 12:36:23 Tower kernel: Pid: 3399, comm: unraidd Not tainted 3.9.11p-unRAID #5 Aug 31 12:36:23 Tower kernel: Call Trace: Aug 31 12:36:23 Tower kernel: [] print_cpu_stall+0xbc/0x107 Aug 31 12:36:23 Tower kernel: [] __rcu_pending+0x4f/0x12a Aug 31 12:36:23 Tower kernel: [] rcu_check_callbacks+0x73/0x9b Aug 31 12:36:23 Tower kernel: [] update_process_times+0x2d/0x53 Aug 31 12:36:23 Tower kernel: [] tick_sched_timer+0x77/0xa1 Aug 31 12:36:23 Tower kernel: [] ? __remove_hrtimer+0x25/0x7a Aug 31 12:36:23 Tower kernel: [] __run_hrtimer+0x45/0xaf Aug 31 12:36:23 Tower kernel: [] hrtimer_interrupt+0xf1/0x1e7 Aug 31 12:36:23 Tower kernel: [] smp_apic_timer_interrupt+0x6d/0x7f Aug 31 12:36:23 Tower kernel: [] apic_timer_interrupt+0x2d/0x34 Aug 31 12:36:23 Tower kernel: [] ? xor_avx_5+0x148/0x34c Aug 31 12:36:23 Tower kernel: [] xor_blocks+0x74/0x7c Aug 31 12:36:23 Tower kernel: [] check_parity+0x96/0xcc [md_mod] Aug 31 12:36:23 Tower kernel: [] handle_stripe+0xa29/0xceb [md_mod] Aug 31 12:36:23 Tower kernel: [] ? __wake_up+0x3b/0x42 Aug 31 12:36:23 Tower kernel: [] unraidd+0x71/0xb5 [md_mod] Aug 31 12:36:23 Tower kernel: [] md_thread+0xd3/0xea [md_mod] Aug 31 12:36:23 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Aug 31 12:36:23 Tower kernel: [] kthread+0x90/0x95 Aug 31 12:36:23 Tower kernel: [] ? import_device+0x166/0x166 [md_mod] Aug 31 12:36:23 Tower kernel: [] ret_from_kernel_thread+0x1b/0x28 Aug 31 12:36:23 Tower kernel: [] ? kthread_freezable_should_stop+0x4a/0x4a Link to comment
grandprix Posted August 31, 2014 Author Share Posted August 31, 2014 Because two stalls just wasn't enough: Aug 31 12:47:45 Tower kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=6000 jiffies g=8566 c=8565 q=160) Aug 31 12:47:45 Tower kernel: Pid: 3399, comm: unraidd Not tainted 3.9.11p-unRAID #5 Aug 31 12:47:45 Tower kernel: Call Trace: Aug 31 12:47:45 Tower kernel: [] print_cpu_stall+0xbc/0x107 Aug 31 12:47:45 Tower kernel: [] __rcu_pending+0x4f/0x12a Aug 31 12:47:45 Tower kernel: [] rcu_check_callbacks+0x73/0x9b Aug 31 12:47:45 Tower kernel: [] update_process_times+0x2d/0x53 Aug 31 12:47:45 Tower kernel: [] tick_sched_timer+0x77/0xa1 Aug 31 12:47:45 Tower kernel: [] ? __remove_hrtimer+0x25/0x7a Aug 31 12:47:45 Tower kernel: [] __run_hrtimer+0x45/0xaf Aug 31 12:47:45 Tower kernel: [] hrtimer_interrupt+0xf1/0x1e7 Aug 31 12:47:45 Tower kernel: [] smp_apic_timer_interrupt+0x6d/0x7f Aug 31 12:47:45 Tower kernel: [] ? xor_blocks+0x5b/0x7c Aug 31 12:47:45 Tower kernel: [] apic_timer_interrupt+0x2d/0x34 Aug 31 12:47:45 Tower kernel: [] ? memcmp+0x17/0x25 Aug 31 12:47:45 Tower kernel: [] handle_stripe+0xa4d/0xceb [md_mod] Aug 31 12:47:45 Tower kernel: [] ? __wake_up+0x3b/0x42 Aug 31 12:47:45 Tower kernel: [] unraidd+0x71/0xb5 [md_mod] Aug 31 12:47:45 Tower kernel: [] md_thread+0xd3/0xea [md_mod] Aug 31 12:47:45 Tower kernel: [] ? wake_up_bit+0x5b/0x5b Aug 31 12:47:45 Tower kernel: [] kthread+0x90/0x95 Aug 31 12:47:45 Tower kernel: [] ? import_device+0x166/0x166 [md_mod] Aug 31 12:47:45 Tower kernel: [] ret_from_kernel_thread+0x1b/0x28 Aug 31 12:47:45 Tower kernel: [] ? kthread_freezable_should_stop+0x4a/0x4a Link to comment
RobJ Posted September 1, 2014 Share Posted September 1, 2014 With all the posts in this thread, I expected to see various others helping you, but just looks like you've been talking to your self! I'm not an authority on that aspect, but I do believe the stalls are harmless. Nothing wrong with "configured for UDMA/133" and other link oddities, quite normal. The one disk problem I see are the exception handler stoppages, with the "UnrecovData 10B8B BadCRC" SATA error flags. The key one there is the BadCRC flag. That usually always means a bad SATA cable (a quick and cheap fix!). Replacing it with a good SATA cable almost always fixes the problem. The SMART report has the corresponding UDMA_CRC_Error_Count to match. No problem with the drive itself. And none of the drive-related issues have anything to do with the stalls. Link to comment
grandprix Posted September 1, 2014 Author Share Posted September 1, 2014 With all the posts in this thread, I expected to see various others helping you, but just looks like you've been talking to your self! I'm not an authority on that aspect, but I do believe the stalls are harmless. Nothing wrong with "configured for UDMA/133" and other link oddities, quite normal. The one disk problem I see are the exception handler stoppages, with the "UnrecovData 10B8B BadCRC" SATA error flags. The key one there is the BadCRC flag. That usually always means a bad SATA cable (a quick and cheap fix!). Replacing it with a good SATA cable almost always fixes the problem. The SMART report has the corresponding UDMA_CRC_Error_Count to match. No problem with the drive itself. And none of the drive-related issues have anything to do with the stalls. Rob, thank you for the reply. As for the only one to have posted in this thread, it's ok, I just pretended I was talking to my wife, she rarely listens either. On a serious note and back to topic. The no-correct finished, just a few minutes ago, no sync errors. I've got plenty of breakout cables to choose from, the ones in the machine now are brand new (not to say they can't be bad). I'll keep an eye out on it. Though I have a theory (perhaps not theory, that denotes some form of knowledge in the subject -- a "guess" then?), why I may have gotten that error. Aug 31 11:42:32 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:42:32 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:42:32 Tower kernel: ata3.00: cmd c8/00:10:f0:f2:05/00:00:00:00:00/e1 tag 0 dma 8192 in Aug 31 11:42:32 Tower kernel: res 50/00:00:ef:f2:05/00:00:01:00:00/e1 Emask 0x10 (ATA bus error) Aug 31 11:42:32 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:42:32 Tower kernel: ata3: hard resetting link Aug 31 11:42:32 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:42:32 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:42:32 Tower kernel: ata3: EH complete Aug 31 11:48:38 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:48:38 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:48:38 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:48:38 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:48:38 Tower kernel: ata3.00: cmd c8/00:10:70:39:f4/00:00:00:00:00/e4 tag 0 dma 8192 in Aug 31 11:48:38 Tower kernel: res 50/00:00:6f:39:f4/00:00:04:00:00/e4 Emask 0x10 (ATA bus error) Aug 31 11:48:38 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:48:38 Tower kernel: ata3: hard resetting link Aug 31 11:48:38 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:48:38 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:48:38 Tower kernel: ata3: EH complete Aug 31 11:51:23 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:51:23 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:51:23 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:51:23 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:51:23 Tower kernel: ata3.00: cmd c8/00:18:b0:d4:aa/00:00:00:00:00/e6 tag 0 dma 12288 in Aug 31 11:51:23 Tower kernel: res 50/00:00:af:d4:aa/00:00:06:00:00/e6 Emask 0x10 (ATA bus error) Aug 31 11:51:23 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:51:23 Tower kernel: ata3: hard resetting link Aug 31 11:51:23 Tower kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 31 11:51:23 Tower kernel: ata3.00: configured for UDMA/133 Aug 31 11:51:23 Tower kernel: ata3: EH complete Aug 31 11:52:14 Tower kernel: ata3: limiting SATA link speed to 3.0 Gbps Aug 31 11:52:14 Tower kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x280100 action 0x6 frozen Aug 31 11:52:14 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error Aug 31 11:52:14 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC } Aug 31 11:52:14 Tower kernel: ata3.00: failed command: READ DMA Aug 31 11:52:14 Tower kernel: ata3.00: cmd c8/00:10:f0:f5:32/00:00:00:00:00/e7 tag 0 dma 8192 in Aug 31 11:52:14 Tower kernel: res 50/00:00:ef:f5:32/00:00:07:00:00/e7 Emask 0x10 (ATA bus error) Aug 31 11:52:14 Tower kernel: ata3.00: status: { DRDY } Aug 31 11:52:14 Tower kernel: ata3: hard resetting link Aug 31 11:52:15 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320) Aug 31 11:52:15 Tower kernel: ata3.00: configured for UDMA/133 After the no-correct finished, I looked to see if I had done what I suspected. In short, I placed the corresponding port for the parity drive on the breakout cable (P3) to a 3Gbps port on the mobo SATA, forgetting that Norco reverses the numbering of their ports (or what I would consider reversed, port 1 is on the right of the plane when looking at it from the front, port 4 is on the left). The SATA 3.0 drive is now on a SATA 3.0 port. Not sure it will make a bit of difference, but.. do you think (since it was shortly after the drive error) that the CPU Stall was caused by the drive error as well? Or what do you believe caused the CPU Stall? I Googled and was conquered. I haven't a clue just what I'm reading, most of the posts I found simply refer to the "module"(?) generating the stall report, which in-of-itself is Greek to me. Link to comment
RobJ Posted September 1, 2014 Share Posted September 1, 2014 I've got plenty of breakout cables to choose from, the ones in the machine now are brand new (not to say they can't be bad). I'd try replacing one cable at a time, until the BadCRC error no longer occurs. In this case, an older but tested one is better. A CRC error is relatively minor, as it retries until packet transfer is good, with only a minor delay in the transfer. I don't believe speed or port choice matters at all, unless one is faulty, and I don't think that is the issue here. do you think (since it was shortly after the drive error) that the CPU Stall was caused by the drive error as well? Or what do you believe caused the CPU Stall? I really don't see a connection between disk error and CPU stall. I don't know anything about CPU stalls. I vaguely remember trying to research them, and not finding anything useful, from a troubleshooting standpoint. Perhaps someone else can help, but until then, I would ignore them. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.