September 5, 201411 yr Several of us have reported 'random' or in my instance rsync copy associated hard crashes. Today I learned how to capture an active syslog with tail -f syslog, so I did that this evening. 6b8 crashed during the day while idle. I turned it on tonight, with parity check running, and set up an rsync from a reiserfs array volume to a xfs array volume. Then I took the honey out to dinner. Less than three hours later, unRaid 6b8 reliably crashed. Here is the syslog captured by ssh: root@Tower:~# tail -f /var/log/syslog Sep 4 18:15:01 Tower cache_dirs: max_seconds=10, min_seconds=1 Sep 4 18:15:01 Tower cache_dirs: max_depth=6 Sep 4 18:15:01 Tower cache_dirs: command=find -noleaf Sep 4 18:15:01 Tower cache_dirs: version=1.6.9 Sep 4 18:15:01 Tower cache_dirs: ---------- caching directories --------------- Sep 4 18:15:01 Tower cache_dirs: Sep 4 18:15:01 Tower cache_dirs: ---------------------------------------------- Sep 4 18:15:01 Tower cache_dirs: ERROR: included directory "/mnt" does not exist. Sep 4 18:15:01 Tower cache_dirs: cache_dirs process ID 4297 started, To terminate it, type: cache_dirs -q Sep 4 18:15:34 Tower sshd[4332]: Accepted publickey for root from 192.168.1.1 port 64641 ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 4 18:15:53 Tower kernel: REISERFS (device md2): replayed 25 transactions in 98 seconds Sep 4 18:15:53 Tower kernel: REISERFS (device md2): Using r5 hash to sort names Sep 4 18:15:53 Tower emhttp: shcmd (32): set -o pipefail ; mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/md2 /mnt/disk2 |& logger Sep 4 18:15:53 Tower emhttp: shcmd (33): mkdir -p /mnt/disk3 Sep 4 18:15:53 Tower emhttp: shcmd (34): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md3 /mnt/disk3 |& logger Sep 4 18:15:53 Tower kernel: REISERFS (device md3): found reiserfs format "3.6" with standard journal Sep 4 18:15:53 Tower kernel: REISERFS (device md3): using ordered data mode Sep 4 18:15:53 Tower kernel: reiserfs: using flush barriers Sep 4 18:15:53 Tower kernel: REISERFS (device md3): journal params: device md3, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Sep 4 18:15:53 Tower kernel: REISERFS (device md3): checking transaction log (md3) Sep 4 18:15:53 Tower kernel: REISERFS (device md3): replayed 2 transactions in 0 seconds Sep 4 18:15:53 Tower kernel: REISERFS (device md3): Using r5 hash to sort names Sep 4 18:15:53 Tower emhttp: shcmd (35): set -o pipefail ; mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/md3 /mnt/disk3 |& logger Sep 4 18:15:53 Tower emhttp: shcmd (36): mkdir -p /mnt/disk4 Sep 4 18:15:53 Tower emhttp: shcmd (37): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md4 /mnt/disk4 |& logger Sep 4 18:15:53 Tower kernel: XFS (md4): Mounting V4 Filesystem Sep 4 18:15:54 Tower kernel: XFS (md4): Starting recovery (logdev: internal) Sep 4 18:15:54 Tower kernel: XFS (md4): Ending recovery (logdev: internal) Sep 4 18:15:54 Tower emhttp: shcmd (38): mkdir -p /mnt/disk5 Sep 4 18:15:54 Tower emhttp: shcmd (39): set -o pipefail ; mount -t reiserfs -o noatime,nodiratime /dev/md5 /mnt/disk5 |& logger Sep 4 18:15:54 Tower kernel: REISERFS (device md5): found reiserfs format "3.6" with standard journal Sep 4 18:15:54 Tower kernel: REISERFS (device md5): using ordered data mode Sep 4 18:15:54 Tower kernel: reiserfs: using flush barriers Sep 4 18:15:54 Tower kernel: REISERFS (device md5): journal params: device md5, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Sep 4 18:15:54 Tower kernel: REISERFS (device md5): checking transaction log (md5) Sep 4 18:15:54 Tower kernel: REISERFS (device md5): replayed 2 transactions in 0 seconds Sep 4 18:15:54 Tower kernel: REISERFS (device md5): Using r5 hash to sort names Sep 4 18:15:54 Tower emhttp: shcmd (40): set -o pipefail ; mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/md5 /mnt/disk5 |& logger Sep 4 18:15:54 Tower emhttp: shcmd (41): mkdir -p /mnt/disk6 Sep 4 18:15:54 Tower emhttp: shcmd (42): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md6 /mnt/disk6 |& logger Sep 4 18:15:54 Tower kernel: XFS (md6): Mounting V4 Filesystem Sep 4 18:15:55 Tower kernel: XFS (md6): Starting recovery (logdev: internal) Sep 4 18:15:55 Tower emhttp: shcmd (43): mkdir -p /mnt/cache Sep 4 18:15:55 Tower kernel: XFS (md6): Ending recovery (logdev: internal) Sep 4 18:15:57 Tower emhttp: shcmd (44): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/sdi1 /mnt/cache |& logger Sep 4 18:15:57 Tower kernel: BTRFS info (device sdi1): disk space caching is enabled Sep 4 18:15:57 Tower kernel: BTRFS: detected SSD devices, enabling SSD mode Sep 4 18:15:58 Tower emhttp: shcmd (45): mkdir /mnt/user0 Sep 4 18:15:58 Tower emhttp: shcmd (46): /usr/local/sbin/shfs /mnt/user0 -disks 16777214 -o noatime,big_writes,allow_other |& logger Sep 4 18:15:58 Tower emhttp: shcmd (47): mkdir /mnt/user Sep 4 18:15:58 Tower emhttp: shcmd (48): /usr/local/sbin/shfs /mnt/user -disks 16777215 1024 -o noatime,big_writes,allow_other -o remember=330 |& logger Sep 4 18:15:58 Tower emhttp: shcmd (49): crontab -c /etc/cron.d - <<< "# Generated mover schedule: 30 2 * * 5 /usr/local/sbin/mover |& logger" Sep 4 18:15:58 Tower emhttp: shcmd (50): /usr/local/sbin/emhttp_event disks_mounted Sep 4 18:15:58 Tower emhttp_event: disks_mounted Sep 4 18:15:58 Tower logger: Starting docker.io Sep 4 18:15:58 Tower logger: Maybe expand image file Sep 4 18:15:58 Tower kernel: BTRFS: device fsid 0ab81931-36e8-49c4-924b-6ca691a81a8c devid 1 transid 1661 /dev/loop8 Sep 4 18:15:58 Tower kernel: BTRFS info (device loop8): disk space caching is enabled Sep 4 18:15:58 Tower logger: Resize '/var/lib/docker' of 'max' Sep 4 18:15:58 Tower logger: starting docker ... Sep 4 18:15:58 Tower kernel: BTRFS: new size for /dev/loop8 is 8589934592 Sep 4 18:15:58 Tower kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Sep 4 18:15:58 Tower avahi-daemon[3955]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.42.1. Sep 4 18:15:58 Tower avahi-daemon[3955]: New relevant interface docker0.IPv4 for mDNS. Sep 4 18:15:58 Tower avahi-daemon[3955]: Registering new address record for 172.17.42.1 on docker0.IPv4. Sep 4 18:16:00 Tower ntpd[1357]: Listen normally on 3 docker0 172.17.42.1 UDP 123 Sep 4 18:16:00 Tower ntpd[1357]: peers refreshed Sep 4 18:16:00 Tower ntpd[1357]: new interface(s) found: waking up resolver Sep 4 18:16:00 Tower kernel: device veth5b16 entered promiscuous mode Sep 4 18:16:00 Tower avahi-daemon[3955]: Withdrawing workstation service for veth75fb. Sep 4 18:16:00 Tower kernel: docker0: port 1(veth5b16) entered forwarding state Sep 4 18:16:00 Tower kernel: docker0: port 1(veth5b16) entered forwarding state Sep 4 18:16:00 Tower logger: CouchPotato: started succesfully! Sep 4 18:16:00 Tower logger: Dropbox: started succesfully! Sep 4 18:16:00 Tower kernel: device veth22d7 entered promiscuous mode Sep 4 18:16:00 Tower kernel: docker0: port 2(veth22d7) entered forwarding state Sep 4 18:16:00 Tower kernel: docker0: port 2(veth22d7) entered forwarding state Sep 4 18:16:00 Tower avahi-daemon[3955]: Withdrawing workstation service for veth7a1e. Sep 4 18:16:01 Tower logger: SABnzbd: started succesfully! Sep 4 18:16:01 Tower kernel: device vethdefa entered promiscuous mode Sep 4 18:16:01 Tower kernel: docker0: port 3(vethdefa) entered forwarding state Sep 4 18:16:01 Tower kernel: docker0: port 3(vethdefa) entered forwarding state Sep 4 18:16:01 Tower avahi-daemon[3955]: Withdrawing workstation service for veth1072. Sep 4 18:16:01 Tower logger: SickBeard: started succesfully! Sep 4 18:16:01 Tower emhttp: unclean shutdown detected Sep 4 18:16:01 Tower kernel: mdcmd (43): check CORRECT Sep 4 18:16:01 Tower kernel: md: recovery thread woken up ... Sep 4 18:16:01 Tower kernel: md: recovery thread checking parity... Sep 4 18:16:01 Tower kernel: md: using 1536k window, over a total of 3907018532 blocks. Sep 4 18:16:03 Tower emhttp: shcmd (51): :>/etc/samba/smb-shares.conf Sep 4 18:16:03 Tower emhttp: shcmd (52): cp /etc/exports- /etc/exports Sep 4 18:16:03 Tower avahi-daemon[3955]: Files changed, reloading. Sep 4 18:16:03 Tower emhttp: shcmd (53): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default Sep 4 18:16:03 Tower avahi-daemon[3955]: Files changed, reloading. Sep 4 18:16:03 Tower last message repeated 2 times Sep 4 18:16:03 Tower emhttp: shcmd (54): echo '"/mnt/user/Media" -async,no_subtree_check,fsid=101 *(sec=sys,ro,insecure,anongid=100,anonuid=99,all_squash)' >>/etc/exports Sep 4 18:16:03 Tower avahi-daemon[3955]: Files changed, reloading. Sep 4 18:16:03 Tower emhttp: Restart SMB... Sep 4 18:16:03 Tower emhttp: shcmd (55): killall -HUP smbd Sep 4 18:16:03 Tower emhttp: shcmd (56): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service Sep 4 18:16:03 Tower avahi-daemon[3955]: Files changed, reloading. Sep 4 18:16:03 Tower avahi-daemon[3955]: Service group file /etc/avahi/services/smb.service changed, reloading. Sep 4 18:16:03 Tower emhttp: shcmd (57): ps axc | grep -q rpc.mountd Sep 4 18:16:03 Tower emhttp: _shcmd: shcmd (57): exit status: 1 Sep 4 18:16:03 Tower emhttp: Start NFS... Sep 4 18:16:03 Tower emhttp: shcmd (58): /etc/rc.d/rc.nfsd start |& logger Sep 4 18:16:04 Tower logger: Starting NFS server daemons: Sep 4 18:16:04 Tower logger: /usr/sbin/exportfs -r Sep 4 18:16:04 Tower logger: /usr/sbin/rpc.nfsd 8 Sep 4 18:16:04 Tower logger: /usr/sbin/rpc.mountd Sep 4 18:16:04 Tower rpc.mountd[4960]: Version 1.2.8 starting Sep 4 18:16:04 Tower emhttp: Restart AFP... Sep 4 18:16:04 Tower emhttp: shcmd (59): killall -HUP afpd Sep 4 18:16:04 Tower emhttp: shcmd (60): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Sep 4 18:16:04 Tower avahi-daemon[3955]: Files changed, reloading. Sep 4 18:16:04 Tower avahi-daemon[3955]: Service group file /etc/avahi/services/afp.service changed, reloading. Sep 4 18:16:04 Tower emhttp: shcmd (61): /usr/local/sbin/emhttp_event svcs_restarted Sep 4 18:16:04 Tower emhttp_event: svcs_restarted Sep 4 18:16:04 Tower emhttp: shcmd (62): /usr/local/sbin/emhttp_event started Sep 4 18:16:04 Tower emhttp_event: started Sep 4 18:16:04 Tower rc.unRAID[4970][4971]: Processing /etc/rc.d/rc.unRAID.d/ start scripts. Sep 4 18:16:04 Tower avahi-daemon[3955]: Service "Tower" (/etc/avahi/services/smb.service) successfully established. Sep 4 18:16:05 Tower avahi-daemon[3955]: Service "Tower-AFP" (/etc/avahi/services/afp.service) successfully established. Sep 4 18:16:15 Tower kernel: docker0: port 1(veth5b16) entered forwarding state Sep 4 18:16:15 Tower kernel: docker0: port 2(veth22d7) entered forwarding state Sep 4 18:16:16 Tower kernel: docker0: port 3(vethdefa) entered forwarding state Sep 4 18:17:35 Tower login[4205]: invalid password for 'root' on '/dev/tty1' Sep 4 18:17:45 Tower login[4205]: ROOT LOGIN on '/dev/tty1' Sep 4 18:18:12 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 18:18:12 Tower kernel: ata7.00: failed command: IDENTIFY DEVICE Sep 4 18:18:12 Tower kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 10 pio 512 in Sep 4 18:18:12 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 18:18:12 Tower kernel: ata7.00: status: { DRDY } Sep 4 18:18:12 Tower kernel: ata7: hard resetting link Sep 4 18:18:13 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 18:18:13 Tower kernel: ata7.00: configured for UDMA/133 Sep 4 18:18:13 Tower kernel: ata7: EH complete Sep 4 18:20:28 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 18:20:28 Tower kernel: ata7.00: failed command: SMART Sep 4 18:20:28 Tower kernel: ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 3 pio 512 in Sep 4 18:20:28 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 18:20:28 Tower kernel: ata7.00: status: { DRDY } Sep 4 18:20:28 Tower kernel: ata7: hard resetting link Sep 4 18:20:29 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 18:20:29 Tower kernel: ata7.00: configured for UDMA/133 Sep 4 18:20:29 Tower kernel: ata7: EH complete Sep 4 18:20:51 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 18:20:51 Tower kernel: ata7.00: failed command: IDENTIFY DEVICE Sep 4 18:20:51 Tower kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 10 pio 512 in Sep 4 18:20:51 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 18:20:51 Tower kernel: ata7.00: status: { DRDY } Sep 4 18:20:51 Tower kernel: ata7: hard resetting link Sep 4 18:20:52 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 18:20:52 Tower kernel: ata7.00: configured for UDMA/133 Sep 4 18:20:52 Tower kernel: ata7: EH complete Read from remote host <ip address>: Operation timed out Sure looking forward to a server that stays up more than three hours. D
September 5, 201411 yr Your log looks almost identical to mine. I have been doing some testing on mine with shutting down samba and it hasn't crashed yet. I noticed while capturing top that smbd was the top process when mine hung. I shutdown smbd via "/etc/rc.d/rc.samba stop" and ran a parity check over night, and have been running an rsync for about 1 hours, and it hasn't crashed yet. I wonder if you would get the same result.
September 5, 201411 yr Sep 4 18:18:12 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 18:18:12 Tower kernel: ata7.00: failed command: IDENTIFY DEVICE Sep 4 18:18:12 Tower kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 10 pio 512 in Sep 4 18:18:12 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 18:18:12 Tower kernel: ata7.00: status: { DRDY } Sep 4 18:18:12 Tower kernel: ata7: hard resetting link Sep 4 18:18:13 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 18:18:13 Tower kernel: ata7.00: configured for UDMA/133 These are problems with your hard drives, the cables to the hard drives, the controller and/or possibly a driver. rsync will work your drive and eat up allot of memory if you have allot of files. This is a known, you may need to drop the cache before doing a very large rsync. But that's if you get OOM killer messages. In addition, turn off the cache_dirs if you plan to do a large rsync. Sep 4 18:15:01 Tower cache_dirs: max_seconds=10, min_seconds=1 Sep 4 18:15:01 Tower cache_dirs: max_depth=6 Sep 4 18:15:01 Tower cache_dirs: command=find -noleaf Sep 4 18:15:01 Tower cache_dirs: version=1.6.9 Sep 4 18:15:01 Tower cache_dirs: ---------- caching directories --------------- Sep 4 18:15:01 Tower cache_dirs: Sep 4 18:15:01 Tower cache_dirs: ---------------------------------------------- What's happening is you are making your drive scramble if the buffer cache is getting flushed from the large rsync. On top of that, there is some other hardware (or driver) issue. In reality, the software and hardware should deal with the scrambling and work load. Albeit slow. What's going here is something is just on the edge of acceptable behavior. So you may need to do a SMART long test on the drive in question just to have peace of mind. Check the logs look for pending and unrecoverable/reallocated sectors. Double check your cables. Make sure you are using quality cables and they are not near something which could cause interference. It's not Samba that is making the hard drive disconnect and reset the SATA. It's the workload/scrambling and something being marginal in performance. (Could be a Sata driver timing issue too). But without a SMART test and report it's hard to tell.
September 5, 201411 yr Author Attached is a syslog of a similar crash during rsync. Note at around 00:57 Sep 5 00:57:04 Tower kernel: kernel BUG at drivers/md/unraid.c:461! Sep 5 00:57:04 Tower kernel: invalid opcode: 0000 [#1] SMP There are similar ata issues to the previous crash that weebo pointed out, though different drives (ata10 is what i see, while it was ata7 before). How do the drives identified in the syslog as 'ata7' correlate with the devices sda, sdb, etc? The cables are from the server hardware from LimeTech, but i have a PCI 4 channel sata card that has one bad channel. I checked them all last week, but.. I'll get smart reports soon, and try without cach_dirs on Thanks D edit: Since this crash I can't keep the server up for more than a couple of minutes with rsync + parity check running on either reiser >XFS or reiser>reiserfs copy, with or without cache_dirs. gotta go to work... Syslog_9_5_14.txt
September 5, 201411 yr Well I stand corrected here in that it could be an unRAID driver issue too. however there are some hardware errors that keep cropping up. ata10.00, ata7.00, ata9.00. That's three drives, which could possibly be on the same controller. Without seeing the whole syslog I can't determine what controller and/or hardware. However the kernel bug and the hardware issue may or may not be unrelated to the drives. The time is much later then the drive errors being reported. Perhaps an smp issue with the unraid driver. Which version of unRAID are you using? (I'm assuming unRAID 6 Beta-8 from prior post). Also can you describe your hardware? Sep 4 21:07:03 Tower sshd[6115]: Received disconnect from 24.125.219.59: 11: Sep 4 21:18:51 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:18:51 Tower kernel: ata10.00: failed command: IDENTIFY DEVICE Sep 4 21:18:51 Tower kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 29 pio 512 in Sep 4 21:18:51 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:18:51 Tower kernel: ata10.00: status: { DRDY } Sep 4 21:18:51 Tower kernel: ata10: hard resetting link Sep 4 21:18:52 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 21:18:52 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 21:18:52 Tower kernel: ata10: EH complete Sep 4 21:21:01 Tower kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:21:01 Tower kernel: ata7.00: failed command: IDENTIFY DEVICE Sep 4 21:21:01 Tower kernel: ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 27 pio 512 in Sep 4 21:21:01 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:21:01 Tower kernel: ata7.00: status: { DRDY } Sep 4 21:21:01 Tower kernel: ata7: hard resetting link Sep 4 21:21:01 Tower kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 21:21:01 Tower kernel: ata7.00: configured for UDMA/133 Sep 4 21:21:01 Tower kernel: ata7: EH complete Sep 4 21:26:02 Tower sshd[21258]: Failed password for root from 122.225.109.101 port 39804 ssh2 ... ep 4 21:31:01 Tower php: /usr/bin/docker start mbserver Sep 4 21:31:02 Tower php: mbserver Sep 4 21:31:02 Tower php: Sep 4 21:31:22 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:31:22 Tower kernel: ata9.00: failed command: IDENTIFY DEVICE Sep 4 21:31:22 Tower kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 18 pio 512 in Sep 4 21:31:22 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:31:22 Tower kernel: ata9.00: status: { DRDY } Sep 4 21:31:22 Tower kernel: ata9: hard resetting link Sep 4 21:31:23 Tower kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 4 21:31:23 Tower kernel: ata9.00: configured for UDMA/133 Sep 4 21:31:23 Tower kernel: ata9: EH complete Sep 4 21:32:07 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:32:07 Tower kernel: ata10.00: failed command: IDENTIFY DEVICE Sep 4 21:32:07 Tower kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 14 pio 512 in Sep 4 21:32:07 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:32:07 Tower kernel: ata10.00: status: { DRDY } Sep 4 21:32:07 Tower kernel: ata10: hard resetting link Sep 4 21:32:07 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:32:07 Tower kernel: ata9.00: failed command: IDENTIFY DEVICE Sep 4 21:32:07 Tower kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Sep 4 21:32:07 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:32:07 Tower kernel: ata9.00: status: { DRDY } Sep 4 21:32:07 Tower kernel: ata9: hard resetting link Sep 4 21:32:08 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 21:32:08 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 21:32:08 Tower kernel: ata10: EH complete Sep 4 21:32:08 Tower kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 4 21:32:08 Tower kernel: ata9.00: configured for UDMA/133 Sep 4 21:32:08 Tower kernel: ata9: EH complete Sep 4 21:40:25 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:40:25 Tower kernel: ata10.00: failed command: IDENTIFY DEVICE Sep 4 21:40:25 Tower kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 17 pio 512 in Sep 4 21:40:25 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Sep 4 21:40:25 Tower kernel: ata10.00: status: { DRDY } Sep 4 21:40:25 Tower kernel: ata10: hard resetting link Sep 4 21:40:26 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 21:40:26 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 21:40:26 Tower kernel: ata10: EH complete Sep 4 21:41:06 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 21:41:06 Tower kernel: ata10.00: failed command: SMART Sep 4 21:41:06 Tower kernel: ata10.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 4 pio 512 in Sep 4 21:41:06 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 21:41:06 Tower kernel: ata10.00: status: { DRDY } Sep 4 21:41:06 Tower kernel: ata10: hard resetting link Sep 4 21:41:07 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 21:41:07 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 21:41:07 Tower kernel: ata10: EH complete Sep 4 21:45:35 Tower sshd[29303]: Accepted password for leech from 24.125.219.59 port 48086 ssh2 Sep 4 21:46:35 Tower sshd[29307]: Received disconnect from 24.125.219.59: 11: Sep 4 22:00:49 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 22:00:49 Tower kernel: ata10.00: failed command: IDENTIFY DEVICE Sep 4 22:00:49 Tower kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 1 pio 512 in Sep 4 22:00:49 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 22:00:49 Tower kernel: ata10.00: status: { DRDY } Sep 4 22:00:49 Tower kernel: ata10: hard resetting link Sep 4 22:00:49 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 22:00:49 Tower kernel: ata9.00: failed command: IDENTIFY DEVICE Sep 4 22:00:49 Tower kernel: ata9.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 1 pio 512 in Sep 4 22:00:49 Tower kernel: res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 22:00:49 Tower kernel: ata9.00: status: { DRDY } Sep 4 22:00:49 Tower kernel: ata9: hard resetting link Sep 4 22:00:50 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 22:00:50 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 22:00:50 Tower kernel: ata10: EH complete Sep 4 22:00:50 Tower kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 4 22:00:50 Tower kernel: ata9.00: configured for UDMA/133 Sep 4 22:00:50 Tower kernel: ata9: EH complete Sep 4 22:01:59 Tower kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Sep 4 22:01:59 Tower kernel: ata10.00: failed command: IDENTIFY DEVICE Sep 4 22:01:59 Tower kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 6 pio 512 in Sep 4 22:01:59 Tower kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 4 22:01:59 Tower kernel: ata10.00: status: { DRDY } Sep 4 22:01:59 Tower kernel: ata10: hard resetting link Sep 4 22:02:00 Tower kernel: ata10: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Sep 4 22:02:00 Tower kernel: ata10.00: configured for UDMA/133 Sep 4 22:02:00 Tower kernel: ata10: EH complete Sep 4 22:34:28 Tower afpd[28622]: acl = acl_get_file(path, ACL_TYPE_ACCESS) failed: No such file or directory Sep 4 22:34:28 Tower afpd[28622]: posix_acls_to_uaperms(path, st, ma) failed: No such file or directory ... Sep 4 22:36:42 Tower afpd[28622]: posix_acls_to_uaperms(path, st, ma) failed: No such file or directory Sep 4 23:00:43 Tower login[4241]: ROOT LOGIN on '/dev/tty1' <snip login failures> Sep 5 00:57:04 Tower kernel: ------------[ cut here ]------------ Sep 5 00:57:04 Tower kernel: kernel BUG at drivers/md/unraid.c:461! Sep 5 00:57:04 Tower kernel: invalid opcode: 0000 [#1] SMP Sep 5 00:57:04 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables md_mod btrfs xor lzo_compress raid6_pq vhost_net vhost tun i2c_i801 e1000e ptp pps_core ahci libahci Sep 5 00:57:04 Tower kernel: CPU: 1 PID: 32027 Comm: kworker/1:4 Not tainted 3.16.0-unRAID #6 Sep 5 00:57:04 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AD2550R/U3S3, BIOS P1.20 07/22/2013 Sep 5 00:57:04 Tower kernel: Workqueue: md submit_flush_data [md_mod] Sep 5 00:57:04 Tower kernel: task: ffff88006670bf00 ti: ffff88010c950000 task.ti: ffff88010c950000 Sep 5 00:57:04 Tower kernel: RIP: 0010:[<ffffffffa015ed23>] [<ffffffffa015ed23>] unraid_make_request+0x34a/0x51e [md_mod]
September 5, 201411 yr Author Well I stand corrected here in that it could be an unRAID driver issue too. however there are some hardware errors that keep cropping up. ata10.00, ata7.00, ata9.00. That's three drives, which could possibly be on the same controller. Without seeing the whole syslog I can't determine what controller and/or hardware. I finally put my system info in my .sig (plus got an avatar) I think you're right that ata7,9, and 10 are a suspect controller; I have already disconnected one of 4 sockets on that card (i.e. #, and I suspect that the 7 onboard sata ports are ata 0-6. Tonight I'll try reorganizing all of my drives on the MB sockets, remove the controller and repeat. My setup crashed today while sitting silent within a couple of hours. If it's system SW I wouldn't expect it to be getting worse. Dennis
September 6, 201411 yr Author From the last crash it appeared that I was having failures on several drive ports from my controller card. Last night i rebuilt my array (with one fewer drive) using only the ports on my MB. Then I set up an rsync copy, during a parity check. The copy did seem to go faster, but the server crashed in about 7 hours. I captured the syslog via ssh but it doesn't have a lot of provocative info to my eye. At least there are no drive errors reported. I'll set up the test again now. Media-Mac:~ media$ ssh [email protected] Last login: Fri Sep 5 18:46:36 2014 Linux 3.16.0-unRAID. root@Tower:~# tail -f /var/log/syslog Sep 5 18:45:39 Tower sshd[7123]: error: Could not get shadow information for NOUSER Sep 5 18:45:39 Tower sshd[7123]: Failed password for invalid user admin from 116.10.191.214 port 11842 ssh2 Sep 5 18:45:39 Tower sshd[7123]: Failed password for invalid user admin from 116.10.191.214 port 11842 ssh2 Sep 5 18:45:39 Tower sshd[7123]: Disconnecting: Too many authentication failures for admin [preauth] Sep 5 18:45:40 Tower sshd[7162]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7102]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7171]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7153]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:46:36 Tower login[4199]: ROOT LOGIN on '/dev/tty2' Sep 5 18:47:13 Tower sshd[7642]: Accepted publickey for root from 192.168.1.1 port xxx ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 5 18:48:24 Tower sshd[8002]: Accepted publickey for root from 192.168.1.1 port xxx ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 5 18:48:57 Tower php: /usr/bin/docker stop Dropbox Sep 5 18:48:59 Tower php: Dropbox Sep 5 18:48:59 Tower php: Sep 5 19:37:43 Tower sshd[8743]: Accepted password for leech from <ip addr> port xxx ssh2 Sep 5 19:38:00 Tower sshd[8747]: Received disconnect from <ip addr>: 11: Sep 5 21:48:37 Tower sshd[11314]: Failed password for root from 122.225.109.101 port 30543 ssh2 Sep 5 21:48:37 Tower sshd[11314]: Failed password for root from 122.225.109.101 port 30543 ssh2 Sep 5 21:48:37 Tower sshd[11314]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:39 Tower sshd[11319]: Failed password for root from 122.225.109.101 port 31183 ssh2 Sep 5 21:48:39 Tower sshd[11319]: Failed password for root from 122.225.109.101 port 31183 ssh2 Sep 5 21:48:39 Tower sshd[11319]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:41 Tower sshd[11326]: Failed password for root from 122.225.109.101 port 31669 ssh2 Sep 5 21:48:42 Tower sshd[11326]: Failed password for root from 122.225.109.101 port 31669 ssh2 Sep 5 21:48:42 Tower sshd[11326]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:44 Tower sshd[11333]: Failed password for root from 122.225.109.101 port 32136 ssh2 Sep 5 21:48:44 Tower sshd[11333]: Failed password for root from 122.225.109.101 port 32136 ssh2 Sep 5 21:48:44 Tower sshd[11333]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:46 Tower sshd[11337]: Failed password for root from 122.225.109.101 port 33091 ssh2 Sep 5 21:48:47 Tower sshd[11337]: Failed password for root from 122.225.109.101 port 33091 ssh2 Sep 5 21:48:47 Tower sshd[11337]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:49 Tower sshd[11341]: Failed password for root from 122.225.109.101 port 33789 ssh2 Sep 5 21:48:49 Tower sshd[11341]: Failed password for root from 122.225.109.101 port 33789 ssh2 Sep 5 21:48:49 Tower sshd[11341]: Disconnecting: Too many authentication failures for root [preauth] <snip failed logins> Sep 5 21:54:00 Tower sshd[13751]: input_userauth_request: invalid user admin [preauth] Sep 5 21:54:01 Tower sshd[13743]: Failed password for root from 116.10.191.214 port 58210 ssh2 Sep 5 21:54:01 Tower sshd[13747]: Failed password for root from 116.10.191.214 port 58209 ssh2 Sep 5 21:54:01 Tower sshd[13751]: error: Could not get shadow information for NOUSER Sep 5 21:54:01 Tower sshd[13751]: Failed password for invalid user admin from 116.10.191.214 port 58402 ssh2 Sep 5 21:54:01 Tower sshd[13743]: Failed password for root from 116.10.191.214 port 58210 ssh2 Sep 5 21:54:01 Tower sshd[13743]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:01 Tower sshd[13747]: Failed password for root from 116.10.191.214 port 58209 ssh2 Sep 5 21:54:01 Tower sshd[13747]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:01 Tower sshd[13751]: Failed password for invalid user admin from 116.10.191.214 port 58402 ssh2 Sep 5 21:54:01 Tower sshd[13751]: Disconnecting: Too many authentication failures for admin [preauth] Sep 5 21:54:02 Tower sshd[13755]: Failed password for root from 116.10.191.214 port 1281 ssh2 Sep 5 21:54:03 Tower sshd[13755]: Failed password for root from 116.10.191.214 port 1281 ssh2 Sep 5 21:54:03 Tower sshd[13755]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:05 Tower sshd[13785]: Invalid user admin from 116.10.191.214 Sep 5 21:54:05 Tower sshd[13785]: input_userauth_request: invalid user admin [preauth] Sep 5 21:54:06 Tower sshd[13785]: error: Could not get shadow information for NOUSER Sep 5 21:54:06 Tower sshd[13785]: Failed password for invalid user admin from 116.10.191.214 port 3993 ssh2 Sep 5 21:54:06 Tower sshd[13785]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 21:54:06 Tower sshd[13777]: fatal: Write failed: Connection reset by peer [preauth] Sep 5 21:54:06 Tower sshd[13781]: fatal: Write failed: Connection reset by peer [preauth] Sep 5 21:54:07 Tower sshd[13789]: fatal: Write failed: Connection reset by peer [preauth] Read from remote host amor.us.to: Operation timed out Connection to <ip> closed. It is curious that the failed logins don't come in steadily through the night, the large majority came in in a six minute period preceding the crash. Here's TOP during the crash: Last login: Fri Sep 5 18:46:36 2014 Linux 3.16.0-unRAID. root@Tower:~# tail -f /var/log/syslog Sep 5 18:45:39 Tower sshd[7123]: error: Could not get shadow information for NOUSER Sep 5 18:45:39 Tower sshd[7123]: Failed password for invalid user admin from 116.10.191.214 port 11842 ssh2 Sep 5 18:45:39 Tower sshd[7123]: Failed password for invalid user admin from 116.10.191.214 port 11842 ssh2 Sep 5 18:45:39 Tower sshd[7123]: Disconnecting: Too many authentication failures for admin [preauth] Sep 5 18:45:40 Tower sshd[7162]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7102]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7171]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:45:41 Tower sshd[7153]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 18:46:36 Tower login[4199]: ROOT LOGIN on '/dev/tty2' Sep 5 18:47:13 Tower sshd[7642]: Accepted publickey for root from 192.168.1.1 port xxx ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 5 18:48:24 Tower sshd[8002]: Accepted publickey for root from 192.168.1.1 port xxx ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 5 18:48:57 Tower php: /usr/bin/docker stop Dropbox Sep 5 18:48:59 Tower php: Dropbox Sep 5 18:48:59 Tower php: Sep 5 19:37:43 Tower sshd[8743]: Accepted password for leech from 24.125.219.59 port 39801 ssh2 Sep 5 19:38:00 Tower sshd[8747]: Received disconnect from 24.125.219.59: 11: Sep 5 21:48:37 Tower sshd[11314]: Failed password for root from 122.225.109.101 port 30543 ssh2 Sep 5 21:48:37 Tower sshd[11314]: Failed password for root from 122.225.109.101 port 30543 ssh2 Sep 5 21:48:37 Tower sshd[11314]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:39 Tower sshd[11319]: Failed password for root from 122.225.109.101 port 31183 ssh2 Sep 5 21:48:39 Tower sshd[11319]: Failed password for root from 122.225.109.101 port 31183 ssh2 Sep 5 21:48:39 Tower sshd[11319]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:41 Tower sshd[11326]: Failed password for root from 122.225.109.101 port 31669 ssh2 Sep 5 21:48:42 Tower sshd[11326]: Failed password for root from 122.225.109.101 port 31669 ssh2 Sep 5 21:48:42 Tower sshd[11326]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:44 Tower sshd[11333]: Failed password for root from 122.225.109.101 port 32136 ssh2 Sep 5 21:48:44 Tower sshd[11333]: Failed password for root from 122.225.109.101 port 32136 ssh2 Sep 5 21:48:44 Tower sshd[11333]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:46 Tower sshd[11337]: Failed password for root from 122.225.109.101 port 33091 ssh2 Sep 5 21:48:47 Tower sshd[11337]: Failed password for root from 122.225.109.101 port 33091 ssh2 Sep 5 21:48:47 Tower sshd[11337]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:48:49 Tower sshd[11341]: Failed password for root from 122.225.109.101 port 33789 ssh2 Sep 5 21:48:49 Tower sshd[11341]: Failed password for root from 122.225.109.101 port 33789 ssh2 Sep 5 21:48:49 Tower sshd[11341]: Disconnecting: Too many authentication failures for root [preauth] <snip failed logins> Sep 5 21:54:00 Tower sshd[13751]: input_userauth_request: invalid user admin [preauth] Sep 5 21:54:01 Tower sshd[13743]: Failed password for root from 116.10.191.214 port 58210 ssh2 Sep 5 21:54:01 Tower sshd[13747]: Failed password for root from 116.10.191.214 port 58209 ssh2 Sep 5 21:54:01 Tower sshd[13751]: error: Could not get shadow information for NOUSER Sep 5 21:54:01 Tower sshd[13751]: Failed password for invalid user admin from 116.10.191.214 port 58402 ssh2 Sep 5 21:54:01 Tower sshd[13743]: Failed password for root from 116.10.191.214 port 58210 ssh2 Sep 5 21:54:01 Tower sshd[13743]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:01 Tower sshd[13747]: Failed password for root from 116.10.191.214 port 58209 ssh2 Sep 5 21:54:01 Tower sshd[13747]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:01 Tower sshd[13751]: Failed password for invalid user admin from 116.10.191.214 port 58402 ssh2 Sep 5 21:54:01 Tower sshd[13751]: Disconnecting: Too many authentication failures for admin [preauth] Sep 5 21:54:02 Tower sshd[13755]: Failed password for root from 116.10.191.214 port 1281 ssh2 Sep 5 21:54:03 Tower sshd[13755]: Failed password for root from 116.10.191.214 port 1281 ssh2 Sep 5 21:54:03 Tower sshd[13755]: Disconnecting: Too many authentication failures for root [preauth] Sep 5 21:54:05 Tower sshd[13785]: Invalid user admin from 116.10.191.214 Sep 5 21:54:05 Tower sshd[13785]: input_userauth_request: invalid user admin [preauth] Sep 5 21:54:06 Tower sshd[13785]: error: Could not get shadow information for NOUSER Sep 5 21:54:06 Tower sshd[13785]: Failed password for invalid user admin from 116.10.191.214 port 3993 ssh2 Sep 5 21:54:06 Tower sshd[13785]: fatal: Read from socket failed: Connection reset by peer [preauth] Sep 5 21:54:06 Tower sshd[13777]: fatal: Write failed: Connection reset by peer [preauth] Sep 5 21:54:06 Tower sshd[13781]: fatal: Write failed: Connection reset by peer [preauth] Sep 5 21:54:07 Tower sshd[13789]: fatal: Write failed: Connection reset by peer [preauth] Read from remote host <1p>: Operation timed out Connection to <ip> closed.
September 6, 201411 yr Author OK, this looks like an informative crash. unRaid 6b8, hardware below except the controller unplugged. plugins apcupsd, cache_dirs running. Parity check running (server hasn't been able long enough to complete one!) SMB/NFS and AFP running but not in use. Started rsync (-avh with copy of large files from a reiserfs array volume to an xfs volume) started remote terminal with tail -f capture and another with top. Hard crash happened in 90 minutes. Here's the log: root@Tower:~# tail -f /var/log/syslog Sep 6 06:55:28 Tower emhttp: shcmd (58): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Sep 6 06:55:28 Tower avahi-daemon[3880]: Files changed, reloading. Sep 6 06:55:28 Tower avahi-daemon[3880]: Service group file /etc/avahi/services/afp.service changed, reloading. Sep 6 06:55:28 Tower emhttp: shcmd (59): /usr/local/sbin/emhttp_event svcs_restarted Sep 6 06:55:28 Tower emhttp_event: svcs_restarted Sep 6 06:55:28 Tower emhttp: shcmd (60): /usr/local/sbin/emhttp_event started Sep 6 06:55:28 Tower emhttp_event: started Sep 6 06:55:28 Tower rc.unRAID[4832][4833]: Processing /etc/rc.d/rc.unRAID.d/ start scripts. Sep 6 06:55:28 Tower avahi-daemon[3880]: Service "Tower" (/etc/avahi/services/smb.service) successfully established. Sep 6 06:55:29 Tower avahi-daemon[3880]: Service "Tower-AFP" (/etc/avahi/services/afp.service) successfully established. Sep 6 06:55:39 Tower kernel: docker0: port 1(veth9241) entered forwarding state Sep 6 06:55:39 Tower kernel: docker0: port 2(veth59d2) entered forwarding state Sep 6 06:55:40 Tower kernel: docker0: port 3(veth38c6) entered forwarding state Sep 6 06:56:00 Tower cache_dirs: ============================================== Sep 6 06:56:00 Tower cache_dirs: command-args=-w -m 1 -M 10 -d 6 -p 10 -U 50000 -i /mnt -B -a -noleaf Sep 6 06:56:00 Tower cache_dirs: vfs_cache_pressure=10 Sep 6 06:56:00 Tower cache_dirs: max_seconds=10, min_seconds=1 Sep 6 06:56:00 Tower cache_dirs: max_depth=6 Sep 6 06:56:00 Tower cache_dirs: command=find -noleaf Sep 6 06:56:00 Tower cache_dirs: version=1.6.9 Sep 6 06:56:00 Tower cache_dirs: ---------- caching directories --------------- Sep 6 06:56:00 Tower cache_dirs: Sep 6 06:56:00 Tower cache_dirs: ---------------------------------------------- Sep 6 06:56:00 Tower cache_dirs: ERROR: included directory "/mnt" does not exist. Sep 6 06:56:00 Tower cache_dirs: cache_dirs process ID 5267 started, To terminate it, type: cache_dirs -q Sep 6 06:56:16 Tower sshd[5363]: Accepted publickey for root from 192.168.1.1 port xxxx ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 6 07:14:35 Tower login[4239]: ROOT LOGIN on '/dev/tty1' Sep 6 07:30:12 Tower sshd[17439]: Accepted publickey for root from 192.168.1.144 port xxxx ssh2: RSA 02:cb:cf:9d:44:aa:04:49:7d:2c:9f:a9:00:da:ef:92 Sep 6 08:11:31 Tower kernel: ------------[ cut here ]------------ Sep 6 08:11:31 Tower kernel: kernel BUG at drivers/md/unraid.c:462! Sep 6 08:11:31 Tower kernel: invalid opcode: 0000 [#1] SMP Sep 6 08:11:31 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables md_mod btrfs xor lzo_compress raid6_pq vhost_net vhost tun i2c_i801 e1000e ptp pps_core ahci libahci Sep 6 08:11:31 Tower kernel: CPU: 0 PID: 13420 Comm: kworker/0:6 Not tainted 3.16.0-unRAID #6 Sep 6 08:11:31 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AD2550R/U3S3, BIOS P1.20 07/22/2013 Sep 6 08:11:31 Tower kernel: Workqueue: md submit_flush_data [md_mod] Sep 6 08:11:31 Tower kernel: task: ffff88006fd246e0 ti: ffff88010b19c000 task.ti: ffff88010b19c000 Sep 6 08:11:31 Tower kernel: RIP: 0010:[<ffffffffa018bd2c>] [<ffffffffa018bd2c>] unraid_make_request+0x353/0x51e [md_mod] Sep 6 08:11:31 Tower kernel: RSP: 0018:ffff88010b19fd90 EFLAGS: 00010286 Sep 6 08:11:31 Tower kernel: RAX: ffff880119240120 RBX: ffff8800b1c51d00 RCX: 000000000000508e Sep 6 08:11:31 Tower kernel: RDX: 000000000000508e RSI: 000000000000508e RDI: ffff880119240044 Sep 6 08:11:31 Tower kernel: RBP: ffff88010b19fde0 R08: 0000000000000046 R09: ffff88006fd246e0 Sep 6 08:11:31 Tower kernel: R10: 000000058107dae0 R11: 000000000000b6f6 R12: 0000000000000001 Sep 6 08:11:31 Tower kernel: R13: ffff8800da846a80 R14: ffff880119240000 R15: ffff8800da846a80 Sep 6 08:11:31 Tower kernel: FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 Sep 6 08:11:31 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 6 08:11:31 Tower kernel: CR2: 00007fabe9d511de CR3: 00000000515d5000 CR4: 00000000000007f0 Sep 6 08:11:31 Tower kernel: Stack: Sep 6 08:11:31 Tower kernel: 0000000000000001 ffff880119240120 ffff880119240044 000000007486ab13 Sep 6 08:11:31 Tower kernel: 000000007486aad0 ffff8800b1c51a20 ffff8800b1c51d00 ffff8800da846a80 Sep 6 08:11:31 Tower kernel: ffff8800b1c51a00 0000000000000000 ffff88010b19fe10 ffffffffa018bf74 Sep 6 08:11:31 Tower kernel: Call Trace: Sep 6 08:11:31 Tower kernel: [<ffffffffa018bf74>] submit_flush_data+0x7d/0x93 [md_mod] Sep 6 08:11:31 Tower kernel: [<ffffffff81052ea9>] process_one_work+0x20d/0x332 Sep 6 08:11:31 Tower kernel: [<ffffffff81053b5d>] worker_thread+0x392/0x507 Sep 6 08:11:31 Tower kernel: [<ffffffff810537cb>] ? cancel_delayed_work_sync+0x10/0x10 Sep 6 08:11:31 Tower kernel: [<ffffffff810588de>] kthread+0xd6/0xde Sep 6 08:11:31 Tower kernel: [<ffffffff81058808>] ? kthread_create_on_node+0x168/0x168 Sep 6 08:11:31 Tower kernel: [<ffffffff815582fc>] ret_from_fork+0x7c/0xb0 Sep 6 08:11:31 Tower kernel: [<ffffffff81058808>] ? kthread_create_on_node+0x168/0x168 Sep 6 08:11:31 Tower kernel: Code: 46 44 48 89 c7 48 89 45 c0 e8 cb be 3c e1 48 8b 45 b8 48 83 78 18 00 74 02 0f 0b 48 83 78 20 00 74 02 0f 0b 48 83 78 28 00 74 02 <0f> 0b 48 8b 50 08 80 e6 df 48 89 50 08 f6 43 18 01 75 06 48 89 Sep 6 08:11:31 Tower kernel: RIP [<ffffffffa018bd2c>] unraid_make_request+0x353/0x51e [md_mod] Sep 6 08:11:31 Tower kernel: RSP <ffff88010b19fd90> Sep 6 08:11:31 Tower kernel: ---[ end trace 0a6b223d28c9bb17 ]--- Sep 6 08:11:31 Tower kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8 Sep 6 08:11:31 Tower kernel: IP: [<ffffffff81058e81>] kthread_data+0xc/0x11 Sep 6 08:11:31 Tower kernel: PGD 1742067 PUD 1744067 PMD 0 Sep 6 08:11:31 Tower kernel: Oops: 0000 [#2] SMP the console pic is attached. Top shows top - 08:11:38 up 1:17, 4 users, load average: 9.89, 9.42, 9.39 Tasks: 293 total, 2 running, 291 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3%us, 0.1%sy, 0.0%ni, 66.5%id, 33.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4038012k total, 3762660k used, 275352k free, 126176k buffers Swap: 0k total, 0k used, 0k free, 2089068k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5401 root 20 0 13412 2408 1972 R 1 0.1 0:40.27 top 4752 nobody 20 0 1484m 36m 0 S 0 0.9 3:09.42 sabnzbdplus 1 root 20 0 4364 1608 1508 S 0 0.0 0:26.63 init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:14.86 ksoftirqd/0 4 root 20 0 0 0 0 S 0 0.0 0:02.51 kworker/0:0 5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0 0.0 0:19.80 rcu_sched 8 root 20 0 0 0 0 S 0 0.0 0:00.00 rcu_bh 9 root RT 0 0 0 0 S 0 0.0 0:00.33 migration/0 10 root RT 0 0 0 0 S 0 0.0 0:00.75 migration/1 11 root 20 0 0 0 0 S 0 0.0 0:13.82 ksoftirqd/1 13 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/1:0H 14 root RT 0 0 0 0 S 0 0.0 0:00.15 migration/2 15 root 20 0 0 0 0 S 0 0.0 0:14.03 ksoftirqd/2 16 root 20 0 0 0 0 S 0 0.0 0:02.32 kworker/2:0 17 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/2:0H -=-=-=-= I note the bug notice: Sep 6 08:11:31 Tower kernel: kernel BUG at drivers/md/unraid.c:462! Sep 6 08:11:31 Tower kernel: invalid opcode: 0000 [#1] SMP Sep 6 08:11:31 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables md_mod btrfs xor lzo_compress raid6_pq vhost_net vhost tun i2c_i801 e1000e ptp pps_core ahci libahci This is similar to the bug in a previous crash except it was 461 then http://lime-technology.com/forum/index.php?topic=35043.msg326019#msg326019 I think this is pretty good evidence that there is an issue with the referenced unRAID driver. I also note that the previous drive/controller issues are gone. I've ordered a new controller... Thanks Dennis
September 6, 201411 yr Author And another informative crash, with same unraid.c driver error. This time the rsync was from a reiserfs to another resierfs formatted volume; before I was copying to an xfs system. The crash was 16 minutes into a copy. Linux 3.16.0-unRAID. root@Tower:~# tail -f /var/log/syslog Sep 6 09:41:49 Tower emhttp: shcmd (36): set -o pipefail ; mount -t reiserfs -o remount,user_xattr,acl,noatime,nodiratime /dev/md4 /mnt/disk4 |& logger Sep 6 09:41:49 Tower emhttp: shcmd (37): mkdir -p /mnt/disk5 Sep 6 09:41:49 Tower emhttp: shcmd (38): set -o pipefail ; mount -t xfs -o noatime,nodiratime /dev/md5 /mnt/disk5 |& logger Sep 6 09:41:49 Tower kernel: XFS (md5): Mounting V4 Filesystem Sep 6 09:41:50 Tower kernel: XFS (md5): Starting recovery (logdev: internal) Sep 6 09:41:50 Tower emhttp: shcmd (39): mkdir -p /mnt/cache Sep 6 09:41:50 Tower kernel: XFS (md5): Ending recovery (logdev: internal) Sep 6 09:41:50 Tower emhttp: shcmd (40): set -o pipefail ; mount -t btrfs -o noatime,nodiratime /dev/sdh1 /mnt/cache |& logger Sep 6 09:41:50 Tower kernel: BTRFS info (device sdh1): disk space caching is enabled Sep 6 09:41:50 Tower kernel: BTRFS: detected SSD devices, enabling SSD mode Sep 6 09:41:51 Tower emhttp: shcmd (41): mkdir /mnt/user0 Sep 6 09:41:51 Tower emhttp: shcmd (42): /usr/local/sbin/shfs /mnt/user0 -disks 16777214 -o noatime,big_writes,allow_other |& logger Sep 6 09:41:51 Tower emhttp: shcmd (43): mkdir /mnt/user Sep 6 09:41:51 Tower emhttp: shcmd (44): /usr/local/sbin/shfs /mnt/user -disks 16777215 1024 -o noatime,big_writes,allow_other -o remember=330 |& logger Sep 6 09:41:51 Tower emhttp: shcmd (45): crontab -c /etc/cron.d - <<< "# Generated mover schedule: 30 2 * * 5 /usr/local/sbin/mover |& logger" Sep 6 09:41:51 Tower emhttp: shcmd (46): /usr/local/sbin/emhttp_event disks_mounted Sep 6 09:41:51 Tower emhttp_event: disks_mounted Sep 6 09:41:51 Tower logger: Starting docker.io Sep 6 09:41:51 Tower logger: Maybe expand image file Sep 6 09:41:51 Tower kernel: BTRFS: device fsid 0ab81931-36e8-49c4-924b-6ca691a81a8c devid 1 transid 2827 /dev/loop8 Sep 6 09:41:51 Tower kernel: BTRFS info (device loop8): disk space caching is enabled Sep 6 09:41:51 Tower logger: Resize '/var/lib/docker' of 'max' Sep 6 09:41:51 Tower logger: starting docker ... Sep 6 09:41:51 Tower kernel: BTRFS: new size for /dev/loop8 is 8589934592 Sep 6 09:41:51 Tower kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Sep 6 09:41:52 Tower avahi-daemon[3913]: Joining mDNS multicast group on interface docker0.IPv4 with address 172.17.42.1. Sep 6 09:41:52 Tower avahi-daemon[3913]: New relevant interface docker0.IPv4 for mDNS. Sep 6 09:41:52 Tower avahi-daemon[3913]: Registering new address record for 172.17.42.1 on docker0.IPv4. Sep 6 09:41:53 Tower kernel: device veth0595 entered promiscuous mode Sep 6 09:41:53 Tower avahi-daemon[3913]: Withdrawing workstation service for veth2b02. Sep 6 09:41:53 Tower kernel: docker0: port 1(veth0595) entered forwarding state Sep 6 09:41:53 Tower kernel: docker0: port 1(veth0595) entered forwarding state Sep 6 09:41:53 Tower logger: CouchPotato: started succesfully! Sep 6 09:41:54 Tower logger: Dropbox: started succesfully! Sep 6 09:41:54 Tower kernel: device veth6eca entered promiscuous mode Sep 6 09:41:54 Tower kernel: docker0: port 2(veth6eca) entered forwarding state Sep 6 09:41:54 Tower kernel: docker0: port 2(veth6eca) entered forwarding state Sep 6 09:41:54 Tower avahi-daemon[3913]: Withdrawing workstation service for veth9afa. Sep 6 09:41:54 Tower kernel: docker0: port 2(veth6eca) entered disabled state Sep 6 09:41:54 Tower kernel: docker0: port 2(veth6eca) entered forwarding state Sep 6 09:41:54 Tower kernel: docker0: port 2(veth6eca) entered forwarding state Sep 6 09:41:54 Tower logger: SABnzbd: started succesfully! Sep 6 09:41:55 Tower kernel: device vethaa01 entered promiscuous mode Sep 6 09:41:55 Tower kernel: docker0: port 3(vethaa01) entered forwarding state Sep 6 09:41:55 Tower kernel: docker0: port 3(vethaa01) entered forwarding state Sep 6 09:41:55 Tower avahi-daemon[3913]: Withdrawing workstation service for veth473e. Sep 6 09:41:55 Tower logger: SickBeard: started succesfully! Sep 6 09:41:55 Tower kernel: mdcmd (41): check CORRECT Sep 6 09:41:55 Tower kernel: md: recovery thread woken up ... Sep 6 09:41:55 Tower kernel: md: recovery thread syncing parity disk ... Sep 6 09:41:55 Tower kernel: md: using 1536k window, over a total of 3907018532 blocks. Sep 6 09:41:57 Tower ntpd[1379]: Listen normally on 3 docker0 172.17.42.1 UDP 123 Sep 6 09:41:57 Tower ntpd[1379]: peers refreshed Sep 6 09:41:57 Tower ntpd[1379]: new interface(s) found: waking up resolver Sep 6 09:41:57 Tower emhttp: shcmd (47): :>/etc/samba/smb-shares.conf Sep 6 09:41:57 Tower emhttp: shcmd (48): cp /etc/exports- /etc/exports Sep 6 09:41:57 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:57 Tower emhttp: shcmd (49): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default Sep 6 09:41:57 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:57 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:57 Tower emhttp: shcmd (50): echo '"/mnt/user/Media" -async,no_subtree_check,fsid=101 *(sec=sys,ro,insecure,anongid=100,anonuid=99,all_squash)' >>/etc/exports Sep 6 09:41:57 Tower emhttp: shcmd (51): chmod 777 '/mnt/user/junk' Sep 6 09:41:57 Tower emhttp: shcmd (52): chown 'nobody':'users' '/mnt/user/junk' Sep 6 09:41:57 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:57 Tower emhttp: Restart SMB... Sep 6 09:41:57 Tower emhttp: shcmd (53): killall -HUP smbd Sep 6 09:41:57 Tower emhttp: shcmd (54): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service Sep 6 09:41:57 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:57 Tower avahi-daemon[3913]: Service group file /etc/avahi/services/smb.service changed, reloading. Sep 6 09:41:57 Tower emhttp: shcmd (55): ps axc | grep -q rpc.mountd Sep 6 09:41:57 Tower emhttp: _shcmd: shcmd (55): exit status: 1 Sep 6 09:41:57 Tower emhttp: Start NFS... Sep 6 09:41:57 Tower emhttp: shcmd (56): /etc/rc.d/rc.nfsd start |& logger Sep 6 09:41:57 Tower logger: Starting NFS server daemons: Sep 6 09:41:57 Tower logger: /usr/sbin/exportfs -r Sep 6 09:41:57 Tower logger: /usr/sbin/rpc.nfsd 8 Sep 6 09:41:57 Tower logger: /usr/sbin/rpc.mountd Sep 6 09:41:57 Tower rpc.mountd[4826]: Version 1.2.8 starting Sep 6 09:41:57 Tower emhttp: Restart AFP... Sep 6 09:41:57 Tower emhttp: shcmd (57): killall -HUP afpd Sep 6 09:41:58 Tower emhttp: shcmd (58): cp /etc/avahi/services/afp.service- /etc/avahi/services/afp.service Sep 6 09:41:58 Tower avahi-daemon[3913]: Files changed, reloading. Sep 6 09:41:58 Tower avahi-daemon[3913]: Service group file /etc/avahi/services/afp.service changed, reloading. Sep 6 09:41:58 Tower emhttp: shcmd (59): /usr/local/sbin/emhttp_event svcs_restarted Sep 6 09:41:58 Tower emhttp_event: svcs_restarted Sep 6 09:41:58 Tower emhttp: shcmd (60): /usr/local/sbin/emhttp_event started Sep 6 09:41:58 Tower emhttp_event: started Sep 6 09:41:58 Tower rc.unRAID[4836][4837]: Processing /etc/rc.d/rc.unRAID.d/ start scripts. Sep 6 09:41:58 Tower avahi-daemon[3913]: Service "Tower" (/etc/avahi/services/smb.service) successfully established. Sep 6 09:41:58 Tower avahi-daemon[3913]: Service "Tower-AFP" (/etc/avahi/services/afp.service) successfully established. Sep 6 09:42:00 Tower cache_dirs: ============================================== Sep 6 09:42:00 Tower cache_dirs: command-args=-w -m 1 -M 10 -d 6 -p 10 -U 50000 -i /mnt/ -B -a -noleaf Sep 6 09:42:00 Tower cache_dirs: vfs_cache_pressure=10 Sep 6 09:42:00 Tower cache_dirs: max_seconds=10, min_seconds=1 Sep 6 09:42:00 Tower cache_dirs: max_depth=6 Sep 6 09:42:00 Tower cache_dirs: command=find -noleaf Sep 6 09:42:00 Tower cache_dirs: version=1.6.9 Sep 6 09:42:00 Tower cache_dirs: ---------- caching directories --------------- Sep 6 09:42:00 Tower cache_dirs: Sep 6 09:42:00 Tower cache_dirs: ---------------------------------------------- Sep 6 09:42:00 Tower cache_dirs: ERROR: included directory "/mnt/" does not exist. Sep 6 09:42:00 Tower cache_dirs: cache_dirs process ID 4907 started, To terminate it, type: cache_dirs -q Sep 6 09:42:08 Tower kernel: docker0: port 1(veth0595) entered forwarding state Sep 6 09:42:09 Tower kernel: docker0: port 2(veth6eca) entered forwarding state Sep 6 09:42:10 Tower kernel: docker0: port 3(vethaa01) entered forwarding state Sep 6 09:42:46 Tower sshd[5373]: Accepted publickey for root from 192.168.1.1 port 49860 ssh2: RSA 36:1e:36:55:3c:a6:e9:c4:03:0c:7a:78:7f:a9:3f:57 Sep 6 09:43:18 Tower login[4237]: ROOT LOGIN on '/dev/tty1' Sep 6 09:58:18 Tower kernel: ------------[ cut here ]------------ Sep 6 09:58:18 Tower kernel: kernel BUG at drivers/md/unraid.c:461! Sep 6 09:58:18 Tower kernel: invalid opcode: 0000 [#1] SMP Sep 6 09:58:18 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables md_mod btrfs xor lzo_compress raid6_pq vhost_net vhost tun i2c_i801 e1000e ptp pps_core ahci libahci Sep 6 09:58:18 Tower kernel: CPU: 0 PID: 6980 Comm: kworker/0:5 Not tainted 3.16.0-unRAID #6 Sep 6 09:58:18 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AD2550R/U3S3, BIOS P1.20 07/22/2013 Sep 6 09:58:18 Tower kernel: Workqueue: md submit_flush_data [md_mod] Sep 6 09:58:18 Tower kernel: task: ffff8800d80e87e0 ti: ffff88010ad10000 task.ti: ffff88010ad10000 Sep 6 09:58:18 Tower kernel: RIP: 0010:[<ffffffffa018bd23>] [<ffffffffa018bd23>] unraid_make_request+0x34a/0x51e [md_mod] Sep 6 09:58:18 Tower kernel: RSP: 0018:ffff88010ad13d90 EFLAGS: 00010286 Sep 6 09:58:18 Tower kernel: RAX: ffff8800d8d4bc90 RBX: ffff88008b671500 RCX: 0000000000000004 Sep 6 09:58:18 Tower kernel: RDX: 000000000000bbe4 RSI: 0000000000000001 RDI: ffff8800d8d4bbb4 Sep 6 09:58:18 Tower kernel: RBP: ffff88010ad13de0 R08: 0000000000000004 R09: ffff880119ae1808 Sep 6 09:58:18 Tower kernel: R10: ffff8800d8d4bb80 R11: 0000000000000003 R12: 0000000000000001 Sep 6 09:58:18 Tower kernel: R13: ffff8800d7c4a680 R14: ffff8800d8d4bb70 R15: ffff8800d7c4a680 Sep 6 09:58:18 Tower kernel: FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 Sep 6 09:58:18 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 6 09:58:18 Tower kernel: CR2: 00002acb30fb8000 CR3: 000000006e7dc000 CR4: 00000000000007f0 Sep 6 09:58:18 Tower kernel: Stack: Sep 6 09:58:18 Tower kernel: 0000000000000001 ffff8800d8d4bc90 ffff8800d8d4bbb4 00000000748b4623 Sep 6 09:58:18 Tower kernel: 00000000748b45e0 ffff88008fe07420 ffff88008b671500 ffff8800d7c4a680 Sep 6 09:58:18 Tower kernel: ffff88008fe07400 0000000000000000 ffff88010ad13e10 ffffffffa018bf74 Sep 6 09:58:18 Tower kernel: Call Trace: Sep 6 09:58:18 Tower kernel: [<ffffffffa018bf74>] submit_flush_data+0x7d/0x93 [md_mod] Sep 6 09:58:18 Tower kernel: [<ffffffff81052ea9>] process_one_work+0x20d/0x332 Sep 6 09:58:18 Tower kernel: [<ffffffff81053b5d>] worker_thread+0x392/0x507 Sep 6 09:58:18 Tower kernel: [<ffffffff810537cb>] ? cancel_delayed_work_sync+0x10/0x10 Sep 6 09:58:18 Tower kernel: [<ffffffff810588de>] kthread+0xd6/0xde Sep 6 09:58:18 Tower kernel: [<ffffffff81058808>] ? kthread_create_on_node+0x168/0x168 Sep 6 09:58:18 Tower kernel: [<ffffffff815582fc>] ret_from_fork+0x7c/0xb0 Sep 6 09:58:18 Tower kernel: [<ffffffff81058808>] ? kthread_create_on_node+0x168/0x168 Sep 6 09:58:18 Tower kernel: Code: 02 0f 0b 48 89 45 b8 49 8d 46 44 48 89 c7 48 89 45 c0 e8 cb be 3c e1 48 8b 45 b8 48 83 78 18 00 74 02 0f 0b 48 83 78 20 00 74 02 <0f> 0b 48 83 78 28 00 74 02 0f 0b 48 8b 50 08 80 e6 df 48 89 50 Sep 6 09:58:18 Tower kernel: RIP [<ffffffffa018bd23>] unraid_make_request+0x34a/0x51e [md_mod] Sep 6 09:58:18 Tower kernel: RSP <ffff88010ad13d90> Sep 6 09:58:18 Tower kernel: ---[ end trace 33fcc705af08aae6 ]--- Sep 6 09:58:18 Tower kernel: BUG: unable to handle kernel paging request at ffffffffffffffd8 Sep 6 09:58:48 Tower kernel: IP: [<ffffffff81058e81>] kthread_data+0xc/0x11 Sep 6 09:58:48 Tower kernel: PGD 1742067 PUD 1744067 PMD 0 Sep 6 09:58:48 Tower kernel: Oops: 0000 [#2] SMP Sep 6 09:58:48 Tower kernel: Modules linked in: veth xt_nat ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat iptable_filter ip_tables md_mod btrfs xor lzo_compress raid6_pq vhost_net vhost tun i2c_i801 e1000e ptp pps_core ahci libahci Sep 6 09:58:48 Tower kernel: CPU: 0 PID: 6980 Comm: kworker/0:5 Tainted: G D 3.16.0-unRAID #6 Sep 6 09:58:48 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AD2550R/U3S3, BIOS P1.20 07/22/2013 Sep 6 09:58:48 Tower kernel: task: ffff8800d80e87e0 ti: ffff88010ad10000 task.ti: ffff88010ad10000 Sep 6 09:58:48 Tower kernel: RIP: 0010:[<ffffffff81058e81>] [<ffffffff81058e81>] kthread_data+0xc/0x11 Sep 6 09:58:48 Tower kernel: RSP: 0018:ffff88010ad139d8 EFLAGS: 00010002 Sep 6 09:58:48 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff818f1000 Sep 6 09:58:48 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800d80e87e0 Sep 6 09:58:48 Tower kernel: RBP: ffff88010ad139e8 R08: 00000000001016ff R09: 000000000000b88e Sep 6 09:58:48 Tower kernel: R10: 0000000000002a17 R11: 000000000000ba23 R12: 0000000000000000 Sep 6 09:58:48 Tower kernel: R13: ffff8800d80e8c08 R14: ffff88010ad13b40 R15: ffff8800d80e87e0 Sep 6 09:58:48 Tower kernel: FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 Sep 6 09:58:48 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 6 09:58:48 Tower kernel: CR2: 0000000000000028 CR3: 00000000d9e8c000 CR4: 00000000000007f0 Sep 6 09:58:48 Tower kernel: Stack: Sep 6 09:58:48 Tower kernel: ffffffff81053dcb ffff88011fc12c40 ffff88010ad13ae8 ffffffff81554a3c Sep 6 09:58:48 Tower kernel: ffff88010ad13fd8 ffff8800d80e87e0 0000000000012c40 ffff88011ac04240 Sep 6 09:58:48 Tower kernel: ffff88010ad13a50 0000000000000046 0000000000000001 ffff8800893f9900 <truncated> There is a bunch of noise after the start of the crash that I had to truncate, but the complete syslog is attached. Note the driver fault cited for unraid.c: Sep 6 09:58:18 Tower kernel: kernel BUG at drivers/md/unraid.c:461! Sep 6 09:58:18 Tower kernel: invalid opcode: 0000 [#1] SMP this is on 6b8, but the crashes started for me in 6b7 while migrating drives to newly xfs formatted drives. Never had a crash in 6b6 or earlier. syslog_9_6.txt
September 6, 201411 yr Thank you for the very detailed report! Yes I found a race condition in the unraid driver. This only happens when xfs is the target (but bug is not with xfs, it's with unraid driver). Fix coming...
September 14, 201411 yr Author I haven't heard from L-T that this bug is fixed, but after an upgrade to 6b9 I have been able to transfer 1.2 TB of data across various drives, during a parity check. I'm not sure if this issue should be marked 'solved' but it is a good sign. pducharme and jphipps both had similar issues that might be worth comparing. And by the way, the SYBA 4x SATA III card that was acting up was replaced by an identical unit and seems to be working well; I have no complaints about this inexpensive card. Dennis
September 14, 201411 yr I haven't heard from L-T that this bug is fixed, but after an upgrade to 6b9 I have been able to transfer 1.2 TB of data across various drives, during a parity check. It's this line in the -beta9 change log: - unraid: fix overlapped i/o detection race condition Marking this solved.
Archived
This topic is now archived and is closed to further replies.