hmatos Posted May 30, 2012 Share Posted May 30, 2012 I have been running great under all versions of 5.0 (currently at RC3) until a recent hardware swap. I'm hoping someone can point me to what's causing the issue. I recently moved a functioning array into the configuration below. I've only been able to successfully generate parity once. I've run into a situation where running preclear_disk.sh on two drives which smartctl returned pending errors on would cause the system to lock up (except for the unix prompt). I've since given up on those two drives and removed them from the array (plugged in, but not part of array). The error below has been a constant throughout the build-out process; result is always the same, UnRAID locks up (emhttp) and the only thing I can do is manually power down the server. May 29 22:20:27 Media-Server kernel: sas: command 0xec464e40, task 0xec4a72c0, timed out: BLK_EH_NOT_HANDLED May 29 22:20:27 Media-Server kernel: sas: Enter sas_scsi_recover_host May 29 22:20:27 Media-Server kernel: sas: trying to find task 0xec4a72c0 May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: aborting task 0xec4a72c0 May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7680000 task=ec4a72c0 slot=f76916a8 slot_idx=x4 May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: querying task 0xec4a72c0 May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: task 0xec4a72c0 failed to abort May 29 22:20:27 Media-Server kernel: sas: task 0xec4a72c0 is not at LU: I_T recover May 29 22:20:27 Media-Server kernel: sas: I_T nexus reset for dev 0000000000000000 May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800. May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001 May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800. May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1081 May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800. May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x10000 May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0] May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 0 attach dev info is 0 May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 0 attach sas addr is 0 May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 0 byte dmaded. May 29 22:20:28 Media-Server kernel: sas: sas_form_port: phy0 belongs to port0 already(1)! May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 May 29 22:20:30 Media-Server kernel: sas: I_T 0000000000000000 recovered May 29 22:20:30 Media-Server kernel: sas: sas_ata_task_done: SAS error 8d May 29 22:20:30 Media-Server kernel: ata15: sas eh calling libata port error handler May 29 22:20:30 Media-Server kernel: ata15.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0 May 29 22:20:30 Media-Server kernel: ata15.00: failed command: CHECK POWER MODE May 29 22:20:30 Media-Server kernel: ata15.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 May 29 22:20:30 Media-Server kernel: res 01/04:ff:00:00:00/00:00:00:00:00/40 Emask 0x3 (HSM violation) May 29 22:20:30 Media-Server kernel: ata15.00: status: { ERR } May 29 22:20:30 Media-Server kernel: ata15.00: error: { ABRT } May 29 22:20:30 Media-Server kernel: ata15: hard resetting link May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800. May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001 May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800. May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1081 May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800. May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x10000 May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0] May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 0 attach dev info is 0 May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 0 attach sas addr is 0 May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 0 byte dmaded. May 29 22:20:30 Media-Server kernel: sas: sas_form_port: phy0 belongs to port0 already(1)! May 29 22:20:32 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 May 29 22:20:32 Media-Server kernel: sas: sas_ata_hard_reset: Found ATA device. May 29 22:20:32 Media-Server kernel: ata15.00: configured for UDMA/133 May 29 22:20:32 Media-Server kernel: ata15: EH complete May 29 22:20:32 Media-Server kernel: ata16: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata17: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata18: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata19: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata20: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata21: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: ata22: sas eh calling libata port error handler May 29 22:20:32 Media-Server kernel: sas: --- Exit sas_scsi_recover_host May 29 22:20:52 Media-Server kernel: sas: sas_ata_task_done: SAS error 2 Hardware: NORCO RPC-4224 4U Case SUPERMICRO MBD-X8SIA-F-O w/ Pentium G6950 4x Crucial 4GB 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1333 (16GB) 2x AOC-SASLP-MV8 No Plug-ins 13 Disks (varying size), Parity Drive is 3TB (disabled), all connected to the SASLP cards /dev/md10 932G 496G 437G 54% /mnt/disk10 /dev/md9 932G 886G 46G 96% /mnt/disk9 /dev/md6 1.9T 378G 1.5T 21% /mnt/disk6 /dev/md11 1.4T 1.4T 39G 98% /mnt/disk11 /dev/md7 1.9T 1.8T 73G 97% /mnt/disk7 /dev/md8 1.9T 1.8T 114G 94% /mnt/disk8 /dev/md1 1.9T 1.8T 56G 98% /mnt/disk1 /dev/md5 1.9T 366G 1.5T 20% /mnt/disk5 /dev/md3 1.9T 1.8T 55G 98% /mnt/disk3 /dev/md12 1.4T 1.3T 91G 94% /mnt/disk12 /dev/md4 1.9T 1.8T 67G 97% /mnt/disk4 /dev/md2 1.9T 1.8T 102G 95% /mnt/disk2 /dev/md13 932G 614G 319G 66% /mnt/disk13 Any help would be greatly appreciated. BTW, can I just go back to 4.7 and just use the 3TB as a 2TB drive? - hm unRAID-Logs.zip Quote Link to comment
RobJ Posted May 31, 2012 Share Posted May 31, 2012 It looks to me like a bug in the SAS module. The disk issue occurs immediately after some task confusion and an unhandled error, so may be a consequence of the confused state of the module. This of course is not very good news. If it really is a bug, there is not much Tom can do immediately, and you will have to wait for a future mvsas upgrade. It's possible though, that there is some incompatibility between your 3TB drive and the current mvsas module. Look for a firmware upgrade for that card. You mentioned that it locks up, perhaps locks up emhttp, but this syslog seems to show that the underlying UnRAID engine is still operating OK, with a duplicate reported more than 6 hours later. You might want to assume that UnRAID may not be accessible through the Web, but still may be safely shut down through Ctl-Alt-Del or the power button. Quote Link to comment
cyrnel Posted May 31, 2012 Share Posted May 31, 2012 This looks too familiar. Can you give us a quick view of processes? (top -n 1) If you see zombies, what are they? (ps -el | grep Z) Quote Link to comment
hmatos Posted June 1, 2012 Author Share Posted June 1, 2012 It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards? BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares. Quote Link to comment
limetech Posted June 1, 2012 Share Posted June 1, 2012 It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards? BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares. How confident are you that -beta11 works, and -beta12 and everything after does not recover correctly from these driver reported errors? In other words, are you sure -beta12 introduced a regression (i.e., something changed where this problem appeared when it didn't before)? Quote Link to comment
moose Posted June 2, 2012 Share Posted June 2, 2012 I know RobJ mentioned the possibility of a firmware update, but just "for the record" what firmware revision are you running on the AOC-SASLP-MV8 cards? Quote Link to comment
hmatos Posted June 2, 2012 Author Share Posted June 2, 2012 I'm not too confident - I'm pretty sure I already had Beta 12 installed and it did the same thing. I do know that RC2 and RC3 definitely caused the same issue. It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards? BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares. How confident are you that -beta11 works, and -beta12 and everything after does not recover correctly from these driver reported errors? In other words, are you sure -beta12 introduced a regression (i.e., something changed where this problem appeared when it didn't before)? Quote Link to comment
hmatos Posted June 2, 2012 Author Share Posted June 2, 2012 I believe it's .21 -- I can't bring my array down right now, but the moment I do I'll respond with the exact version. I know RobJ mentioned the possibility of a firmware update, but just "for the record" what firmware revision are you running on the AOC-SASLP-MV8 cards? Quote Link to comment
momoz Posted June 4, 2012 Share Posted June 4, 2012 Since RC3 I have the same thing (see below). It causes unraid to be totally unresponsive from the web-gui and SHARES but I can telnet in and see the syslog. I have the same card: AOC-SASLP-MV8 Jun 4 00:59:38 unraid kernel: mdcmd (119): check CORRECT Jun 4 00:59:38 unraid kernel: md: recovery thread woken up ... Jun 4 00:59:38 unraid kernel: md: recovery thread checking parity... Jun 4 00:59:39 unraid kernel: md: using 1536k window, over a total of 2930266532 blocks. Jun 4 02:16:29 unraid kernel: sas: command 0xeebeb3c0, task 0xf1ab0140, timed out: BLK_EH_NOT_HANDLED Jun 4 02:16:29 unraid kernel: sas: Enter sas_scsi_recover_host Jun 4 02:16:29 unraid kernel: sas: trying to find task 0xf1ab0140 Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: aborting task 0xf1ab0140 Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7580000 task=f1ab0140 slot=f759160c slot_idx=x1 Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: querying task 0xf1ab0140 Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: task 0xf1ab0140 failed to abort Jun 4 02:16:29 unraid kernel: sas: task 0xf1ab0140 is not at LU: I_T recover Jun 4 02:16:29 unraid kernel: sas: I_T nexus reset for dev 0600000000000000 Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x89800. Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x1001 Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy6 Unplug Notice Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x199800. Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x1081 Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x199800. Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x10000 Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[6] Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 6 attach dev info is 0 Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 6 attach sas addr is 6 Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 6 byte dmaded. Jun 4 02:16:30 unraid kernel: sas: sas_form_port: phy6 belongs to port5 already(1)! Jun 4 02:16:32 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[5]:rc= 0 Jun 4 02:16:32 unraid kernel: sas: I_T 0600000000000000 recovered Jun 4 02:16:32 unraid kernel: sas: sas_ata_task_done: SAS error 8d Jun 4 02:16:32 unraid kernel: ata9: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata10: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata11: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata12: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata13: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata14: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 t0 Jun 4 02:16:32 unraid kernel: ata14.00: failed command: READ FPDMA QUEUED Jun 4 02:16:32 unraid kernel: ata14.00: cmd 60/f8:00:e7:bd:d7/01:00:2d:00:00/40 tag 0 ncq 258048 in Jun 4 02:16:32 unraid kernel: res 41/40:00:3a:be:d7/00:00:2d:00:00/40 Emask 0x409 (media error) <F> Jun 4 02:16:32 unraid kernel: ata14.00: status: { DRDY ERR } Jun 4 02:16:32 unraid kernel: ata14.00: error: { UNC } Jun 4 02:16:32 unraid kernel: ata14.00: configured for UDMA/133 Jun 4 02:16:32 unraid kernel: ata14: EH complete Jun 4 02:16:32 unraid kernel: ata15: sas eh calling libata port error handler Jun 4 02:16:32 unraid kernel: sas: --- Exit sas_scsi_recover_host Jun 4 02:16:32 unraid kernel: sas: sas_ata_task_done: SAS error 2 Quote Link to comment
chickensoup Posted June 6, 2012 Share Posted June 6, 2012 Since RC3 I have the same thing (see below). It causes unraid to be totally unresponsive from the web-gui and SHARES but I can telnet in and see the syslog. I'm not sure if it will be required but to save the wait, could you please post the full syslog? cp /var/log/syslog /boot/syslog.txt Quote Link to comment
limetech Posted June 6, 2012 Share Posted June 6, 2012 I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system. Quote Link to comment
MyKroFt Posted June 7, 2012 Share Posted June 7, 2012 am waiting also, i mainly get this error in the 1st 10% of my parity check 50% of the time. Last time - 2 nights ago, it was trying to spindown disk6 and got the error and redballed my drive. Had to remove it, readd it and let parity that finished correctly the night before rebuild the drive. This has happened 3 times in the last 2-4 months.... Myk Quote Link to comment
poopiepants Posted June 16, 2012 Share Posted June 16, 2012 Same problem here when I was copying files. Quote Link to comment
whiteatom Posted June 16, 2012 Share Posted June 16, 2012 same issue.. How do we check the firmware version on the MV8 card? Quote Link to comment
cyrnel Posted June 16, 2012 Share Posted June 16, 2012 The card will display its BIOS version at boot, just before it scans for and spins up drives. The most recent AOC-SASLP-MV8 fw is 3.1.0.21 which I'm using but I still have the BLK_EH_NOT_HANDLED problem sometimes. It happened less after I increased my sleep timers and it hasn't happened at all since I disabled drive sleeping. Would be interesting to see if others see the same effect from changing their drive sleep timers. I'd love to test the alternate rc4 for us but I'm very close to just swapping in an LSI card. I have several lying around and the problems with those seem to have been resolved. Quote Link to comment
MyKroFt Posted June 16, 2012 Share Posted June 16, 2012 I have 3.1.0.21 as well on 3 cards, really dont want to spend the $$ to go so something else - still waiting on RC with update mvsas drivers to test.... Myk Quote Link to comment
whiteatom Posted June 18, 2012 Share Posted June 18, 2012 Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog... Jun 18 13:36:20 knox kernel: sas: command 0xf2ce0300, task 0xf272da40, timed out: BLK_EH_NOT_HANDLED Quote Link to comment
cyrnel Posted June 18, 2012 Share Posted June 18, 2012 Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog... Yes, though it defeats one of unraid's main benefits. In my case the failures seemed to follow this sequence: 1. system fine, I leave 2. BLK_EH_NOT_HANDLED 3. normal timer-based drive spin down 4. I'm back. The first SMB access times out. hdparm will have gone zombie, presumably when waking drives. Most drive access blocked. Telnet still functional. My rc4 system has been up since the 6th. In that time syslog shows two BLK_EH_NOT_HANDLED errors but the system has remained up. I don't know if it's luck, these errors have been recoverable, or because drives aren't being spun down/up. And I don't have enough failures saved to be certain of the sequence. When the drives went away previously I didn't make the effort to copy syslog to the telnet session. Quote Link to comment
whiteatom Posted June 19, 2012 Share Posted June 19, 2012 Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog... Yes, though it defeats one of unraid's main benefits. In my case the failures seemed to follow this sequence: 1. system fine, I leave 2. BLK_EH_NOT_HANDLED 3. normal timer-based drive spin down 4. I'm back. The first SMB access times out. hdparm will have gone zombie, presumably when waking drives. Most drive access blocked. Telnet still functional. My rc4 system has been up since the 6th. In that time syslog shows two BLK_EH_NOT_HANDLED errors but the system has remained up. I don't know if it's luck, these errors have been recoverable, or because drives aren't being spun down/up. And I don't have enough failures saved to be certain of the sequence. When the drives went away previously I didn't make the effort to copy syslog to the telnet session. Exactly the same issue.. after the error, the system works perfectly, until I ask for a drive to be spun up - then the shares drop of the network and it all goes south. As long as I don't touch an array drive in telnet, it continues to work - although I have to use the IP to get in as the host names seems to stop working. I also forgot to copy a log, but there was NOTHING in it.. I swear.. just the BLK_EH_NOT_HANDLED and the spin down. I have gone back to b14 just now. I didn't see this issue in 75 days of running b14, but I'm getting it every 2-3 days now. I can't prove it appeared since then, but b14 has been the most stable for me so far. whiteatom UPDATE: 48 hours of uptime - no errors yet. Quote Link to comment
TheWombat Posted June 21, 2012 Share Posted June 21, 2012 I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system. Any update on when this special release is being posted? Or did I miss the notification? thanks Alex Quote Link to comment
chickensoup Posted June 21, 2012 Share Posted June 21, 2012 I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system. Any update on when this special release is being posted? Or did I miss the notification? thanks Alex I think Tom was working on this release when he started having site/server issues, i'm sure he will bump this thread once it is available. I have been keeping an eye on the announcement board and the RC4 thread so I will let you know if I see it pop up. Quote Link to comment
whiteatom Posted June 22, 2012 Share Posted June 22, 2012 Few more days, and no errors at all. It sure seems like this issue is restricted to one of the builds after b14. Quote Link to comment
cyrnel Posted June 23, 2012 Share Posted June 23, 2012 Likewise, no problems here with rc4 with all my drives always spinning. I have a BLK_EH_NOT_HANDLED but nothing bad happened. The next activity was mover several hours later. System's been up since the 6th. sas_scsi_recover_host took 4 seconds to complete. Can't help but wonder what's happening with interrupts during that time, and what happens if hdparm or anything drive-related fires. Jun 18 01:40:27 Tower1 kernel: sas: command 0xf2654540, task 0xf0237680, timed out: BLK_EH_NOT_HANDLED (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: Enter sas_scsi_recover_host (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: trying to find task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: aborting task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f74a0000 task=f0237680 slot=f74b1640 slot_idx=x2 (System) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: querying task 0xf0237680 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 (System) Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: task 0xf0237680 failed to abort (Minor Issues) Jun 18 01:40:27 Tower1 kernel: sas: task 0xf0237680 is not at LU: I_T recover (Drive related) Jun 18 01:40:27 Tower1 kernel: sas: I_T nexus reset for dev 0300000000000000 (Drive related) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001001 (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1011081 (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System) Jun 18 01:40:29 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System) Jun 18 01:40:29 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System) Jun 18 01:40:29 Tower1 kernel: sas: I_T 0300000000000000 recovered (Drive related) Jun 18 01:40:29 Tower1 kernel: sas: sas_ata_task_done: SAS error 8d (Errors) Jun 18 01:40:29 Tower1 kernel: ata7: sas eh calling libata port error handler (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0 (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: failed command: CHECK POWER MODE (Minor Issues) Jun 18 01:40:29 Tower1 kernel: ata7.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 (Drive related) Jun 18 01:40:29 Tower1 kernel: res 01/04:04:b8:07:a7/00:00:16:00:00/40 Emask 0x3 (HSM violation) (Errors) Jun 18 01:40:29 Tower1 kernel: ata7.00: status: { ERR } (Drive related) Jun 18 01:40:29 Tower1 kernel: ata7.00: error: { ABRT } (Errors) Jun 18 01:40:29 Tower1 kernel: ata7: hard resetting link (Minor Issues) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x11081 (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System) Jun 18 01:40:31 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System) Jun 18 01:40:31 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related) Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System) Jun 18 01:40:31 Tower1 kernel: sas: sas_ata_hard_reset: Found ATA device. (Drive related) Jun 18 01:40:31 Tower1 kernel: ata7.00: configured for UDMA/133 (Drive related) Jun 18 01:40:31 Tower1 kernel: ata7: EH complete (Drive related) Jun 18 01:40:31 Tower1 kernel: ata8: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata9: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata10: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: ata11: sas eh calling libata port error handler (Errors) Jun 18 01:40:31 Tower1 kernel: sas: --- Exit sas_scsi_recover_host (Drive related) Quote Link to comment
mtruffa Posted June 25, 2012 Share Posted June 25, 2012 I was having the same problem. I tried every release back to b11. I have been on that now for 3 days without a crash or the error. It is something then that is causing the problem. I am going to stay here (b11) for awhile until the problem can be narrowed down. Mike Quote Link to comment
SuperW2 Posted June 25, 2012 Share Posted June 25, 2012 I think i'm seeing the same or similar thing... I've had to hard power twice in the last 3 days, the shares become un-accessible, cannot access the GUI. I can Telnet, so I know the Network is good, but cannot Shutdown/Poweroff, etc. Full Syslog attached, but this was the last few lines before I had to restart. Jun 24 22:29:51 media kernel: sas: command 0xed68ccc0, task 0xf2c4a000, timed out: BLK_EH_NOT_HANDLED Jun 24 22:29:51 media kernel: sas: Enter sas_scsi_recover_host Jun 24 22:29:51 media kernel: sas: trying to find task 0xf2c4a000 Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: aborting task 0xf2c4a000 Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: querying task 0xf2c4a000 Jun 24 22:29:51 media kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: task 0xf2c4a000 failed to abort Jun 24 22:29:51 media kernel: sas: task 0xf2c4a000 is not at LU: I_T recover Jun 24 22:29:51 media kernel: sas: I_T nexus reset for dev 0200000000000000 Jun 24 22:29:51 media kernel: sas: sas_form_port: phy2 belongs to port2 already(1)! Jun 24 22:29:53 media kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[2]:rc= 0 Jun 24 22:29:53 media kernel: sas: I_T 0200000000000000 recovered Jun 24 22:29:53 media kernel: sas: sas_ata_task_done: SAS error 8d Jun 24 22:29:53 media kernel: ata13: sas eh calling libata port error handler Jun 24 22:29:53 media kernel: ata14: sas eh calling libata port error handler Jun 24 22:29:53 media kernel: ata15: sas eh calling libata port error handler Jun 24 22:29:53 media kernel: ata15.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0 Jun 24 22:29:53 media kernel: ata15.00: failed command: CHECK POWER MODE Jun 24 22:29:53 media kernel: ata15.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 Jun 24 22:29:53 media kernel: res 01/04:ff:00:00:00/00:00:00:00:00/40 Emask 0x3 (HSM violation) Jun 24 22:29:53 media kernel: ata15.00: status: { ERR } Jun 24 22:29:53 media kernel: ata15.00: error: { ABRT } Jun 24 22:29:53 media kernel: ata15: hard resetting link Jun 24 22:29:53 media kernel: sas: sas_form_port: phy2 belongs to port2 already(1)! Jun 24 22:29:56 media kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[2]:rc= 0 Jun 24 22:29:56 media kernel: sas: sas_ata_hard_reset: Found ATA device. Jun 24 22:29:56 media kernel: ata15.00: configured for UDMA/133 Jun 24 22:29:56 media kernel: ata15: EH complete Jun 24 22:29:56 media kernel: ata16: sas eh calling libata port error handler Jun 24 22:29:56 media kernel: ata17: sas eh calling libata port error handler Jun 24 22:29:56 media kernel: ata18: sas eh calling libata port error handler Jun 24 22:29:56 media kernel: ata19: sas eh calling libata port error handler Jun 24 22:29:56 media kernel: ata20: sas eh calling libata port error handler Jun 24 22:29:56 media kernel: sas: --- Exit sas_scsi_recover_host Jun 24 22:30:56 media kernel: sas: sas_ata_task_done: SAS error 2 syslog.txt Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.