5.0RC3 - BLK_EH_NOT_HANDLED causes freeze

hmatos · May 30, 2012

I have been running great under all versions of 5.0 (currently at RC3) until a recent hardware swap. I'm hoping someone can point me to what's causing the issue. I recently moved a functioning array into the configuration below. I've only been able to successfully generate parity once. I've run into a situation where running preclear_disk.sh on two drives which smartctl returned pending errors on would cause the system to lock up (except for the unix prompt). I've since given up on those two drives and removed them from the array (plugged in, but not part of array). The error below has been a constant throughout the build-out process; result is always the same, UnRAID locks up (emhttp) and the only thing I can do is manually power down the server.

May 29 22:20:27 Media-Server kernel: sas: command 0xec464e40, task 0xec4a72c0, timed out: BLK_EH_NOT_HANDLED

May 29 22:20:27 Media-Server kernel: sas: Enter sas_scsi_recover_host

May 29 22:20:27 Media-Server kernel: sas: trying to find task 0xec4a72c0

May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: aborting task 0xec4a72c0

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7680000 task=ec4a72c0 slot=f76916a8 slot_idx=x4

May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: querying task 0xec4a72c0

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5

May 29 22:20:27 Media-Server kernel: sas: sas_scsi_find_task: task 0xec4a72c0 failed to abort

May 29 22:20:27 Media-Server kernel: sas: task 0xec4a72c0 is not at LU: I_T recover

May 29 22:20:27 Media-Server kernel: sas: I_T nexus reset for dev 0000000000000000

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800.

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800.

May 29 22:20:27 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1081

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800.

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x10000

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0]

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 0 attach dev info is 0

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 0 attach sas addr is 0

May 29 22:20:28 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 0 byte dmaded.

May 29 22:20:28 Media-Server kernel: sas: sas_form_port: phy0 belongs to port0 already(1)!

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0

May 29 22:20:30 Media-Server kernel: sas: I_T 0000000000000000 recovered

May 29 22:20:30 Media-Server kernel: sas: sas_ata_task_done: SAS error 8d

May 29 22:20:30 Media-Server kernel: ata15: sas eh calling libata port error handler

May 29 22:20:30 Media-Server kernel: ata15.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0

May 29 22:20:30 Media-Server kernel: ata15.00: failed command: CHECK POWER MODE

May 29 22:20:30 Media-Server kernel: ata15.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0

May 29 22:20:30 Media-Server kernel: res 01/04:ff:00:00:00/00:00:00:00:00/40 Emask 0x3 (HSM violation)

May 29 22:20:30 Media-Server kernel: ata15.00: status: { ERR }

May 29 22:20:30 Media-Server kernel: ata15.00: error: { ABRT }

May 29 22:20:30 Media-Server kernel: ata15: hard resetting link

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x89800.

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1001

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy0 Unplug Notice

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800.

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x1081

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 0 ctrl sts=0x199800.

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 0 irq sts = 0x10000

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[0]

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 0 attach dev info is 0

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 0 attach sas addr is 0

May 29 22:20:30 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 0 byte dmaded.

May 29 22:20:30 Media-Server kernel: sas: sas_form_port: phy0 belongs to port0 already(1)!

May 29 22:20:32 Media-Server kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0

May 29 22:20:32 Media-Server kernel: sas: sas_ata_hard_reset: Found ATA device.

May 29 22:20:32 Media-Server kernel: ata15.00: configured for UDMA/133

May 29 22:20:32 Media-Server kernel: ata15: EH complete

May 29 22:20:32 Media-Server kernel: ata16: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata17: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata18: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata19: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata20: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata21: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: ata22: sas eh calling libata port error handler

May 29 22:20:32 Media-Server kernel: sas: --- Exit sas_scsi_recover_host

May 29 22:20:52 Media-Server kernel: sas: sas_ata_task_done: SAS error 2

Hardware:

NORCO RPC-4224 4U Case

SUPERMICRO MBD-X8SIA-F-O w/ Pentium G6950

4x Crucial 4GB 240-Pin DDR3 SDRAM ECC Unbuffered DDR3 1333 (16GB)

2x AOC-SASLP-MV8

No Plug-ins

13 Disks (varying size), Parity Drive is 3TB (disabled), all connected to the SASLP cards

/dev/md10 932G 496G 437G 54% /mnt/disk10

/dev/md9 932G 886G 46G 96% /mnt/disk9

/dev/md6 1.9T 378G 1.5T 21% /mnt/disk6

/dev/md11 1.4T 1.4T 39G 98% /mnt/disk11

/dev/md7 1.9T 1.8T 73G 97% /mnt/disk7

/dev/md8 1.9T 1.8T 114G 94% /mnt/disk8

/dev/md1 1.9T 1.8T 56G 98% /mnt/disk1

/dev/md5 1.9T 366G 1.5T 20% /mnt/disk5

/dev/md3 1.9T 1.8T 55G 98% /mnt/disk3

/dev/md12 1.4T 1.3T 91G 94% /mnt/disk12

/dev/md4 1.9T 1.8T 67G 97% /mnt/disk4

/dev/md2 1.9T 1.8T 102G 95% /mnt/disk2

/dev/md13 932G 614G 319G 66% /mnt/disk13

Any help would be greatly appreciated. BTW, can I just go back to 4.7 and just use the 3TB as a 2TB drive?

- hm

unRAID-Logs.zip

RobJ · May 31, 2012

It looks to me like a bug in the SAS module. The disk issue occurs immediately after some task confusion and an unhandled error, so may be a consequence of the confused state of the module. This of course is not very good news. If it really is a bug, there is not much Tom can do immediately, and you will have to wait for a future mvsas upgrade. It's possible though, that there is some incompatibility between your 3TB drive and the current mvsas module. Look for a firmware upgrade for that card.

You mentioned that it locks up, perhaps locks up emhttp, but this syslog seems to show that the underlying UnRAID engine is still operating OK, with a duplicate reported more than 6 hours later. You might want to assume that UnRAID may not be accessible through the Web, but still may be safely shut down through Ctl-Alt-Del or the power button.

cyrnel · May 31, 2012

This looks too familiar.

Can you give us a quick view of processes? (top -n 1) If you see zombies, what are they? (ps -el | grep Z)

hmatos · June 1, 2012

It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards?

BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares.

limetech · June 1, 2012

It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards?

BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares.

How confident are you that -beta11 works, and -beta12 and everything after does not recover correctly from these driver reported errors? In other words, are you sure -beta12 introduced a regression (i.e., something changed where this problem appeared when it didn't before)?

moose · June 2, 2012

I know RobJ mentioned the possibility of a firmware update, but just "for the record" what firmware revision are you running on the AOC-SASLP-MV8 cards?

hmatos · June 2, 2012

I'm not too confident - I'm pretty sure I already had Beta 12 installed and it did the same thing. I do know that RC2 and RC3 definitely caused the same issue.

It seems to be the known issue where you have to go back to 5.0 Beta 11. Going to this version finally fixed the problem. I guess the new kernel doesn't like the SAS cards?

BTW, I can login to the system and execute just about any command. It will NOT let me do anything like umount or reboot or shutdown; it just hangs. The clients also lose die while trying to access any of the shares.

How confident are you that -beta11 works, and -beta12 and everything after does not recover correctly from these driver reported errors? In other words, are you sure -beta12 introduced a regression (i.e., something changed where this problem appeared when it didn't before)?

hmatos · June 2, 2012

I believe it's .21 -- I can't bring my array down right now, but the moment I do I'll respond with the exact version.

I know RobJ mentioned the possibility of a firmware update, but just "for the record" what firmware revision are you running on the AOC-SASLP-MV8 cards?

momoz · June 4, 2012

Since RC3 I have the same thing (see below). It causes unraid to be totally unresponsive from the web-gui and SHARES but I can telnet in and see the syslog.

I have the same card: AOC-SASLP-MV8

Jun 4 00:59:38 unraid kernel: mdcmd (119): check CORRECT

Jun 4 00:59:38 unraid kernel: md: recovery thread woken up ...

Jun 4 00:59:38 unraid kernel: md: recovery thread checking parity...

Jun 4 00:59:39 unraid kernel: md: using 1536k window, over a total of 2930266532 blocks.

Jun 4 02:16:29 unraid kernel: sas: command 0xeebeb3c0, task 0xf1ab0140, timed out: BLK_EH_NOT_HANDLED

Jun 4 02:16:29 unraid kernel: sas: Enter sas_scsi_recover_host

Jun 4 02:16:29 unraid kernel: sas: trying to find task 0xf1ab0140

Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: aborting task 0xf1ab0140

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f7580000 task=f1ab0140 slot=f759160c slot_idx=x1

Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: querying task 0xf1ab0140

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5

Jun 4 02:16:29 unraid kernel: sas: sas_scsi_find_task: task 0xf1ab0140 failed to abort

Jun 4 02:16:29 unraid kernel: sas: task 0xf1ab0140 is not at LU: I_T recover

Jun 4 02:16:29 unraid kernel: sas: I_T nexus reset for dev 0600000000000000

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x89800.

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x1001

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy6 Unplug Notice

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x199800.

Jun 4 02:16:29 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x1081

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 6 ctrl sts=0x199800.

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 6 irq sts = 0x10000

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[6]

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 6 attach dev info is 0

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 6 attach sas addr is 6

Jun 4 02:16:30 unraid kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 6 byte dmaded.

Jun 4 02:16:30 unraid kernel: sas: sas_form_port: phy6 belongs to port5 already(1)!

Jun 4 02:16:32 unraid kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[5]:rc= 0

Jun 4 02:16:32 unraid kernel: sas: I_T 0600000000000000 recovered

Jun 4 02:16:32 unraid kernel: sas: sas_ata_task_done: SAS error 8d

Jun 4 02:16:32 unraid kernel: ata9: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata10: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata11: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata12: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata13: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata14: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: ata14.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 t0

Jun 4 02:16:32 unraid kernel: ata14.00: failed command: READ FPDMA QUEUED

Jun 4 02:16:32 unraid kernel: ata14.00: cmd 60/f8:00:e7:bd:d7/01:00:2d:00:00/40 tag 0 ncq 258048 in

Jun 4 02:16:32 unraid kernel: res 41/40:00:3a:be:d7/00:00:2d:00:00/40 Emask 0x409 (media error) <F>

Jun 4 02:16:32 unraid kernel: ata14.00: status: { DRDY ERR }

Jun 4 02:16:32 unraid kernel: ata14.00: error: { UNC }

Jun 4 02:16:32 unraid kernel: ata14.00: configured for UDMA/133

Jun 4 02:16:32 unraid kernel: ata14: EH complete

Jun 4 02:16:32 unraid kernel: ata15: sas eh calling libata port error handler

Jun 4 02:16:32 unraid kernel: sas: --- Exit sas_scsi_recover_host

Jun 4 02:16:32 unraid kernel: sas: sas_ata_task_done: SAS error 2

chickensoup · June 6, 2012

Since RC3 I have the same thing (see below). It causes unraid to be totally unresponsive from the web-gui and SHARES but I can telnet in and see the syslog.

I'm not sure if it will be required but to save the wait, could you please post the full syslog? cp /var/log/syslog /boot/syslog.txt

limetech · June 6, 2012

I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system.

MyKroFt · June 7, 2012

am waiting also, i mainly get this error in the 1st 10% of my parity check 50% of the time. Last time - 2 nights ago, it was trying to spindown disk6 and got the error and redballed my drive. Had to remove it, readd it and let parity that finished correctly the night before rebuild the drive. This has happened 3 times in the last 2-4 months....

Myk

poopiepants · June 16, 2012

Same problem here when I was copying files.

whiteatom · June 16, 2012

same issue.. How do we check the firmware version on the MV8 card?

cyrnel · June 16, 2012

The card will display its BIOS version at boot, just before it scans for and spins up drives. The most recent AOC-SASLP-MV8 fw is 3.1.0.21 which I'm using but I still have the BLK_EH_NOT_HANDLED problem sometimes. It happened less after I increased my sleep timers and it hasn't happened at all since I disabled drive sleeping. Would be interesting to see if others see the same effect from changing their drive sleep timers.

I'd love to test the alternate rc4 for us but I'm very close to just swapping in an LSI card. I have several lying around and the problems with those seem to have been resolved.

MyKroFt · June 16, 2012

I have 3.1.0.21 as well on 3 cards, really dont want to spend the $$ to go so something else - still waiting on RC with update mvsas drivers to test....

Myk

whiteatom · June 18, 2012

Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog...

Jun 18 13:36:20 knox kernel: sas: command 0xf2ce0300, task 0xf272da40, timed out: BLK_EH_NOT_HANDLED

cyrnel · June 18, 2012

Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog...

Yes, though it defeats one of unraid's main benefits. In my case the failures seemed to follow this sequence:

1. system fine, I leave

2. BLK_EH_NOT_HANDLED

3. normal timer-based drive spin down

4. I'm back. The first SMB access times out. hdparm will have gone zombie, presumably when waking drives. Most drive access blocked. Telnet still functional.

My rc4 system has been up since the 6th. In that time syslog shows two BLK_EH_NOT_HANDLED errors but the system has remained up. I don't know if it's luck, these errors have been recoverable, or because drives aren't being spun down/up. And I don't have enough failures saved to be certain of the sequence. When the drives went away previously I didn't make the effort to copy syslog to the telnet session.

whiteatom · June 19, 2012

Yeah 3.1.0.21 as well. When you say disable drive sleeping.. do you mean not spinning them down? I'd certainly be willing to try that. I don't remember this problem when I was on b14. Now I'm seeing the same issue in my syslog...

Yes, though it defeats one of unraid's main benefits. In my case the failures seemed to follow this sequence:

1. system fine, I leave

2. BLK_EH_NOT_HANDLED

3. normal timer-based drive spin down

4. I'm back. The first SMB access times out. hdparm will have gone zombie, presumably when waking drives. Most drive access blocked. Telnet still functional.

My rc4 system has been up since the 6th. In that time syslog shows two BLK_EH_NOT_HANDLED errors but the system has remained up. I don't know if it's luck, these errors have been recoverable, or because drives aren't being spun down/up. And I don't have enough failures saved to be certain of the sequence. When the drives went away previously I didn't make the effort to copy syslog to the telnet session.

Exactly the same issue.. after the error, the system works perfectly, until I ask for a drive to be spun up - then the shares drop of the network and it all goes south. As long as I don't touch an array drive in telnet, it continues to work - although I have to use the IP to get in as the host names seems to stop working. I also forgot to copy a log, but there was NOTHING in it.. I swear.. just the BLK_EH_NOT_HANDLED and the spin down.

I have gone back to b14 just now. I didn't see this issue in 75 days of running b14, but I'm getting it every 2-3 days now. I can't prove it appeared since then, but b14 has been the most stable for me so far.

whiteatom

UPDATE: 48 hours of uptime - no errors yet.

TheWombat · June 21, 2012

I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system.

Any update on when this special release is being posted? Or did I miss the notification?

thanks

Alex

chickensoup · June 21, 2012

I will be posting a special release called 5.0-rc4-scst-1, which uses a different mvsas driver. Watch for that announcement and then let's see how it behaves in your system.

Any update on when this special release is being posted? Or did I miss the notification?

thanks

Alex

I think Tom was working on this release when he started having site/server issues, i'm sure he will bump this thread once it is available. I have been keeping an eye on the announcement board and the RC4 thread so I will let you know if I see it pop up.

whiteatom · June 22, 2012

Few more days, and no errors at all. It sure seems like this issue is restricted to one of the builds after b14.

cyrnel · June 23, 2012

Likewise, no problems here with rc4 with all my drives always spinning. I have a BLK_EH_NOT_HANDLED but nothing bad happened. The next activity was mover several hours later. System's been up since the 6th.

sas_scsi_recover_host took 4 seconds to complete.

Can't help but wonder what's happening with interrupts during that time, and what happens if hdparm or anything drive-related fires.

Jun 18 01:40:27 Tower1 kernel: sas: command 0xf2654540, task 0xf0237680, timed out: BLK_EH_NOT_HANDLED (Drive related)
Jun 18 01:40:27 Tower1 kernel: sas: Enter sas_scsi_recover_host (Drive related)
Jun 18 01:40:27 Tower1 kernel: sas: trying to find task 0xf0237680 (Drive related)
Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: aborting task 0xf0237680 (Drive related)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1818:<7>mv_abort_task() mvi=f74a0000 task=f0237680 slot=f74b1640 slot_idx=x2 (System)
Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: querying task 0xf0237680 (Drive related)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5 (System)
Jun 18 01:40:27 Tower1 kernel: sas: sas_scsi_find_task: task 0xf0237680 failed to abort (Minor Issues)
Jun 18 01:40:27 Tower1 kernel: sas: task 0xf0237680 is not at LU: I_T recover (Drive related)
Jun 18 01:40:27 Tower1 kernel: sas: I_T nexus reset for dev 0300000000000000 (Drive related)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001001 (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1011081 (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System)
Jun 18 01:40:27 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System)
Jun 18 01:40:29 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System)
Jun 18 01:40:29 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System)
Jun 18 01:40:29 Tower1 kernel: sas: I_T 0300000000000000 recovered (Drive related)
Jun 18 01:40:29 Tower1 kernel: sas: sas_ata_task_done: SAS error 8d (Errors)
Jun 18 01:40:29 Tower1 kernel: ata7: sas eh calling libata port error handler (Errors)
Jun 18 01:40:29 Tower1 kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0 (Errors)
Jun 18 01:40:29 Tower1 kernel: ata7.00: failed command: CHECK POWER MODE (Minor Issues)
Jun 18 01:40:29 Tower1 kernel: ata7.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0 (Drive related)
Jun 18 01:40:29 Tower1 kernel:          res 01/04:04:b8:07:a7/00:00:16:00:00/40 Emask 0x3 (HSM violation) (Errors)
Jun 18 01:40:29 Tower1 kernel: ata7.00: status: { ERR } (Drive related)
Jun 18 01:40:29 Tower1 kernel: ata7.00: error: { ABRT } (Errors)
Jun 18 01:40:29 Tower1 kernel: ata7: hard resetting link (Minor Issues)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x11081 (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System)
Jun 18 01:40:29 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2278:plugin interrupt but phy3 is gone (System)
Jun 18 01:40:31 Tower1 kernel: mvsas 0000:01:00.0: Phy3 : No sig fis (Drive related)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2139:phy3 Attached Device (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x89800. (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x1001 (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2226:phy3 Unplug Notice (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x81 (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2198:port 3 ctrl sts=0x199800. (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2200:Port 3 irq sts = 0x10000 (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 2253:notify plug in on phy[3] (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1338:port 3 attach dev info is 400 (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1340:port 3 attach sas addr is 3 (System)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 379:phy 3 byte dmaded. (System)
Jun 18 01:40:31 Tower1 kernel: sas: sas_form_port: phy3 belongs to port0 already(1)! (Drive related)
Jun 18 01:40:31 Tower1 kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[0]:rc= 0 (System)
Jun 18 01:40:31 Tower1 kernel: sas: sas_ata_hard_reset: Found ATA device. (Drive related)
Jun 18 01:40:31 Tower1 kernel: ata7.00: configured for UDMA/133 (Drive related)
Jun 18 01:40:31 Tower1 kernel: ata7: EH complete (Drive related)
Jun 18 01:40:31 Tower1 kernel: ata8: sas eh calling libata port error handler (Errors)
Jun 18 01:40:31 Tower1 kernel: ata9: sas eh calling libata port error handler (Errors)
Jun 18 01:40:31 Tower1 kernel: ata10: sas eh calling libata port error handler (Errors)
Jun 18 01:40:31 Tower1 kernel: ata11: sas eh calling libata port error handler (Errors)
Jun 18 01:40:31 Tower1 kernel: sas: --- Exit sas_scsi_recover_host (Drive related)

mtruffa · June 25, 2012

I was having the same problem. I tried every release back to b11. I have been on that now for 3 days without a crash or the error. It is something then that is causing the problem. I am going to stay here (b11) for awhile until the problem can be narrowed down.

Mike

SuperW2 · June 25, 2012

I think i'm seeing the same or similar thing... I've had to hard power twice in the last 3 days, the shares become un-accessible, cannot access the GUI. I can Telnet, so I know the Network is good, but cannot Shutdown/Poweroff, etc. Full Syslog attached, but this was the last few lines before I had to restart.

Jun 24 22:29:51 media kernel: sas: command 0xed68ccc0, task 0xf2c4a000, timed out: BLK_EH_NOT_HANDLED

Jun 24 22:29:51 media kernel: sas: Enter sas_scsi_recover_host

Jun 24 22:29:51 media kernel: sas: trying to find task 0xf2c4a000

Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: aborting task 0xf2c4a000

Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: querying task 0xf2c4a000

Jun 24 22:29:51 media kernel: drivers/scsi/mvsas/mv_sas.c 1747:mvs_query_task:rc= 5

Jun 24 22:29:51 media kernel: sas: sas_scsi_find_task: task 0xf2c4a000 failed to abort

Jun 24 22:29:51 media kernel: sas: task 0xf2c4a000 is not at LU: I_T recover

Jun 24 22:29:51 media kernel: sas: I_T nexus reset for dev 0200000000000000

Jun 24 22:29:51 media kernel: sas: sas_form_port: phy2 belongs to port2 already(1)!

Jun 24 22:29:53 media kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[2]:rc= 0

Jun 24 22:29:53 media kernel: sas: I_T 0200000000000000 recovered

Jun 24 22:29:53 media kernel: sas: sas_ata_task_done: SAS error 8d

Jun 24 22:29:53 media kernel: ata13: sas eh calling libata port error handler

Jun 24 22:29:53 media kernel: ata14: sas eh calling libata port error handler

Jun 24 22:29:53 media kernel: ata15: sas eh calling libata port error handler

Jun 24 22:29:53 media kernel: ata15.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 t0

Jun 24 22:29:53 media kernel: ata15.00: failed command: CHECK POWER MODE

Jun 24 22:29:53 media kernel: ata15.00: cmd e5/00:00:00:00:00/00:00:00:00:00/40 tag 0

Jun 24 22:29:53 media kernel: res 01/04:ff:00:00:00/00:00:00:00:00/40 Emask 0x3 (HSM violation)

Jun 24 22:29:53 media kernel: ata15.00: status: { ERR }

Jun 24 22:29:53 media kernel: ata15.00: error: { ABRT }

Jun 24 22:29:53 media kernel: ata15: hard resetting link

Jun 24 22:29:53 media kernel: sas: sas_form_port: phy2 belongs to port2 already(1)!

Jun 24 22:29:56 media kernel: drivers/scsi/mvsas/mv_sas.c 1701:mvs_I_T_nexus_reset for device[2]:rc= 0

Jun 24 22:29:56 media kernel: sas: sas_ata_hard_reset: Found ATA device.

Jun 24 22:29:56 media kernel: ata15.00: configured for UDMA/133

Jun 24 22:29:56 media kernel: ata15: EH complete

Jun 24 22:29:56 media kernel: ata16: sas eh calling libata port error handler

Jun 24 22:29:56 media kernel: ata17: sas eh calling libata port error handler

Jun 24 22:29:56 media kernel: ata18: sas eh calling libata port error handler

Jun 24 22:29:56 media kernel: ata19: sas eh calling libata port error handler

Jun 24 22:29:56 media kernel: ata20: sas eh calling libata port error handler

Jun 24 22:29:56 media kernel: sas: --- Exit sas_scsi_recover_host

Jun 24 22:30:56 media kernel: sas: sas_ata_task_done: SAS error 2

syslog.txt

5.0RC3 - BLK_EH_NOT_HANDLED causes freeze

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation