Is this a bug?

January 13, 201610 yr

My server is acting up and is basically unusable -- things hang and then can't be killed (even with -9) I've had this happen with innocuous things like chmod and lsof. I can't shutdown the system remotely since "shutdown -r now" hangs (after sending a broadcast that the system is going down). It just hangs and nothing else happens.

The web server is unresponsive.

When I telnet in, I can see the share data but get the 'hang' errors above when trying to do anything with some of it. In particular, anything that tries to access data written recently hangs (and takes the entire telnet session with it).

So, I found this in my syslog:

root@Tower:/var/log# grep "1\:0\:6" syslog*
syslog:Jan 12 23:04:48 Tower kernel: sd 1:0:6:0: [sdk] Synchronizing SCSI cache
syslog:Jan 12 23:04:49 Tower kernel: scsi target1:0:6: attempting target reset!
scmd(ffff880009fa8600)
syslog:Jan 12 23:04:49 Tower kernel: sd 1:0:6:0: [sdk] CDB: opcode=0x88 88 00 00
00 00 00 00 5c 57 88 00 00 00 80 00 00
syslog:Jan 12 23:04:49 Tower kernel: scsi target1:0:6: target reset: SUCCESS scm
d(ffff880009fa8600)
syslog:Jan 12 23:04:49 Tower kernel: sd 1:0:6:0: [sdk] CDB: opcode=0x88 88 00 00
00 00 00 00 5c 57 88 00 00 00 80 00 00
root@Tower:/var/log#

This is good (?)...well, in the sense that it explains some of my problem -- files older than this are ok but newer ones aren't.

It seems that one of my drives is dead. fdisk -l /dev/sdk returns nothing so I think the drive has just become non-responsive. I believe that what I'm seeing in "/mnt/disk4" is being reconstructed using parity information, but don't know how to know for sure.

Anyway, all I want to do now is shut down and reboot. I can't do it remotely and probably just need to hold down the power button. Assuming the problem is that a drive 'disappeared' due to failure or bad cable, shouldn't unraid deal more appropriately with such a problem? Is there a way other than "shutdown -r now" to try to remotely reboot? I'm running v6 (6.1-r3 I think).

PS: server uptime is 114 days, and nothing changed in that period.

PPS: In looking at the forums, this seems to be a fairly common problem, but potoentiall misdiagnosed/undiagnosed since unraid doesn't deal with this kind of hardware failure particularly well. Here's one: http://lime-technology.com/forum/index.php?topic=44305

Quote

January 20, 201610 yr

Author

This happened again. I am not physically near the server, and can't reboot it. Is there a way to unmount the unresponsive drive and try again?

Quote

January 20, 201610 yr

It looks like the controller has lost communication with the disk. There isn't much you can do remotely to fix that. It looks likely that a site visit is needed.

Regarding the link you gave, there is just no information given in that posting for anyone to be able to offer any suggestions.

Quote

January 20, 201610 yr

Author

I did an emergency reboot (remotely). I posted details in a separate thread here: http://lime-technology.com/forum/index.php?topic=45782.0

Quote

January 20, 201610 yr

Did it fix the problem?

Quote

February 10, 201610 yr

Author

Yes. The remote emergency reboot 'fixed' the problem. I was able to get the server back up and running -- disk read/write and remote access all worked as expected, but I do have a flaky drive. It stayed up for about 2-3 weeks, then the same error which I documented here happened again. I need to replace the drive, but getting access (even to a failing drive) was a lifesaver.

Quote

Is this a bug?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)