Jump to content

v6.1 rc5\6 Complete lockup after around half hour after power up


billington.mark

Recommended Posts

Hi Guys,

 

Having a strange lockup after about 12 hours of uptime with 6.1 rc6. Its as if the entire box just freezes. Network connectivity drops and the console doesn't accept keyboard input.

Had the same issue with rc5, but i put that down to a disk which had SMART issues being reported, so that disk has been removed, not made any difference though.

 

with the syslog getting overwritten on reboot, is there a way to view what the last entries were on the syslog from the previous session?

Currently got the syslog being "tail'd" to the console, so if it locks up again, i'll report back with the last few log messages.

 

ive plugged the usb into a windows box and checked for errors, no errors found. Memtest reports back as no memory faults either.

 

I've attached the diagnostics, but with this being from after a rebbot, im not sure if the logs will be much help?

 

unraid-diagnostics-20150829-0855.zip

Link to comment

Failure has just happened.

 

Last few lines on the tail from the syslog:

 

09:17:40 UnRAID Kernel: CE: hpet increaced min_delta_ns to 20115 nsec

09:17:50 UnRAID Kernel: CE: hpet increaced min_delta_ns to 30172 nsec

09:18:00 UnRAID Kernel: CE: hpet increaced min_delta_ns to 45258 nsec

09:18:04 UnRAID Kernel: CE: hpet increaced min_delta_ns to 67887 nsec

09:18:07 UnRAID Kernel: CE: hpet increaced min_delta_ns to 101830 nsec

09:21:04 UnRAID Kernel: CE: hpet increaced min_delta_ns to 152745 nsec

 

Console has frozen, wont accept keyboard input, however the cursor is still flashing on the screen (not sure if relevant)

 

EDIT: This occurred during a parity check after the system was rebooted from the previous failure. I'll reboot again, Cancel the parity check and report back with the syslog messages on failure to see if they are any different.

Link to comment

Next failure (no parity check running)...

 

12:41:00 unRAID kernel: mptscsih: ioc0: attempting task abort! (sc=ffff8802ead5b480)

12:41:00 unRAID kernel: sd 7:0:0:0: tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00

12:41:00 unRAID kernel: mptscsih: ioc0: WARNING - TaskMgmt type=1: IOC Not operational (0xffffffff)!

12:41:00 unRAID kernel: mptscsih: ioc0: WARNING - Issuing HardReset from mptscsih_IssueTaskMgmt!!

12:41:00 unRAID kernel: mptbase: ioc: initiating Recovery

12:41:00 unRAID kernel: mptbase: ioc0: WARNING - Unexpected dorrbell active!

12:41:00 unRAID kernel: mptbase: ioc0: ERROR - Failed to come READY after reset! IocState=f0000000

12:41:00 unRAID kernel: mptbase: ioc0: WARNING - ResetHistory bit failed to clear!

12:41:00 unRAID kernel: mptbase: ioc0: ERROR - Diagnostic Reset Failed! (ffffffffh)

12:41:00 unRAID kernel: mptbase: ioc0: WARNING - NOT READY WARNING!

12:41:00 unRAID kernel: mptbase: WARNING - (-1) cannot recover ioc0, doorbell=0xffffffff

12:41:00 unRAID kernel: mptscsih: ioc0: WARNING - TaskMgmt HardReset FAILED!!

12:41:00 unRAID kernel: mptscsih: ioc0: task abort: SUCCESS (rv=2002) (sc=ffff8802ead5b480)

 

Do I have a bad disk thats taking out the array? :/

Link to comment

Drives, no matter how bad, do not take out a system.  I can't think of any way a really bad drive could cause more than pauses, delays but not freezes.  Without looking at your diagnostics yet, your errors look like a bad controller or motherboard or bad interaction between devices/drivers/BIOS.  It's possible that a BIOS or firmware update could help.  But definitely nothing to do with any drive.

Link to comment

looking back in the syslog, 'ioc0' does seem to be my SAS Card (unRAID kernel: ioc0: LSISAS1068E B3: Capabilities={Initiator})

 

Ive reseated the card and all connections between the card and HDDs. So far its stayed up a lot longer than usual.

 

Out of interest, is there a way to get at a previous boot's syslog? in situations like this, its quite annoying having to have to tail the syslog on the console to watch for when it bombs!

A putty session disconnects well before the last entries of the syslog in my case.

Link to comment

Syslog is only in RAM, so it's only saved if you save it.  Your console tail is the only thing with the very last messages.

 

BIOS is 2.0 from 2010, see if you can update that.  The rest of the syslog looks fine.  No drive issues either.

 

All indications are that it is a problem with your SAS controller.  Perhaps it is not flashed correctly?  Or needs newer firmware?  You could try taking a load off it, moving some of the drives to the motherboard ports, which are as fast or faster.

 

Online searching indicates that disabling hpet *might* help, but it's a workaround.  It would be better if a BIOS or firmware update fixes the problem.  But if not then adding "hpet=disable" (without the quotes) to your syslinux.cfg append line is worth testing.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...