Jump to content

Server completely hangs. Could use some assistance troubleshooting.


Recommended Posts

I'm hoping someone can help me do some trouble shooting with my server. Yesterday I tried to load a movie from it, and it failed. When I tried to access the server from the web interface, it wouldn't load. I have a keyboard and monitor connected directly to the server, and even with that, I couldn't type any commands, it was dead. Even a telnet session I had from earlier in the day wouldn't respond. As much as I hated doing it, I pulled the plug. It rebooted, all my drives where detected, it just required a parity check. I started the parity check, and a few minutes later it was locked up again. To keep an eye on it, I did run a tail on the log. If I'm reading the log right and referencing the drive issues page, it looks like I might be having trouble with one of the backplanes in my Norco case. I'm hoping someone can look over the log and verify that, or if there is a way to see what drive failed, so I can even track down what back plane exactly is giving me trouble.

 

My hardware

 

ASRock C226 WS

Intel G3220

Supermicro AOC-SASLP-MV8 x1 and the older version of the card, I forget the model.

Norco 4224 case.

 

Running unraid 5.0, no addons.

 

The log I was able to save is attached.

 

Thanks in advance!

syslog.txt

Link to comment

May 25 10:13:10 Tower kernel: ata3.00: irq_stat 0x08000000, interface fatal error
May 25 10:13:10 Tower kernel: ata3: SError: { UnrecovData 10B8B BadCRC }
May 25 10:13:10 Tower kernel: ata3.00: failed command: READ DMA
May 25 10:13:10 Tower kernel: ata3.00: cmd c8/00:40:38:a0:7a/00:00:00:00:00/e9 tag 0 dma 32768 in
May 25 10:13:10 Tower kernel:          res 50/00:00:37:a0:7a/00:00:09:00:00/e9 Emask 0x10 (ATA bus error)
May 25 10:13:10 Tower kernel: ata3.00: status: { DRDY }
May 25 10:13:10 Tower kernel: ata3: hard resetting link

 

10B8B indicates an error in the hardware path, e.g., Bad or dirty SATA port, bad or loose SATA cable, bad SATA backplane. The entire log is needed to determine which drive.

 

Start a tail of the log in a telnet window and copy the entire log. Run until crash and then post both the beginning of the log and the tail.

Link to comment

Thanks for pointing out what the error was, and what it meant. I did what you said, and got a full syslog, and doing that, I was able to tell what drive was causing the problem. I had a had a spare cable that I swapped out, but that cable turned out to be completely bad, so I decided to try my original cable again, because now everything would be reseated. So far looks like that did the trick, as the parity check is running without issue. I'll keep a eye on my syslog and see if it comes back and is the backplane.

 

I appreciate your help!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...