Jump to content

Is something wrong with my disks?


eroz

Recommended Posts

I've been having issues with my server. A couple of days ago I found my disk2 had redballed.  I checked smart reports and nothing was out of the ordinary. I had not added anything to the server or moved it around.  Anyway, I checked to make sure all cables were still attach correctly.  I unassigned disk2 and rebooted the server.  After the restart I reassigned disk2 and it started to rebuild it.  Everything looked to be working correctly. Anyway today, I was trying to watch some movies on my pch and the movie would freeze.  It was acting like it could not maintain a connection with the server.

 

So the issue could actually be with the pch and not the server.  But could someone look over my latest syslog and see if something does not look right?

 

Thanks.

 

I'm unable to attach the syslog so I uploaded it to pastebin.

Link to comment

Your SMART reports look normal, so your drives should be healthy.  Your syslog is filled with a few different types of errors.  For example:

 

Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2071:Port 0 irq sts = 0x1000000
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 1 ctrl sts=0x199800.
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2071:Port 1 irq sts = 0x1usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port usr/srusr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port usr/srusr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 0 ctrl sts=0x199800.
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunkusr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 0 ctrl sts=0x199800.

 

...and then later...

 

Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 2 ctrl sts=0x199800.
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2071:Port 2 irq sts = 0x1000000
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 3 ctrl sts=0x199800.
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2071:Port 3 irq sts = 0x1000000
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 0 ctrl sts=0x199800.
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2071:Port 0 irq sts = 0x1000000
Sep 30 15:37:11 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 2069:port 1 ctrl sts=0x199800.

 

...and later still...

 

Sep 30 15:37:17 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 928:port 1 does not attach device.
Sep 30 15:37:17 Tower kernel: sas: lldd_execute_task returned: 138
Sep 30 15:37:17 Tower kernel: ata2: no sense translation for status: 0x50
Sep 30 15:37:17 Tower kernel: ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Sep 30 15:37:17 Tower kernel: ata2: status=0x50 { DriveReady SeekComplete }
Sep 30 15:37:17 Tower kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 928:port 1 does not attach device.
Sep 30 15:37:17 Tower kernel: sas: lldd_execute_task returned: 138
Sep 30 15:37:17 Tower kernel: ata2: no sense translation for status: 0x50
Sep 30 15:37:17 Tower kernel: ata2: translated ATA stat/err 0x50/00 to SCSI SK/ASC/ASCQ 0xb/00/00
Sep 30 15:37:17 Tower kernel: ata2: status=0x50 { DriveReady SeekComplete }

 

Note that all of this started on Sept. 30th.  Any power surges or anything out of the ordinary happen that day?  Some of the errors mention the mv_sas driver, so it could be that your Supermicro AOC-SASLP-MV8 card is dying (I assume you have one).  It looks like there might also be an IRQ conflict with that device.

 

You said you already checked your cables, which is good.  You might want to reseat the connection between your SASLP card and your breakout cables.

 

The first and easiest troubleshooting step is to boot into BIOS and disable any motherboard features you you don't need, such as serial ports, parallel ports, etc.  Also look for anything called 'PEG mode' or similar and disable it.  Then boot into unRAID and run your server normally for a few days and see if the problem recurs.  If it doesn't, great - IRQ issues are generally easy to fix.  If it does, then I would suggest that your SASLP card (or the PCIe slot it is plugged into) might be defective.  If you have a second PCIe x4 or faster slot, swap the card to that slot and see if the errors disappear.  Also, if you have any spare controller cards, swap them in and see if the server will run correctly without the SASLP card.

Link to comment

Note that all of this started on Sept. 30th.  Any power surges or anything out of the ordinary happen that day?  Some of the errors mention the mv_sas driver, so it could be that your Supermicro AOC-SASLP-MV8 card is dying (I assume you have one).  It looks like there might also be an IRQ conflict with that device.

 

You said you already checked your cables, which is good.  You might want to reseat the connection between your SASLP card and your breakout cables.

 

The first and easiest troubleshooting step is to boot into BIOS and disable any motherboard features you you don't need, such as serial ports, parallel ports, etc.  Also look for anything called 'PEG mode' or similar and disable it.  Then boot into unRAID and run your server normally for a few days and see if the problem recurs.  If it doesn't, great - IRQ issues are generally easy to fix.  If it does, then I would suggest that your SASLP card (or the PCIe slot it is plugged into) might be defective.  If you have a second PCIe x4 or faster slot, swap the card to that slot and see if the errors disappear.  Also, if you have any spare controller cards, swap them in and see if the server will run correctly without the SASLP card.

 

On Sept. 30th, around 3:30pm there was a storm that blew through the area.  I normally lose power whenever there is a big storm, but this time no power looked to be lost.  My computer and server which normally do lose power were still up and running.  I didn't think much of that storm. apparently it did affect the server.

 

I'll give that a try Rajahal.  I do have a SASLP card.  I also have a different set of breakout cables I will try out.

Thank you!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...