Jump to content

[SOLVED] Multiple Drive Failures out of the blue (11 drives with errors now)


Recommended Posts

I said "drive failures" because I think my drives are actually fine.  I just didn't know if anyone else has run across this.

 

Yesterday my server emailed me saying it had array errors, 3 disks with read errors (Parity disk, Parity disk 2, and Disk 3).  It disabled the Parity Drive as well as Disk 3, but things kept running.  I was thinking the chances of three drives having errors all at once out of nowhere seemed a bit low, so I doubted it was actually bad drives. 

 

Turns out the following below is not the problem.  Problem just came back and shows 11 drives with errors.  I'm at a loss and have diagnostics if someone smarter than me can make sense of them.

 

To get to it, I am wondering if this is the problem somehow..

 

I use the docker ShinySDR for a SDR dongle I use.  I unplugged it from my server a day prior as I was getting a longer antenna cable for it.  The docker file had the usb device set as /dev/bus/usb/003/002 (which was correct prior to me unplugging it). and the docker was set to automatically start.  Somewhere in there I rebooted the server and I think this is where the issues started.  I rebooted numerous times, shut down all docker containers, shut down the one vm I run, and tried to remove all plugins I felt I didn't need trying to find what the issue might be.  I forced the server off a couple of times as it was just unresponsive as well.  The server actually emailed me yesterday afternoon saying I had 9 disks with read errors.  Well I opened the terminal and ran lsusb to see what it had connected and /dev/bus/usb/003/002 was now "Bus 003 Device 002: ID 058f:6387 Alcor Micro Corp. Flash Drive" - This is my Unraid USB drive...  I am wondering if this cold have been the cause.  I didn't know if the docker container could be trying to access the usb drive in such a way as to spew out all of these read errors and disable my drives.

 

I did run tools -> diagnostics several times, but I now know that every time you reboot you might miss something important.  These files along with the syslog did show errors, but I'm hesitant to  believe it as it has since rebuilt the parity drive, and is currently 66% through rebuilding disk 3.  The syslog currently shows only the errors for the disabled drives, prior to me removing them and adding them back.

 

thoughts?

 

thanks,

 

John

 

 

disk errors.PNG

Edited by jcamer
Link to comment
  • jcamer changed the title to Multiple Drive Failures out of the blue (11 drives with errors now)

I have been searching and this seems to be exactly what I am experiencing.  In the logs I also see the exact same error across multiple drives on the exact same sector.  As with that poster, I also have a Supermicro Chassis with 12 drive bays.  I am going to try what they tried (disable spin down) and will see how that works.  I wouldn't think it'd be a power issue as it has dual 1000w power supplies.  I'll also try to update the firmware on my card (Supermicro Card, LSI3008-IT, running firmware 6.00.00.00-IT)  The card and cables haven't been moved so nothing has come unseated or anything.

 

My question now is, how do I get my two disabled drives back without having to stop the array, remove them, start the array, and add them back?  Is there an easy way to tell Unraid to trust they're good and reenable them?

 

edit:  I stopped the array and removed both drives.  Restarted the array, added them back, etc.  Rebuilding now.

 

Thanks again,

 

John

Edited by jcamer
add content
Link to comment
18 hours ago, JorgeB said:

Fist thing would should do is updating the LSI firmware, if still issues after that see if disabling spin down helps.

 

I updated the firmware to the latest on the supermicro site.  I'm nervous to tell the drives to spin down yet, I might give it a day or so.  No issues since telling them not to spin down.  I'll see how the newer firmware does then reenable spin down.  Thanks again, appreciate it.

 

updated:

1530514581_updatedfirmware.PNG.880fea4cde029626b2dad2bdd2d1aa05.PNG

 

original:

sas3flash2.PNG.93213e0e91d416c04a5bf8c70e9883dc.PNG

Edited by jcamer
  • Like 1
Link to comment
  • JorgeB changed the title to [SOLVED] Multiple Drive Failures out of the blue (11 drives with errors now)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...