Disk errors on one disk and another disk is disabled after server crash


noties

Recommended Posts

Came home from vacation to find my unRAID server in a tizzy.  Not sure if it overheated or just had a brain f-rt.  It was locked up - no network, no drives accessible, no web mgmt accessible and spewing the follow errors on console screen (couldn't capture the text):

 

5a8b0cdea2606_2018-02-1812_30_28-10.1.1.19(KVM-IPTCP)RemoteConsole.png.b729fdeca10ca5305ea0468c14d56753.png 

 

 

5a8b0d000dbcc_2018-02-1812_30_36-10.1.1_99.png.e4cf05a21ab0438f716668142a3005e4.png

 

 

Upon restarting the server, I have one disk disabled and one disk showing errors.  Array is started and is accessible, but I'm afraid to write data until I get this figured out.

image.thumb.png.ed321d5ac52e12fe211dee77a4aa065b.png

 

 

Grabbed a diagnostic and attached here.  Wondering if anyone sees something I don't.  Just starting the troubleshooting process and wondering if I should focus on Disk 6 or 7 first.  

 

teraserver-diagnostics-20180219-0923.zip

 

Any advice would be greatly appreciated.  One disk failure, I feel comfortable recovering from, but TWO... UGH!  I'm a little worried.

 

 

 

 

Link to comment
1 hour ago, johnnie.black said:

Problem in two disks at the same time would point to a cable/power problem, since they share the same miniSAS cable that would be the first thing to replace if you have a spare, then if it looks stable you need to rebuild the disable disk, using the old disk or to play it safer using a new spare.

 

Don't have a spare mini-SAS cable, but just ordered one.  I shutdown and re-seated all cables and power to the drives and booted up again.  Disk 6 is no longer seeing errors, but I'm guessing it only resets the counter to 0 on reboot.  Should I run a parity check while Disk 7 is disabled to see if Disk 6 is still having issues?

 

Link to comment
Just now, noties said:

Disk 6 is no longer seeing errors, but I'm guessing it only resets the counter to 0 on reboot. 

Correct, even an array stop/restart will reset them.

 

1 minute ago, noties said:

Should I run a parity check while Disk 7 is disabled to see if Disk 6 is still having issues?

You can, if there are more errors it won't damage anything, and if there aren't you should also be able to rebuild the disable disk, check that contents on the emulated disk look correct before rebuilding on top of the old disk (or use a new disk)

Link to comment

I'd suggest taking the time to check out the simulated disks in unRaid. (When disks are kicked from the array, they are simulated using parities and the other data drives in the array). Check to make sure the simulated disks contain files, and don't appear unformatted. And I'd check a small sampling of files for accuracy, including some of the newer files copied to the disks.

 

If they look perfect, you could restore data to the kicked disks. But I don't recommend it. The kicked disks are likely perfect (or near perfect), and if anything goes wrong with the rebuild, they are your plan B. You could rebuild one to a new disk, and if it works, rebuild the second to the disk that is already successfully rebuilt. That would be my suggestion.

Link to comment
9 minutes ago, SSD said:

I'd suggest taking the time to check out the simulated disks in unRaid. (When disks are kicked from the array, they are simulated using parities and the other data drives in the array). Check to make sure the simulated disks contain files, and don't appear unformatted. And I'd check a small sampling of files for accuracy, including some of the newer files copied to the disks.

 

If they look perfect, you could restore data to the kicked disks. But I don't recommend it. The kicked disks are likely perfect (or near perfect), and if anything goes wrong with the rebuild, they are your plan B. You could rebuild one to a new disk, and if it works, rebuild the second to the disk that is already successfully rebuilt. That would be my suggestion.

 

Thank you for the suggestions.  I'm in the process of doing a parity check again to verify Disk 6 that had tons of errors.  Also taking your suggestion and looking at files on the simulated disk.  So far all the newer files are fine and parity check is 50% done with no errors.  

 

I suspect this was either a cable, controller, or (less likely) heat issue that caused the crash and kicked the disk.    I'll report back with my continued results.

Link to comment
14 minutes ago, trurl said:

@johnnie.black

I'm a little unclear about this part. Since there is only one parity, and a disk is disabled, what does it actually do? Isn't this the "read check" scenario? What exactly happens in this situation if you ask for a parity check?

 

I guess my question centered around running Disk 6 through a full read which I assumed happened during a parity check.  

Link to comment
35 minutes ago, trurl said:

@johnnie.black

I'm a little unclear about this part. Since there is only one parity, and a disk is disabled, what does it actually do? Isn't this the "read check" scenario? What exactly happens in this situation if you ask for a parity check?

Yes, though unRAID still calls it a parity check when there's a parity disk, it just acts as a read check hence why I mentioned it won't damage anything even if there are more errors.

Link to comment

So just following up on this thread.  Powered down the server and re-seated all the power and SAS cables.  I ran SMART short and long tests on both Disk 6 and 7 and they passed.  Decided to re-add Disk 7 to the array.  It's now rebuilding and so far no errors or issues.  

 

I'll be replacing the SAS cable this week when the replacement cable shows up just to be sure.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.