Getting a FAILED health report while rebuilding array


Recommended Posts

I've been running an array of 12 disks, with a cache drive and a hot spare for several years now with no complaint an great ease,

 

The box I use is a Norco 4224 with IBM SAS cards and native SATA slots on a ASRockZ77Extreme with an Intel Celeron G1610 and 4GB of memory since about 2013 with virtually no incidents.

 

Tonight I wanted to replace an old 2GB drive with a new 8GB drive. I had already updated my parity drive to 8gb successfully some six months ago,'

 

I took the array off line, removed the old disk, replaced the new larger disk in the same physical slot that the old one had been in (just for a sense of safety) and restarted the array. Evevything was going according to plan until a got a notice that the rebuilding disk was 'warm' 45 degrees C. Several minutes later I got the same notice but it now read 46 degrees,  

 

I took the top off the case and discovered that the fan closest to the drive being rebuilt had stopped, I removed the lid completely, placed a desk fan pointing at the problem disk and waited for the temperature to go down. It did about 20 minutes later and is now running at a comfortable 41 degrees .

 

But after about another     3 hours I got a popup fail message, which read "Notice [TOWER} = array health reort [FAIL]. Array has 14 disks(including parity and cache.)

 

I looked in the log but couldn't see anything untoward. The array was still rebuilding so I let it continue to see what would happen. Now about another 1-1.5 hrs later it seems to be humming along, My plan, such as it is is to let it finish and the run a parity check (assuming nothing further happens.)   I also purposely didn't repurpose the old disk that I was replacing, and since I didn't any new content to the unraid system during this process, I hope that if something is wrong I can simply replace the old disk in the slot where it was temporarily until I know what's wrong. I've attached both the syslog from right after I noticed the problem though my eye sees nothing wrong with it,   and the diagnostics file requested by the how-to post. I also captured an image of the toastr image that warned me of the fail.

 

What I would like to know is:

1. is there any point in letting this process complete. If not should I place aside the new hard drive and do a new preclear on it before trying to use it again (I did two cycles before starting this process.           

 

2. if not what should I do.

 

3, what do more trained eyes than min glean from the log and diagnostics report.

 

I'm anxious of course to get my  array back up and running but I'm hoping a  being deliberate will keep me from doing something rash, hence allowing it to complete and having the old disk still intact and untouched so it could be put back in the array.                     

 

Any help or guildance would be much apprecited.

 

 

hwilker                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

tower-diagnostics-20180810-0057.zip

tower-syslog-20180810-0033.zip

healthreportfail.JPG

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.