Intermittently all drives and shares go missing


Recommended Posts

About once a week or so I am unable to load any shares and when I check the web admin I see all my drives and shares appear to be completely missing. Rebooting from the Web UI seems to not work either and I have to run powerdown via SSH as root. I've noticed this since I setup my unRAID server a few months ago. I started my server on the latest RC builds of 6.3.x and saw this issue, I've moved to the pre-release 6.4 builds and still am seeing it.

 

I ran the plugin check for common problems and it found a two things that maybe are causing it.


 

Quote

 

"Machine Check Events detected on your server    Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged"

 

 

 

 

Quote

 

"Marvel Hard Drive Controller Installed    It appears that your server has a Marvel based hard drive controller installed within it. Some users with Marvel based controllers exhibit random drives dropping offline, recurring parity errors during checks etc. This tends to be exacberated if VT-D / IOMMU is enabled in the BIOS. Generally, LSI based controllers would be preferred over Marvel based controllers because of these issues. Note that these issues are out of Limetech's hands. Depending upon the exact combination of hardware present in your server, you may not have any problems whatsoever. If you have no problems, then this warning can be safely ignored, but future versions of unRaid (and later Kernel versions) may (or may not) present you with the previously mentioned issues."

 

 

 

 

 

Attaching some diagnose i collected via SSH when the server was in this state and one collected via the web admin after rebooting. Oh also attaching log output from when running diagnostics via SSH incase that's useful.

 

 

tower-diagnostics-20170813-1046.zip

moya-diagnostics-20170813-1144.zip

diagnostic-log.txt

Edited by littlebluebro
adding more logs
Link to comment

Machine check event:

 

mce log only logged this:

Aug 13 11:43:06 Moya root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors

 

First time I've ever seen anything that terse.  But it may be because mcelog was installed when the errors happened.  Reboot, and the next time it pops up, we should have something better to go on.

 

Shares:  Is the moya diagnostics from after a reboot?  Was everything still running at the time?  The "Tower" diagnostics are completely empty for some weird reason, but probably related to all the errors in the diagnostics.zip file which seems to imply that the flash drive dropped offline.  

 

Move the flash drive to a different controller (USB2 <--> USB3)

  • Upvote 1
Link to comment

Thanks Squid! Yes moya-diagnostics were run after a reboot and done from the Web UI and the tower diagnostics were done via a shell connection when the server was in the bad state. The diagnostics-txt file was the terminal output from running diagnostics via the ssh session as root, not sure if that matters. 

 

I'll try using a different USB controller to connect the flash drive to in the meantime and if it occurs again grab more logs now that I have mcelog installed.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.