littlebluebro Posted August 13, 2017 Share Posted August 13, 2017 (edited) About once a week or so I am unable to load any shares and when I check the web admin I see all my drives and shares appear to be completely missing. Rebooting from the Web UI seems to not work either and I have to run powerdown via SSH as root. I've noticed this since I setup my unRAID server a few months ago. I started my server on the latest RC builds of 6.3.x and saw this issue, I've moved to the pre-release 6.4 builds and still am seeing it. I ran the plugin check for common problems and it found a two things that maybe are causing it. Quote "Machine Check Events detected on your server Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged" Quote "Marvel Hard Drive Controller Installed It appears that your server has a Marvel based hard drive controller installed within it. Some users with Marvel based controllers exhibit random drives dropping offline, recurring parity errors during checks etc. This tends to be exacberated if VT-D / IOMMU is enabled in the BIOS. Generally, LSI based controllers would be preferred over Marvel based controllers because of these issues. Note that these issues are out of Limetech's hands. Depending upon the exact combination of hardware present in your server, you may not have any problems whatsoever. If you have no problems, then this warning can be safely ignored, but future versions of unRaid (and later Kernel versions) may (or may not) present you with the previously mentioned issues." Attaching some diagnose i collected via SSH when the server was in this state and one collected via the web admin after rebooting. Oh also attaching log output from when running diagnostics via SSH incase that's useful. tower-diagnostics-20170813-1046.zip moya-diagnostics-20170813-1144.zip diagnostic-log.txt Edited August 13, 2017 by littlebluebro adding more logs Quote Link to comment
Squid Posted August 13, 2017 Share Posted August 13, 2017 Machine check event: mce log only logged this: Aug 13 11:43:06 Moya root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors First time I've ever seen anything that terse. But it may be because mcelog was installed when the errors happened. Reboot, and the next time it pops up, we should have something better to go on. Shares: Is the moya diagnostics from after a reboot? Was everything still running at the time? The "Tower" diagnostics are completely empty for some weird reason, but probably related to all the errors in the diagnostics.zip file which seems to imply that the flash drive dropped offline. Move the flash drive to a different controller (USB2 <--> USB3) 1 Quote Link to comment
littlebluebro Posted August 14, 2017 Author Share Posted August 14, 2017 Thanks Squid! Yes moya-diagnostics were run after a reboot and done from the Web UI and the tower diagnostics were done via a shell connection when the server was in the bad state. The diagnostics-txt file was the terminal output from running diagnostics via the ssh session as root, not sure if that matters. I'll try using a different USB controller to connect the flash drive to in the meantime and if it occurs again grab more logs now that I have mcelog installed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.