May 6, 201610 yr Hi, I have had a horrible time trying to get my unraid setup to stay stable. Last Thursday I started out with the aim of replacing my Windows Server 2008 running my home domain and various apps like plex with Unraid running various docker apps with windows server to run under a vm and leaving it to do what it does best, serving as a domain controller. I started with 4 3tb WD Reds 2 which where new for the project started off the array just fine, 1 as the parity and 1 as a data drive. The problems started after I had transferred the data off the other 2 drives and attempted to preclear them. Unraid proceeded to crash during random times throughout the preclear process causing me to have to perform a hard restart of the box and start the process again, the console shows the tail end of what looks like an error stack trace. The console itself is usually halted and doesn't accept input. Eventually I have managed to get the drives cleared and added to the array but I am still experiencing occasional crashes, especially since I have started to preclear my next drive the 1.5TB Seagate. I have also had a couple of occasions where the unraid webui has become completely unresponsive and will not load, attempting a reboot using the powerdown script usually fails with the system not responding. For reference my hardware is: Motherboard: Gigabyte GA-Z77X-UD5H CPU: Intel 3570k Ram: Corsair DDR3 XMS3 16GB Set/Corsair DDR3 XMS3 8GB Set (for 24GB total) PSU: Corsair TX65M 650W Drives in array: 3TB Western Digital Red (x4) Other Drives (Not yet processed/precleared) Seagate Barracuda 7200 1.5TB Seagate Barracuda 7200.10 500GB Western Digital 1TB Caviar Green 32MB Cache Samsung 840 Basic SSD 120GB I have attached what logs and diag dumps I have available, but the syslogs before the crashes seem to be lost. If someone could help diagnose my problem I would be very grateful. If there is anything I can do to collect more information please let me know. optimus-diagnostics-20160505-0154.zip optimus-diagnostics-20160506-2143.zip
May 6, 201610 yr Author Some more syslogs that I couldn't attach to the previous post syslog1.zip syslog3.zip
May 13, 201610 yr Author Hi, still getting random hangs. Managed to get all my drives cleared and in the array apart from the 500GB one which i have literally now removed from the machine. Latest problem was tonight where the web ui dockers and a vm i was running all died while I was performing a rsync between 2 shares, normal process happened where I tried to perform a restart (via the powerdown script) but it hangs while trying to perform the system diagnostics which I press crtl-c to skip after a while then it hangs on the shutdown -r command forcing me to pull the plug. I have the system log which is totally filled with errors but they don't mean much to me, please please let me know if they point to anything obvious or whether there are any known issues with my hardware setup. syslog.zip
May 15, 201610 yr Author Help! In a bad situation now, I was using the unbalance plugin to move some shares onto another disk when things went weird, unbalance was reporting all drives where zero sized and failed to do anything. I proceeded to shutdown all dockers and the vm and went to restart the machine via power down. It made the diag log fine and sent the shutdown command and then just hung. Since I've now encountered this situation so many times I hammered the restart button and waited for it to come back up. To my horror I was presented with the error on the console: XFS (md4): Failed to recover EFIs Looking back in the logs of the diag before shutdown (attached) something happened to disk4 and started spitting out these errors: May 15 17:26:17 Optimus kernel: Buffer I/O error on dev md4, logical block 33387828, lost async page write May 15 17:26:28 Optimus kernel: XFS (md4): xfs_log_force: error -5 returned. May 15 17:42:47 Optimus emhttp: get_filesystem_status: statfs: /mnt/user/Backup Input/output error Now it seems that unraid has booted but cannot mount disk4 (syslog attached), the webui is not available. Can anyone help and let me know what the procedure is to try and fix this? Is the drive/array a lost cause? I'm pretty desperate to solve this whole instability situation, I'm one step away from just buying some new hardware. optimus-diagnostics-20160515-1950.zip syslog15.zip
May 15, 201610 yr Author I've been reading http://lime-technology.com/wiki/index.php/Check_Disk_Filesystems But it mentions to put the array in maintenance mode, since the web ui does not come up I cannot do this. Is it safe to run "xfs_repair -v /dev/md4" without maintenance mode?
May 15, 201610 yr Doubt it... Try changing config/disk.cfg (startArray="yes" to be startArray="no") then reboot (powerdown -r) Hopefully the GUI will come back and you'll be able to start in maintenance mode.
May 15, 201610 yr Author Doubt it... Try changing config/disk.cfg (startArray="yes" to be startArray="no") then reboot (powerdown -r) Hopefully the GUI will come back and you'll be able to start in maintenance mode. Excellent that worked, thanks! It's back up and I've started maintenance mode, it auto doing a parity check should I wait to see what happens with the check before attempting any xfs repair commands?
May 15, 201610 yr For future reference, your syslog is always one of the things already included in your diagnostics zip so there is no need to also attach it separately.
May 15, 201610 yr Author The attached diag zip was pre restart, the syslog was post restart. Sorry for not making that clearer.
May 16, 201610 yr Author The xfs_repair ran fine and the array is now back up and running. With that crisis averted I still need to attack the stability issue, does no one have any ideas I could try? I have confirmed its not a ram issue, I ran a memtest again over the weekend and left it for 3 full passes and all was fine. I can only round it down to motherboard/sata controller/sata cable issues.
May 23, 201610 yr Author Just thought I'd give a conclusion to this thread to help anyone who may have a similar issue: I ended up buying new hardware as per this thread (http://lime-technology.com/forum/index.php?topic=49118.0). When I was attempting to boot it up for the first time the new motherboard had an "power LED" light on it that would flash on and off every second but would not power up. The Asus manual did not say anything about the states of this LED but a quick Google people where suggesting it was down to the power supply. I ended up buying another one but it powered on fine after that. I am now starting to wonder whether my stability issues were down to the power supply all along (or maybe a combination of things), something to do maybe about it not supplying reliable power to the motherboard, and the change to the Asus server grade motherboard was more intolerant of this and refused to power on. For reference the old power supply was a 5 year old Corsair HX 650W, and the new is a Super Flower Leadex 550W Platinum. I can happily report that everything is running stable since Saturday as per the build in my sig. It ended up over budget but I think it will be worth it in the long run.
Archived
This topic is now archived and is closed to further replies.