fcaico Posted April 22, 2016 Share Posted April 22, 2016 I've recently put together this unRAID system which is intended to be a media and application server along with providing a virtual HTPC to my media room. It all runs swimmingly for a little while but after 1-3 days I will typically find my SMB shares, the unRAID web (or one of the docker webs - or a combination) in a non-responsive state. There are some files being held onto (usually by SMBD or SHFS) and while I can unmount some of the drives, there will typically be some which are held busy by one of those rogue processes and I cannot kill the offending process. This prevents me from shutting down nicely -even using the powerdown plugin, but so far I haven't lost any data (or even been forced to do a parity check on startup). Powerdown -r says its halting the system, but after that nothing happens. I can still login to the system using SSH. My only recourse is to hold down the power button to restart. The only thing I can see in the log is a couple of mce [Hardware Error]s at boot time. But the problem occurs much later usually and doesn't crash the system at all. I've done a memtest on the system for over 24 hours and Im running everything at normal clock rates (not overclocked). I have 1 Windows 10 VM (with a video card and a USB bus passed through) I have 3 dockers running: Mariadb, SABNZBd and SickBeard I have 2 plugins loaded: Community Applications, Powerdown The system is pretty beefy too: cpu: i7 5820K, ram:32 GB mobo: ASRock X99 Extreme 6 video: GTX 970 cache: 2x 240GB SSDs parity 1 6TB data: 2x 6TB, 2x 4TB, 1x2TB unraid-diagnostics-20160422-0749.zip Quote Link to comment
fcaico Posted April 24, 2016 Author Share Posted April 24, 2016 I'm using 6.2 beta 20.... Do you think downgrading to 6.1 release would fix this? or even upgrading to 6.2 beta 21? Quote Link to comment
Bjonness406 Posted April 24, 2016 Share Posted April 24, 2016 You should be on the newest beta. Since the last beta is not supported anymore. Then you are using beta, always use the latest one! Quote Link to comment
Adam64 Posted April 24, 2016 Share Posted April 24, 2016 I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using? Quote Link to comment
callmeedin Posted April 29, 2016 Share Posted April 29, 2016 Adam64 -- how did you discover that it was an Intel i 218V ethernet issue and was there a fix (other than adding a new network card)? I to have a v6 system that becomes unusable every few days (opened a new post on it this moring), and I have a ASRcok Motherboard in it (like the original poster here), but not sure what onboard ethernet it has. Thanks. Quote Link to comment
testdasi Posted April 29, 2016 Share Posted April 29, 2016 I think it's in the X99 chip set. Good to know that the i218V has problems. I'll switch to my RealTek port then. Quote Link to comment
callmeedin Posted April 29, 2016 Share Posted April 29, 2016 Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset? 03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10) Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1] Kernel driver in use: tg3 Kernel modules: tg3 Quote Link to comment
fcaico Posted April 29, 2016 Author Share Posted April 29, 2016 I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using? So I just checked and saw that in fact I was using the top RJ45 jack - which is the Atheros one -- so I don't think this is my issue. Also, It seemed pretty unlikely at any rate since The system was still responding just fine over ssh. Its just that some of the disk devices were being kept busy by something and the web interface stopped responding (which also made SMB stop responding). Still havent resolved this issue and I'm wondering if this is a V6 issue as I dont seem to be the only one. I'm not sure exactly what I would lose by downgrading to v5 but I'm considering it as I'd like to resolve these issues. Quote Link to comment
fcaico Posted April 29, 2016 Author Share Posted April 29, 2016 Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset? 03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10) Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1] Kernel driver in use: tg3 Kernel modules: tg3 No - your motherboard uses the Z77 chipset. What CPU are you running? Quote Link to comment
callmeedin Posted April 30, 2016 Share Posted April 30, 2016 ASRock Z77 Extreme4 AMD Sempron 145 2GB DD3-1145 Single Module Listed wrong MB: It is a ASRock 990FX Extreme 3. Quote Link to comment
fcaico Posted April 30, 2016 Author Share Posted April 30, 2016 Yeah your hardware is far below the spec of my unraid server (which may or may not be the problem in your case. not sure.). Quote Link to comment
gdeyoung Posted May 3, 2016 Share Posted May 3, 2016 I have the same problem on a beta 21 PC with a AMD config: Gigabyte Technology Co., Ltd. - 990FXA-UD3 CPU: AMD FX-8350 Eight-Core @ 4000 HVM: Enabled IOMMU: Enabled Cache: 384 kB, 8192 kB, 8192 kB Memory: 24576 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g Exact same symptons. I end up having to hard reset the box since it will not soft reset from the command line. My NIC is a Realtek RTL8111E chip (10/100/1000 Mbit) Quote Link to comment
fcaico Posted May 3, 2016 Author Share Posted May 3, 2016 You should upload your diagnostics next time it happens. Quote Link to comment
drdobsg Posted May 6, 2016 Share Posted May 6, 2016 I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it. My M/B uses Intel 82579LM and 82574L LAN. Config 6.2.0-beta21 M/B: Supermicro - X9SCL/X9SCM CPU: Intel® Xeon® CPU E31230 @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: 256 kB Memory: 16384 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 eth1: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g Quote Link to comment
grither Posted May 6, 2016 Share Posted May 6, 2016 i had this problem as well... i was in another thread that suggested adding a cron job which restarts all dockers once a day. i've been running about 3 months with no more unusable periods. Quote Link to comment
itimpi Posted May 6, 2016 Share Posted May 6, 2016 I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it. My M/B uses Intel 82579LM and 82574L LAN. Config 6.2.0-beta21 M/B: Supermicro - X9SCL/X9SCM CPU: Intel® Xeon® CPU E31230 @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: 256 kB Memory: 16384 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 eth1: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system. Quote Link to comment
fcaico Posted May 6, 2016 Author Share Posted May 6, 2016 What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system. Hmmm Thats pretty interesting. I have a mix of drives - and 3 of them are WD 6TB Red drives. I've noticed that sometimes its not the Red drives which are held 'busy' by some process, but I suppose since my Parity drive IS a Red drive this *might* be my issue... UPDATE: I just checked an all my drives Spin Down Delay is set to "Use Default" which is set to Never So that does not appear to be it. Quote Link to comment
drdobsg Posted May 6, 2016 Share Posted May 6, 2016 What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system. I have an older 60GB OCZ SSD for a cache drive, 3x 1TB WD Blacks, and a couple 500GB Seagates for the array. I know 3TB usable its not much, but I am still testing unRaid while i wait for some sales on some newer drives. I have 5x 3TB Seagate drives but 3 of them died so I am looking to replace them with something better. I was surprised to see the default spindown delay was set to never. I changed it to 2 hours and then the server froze, but I was also going heavy on the downloading trying to restore my collection. I don't even know why the drives were spun down if there were still processes accessing them. Quote Link to comment
fcaico Posted May 12, 2016 Author Share Posted May 12, 2016 So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days). However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens). Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them. Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart. Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato... Anyone? I'm pretty desperate to resolve this. Quote Link to comment
testdasi Posted May 13, 2016 Share Posted May 13, 2016 So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days). However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens). Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them. Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart. Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato... Anyone? I'm pretty desperate to resolve this. Hmm... that sounds a bit similar to what is happening with me on 6.2.0 beta 21. I managed to pin-point that it's due to something hanging in the array but couldn't figured out what. dAigo seems to have a similar issue too. You might want to try 6.1.9 to see if it works. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.