April 22, 201610 yr I've recently put together this unRAID system which is intended to be a media and application server along with providing a virtual HTPC to my media room. It all runs swimmingly for a little while but after 1-3 days I will typically find my SMB shares, the unRAID web (or one of the docker webs - or a combination) in a non-responsive state. There are some files being held onto (usually by SMBD or SHFS) and while I can unmount some of the drives, there will typically be some which are held busy by one of those rogue processes and I cannot kill the offending process. This prevents me from shutting down nicely -even using the powerdown plugin, but so far I haven't lost any data (or even been forced to do a parity check on startup). Powerdown -r says its halting the system, but after that nothing happens. I can still login to the system using SSH. My only recourse is to hold down the power button to restart. The only thing I can see in the log is a couple of mce [Hardware Error]s at boot time. But the problem occurs much later usually and doesn't crash the system at all. I've done a memtest on the system for over 24 hours and Im running everything at normal clock rates (not overclocked). I have 1 Windows 10 VM (with a video card and a USB bus passed through) I have 3 dockers running: Mariadb, SABNZBd and SickBeard I have 2 plugins loaded: Community Applications, Powerdown The system is pretty beefy too: cpu: i7 5820K, ram:32 GB mobo: ASRock X99 Extreme 6 video: GTX 970 cache: 2x 240GB SSDs parity 1 6TB data: 2x 6TB, 2x 4TB, 1x2TB unraid-diagnostics-20160422-0749.zip
April 24, 201610 yr Author I'm using 6.2 beta 20.... Do you think downgrading to 6.1 release would fix this? or even upgrading to 6.2 beta 21?
April 24, 201610 yr You should be on the newest beta. Since the last beta is not supported anymore. Then you are using beta, always use the latest one!
April 24, 201610 yr I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using?
April 29, 201610 yr Adam64 -- how did you discover that it was an Intel i 218V ethernet issue and was there a fix (other than adding a new network card)? I to have a v6 system that becomes unusable every few days (opened a new post on it this moring), and I have a ASRcok Motherboard in it (like the original poster here), but not sure what onboard ethernet it has. Thanks.
April 29, 201610 yr I think it's in the X99 chip set. Good to know that the i218V has problems. I'll switch to my RealTek port then.
April 29, 201610 yr Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset? 03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10) Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1] Kernel driver in use: tg3 Kernel modules: tg3
April 29, 201610 yr Author I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using? So I just checked and saw that in fact I was using the top RJ45 jack - which is the Atheros one -- so I don't think this is my issue. Also, It seemed pretty unlikely at any rate since The system was still responding just fine over ssh. Its just that some of the disk devices were being kept busy by something and the web interface stopped responding (which also made SMB stop responding). Still havent resolved this issue and I'm wondering if this is a V6 issue as I dont seem to be the only one. I'm not sure exactly what I would lose by downgrading to v5 but I'm considering it as I'd like to resolve these issues.
April 29, 201610 yr Author Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset? 03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10) Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1] Kernel driver in use: tg3 Kernel modules: tg3 No - your motherboard uses the Z77 chipset. What CPU are you running?
April 30, 201610 yr ASRock Z77 Extreme4 AMD Sempron 145 2GB DD3-1145 Single Module Listed wrong MB: It is a ASRock 990FX Extreme 3.
April 30, 201610 yr Author Yeah your hardware is far below the spec of my unraid server (which may or may not be the problem in your case. not sure.).
May 3, 201610 yr I have the same problem on a beta 21 PC with a AMD config: Gigabyte Technology Co., Ltd. - 990FXA-UD3 CPU: AMD FX-8350 Eight-Core @ 4000 HVM: Enabled IOMMU: Enabled Cache: 384 kB, 8192 kB, 8192 kB Memory: 24576 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g Exact same symptons. I end up having to hard reset the box since it will not soft reset from the command line. My NIC is a Realtek RTL8111E chip (10/100/1000 Mbit)
May 6, 201610 yr I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it. My M/B uses Intel 82579LM and 82574L LAN. Config 6.2.0-beta21 M/B: Supermicro - X9SCL/X9SCM CPU: Intel® Xeon® CPU E31230 @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: 256 kB Memory: 16384 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 eth1: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g
May 6, 201610 yr Community Expert i had this problem as well... i was in another thread that suggested adding a cron job which restarts all dockers once a day. i've been running about 3 months with no more unusable periods.
May 6, 201610 yr Community Expert I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it. My M/B uses Intel 82579LM and 82574L LAN. Config 6.2.0-beta21 M/B: Supermicro - X9SCL/X9SCM CPU: Intel® Xeon® CPU E31230 @ 3.20GHz HVM: Enabled IOMMU: Enabled Cache: 256 kB Memory: 16384 MB (max. installable capacity 32 GB) Network: bond0: fault-tolerance (active-backup), mtu 1500 eth0: 1000Mb/s, Full Duplex, mtu 1500 eth1: 1000Mb/s, Full Duplex, mtu 1500 Kernel: Linux 4.4.6-unRAID x86_64 OpenSSL: 1.0.2g What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system.
May 6, 201610 yr Author What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system. Hmmm Thats pretty interesting. I have a mix of drives - and 3 of them are WD 6TB Red drives. I've noticed that sometimes its not the Red drives which are held 'busy' by some process, but I suppose since my Parity drive IS a Red drive this *might* be my issue... UPDATE: I just checked an all my drives Spin Down Delay is set to "Use Default" which is set to Never So that does not appear to be it.
May 6, 201610 yr What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system. I have an older 60GB OCZ SSD for a cache drive, 3x 1TB WD Blacks, and a couple 500GB Seagates for the array. I know 3TB usable its not much, but I am still testing unRaid while i wait for some sales on some newer drives. I have 5x 3TB Seagate drives but 3 of them died so I am looking to replace them with something better. I was surprised to see the default spindown delay was set to never. I changed it to 2 hours and then the server froze, but I was also going heavy on the downloading trying to restore my collection. I don't even know why the drives were spun down if there were still processes accessing them.
May 12, 201610 yr Author So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days). However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens). Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them. Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart. Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato... Anyone? I'm pretty desperate to resolve this.
May 13, 201610 yr So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days). However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens). Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them. Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart. Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato... Anyone? I'm pretty desperate to resolve this. Hmm... that sounds a bit similar to what is happening with me on 6.2.0 beta 21. I managed to pin-point that it's due to something hanging in the array but couldn't figured out what. dAigo seems to have a similar issue too. You might want to try 6.1.9 to see if it works.
Archived
This topic is now archived and is closed to further replies.