Help! Disks kept busy and SMB stops working

April 22, 201610 yr

I've recently put together this unRAID system which is intended to be a media and application server along with providing a virtual HTPC to my media room.

It all runs swimmingly for a little while but after 1-3 days I will typically find my SMB shares, the unRAID web (or one of the docker webs - or a combination) in a non-responsive state.

There are some files being held onto (usually by SMBD or SHFS) and while I can unmount some of the drives, there will typically be some which

are held busy by one of those rogue processes and I cannot kill the offending process. This prevents me from shutting down nicely -even using the powerdown plugin, but so far I haven't lost any data (or even been forced to do a parity check on startup). Powerdown -r says its halting the system, but after that nothing happens. I can still login to the system using SSH. My only recourse is to hold down the power button to restart.

The only thing I can see in the log is a couple of mce [Hardware Error]s at boot time. But the problem occurs much later usually and doesn't crash the system at all. I've done a memtest on the system for over 24 hours and Im running everything at normal clock rates (not overclocked).

I have 1 Windows 10 VM (with a video card and a USB bus passed through)

I have 3 dockers running: Mariadb, SABNZBd and SickBeard

I have 2 plugins loaded: Community Applications, Powerdown

The system is pretty beefy too:

cpu: i7 5820K,

ram:32 GB

mobo: ASRock X99 Extreme 6

video: GTX 970

cache: 2x 240GB SSDs

parity 1 6TB

data: 2x 6TB, 2x 4TB, 1x2TB

unraid-diagnostics-20160422-0749.zip

Quote

April 24, 201610 yr

Author

I'm using 6.2 beta 20.... Do you think downgrading to 6.1 release would fix this? or even upgrading to 6.2 beta 21?

Quote

April 24, 201610 yr

You should be on the newest beta. Since the last beta is not supported anymore.

Then you are using beta, always use the latest one!

Quote

April 24, 201610 yr

I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using?

Quote

April 29, 201610 yr

Adam64 -- how did you discover that it was an Intel i 218V ethernet issue and was there a fix (other than adding a new network card)? I to have a v6 system that becomes unusable every few days (opened a new post on it this moring), and I have a ASRcok Motherboard in it (like the original poster here), but not sure what onboard ethernet it has.

Thanks.

Quote

April 29, 201610 yr

I think it's in the X99 chip set.

Good to know that the i218V has problems. I'll switch to my RealTek port then.

Quote

April 29, 201610 yr

Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset?

03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10)

Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1]

Kernel driver in use: tg3

Kernel modules: tg3

Quote

April 29, 201610 yr

Author

I had similar issues when I was using my onboard Intel i218V ethernet. Which one are you using?

So I just checked and saw that in fact I was using the top RJ45 jack - which is the Atheros one -- so I don't think this is my issue.

Also, It seemed pretty unlikely at any rate since The system was still responding just fine over ssh. Its just that some of the disk devices were being kept busy by something and the web interface stopped responding (which also made SMB stop responding).

Still havent resolved this issue and I'm wondering if this is a V6 issue as I dont seem to be the only one. I'm not sure exactly what I would lose by downgrading to v5 but I'm considering it as I'd like to resolve these issues.

Quote

April 29, 201610 yr

Author

Just checked and I have the Z77 Extreme 4 MB. Does that use the same chipset?

03:00.0 Ethernet controller [0200]: Broadcom Corporation NetLink BCM57781 Gigabit Ethernet PCIe [14e4:16b1] (rev 10)

Subsystem: ASRock Incorporation Z77 Extreme4 motherboard [1849:96b1]

Kernel driver in use: tg3

Kernel modules: tg3

No - your motherboard uses the Z77 chipset. What CPU are you running?

Quote

April 30, 201610 yr

ASRock Z77 Extreme4

AMD Sempron 145

2GB DD3-1145 Single Module

Listed wrong MB: It is a ASRock 990FX Extreme 3.

Quote

April 30, 201610 yr

Author

Yeah your hardware is far below the spec of my unraid server (which may or may not be the problem in your case. not sure.).

Quote

May 3, 201610 yr

I have the same problem on a beta 21 PC with a AMD config:

Gigabyte Technology Co., Ltd. - 990FXA-UD3

CPU: AMD FX-8350 Eight-Core @ 4000

HVM: Enabled

IOMMU: Enabled

Cache: 384 kB, 8192 kB, 8192 kB

Memory: 24576 MB (max. installable capacity 32 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500

eth0: 1000Mb/s, Full Duplex, mtu 1500

Kernel: Linux 4.4.6-unRAID x86_64

OpenSSL: 1.0.2g

Exact same symptons. I end up having to hard reset the box since it will not soft reset from the command line. My NIC is a Realtek RTL8111E chip (10/100/1000 Mbit)

Quote

May 3, 201610 yr

Author

You should upload your diagnostics next time it happens.

Quote

May 6, 201610 yr

I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it.

My M/B uses Intel 82579LM and 82574L LAN.

Config

6.2.0-beta21

M/B: Supermicro - X9SCL/X9SCM

CPU: Intel® Xeon® CPU E31230 @ 3.20GHz

HVM: Enabled

IOMMU: Enabled

Cache: 256 kB

Memory: 16384 MB (max. installable capacity 32 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500

eth0: 1000Mb/s, Full Duplex, mtu 1500

eth1: 1000Mb/s, Full Duplex, mtu 1500

Kernel: Linux 4.4.6-unRAID x86_64

OpenSSL: 1.0.2g

Quote

May 6, 201610 yr

Community Expert

i had this problem as well... i was in another thread that suggested adding a cron job which restarts all dockers once a day. i've been running about 3 months with no more unusable periods.

Quote

May 6, 201610 yr

Community Expert

I had a similar issue as well. The web GUI locked up, i could still ssh in so i did get my diagnostics. There were several processes locked up that i couldn't kill. I ended up trying to reboot, but that didn't work so i did a hard reset. Parity and array drives were fine after reboot. It may have been coincidence, but the system froze shortly after spinning down hard drives. I can post my diagnostics when i get home if you want, or start a new thread. But it sounds related as I was also d/ling a decent amount via dockers: i was running couchpotato, sonarr, deluge, and plex. And on top of that i was running file integrity plugin. I have since turned off cache drive for my downloads folder, disabled file integrity auto hashing, and set spin delay to never. I will monitor to see if that had anything to do with it.

My M/B uses Intel 82579LM and 82574L LAN.

Config

6.2.0-beta21

M/B: Supermicro - X9SCL/X9SCM

CPU: Intel® Xeon® CPU E31230 @ 3.20GHz

HVM: Enabled

IOMMU: Enabled

Cache: 256 kB

Memory: 16384 MB (max. installable capacity 32 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500

eth0: 1000Mb/s, Full Duplex, mtu 1500

eth1: 1000Mb/s, Full Duplex, mtu 1500

Kernel: Linux 4.4.6-unRAID x86_64

OpenSSL: 1.0.2g

What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system.

Quote

May 6, 201610 yr

Author

What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system.

Hmmm Thats pretty interesting. I have a mix of drives - and 3 of them are WD 6TB Red drives. I've noticed that sometimes its not the Red drives which are held 'busy' by some process, but I suppose since my Parity drive IS a Red drive this *might* be my issue...

UPDATE: I just checked an all my drives Spin Down Delay is set to "Use Default" which is set to Never So that does not appear to be it.

Quote

May 6, 201610 yr

What type of drives do you have? I had stability issues until I set my WD 6TB Red drives to never spin down. This setting does not seem to be necessary for the other drives in my system.

I have an older 60GB OCZ SSD for a cache drive, 3x 1TB WD Blacks, and a couple 500GB Seagates for the array. I know 3TB usable its not much, but I am still testing unRaid while i wait for some sales on some newer drives. I have 5x 3TB Seagate drives but 3 of them died so I am looking to replace them with something better.

I was surprised to see the default spindown delay was set to never. I changed it to 2 hours and then the server froze, but I was also going heavy on the downloading trying to restore my collection. I don't even know why the drives were spun down if there were still processes accessing them.

Quote

May 12, 201610 yr

Author

So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days).

However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens).

Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them.

Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart.

Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato...

Anyone? I'm pretty desperate to resolve this.

Quote

May 13, 201610 yr

So I'm still having this issue. If Im just reading files off the file system (or light writes) everything seems fine for days (had a previous up time of 7 days).

However it seems that large writes cause the problem (If I queue up some big downloads in SABNZB this happens).

Again, one or more disks are held busy by a process (usually SMBD or SHFS) and I can't kill them.

Yesterday I restarted my server after such an occurrence and (From a windows machine) moved a large directory (about 15GB) from one user share to another. Halfway through the process it hung. SMB stopped responding and when I attempted to SSH into the machine I was able to, but I was unable to stop the move. I ended up having to restart.

Could this be the problem? Moving large files around? SABNZB does this when it completes a download... so does Couchpotato...

Anyone? I'm pretty desperate to resolve this.

Hmm... that sounds a bit similar to what is happening with me on 6.2.0 beta 21. I managed to pin-point that it's due to something hanging in the array but couldn't figured out what. dAigo seems to have a similar issue too.

You might want to try 6.1.9 to see if it works.

Quote

Help! Disks kept busy and SMB stops working

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)