Intermittent unresponsive/locked smb shares since upgrade to unRaid 6

callmeedin · April 29, 2016

I have been running 5.0.8 without any problems ever since it has been released. It has been rock solid. When v6 first came out, I upgraded and other than having to add the parameter pci=realloc=off to my syslinux.cfg file (in order for the system to see the HSA cards), no other changes where needed.

Within a few days of running v6, the system would lock up and the 2 samba shares that I have would be unresponsive and the only way for me to bring the system back online would be to hard reset the computer. I gave up on v6 rather quickly and reverted back to 5.0.8 -- rock solid again.

Figured with v6.1.9 I give it another go, but same thing is happening -- within a few days the samba shares become unresponsive.

I searched the forums, but no clear problem/solution has been identified -- I did see some reference to check & repair the reiserfs and I have done that: only one drive had some problems and I repaired them. But even after that, the lockups would happen again. It can take 1 day or even 4 days before it locks up.

It does seem to happen during a transfer of files to the samba share (automated Sonarr transfers, for example).

Since I know that v5.0.8 is stable, I revert back to it rather quickly, so I don't have to mess with the lockups.

I finally decided to capture diagnostics and I am attaching them to this post.

Any ideas/help would be appreciated.

Thanks.

sarajevo-diagnostics-20160428-2211.zip

tyrindor · April 29, 2016

Some old hardware doesn't like the new linux distros/drivers. You may need to upgrade your CPU/RAM/Mobo to more modern hardware. How much RAM? You'll want ~4GB, can get by with less if your running bare bones.

Have you tried backing up the entire flash drive, taking screenshots of your disk layout/share settings, and doing a fresh barebones install of v6? You can check the "parity is valid" box after reassigning all your drives to the same slots.

callmeedin · April 30, 2016

Just rebuilt from scratch as suggested. Other than recreating Array Layout, shares & users, no other changes have been made to the default system. Let's see what happens.

As far as hardware, not the most powerful, but wouldn't say it is OLD-OLD. I run a barebones unRAID with no addons.

Hardware specs:

ASRock Z77 Extreme4 ASRock 990FX Extreme 3

AMD Sempron 145

2GB DD3-1145 Single Module

BrianAz · April 30, 2016

Are your disks ReiserFS? How full are they? I had an eerily similar issue when I came over from v5.

On a different note... isn't the AsRock Extreme 4 motherboard LGA1155/Intel?

callmeedin · April 30, 2016

The disks are pretty fully ... all 2TB disks with free space between 125GB-250GB.

So what was your solution to the problem? Switching them all to XFS!?

I am just afraid to do that, since that is a point of no return and I couldn't go back to my stable v5 in case that the filesystem is not my problem to begin with.

Good catch on the MB. In fact it is a ASRock 990FX Extreme 3 motherboard. Sorry about that. :-)

JorgeB · April 30, 2016

Several users reported that similar issues were indeed fixed by changing all disks to xfs, unfortunately there's no way of knowing if it will work without doing the actual conversion.

wbburden · April 30, 2016

I'm having this same problem as well on ReiserFS.

Frank1940 · April 30, 2016

Just rebuilt from scratch as suggested. Other than recreating Array Layout, shares & users, no other changes have been made to the default system. Let's see what happens.

As far as hardware, not the most powerful, but wouldn't say it is OLD-OLD. I run a barebones unRAID with no addons.

Hardware specs:

ASRock Z77 Extreme4

AMD Sempron 145

2GB DD3-1145 Single Module

You could use a bit more RAM. I would suggest that you check in your BIOS and see if you can cutback on the amount that is being allocated to the GPU. unRAID only runs in the text mode so not much RAM is actually used for the console display. (BTW, the Z77 and Sempron 145 are not exactly a match on my quick search....)

fcaico · April 30, 2016

Well I'm having a very similar problem and I assure you hardware age is not the problem (in my case). I have a i7 5820K and 32 GB of ram (12 Gb is assigned to a VM).

And this is from a clean install of V6 - not an upgrade. All my drives were defaulted to xfs (except the cache which is btrfs).

callmeedin · April 30, 2016

I listed the wrong MB. It is a ASRock 990FX Extreme 3. I will order and add more RAM -- that is easy.

Frank1940 · April 30, 2016

I listed the wrong MB. It is a ASRock 990FX Extreme 3. I will order and add more RAM -- that is easy.

Look carefully when you are purchasing RAM for the 'sweet spot' when looking at prices. unRAID ver6 will use all of the RAM that you throw at it. Any excess will be used for caching writes (at 110MB/s) which considerably faster than writing directly to HD's (~ 40MB/s if you are lucky).

callmeedin · April 30, 2016

Was going to get 8GB. Should I get more? Is there a recommended value for unRAID?

Frank1940 · April 30, 2016

Was going to get 8GB. Should I get more? Is there a recommended value for unRAID?

Not really. You are often limited by what the BIOS will support. (look at the MB manual for that.) 8GB will be more than adequate unless you decide to install a very large number of Dockers. With a Sempron, I can't see you running a VM! Most likely you have four memory slots in that MB so if you decide you can probably put another pair in if you get 2 X 4GB modules. (I seem to recall that a lot of MB's do have issues with double-sided memory modules so it is probably wise to avoid them. Apparently, the address line drivers can't supply the power to maintain a proper wave form as the number of chips on each line increases.)

callmeedin · April 30, 2016

Thanks for the info on memory.

The fresh, from scratch, v6 install just got hung up again -- or more exactly, the smb shared got hung up: web interface, telnet connection, etc. are all still working.

Either way -- going back to 5.0.6. There is really no pressing need for me to be on v6, other than desire for the "latest and greatest", but I am not ready to convert 19 discs to XFS just yet.

BrianAz · May 1, 2016

Thanks for the info on memory.

The fresh, from scratch, v6 install just got hung up again -- or more exactly, the smb shared got hung up: web interface, telnet connection, etc. are all still working.

Either way -- going back to 5.0.6. There is really no pressing need for me to be on v6, other than desire for the "latest and greatest", but I am not ready to convert 19 discs to XFS just yet.

I was in the same exact boat I think. When it locked up for me and I checked the load via ssh it was skyrocketing (well into triple digits). Nothing would get it to calm down except a reboot.

Additionally, I ran a test where I wrote only to a 500GB XFS Cache drive for a month (disabled mover, all shares use cache) and I saw NO issues during that time, so I felt very confident the RFS -> XFS conversion of all my drives would work.

You might check those two things to see if it is indeed the same issue as mine and several others. Even if you go back to v5 eventually, it would be nice to confirm for the future I think...

I went back to v5 at least twice myself while troubleshooting. I wanted stability over features since I don't use any dockers or anything. Eventually though, I wanted things like built-in UPS support, the much nicer GUI and to make sure I was running a close enough version that if I needed support, I wouldn't be the only one running it.

Took me 2-3 weeks to do all my disks, but like I said... worked perfectly after. As you say though, it is the point of no return . So everyone needs to make this call on their own.

callmeedin · May 2, 2016

BrianAZ -- that is very valuable info.

Can you give me a little more details about how to perform the same test to just use one drive with XFS and basically write all new stuff to it.

Seems to be that the writing to a ReiserFS disk can cause the problem, but not the reading from a Reiser FS disk, right!?

So with your suggested "proof of concept" setup, I could still use all the files on the other disks, but would only write to the XFS disk!?

This might be the route I take -- I would know within 7-10 days if writing to my ReiserFS disks is indeed my problem. I have never gone more than 5 days with v6 before a lockup happens.

BrianAz · May 2, 2016

BrianAZ -- that is very valuable info.

Can you give me a little more details about how to perform the same test to just use one drive with XFS and basically write all new stuff to it.

Seems to be that the writing to a ReiserFS disk can cause the problem, but not the reading from a Reiser FS disk, right!?

So with your suggested "proof of concept" setup, I could still use all the files on the other disks, but would only write to the XFS disk!?

This might be the route I take -- I would know within 7-10 days if writing to my ReiserFS disks is indeed my problem. I have never gone more than 5 days with v6 before a lockup happens.

Right... in my experience, reads were never an issue, only writes to the RFS disks. With that in mind, I have a 500GB XFS Cache drive.

I made sure that EVERY user share was configured to write to the Cache... which is critical in making sure nothing is writing to the array (RFS) disks:

Finally, to make sure the mover itself never wrote to the RFS disks during the test, I changed the schedule to monthly and picked yesterday's date:

Of course there is a slight risk since stuff you write during the test will not be parity protected as it's sitting on the Cache disk... but that was a small risk imo.

Good luck! Let me know if any questions.

callmeedin · May 2, 2016

Thank you for the details explanation. I am now setup the same way ... will see what happens. Thanks again to everybody for their help.

callmeedin · May 3, 2016

Quick update -- so far so good with just running on Cache drive (formatted XFS): still running without smb lockups.

Questions -- while I don't think the CPU & RAM are the cause, plan on upgrading them. I already ordered 8GB of RAM and was wondering if I would gain anything if I upgrade the CPU (currently AMD Sempron 145)!? Thinking that at least a faster CPU would help with parity checks, or is something else the bottleneck during parity checks!?

If CPU upgrade makes sense, any recommendations as to what would be a worthy upgrade? Clearly this MB will only support CPU's that are used at this point, so looking at either AMD Phenom II X2 (or X4) version.

Frank1940 · May 3, 2016

Quick update -- so far so good with just running on Cache drive (formatted XFS): still running without smb lockups.

Questions -- while I don't think the CPU & RAM are the cause, plan on upgrading them. I already ordered 8GB of RAM and was wondering if I would gain anything if I upgrade the CPU (currently AMD Sempron 145)!? Thinking that at least a faster CPU would help with parity checks, or is something else the bottleneck during parity checks!?

If CPU upgrade makes sense, any recommendations as to what would be a worthy upgrade? Clearly this MB will only support CPU's that are used at this point, so looking at either AMD Phenom II X2 (or X4) version.

Before upgrading your CPU thinking it will make parity checks faster, start a non-correcting parity check and see what the CPU usage is on the Dashboard of the GUI. (Watch it for a couple of minutes!!!) This is the worst case as you will also being powering the GUI (not normally an issue if you are running the scheduled parity overnight) which is not a trivial task for a Sempron! There are a lot of reasons to upgrade a processor but thinking you will improve the parity test times is normally not one of them.

callmeedin · May 4, 2016

Thanks for the suggestion. Tried it and CPU definitely stays at 100% for quite some time. The CPU's that my motherboard supports are not expensive (used though), so will probably just pull the trigger and upgrade it. Can't hurt.

callmeedin · May 5, 2016

Update & observation:

Still no lockups with using the XFS Cache drive as the only drive to "write" to. So I am starting to feel confident that switching all the drives to XFS will fix my lockup issues.

Now for the observation:

In preparation to free up one of my drives to reformat it as XFS, I am using robocopy (on a virtual machine that lives outside of the Unraid server) to copy 1.7TB of data from one of the Unraid drives to a separate windows server. I noticed in the robocopy logs that in the span of 10+hours there were 5-6 network connection problems that robocopy encountered and "resolved" after waiting 30 seconds to retry. Whatever the cause of the network connection problem (my network, Unraid MB NIC, weak Unraid hardware, glitch in the matrix, etc.), I am wondering if ReiserFS is just not as good as dealing with network glitches during write operations which causes a lockup of the smb shares, while XFS can handle them. Sorry if I am stating something obvious here -- just wanted to share my observations.

BrianAz · May 8, 2016

How's it going?

Like you, I think I had a lockup between every 3-7 days depending on what I was writing and how frequent. I wanted to be REALLY sure, so I think I ended up leaving it for ~ 3 weeks. That was probably overkill though.

callmeedin · May 9, 2016

Still going strong. After what I all have hit the unRAID box with in the last 7 days, I am confident that the RFS is the issue. I have:

1. Moved 1.7T from older 2TB RFS drive to XFS Cache drive

2. Removed the above older 2TB RFS drive, to make space for 4TB drive, so a rebuild of Parity was in order

3. Manually moved the 1.7T data from cache drive to newly XFS formatted 4TB drive

4. Started moving data and reformatting drives to XFS ... working on the 3rd drive right now.

Almost all this on my 2GB RAM and Sempron 145. I have yesterday upgraded to 10GB RAM and AMD Phenom II 2x and I can tell the box likes in particular the faster processor ... the dashboard doesn't have the CPU pegged at 100% anymore during any of these tasks. I don't see much usage of the added RAM. Still have all shares setup to use Cache and mover set to not run until the 1st of the months, so the mover doesn't decide to write to the still existing RFS drives.

Again, thanks for all your help.

BrianAz · May 9, 2016

Great to hear!

Once you complete your RFS -> XFS conversion, come back to confirm the issue is gone. I had a lot of trouble finding good info on my issue and had to piece it together from multiple threads. This one seems to summarize the issue & fix well.

Intermittent unresponsive/locked smb shares since upgrade to unRaid 6

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived