Jump to content
Sign in to follow this  
ixnu

Samba and httpd unresponsive with shfs

25 posts in this topic Last Reply

Recommended Posts

Unraid 6.3-rc5

 

No dockers or unsupported plugins - just a vanilla file server. Fix common problems completed.

 

Long story, but I have not been able to properly shutdown my box after it's been up for more than 24 hours for about two weeks. I have attempted to use the powerdown and native shutdown scripts to no avail on 6.2.4. Neither script completes

 

Usually, the box stays up fine after a hard reboot, but eventually samba and http no longer respond and shfs (zombie) pegs the cpu with dozens of smbd prcesses that can't be stopped.

 

I can continue to login via ssh.

 

I upgraded to the newest rc in hopes that the improved shutdown would help, but shutdown never completes.

 

During this time two week period, I had a red ball and rebuilt successfully.

 

I had read on the forums that Reiser could cause this issue if disks were nearing capacity, so i reduced them all below 90%. This did not work, so I started the process of migrating to XFS. I  have also read that dockers could cause this, but i have never enabled it. If this should be combined with another topic, please forgive my search skillz.

 

Unfortunately, I have filled my sata slots and I needed to consolidate in order to move to XFS. I shrunk the array and rebuilt parity successfully last night. But the issue has returned after about 24 hours.

 

I can't create the hardware diag from the tools, so I have just included the syslog. I'm currently on travel and can't hard restart the box.

 

I am in the process of building a new server, but I really would like this one to stay up as a backup and eventually replace my existing backup.

 

Anybody have any ideas?

 

log_05170043.log.zip

Share this post


Link to post

More info:

 

I finally got home to take a look.

 

The shfs process was zombie and there were dozens of smbd processes opened by user account.

 

I could not kill or HUP these and the box would never shutdown / restart with either shutdown or the powerdown script.

 

Any ideas?

Share this post


Link to post

If you could get your diagnostics we could see the SMART for all your drives. Do the SMART attributes for all of your drives look OK?

 

Have you done a memtest recently?

 

Tell us about your hardware.

Share this post


Link to post

Smarts look OK to me.

 

To answer your question, this is a box that has been running for about 5 years and has gone from 4.7 to 6.0 to 6.2.x to 6.3-rc5. This is the first non stable release that has been on it.

 

Here is a full diag that I got after a hard reboot when I got back home this morning.

 

I certainly appreciate you taking a look at it.

 

The  only option that I could think would be fsck fun.

 

I have not run a memtest recently and it's not ECC :(

tower-diagnostics-20161208-0822.zip

Share this post


Link to post

SMARTs look OK.

 

Can't hurt to try a memtest.

 

Tell us about your hardware.

Share this post


Link to post

Reiserfsck on all disk = 0 corruptions

 

xfs_repair = 0 corruptions

 

 

Surprising to me since I've had to hard reboot about a dozen times in last two weeks.

 

 

Share this post


Link to post

Reiserfsck on all disk = 0 corruptions

 

Surprising to me since I've had to hard reboot about a dozen times in last two weeks.

ReiserFS is very forgiving about hard shutdowns. Problem is, it seems to me that any version of unraid after 5 doesn't like ReiserFS, and seems to take a notion to lock down under intense use. Since ReiserFS is dying, nobody really wants to troubleshoot the issue.

 

Just be glad you don't have a BTRFS cache pool as well as ReiserFS array disks. That combo caused me no end of grief, the ReiserFS issues causing runaway CPU usage, necessitating a hard reboot, causing BTRFS breakage. I initially thought it was BTRFS causing the issue, but found out it was the Reiser disks. Converted all ReiserFS to XFS, no more issues.

 

IMHO, of the choices given in unraid,

Reiser = extremely robust, very repairable, has issues with large disks running close to full, issues with newer kernels.

BTRFS = extremely fragile, recovery dicey, works great as long as you unmount it cleanly every time.

XFS = fairly robust, metadata log seems fragile, data recovery ok. Fewest overall issues.

Share this post


Link to post

 

XFS = fairly robust, metadata log seems fragile, data recovery ok. Fewest overall issues.

 

Anybody got the "reiser_to_xfs" slackware package?

 

Can't seem to find it ;)

Share this post


Link to post

Anybody got the "reiser_to_xfs" slackware package?

 

Can't seem to find it ;)

:) I believe you are being funny, but for others that may find this and assume such a thing exists...

https://lime-technology.com/forum/index.php?topic=37490.0

Long thread discussing options for moving data from ReiserFS disks to XFS.

 

tl;dr - backup and move the data to the newly formatted disks. no in place conversion possible.

Share this post


Link to post

 

I believe you are being funny, but for others that may find this and assume such a thing exists...

https://lime-technology.com/forum/index.php?topic=37490.0

 

 

Ha! That's how nasty rumors get started.

 

Any theories on how Reiser causes this (zombie shfs and runaway smbd processes)?

Only uneducated guesses. I suspect the file system takes too long to complete an operation and causes a timeout that isn't caught. My reasoning is that smaller disks and fresh formats of ReiserFS don't seem to cause the issue.

Share this post


Link to post

I have removed the cache drive to test this speed theory. My cache is an SSD.

 

I should also mention that fuser -mv /dev/md* would not complete when I was having issues.

Share this post


Link to post

Memtest 4 passes and no errors. Updated to rc6.

 

No errors on: Memtest; file systems; and SMART.

 

Nothing in the syslog.

 

What next? Willing to try anything.

 

Voodoo dolls?

 

Blood sacrifice?

 

 

Share this post


Link to post

You never did

 

My apology. Thought diag might be enough. Again, thanks for taking a look!

 

If this is not enough info, just let me know what you need.

 

This box is about 5 years old and everything was bought new. I have had 2 red balls with Seagate drives that were rebuilt properly.

 

All of the controller slots are occupied and one on the MB is empty. I shrunk the array so that I could use this slot to move everything to XFS.

 

MB:

AsRock 880GM-LE - AMD SB710

 

CPU:

AMD Sempron 140

 

RAM:

Gskill 2 x 4GB DDR3/1066

 

Conroller:

Supermicro AOC-SASLP-MV8

 

PSU:

CORSAIR CX series CX750 750W

 

UPS: CyberPower CP1350AVRLCD (1350 VA)

 

The below sit in 3 x Supermicro 5-in-3 drive containers.

 

HDDs (Most are ~80% full and highest is 88% full)

 

Parity HGST_HMS5C4040ALE640_PL2331LAHE3V0J -(sdc)  4 TB

Disk 1 Hitachi_HDS5C3020ALA632_ML0221F30420HD - 2 TB reiserfs 2 TB

Disk 2 SAMSUNG_HD204UI_S2H7J1SZ911253 - reiserfs    2 TB

Disk 3 SAMSUNG_HD204UI_S2H7J1CZ918725 -  reiserfs 2 TB

Disk 4 WDC_WD40EZRZ-00WN9B0_WD-WCC4E0ST3K77 - reiserfs 4 TB

-Empty Slot -

Disk 6 WDC_WD20EZRX-00DC0B0_WD-WCC300793356    reiserfs 2 TB

Disk 7 WDC_WD20EZRX-22D8PB0_WD-WCC4M5ET1KUD - xfs 2 TB

Disk 8 WDC_WD20EVDS-63T3B0_WD-WCAVY5276605 -  reiserfs 2 TB

Disk 9 WDC_WD20EVDS-63T3B0_WD-WCAVY5328007 -  reiserfs 2 TB

Disk 10 ST2000DM001-1CH164_Z2F0TGTB -    reiserfs 2 TB

Disk 11 WDC_WD20EARX-008FB0_WD-WCAZAD975980    reiserfs 2 TB

Disk 12 WDC_WD2002FAEX-007BA0_WD-WMAY05158670    reiserfs 2 TB

Cache SAMSUNG_SSD_830_Series_S0WENYAC301281 xfs 128 GB

 

Boot Flash JD_FireFly - 2 GB (sda)

 

 

Share this post


Link to post

And we are back to shfs at 100% cpu.

 

However this time it's not zombie.

 

Also no smbd processes.

 

Can I try something during this state to help troubleshooting?

 

Syslog is clean.

Share this post


Link to post

Still no joy. Web and smb not responsive.

 

CPU is pegged with shfs (Htop screen included.)

 

/mnt/user0  &  /mnt/user  will not complete a simple "ls"

 

Neither "shudown -r now" nor "powerdown -r now"  will complete.

 

 

htop1.jpg.45596ebb62558d213ef75809acac5c8f.jpg

Share this post


Link to post

Still at a loss on how to troubleshoot.

 

Would greatly appreciate any UnraidTM Inc. input.

 

Since I have nothing to go on, it seems to happen at idle.

 

I have created a simple cron to write a file every 15 mins to the array and set the mover to every hour. I'm hopeful this will give me something.

 

 

Share this post


Link to post

The general solution from the forum users is to ditch reiserfs and switch all drives to xfs. 

 

But, if you want LT to have a look at this, then best to email support@lime-technology.com as they don't generally randomly surf the forum.

Share this post


Link to post

Thanks Squid!

 

Has a root cause been identified or just cargo cult troubleshooting?

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this