ixnu Posted December 5, 2016 Share Posted December 5, 2016 Unraid 6.3-rc5 No dockers or unsupported plugins - just a vanilla file server. Fix common problems completed. Long story, but I have not been able to properly shutdown my box after it's been up for more than 24 hours for about two weeks. I have attempted to use the powerdown and native shutdown scripts to no avail on 6.2.4. Neither script completes Usually, the box stays up fine after a hard reboot, but eventually samba and http no longer respond and shfs (zombie) pegs the cpu with dozens of smbd prcesses that can't be stopped. I can continue to login via ssh. I upgraded to the newest rc in hopes that the improved shutdown would help, but shutdown never completes. During this time two week period, I had a red ball and rebuilt successfully. I had read on the forums that Reiser could cause this issue if disks were nearing capacity, so i reduced them all below 90%. This did not work, so I started the process of migrating to XFS. I have also read that dockers could cause this, but i have never enabled it. If this should be combined with another topic, please forgive my search skillz. Unfortunately, I have filled my sata slots and I needed to consolidate in order to move to XFS. I shrunk the array and rebuilt parity successfully last night. But the issue has returned after about 24 hours. I can't create the hardware diag from the tools, so I have just included the syslog. I'm currently on travel and can't hard restart the box. I am in the process of building a new server, but I really would like this one to stay up as a backup and eventually replace my existing backup. Anybody have any ideas? log_05170043.log.zip Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 More info: I finally got home to take a look. The shfs process was zombie and there were dozens of smbd processes opened by user account. I could not kill or HUP these and the box would never shutdown / restart with either shutdown or the powerdown script. Any ideas? Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 Currently running reiserfsck on all the disks. Not a fan of reiser... Quote Link to comment
trurl Posted December 8, 2016 Share Posted December 8, 2016 If you could get your diagnostics we could see the SMART for all your drives. Do the SMART attributes for all of your drives look OK? Have you done a memtest recently? Tell us about your hardware. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 Smarts look OK to me. To answer your question, this is a box that has been running for about 5 years and has gone from 4.7 to 6.0 to 6.2.x to 6.3-rc5. This is the first non stable release that has been on it. Here is a full diag that I got after a hard reboot when I got back home this morning. I certainly appreciate you taking a look at it. The only option that I could think would be fsck fun. I have not run a memtest recently and it's not ECC tower-diagnostics-20161208-0822.zip Quote Link to comment
trurl Posted December 8, 2016 Share Posted December 8, 2016 SMARTs look OK. Can't hurt to try a memtest. Tell us about your hardware. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 True dat. Will run memtest when completed with fsck. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 Reiserfsck on all disk = 0 corruptions xfs_repair = 0 corruptions Surprising to me since I've had to hard reboot about a dozen times in last two weeks. Quote Link to comment
JonathanM Posted December 8, 2016 Share Posted December 8, 2016 Reiserfsck on all disk = 0 corruptions Surprising to me since I've had to hard reboot about a dozen times in last two weeks. ReiserFS is very forgiving about hard shutdowns. Problem is, it seems to me that any version of unraid after 5 doesn't like ReiserFS, and seems to take a notion to lock down under intense use. Since ReiserFS is dying, nobody really wants to troubleshoot the issue. Just be glad you don't have a BTRFS cache pool as well as ReiserFS array disks. That combo caused me no end of grief, the ReiserFS issues causing runaway CPU usage, necessitating a hard reboot, causing BTRFS breakage. I initially thought it was BTRFS causing the issue, but found out it was the Reiser disks. Converted all ReiserFS to XFS, no more issues. IMHO, of the choices given in unraid, Reiser = extremely robust, very repairable, has issues with large disks running close to full, issues with newer kernels. BTRFS = extremely fragile, recovery dicey, works great as long as you unmount it cleanly every time. XFS = fairly robust, metadata log seems fragile, data recovery ok. Fewest overall issues. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 XFS = fairly robust, metadata log seems fragile, data recovery ok. Fewest overall issues. Anybody got the "reiser_to_xfs" slackware package? Can't seem to find it Quote Link to comment
JonathanM Posted December 8, 2016 Share Posted December 8, 2016 Anybody got the "reiser_to_xfs" slackware package? Can't seem to find it I believe you are being funny, but for others that may find this and assume such a thing exists... https://lime-technology.com/forum/index.php?topic=37490.0 Long thread discussing options for moving data from ReiserFS disks to XFS. tl;dr - backup and move the data to the newly formatted disks. no in place conversion possible. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 I believe you are being funny, but for others that may find this and assume such a thing exists... https://lime-technology.com/forum/index.php?topic=37490.0 Ha! That's how nasty rumors get started. Any theories on how Reiser causes this (zombie shfs and runaway smbd processes)? Quote Link to comment
JonathanM Posted December 8, 2016 Share Posted December 8, 2016 I believe you are being funny, but for others that may find this and assume such a thing exists... https://lime-technology.com/forum/index.php?topic=37490.0 Ha! That's how nasty rumors get started. Any theories on how Reiser causes this (zombie shfs and runaway smbd processes)? Only uneducated guesses. I suspect the file system takes too long to complete an operation and causes a timeout that isn't caught. My reasoning is that smaller disks and fresh formats of ReiserFS don't seem to cause the issue. Quote Link to comment
ixnu Posted December 8, 2016 Author Share Posted December 8, 2016 I have removed the cache drive to test this speed theory. My cache is an SSD. I should also mention that fuser -mv /dev/md* would not complete when I was having issues. Quote Link to comment
ixnu Posted December 9, 2016 Author Share Posted December 9, 2016 Memtest 4 passes and no errors. Updated to rc6. No errors on: Memtest; file systems; and SMART. Nothing in the syslog. What next? Willing to try anything. Voodoo dolls? Blood sacrifice? Quote Link to comment
trurl Posted December 9, 2016 Share Posted December 9, 2016 You never did Tell us about your hardware. Quote Link to comment
ixnu Posted December 9, 2016 Author Share Posted December 9, 2016 You never did My apology. Thought diag might be enough. Again, thanks for taking a look! If this is not enough info, just let me know what you need. This box is about 5 years old and everything was bought new. I have had 2 red balls with Seagate drives that were rebuilt properly. All of the controller slots are occupied and one on the MB is empty. I shrunk the array so that I could use this slot to move everything to XFS. MB: AsRock 880GM-LE - AMD SB710 CPU: AMD Sempron 140 RAM: Gskill 2 x 4GB DDR3/1066 Conroller: Supermicro AOC-SASLP-MV8 PSU: CORSAIR CX series CX750 750W UPS: CyberPower CP1350AVRLCD (1350 VA) The below sit in 3 x Supermicro 5-in-3 drive containers. HDDs (Most are ~80% full and highest is 88% full) Parity HGST_HMS5C4040ALE640_PL2331LAHE3V0J -(sdc) 4 TB Disk 1 Hitachi_HDS5C3020ALA632_ML0221F30420HD - 2 TB reiserfs 2 TB Disk 2 SAMSUNG_HD204UI_S2H7J1SZ911253 - reiserfs 2 TB Disk 3 SAMSUNG_HD204UI_S2H7J1CZ918725 - reiserfs 2 TB Disk 4 WDC_WD40EZRZ-00WN9B0_WD-WCC4E0ST3K77 - reiserfs 4 TB -Empty Slot - Disk 6 WDC_WD20EZRX-00DC0B0_WD-WCC300793356 reiserfs 2 TB Disk 7 WDC_WD20EZRX-22D8PB0_WD-WCC4M5ET1KUD - xfs 2 TB Disk 8 WDC_WD20EVDS-63T3B0_WD-WCAVY5276605 - reiserfs 2 TB Disk 9 WDC_WD20EVDS-63T3B0_WD-WCAVY5328007 - reiserfs 2 TB Disk 10 ST2000DM001-1CH164_Z2F0TGTB - reiserfs 2 TB Disk 11 WDC_WD20EARX-008FB0_WD-WCAZAD975980 reiserfs 2 TB Disk 12 WDC_WD2002FAEX-007BA0_WD-WMAY05158670 reiserfs 2 TB Cache SAMSUNG_SSD_830_Series_S0WENYAC301281 xfs 128 GB Boot Flash JD_FireFly - 2 GB (sda) Quote Link to comment
ixnu Posted December 10, 2016 Author Share Posted December 10, 2016 And we are back to shfs at 100% cpu. However this time it's not zombie. Also no smbd processes. Can I try something during this state to help troubleshooting? Syslog is clean. Quote Link to comment
ixnu Posted December 10, 2016 Author Share Posted December 10, 2016 Still no joy. Web and smb not responsive. CPU is pegged with shfs (Htop screen included.) /mnt/user0 & /mnt/user will not complete a simple "ls" Neither "shudown -r now" nor "powerdown -r now" will complete. Quote Link to comment
ixnu Posted December 11, 2016 Author Share Posted December 11, 2016 Still at a loss on how to troubleshoot. Would greatly appreciate any UnraidTM Inc. input. Since I have nothing to go on, it seems to happen at idle. I have created a simple cron to write a file every 15 mins to the array and set the mover to every hour. I'm hopeful this will give me something. Quote Link to comment
Squid Posted December 11, 2016 Share Posted December 11, 2016 The general solution from the forum users is to ditch reiserfs and switch all drives to xfs. But, if you want LT to have a look at this, then best to email [email protected] as they don't generally randomly surf the forum. Quote Link to comment
ixnu Posted December 11, 2016 Author Share Posted December 11, 2016 Thanks Squid! Has a root cause been identified or just cargo cult troubleshooting? Quote Link to comment
Squid Posted December 11, 2016 Share Posted December 11, 2016 http://www.karenkuehn.com/media/original/CARGO%20CULT%20Cover-copy2.jpg[/img] But it does seem to be the solution. Reiser is ultimately in the grand scheme of things an unsupported FS on linux. No development happens anymore on it, the creator is in jail, and the last time someone attempted a fix on it it caused file corruption. Quote Link to comment
ixnu Posted December 11, 2016 Author Share Posted December 11, 2016 Yeah, maybe this is a more appropriate analogy running ReiserFS on UnRaid 6.x Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.