Andyman Posted February 9, 2020 Share Posted February 9, 2020 (edited) Hello, Longtime lurker first, time asking for assistance. My system is constantly crashing hard and I am running out of steps to troubleshoot by myself Gigabyte UD3 / 4970 / 32GB 500W psu LSI 9200-8i 2x 4TB + 2x 2TB drives in xfs on array (off mobo) 8x 250GB ssd in RAID10 btfrs on cache (off hba) 1x 500GB ssd (off mobo) *4TB parity disk (currently removed from array) *Radeon 7870 (currently physically removed) It started with errors off disk1 (xfs corruption) which was a 1yr 4TB drive. Attempts to rebuild resulted in the system locking up and I was becoming unsure of the integrity of the parity disk as well. I removed the parity disk from array and ran an xfs repair on disk1 which came up ok Various errors made me doubt the integrity of sata cables so these were changed. I also moved all the cache ssd to the hba instead of mobo (was a 6/2 mix prior) Other changes/tests to try and isolate -Removed graphics card -Moved HBA to different slot (was on pcie2.0x4 slot, moved to pcie3.0x8) -Ran memtest - full test 1pass - all ok -Diskcheck in maintainance mode - all ok -VMs and Docker turned off -Leaving idle in maintainance mode -Ran extended smart test on a few drives - all ok While idling in maintanence mode it might run for up to 4-8 hours before locking up. I've been observing this all week from work. I am currently trying to preclear the parity disk to use again, however running the preclear seems to bring an error about much earlier. I managed to pull a diagnostic very close to most recent crash Any help greatly appreciated Cheers, Andy ap-ur01-diagnostics-20200208-1413.zip syslog.txt Edited February 9, 2020 by Andyman added version Quote Link to comment
Andyman Posted February 9, 2020 Author Share Posted February 9, 2020 Attached is a photo of console monitor attached to server during one crash last week. When it happens ts unresponsive to direct attached keyboard. Requires hard reset Quote Link to comment
JorgeB Posted February 9, 2020 Share Posted February 9, 2020 Try running in safe mode for a few hours with dockers/VMs stopped, if it crashes like that it's likely a hardware problem, if it doesn't start enabling the plugins, dockers and VMs a few at a time. Quote Link to comment
Andyman Posted February 9, 2020 Author Share Posted February 9, 2020 44 minutes ago, johnnie.black said: Try running in safe mode for a few hours with dockers/VMs stopped, if it crashes like that it's likely a hardware problem, if it doesn't start enabling the plugins, dockers and VMs a few at a time. Cool thankyou will try. I have had docker and vm service turned off for this whole week while watching this. Is there any particular plugins that are common culprits? Would like to try turning on the ones most likely to error out first ca.backup2.plg - 2019.10.27 ca.cfg.editor.plg - 2019.07.07 ca.mover.tuning.plg - 2019.08.23 ca.turbo.plg - 2020.01.26 ca.update.applications.plg - 2019.10.13 community.applications.plg - 2020.02.01 dynamix.active.streams.plg - 2019.01.03 dynamix.cache.dirs.plg - 2018.11.18 dynamix.file.integrity.plg - 2018.04.20 dynamix.local.master.plg - 2016.09.13a dynamix.s3.sleep.plg - 2018.02.04 dynamix.ssd.trim.plg - 2017.04.23a dynamix.system.stats.plg - 2019.01.31c dynamix.wireguard.plg - 2020.01.27 file.activity.plg - 2019.02.10a fix.common.problems.plg - 2019.12.29 NerdPack.plg - 2019.12.31 preclear.disk.plg - 2020.01.17b tips.and.tweaks.plg - 2019.10.17 unassigned.devices.plg - 2020.02.02 unbalance.plg - v2019.10.26 unRAIDServer.plg - 6.8.1 user.scripts.plg - 2019.08.17 wakeonlan.plg - 2019.12.30 Quote Link to comment
JorgeB Posted February 9, 2020 Share Posted February 9, 2020 Running is safe mode will disable them all. Quote Link to comment
Andyman Posted February 9, 2020 Author Share Posted February 9, 2020 6 minutes ago, johnnie.black said: Running is safe mode will disable them all. Yes I am aware of that. I meant for when turning them back on Quote Link to comment
itimpi Posted February 9, 2020 Share Posted February 9, 2020 13 minutes ago, Andyman said: Yes I am aware of that. I meant for when turning them back on I would think that the one most likely to cause problem is NerdPack Quote Link to comment
JorgeB Posted February 9, 2020 Share Posted February 9, 2020 Don't worry about that for now, first see if running is safe mode helps, if it doesn't it's not a plugin. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.