Alex Ledesma Posted January 26 Share Posted January 26 Hi everyone, My unraid has been freezing regularly and I'm not sure why. I turn it off. I'll wait five minutes and it comes back online. I'm just afraid that one day. It will not come back online. Once it comes up it starts doing a parity check all over again even though it just finished doing one last week without any errors. I went to the logs and I don't see anything that strikes out at me except for s me errors the same day of the the freezing but I don't know what it means. Attached are the syslogs and the diagnostics. I changed the mac vlan thing that came up. Any help is most appreciated. Than you all!!! homeserver-diagnostics-20240125-1924.zip syslog.txt Quote Link to comment
JorgeB Posted January 26 Share Posted January 26 Jan 25 15:08:49 HomeServer kernel: XFS (md1p1): Internal error xfs_efi_item_recover at line 614 of file fs/xfs/xfs_extfree_item.c. Caller xlog_recover_process_intents+0x9c/0x25e [xfs] May not be the issue, but it's a issue, check filesystem on disk1, run it without -n. Quote Link to comment
Alex Ledesma Posted January 27 Author Share Posted January 27 On 1/26/2024 at 4:21 AM, JorgeB said: Jan 25 15:08:49 HomeServer kernel: XFS (md1p1): Internal error xfs_efi_item_recover at line 614 of file fs/xfs/xfs_extfree_item.c. Caller xlog_recover_process_intents+0x9c/0x25e [xfs] May not be the issue, but it's a issue, check filesystem on disk1, run it without -n. Thank you I am waiting for the parity check to finish and then I will attempt the check file system. Thank you! Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 @Alex Ledesma did you have any luck getting your system to stop hanging? Having the same problem and just looking for solutions. I'm running the above filesystem check so I'll see how that goes. Parity checks take about 17 hours for me so I'd really like to figure this out. I've seen mention of some plugins causing this, as well as sometimes networking issues with docker? I'm only running Appdata Backup, Fix common Problems, and the Community Applications plugins, which are all well supported. I've also stopped all my dockers, and still its happening to me. Quote Link to comment
trurl Posted February 8 Share Posted February 8 1 minute ago, jrsphoto said: running the above filesystem check Why? Do you have any indication it might be needed? Since OP hasn't visited the forum in over a week, I guess we can assume this thread is abandoned, so you can Attach Diagnostics to your NEXT post in this thread. Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 Hi @trurl, thanks for the feedback. No real indication it was needed really, just trying anything that seems reasonable and minimally invasive at this point. I've attached the diagnostics and remote syslog file. The freezes seem to happen roughly within 24 hours of being up, and both the console (attached hdmi monitor and keyboard) and web ui are frozen. syslog_unraid.zip homeserver-diagnostics-20240125-1924.zip Quote Link to comment
trurl Posted February 8 Share Posted February 8 26 minutes ago, jrsphoto said: No real indication it was needed Yes there is. Disk1 is unmountable. Exactly the scenario where you would check filesystem. Check filesystem on disk1. Be sure to use the webUI and not the command line. Capture the output and post it. Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 Disk1 us unmountable? From the shell /mnt/disk1 appears to be mounted: and "ls" in that directory shows my files. I'll put the array in maintenance mode and run the the Check filesystem Quote Link to comment
trurl Posted February 8 Share Posted February 8 16 minutes ago, jrsphoto said: Disk1 us unmountable? From the shell /mnt/disk1 appears to be mounted: Not according to your Diagnostics. Did you take them before you did check filesystem? Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 Output from the Check filesystem: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 1 hour ago, trurl said: Not according to your Diagnostics. Did you take them before you did check filesystem? um, not sure. I can rerun diagnostics and post Quote Link to comment
trurl Posted February 8 Share Posted February 8 Check filesystem again, without -n. If it asks for it, use -L. Capture the output and post it. Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 ok, here are the diags again. Array is up and running. tower-diagnostics-20240208-1613.zip Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 5 minutes ago, trurl said: Check filesystem again, without -n. If it asks for it, use -L. Capture the output and post it. here ya go. It didn't ask for the -n option: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 3 - agno = 2 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Quote Link to comment
trurl Posted February 8 Share Posted February 8 Just now, jrsphoto said: moving disconnected inodes to lost+found Post new diagnostics Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 output between the two looks the same to me? Quote Link to comment
trurl Posted February 8 Share Posted February 8 Post new diagnostics with the array started in normal (not maintenance) mode Quote Link to comment
trurl Posted February 8 Share Posted February 8 1 minute ago, jrsphoto said: output between the two looks the same to me? The first was with -n (nomodify), so nothing was actually done, it was just a check. 12 minutes ago, jrsphoto said: No modify flag set, skipping filesystem flush and exiting. Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 new diags: tower-diagnostics-20240208-1623.zip Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 (edited) 20 hours ago, trurl said: Disk1 is mounted in those. Thanks for this. my gut was telling me it was likely something with AMD, so this helps me figure out what to look for in the bios. my CPU is a 3rd gen Ryzen 9 3900x and I believe I'm running 4 singlel rank non-ecc ddr4 sims (ill double check that). I'll check the c-states as well. Edited February 9 by jrsphoto Quote Link to comment
jrsphoto Posted February 8 Share Posted February 8 (edited) ok, based on that FAQQ link you sent, I've set the following: 1) in BIOS, Power Supply Idle Control set to "typical current idle" My ram is 4x ddr4-2133 corsair single-rank, clocked at 2133, well under the DDR4-2933 so that should be ok. I'll let this run overnight and see how it goes. If this still locks up, as per the FAQ, I'll completely disable c-states and try that for another 24 hours. Edited February 8 by jrsphoto Quote Link to comment
jrsphoto Posted February 11 Share Posted February 11 (edited) Just wanted to reply that the FAQ thread that @trurl posted seems to have solved my AMD Ryzen 9 3900x unraid server locking-up problem. Specifically I believe it was setting "Power Supply Idle Control" to "typical current idle" in BIOS, that solved it. I've had no lock-ups in a little over 48 hours so I think I'm good. Thank you @trurl Edited February 12 by jrsphoto Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.