djgizmo Posted August 7, 2023 Share Posted August 7, 2023 (edited) I’ve been using Unraid at home since 2018. Started with basic. Then moved to plus. And then upgraded to pro. About 6 month apart for each upgrade. Started with a DellR710 and had to reboot monthly which was fine for stability. After I made a custom box with a Rosewill Case, a SuperMicro motherboard, Intel 4770, 16Gb of ram, SSD for cache / VMs/containers.. things seemed a bit bettee, I’d only have to reboot every 2-3 months. Now for the past 6 months, stability has been garbage garbage. VMs and containers randomly crashing, not being able to start. So I suspected a bad SSD. Swapped SSD. Same issue. Started to reboot weekly and was fine for a bit. Now I’m getting out of memory and errors and frankly, I’m unhappy. Then today after a reboot, I see ‘unrecognizable file system” error on my SSD and all of my VMs and containers are gone of course. (Luckily I’ve made backups of my container data on my array) I can’t use my unraid box for more than a basic NAS at this point. I’ve memtested my ram for 6 hours, all passes and no errors. I’m not sure what to do now. I need to fix this stability issue. I don’t have a spare motherboard or cpu, to verify that that as possible issue. PSU is a Corsair 500 watt psu, so in theory power is stable. ironman-diagnostics-20230807-0804.zip Logs from RAM.txt Edited August 7, 2023 by djgizmo attaching diagnostics Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 Enable the syslog server and post that after a crash and the diagnostics. Quote Link to comment
rkotara Posted August 7, 2023 Share Posted August 7, 2023 (edited) 9 hours ago, djgizmo said: " After I made a custom box with a Rosewill Case, a SuperMicro motherboard, Intel 4770, 16Gb of ram, SSD for cache / VMs/containers.. things seemed a bit bettee, I’d only have to reboot every 2-3 months. ... I’ve memtested my ram for 6 hours, all passes and no errors." My single threaded memtest will not run a full cycle in 6 hours, but is very accurate on testing. Might be best to let it run a full cycle and ensure your using the single threaded (older) memtest86. The newer multi-threaded version will miss some ram issues in my experience. Edited August 7, 2023 by rkotara Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 17 minutes ago, rkotara said: My single threaded memtest will not run a full cycle in 6 hours, but is very accurate on testing. Might be best to let it run a full cycle and ensure your using the single threaded (older) memtest86. The newer multi-threaded version will miss some ram issues in my experience. Which version should I run? Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 3 hours ago, JorgeB said: Enable the syslog server and post that after a crash and the diagnostics. Attached diagnostics and logs that were stored in ram. I've started the local syslog server. Should I be mirroring this to flash drive? Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 20 minutes ago, djgizmo said: Should I be mirroring this to flash drive? As you prefer, mirror to flash drive is the easiest option to configure. Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 19 minutes ago, JorgeB said: As you prefer, mirror to flash drive is the easiest option to configure. k, I have set this to a share on my array. I don't see any files / folders created for syslog on this share. Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 Possibly not correctly configured, post a screenshot of the settings. Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 1 hour ago, JorgeB said: Possibly not correctly configured, post a screenshot of the settings. Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 You need to set the remote syslog server, use the server IP and default 514 port Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 done, and now the syslog file is created on disk. Thank you. Ran an XFS_repair -N on the ssd. root@IRONMAN:~# xfs_repair -n /dev/sdi1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... invalid start block 1627196033 in record 208 of cnt btree block 2/704825 invalid start block 1627196033 in record 234 of cnt btree block 2/704825 invalid start block 1627196033 in record 236 of cnt btree block 2/704825 agf_freeblks 2545590, counted 2545587 in ag 2 agi unlinked bucket 26 is 125929626 in ag 2 (inode=662800538) sb_icount 841152, counted 841408 sb_ifree 4855, counted 4673 sb_fdblocks 20284605, counted 20803245 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... free space (2,14441748-14441748) only seen by one free space btree free space (2,14447755-14447755) only seen by one free space btree free space (2,14447890-14447890) only seen by one free space btree - check for inodes claiming duplicate blocks... - agno = 2 - agno = 0 - agno = 1 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 662800538, would move to lost+found Phase 7 - verify link counts... would have reset inode 662800538 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting. Do you recommend that I try to mound the ssd manually via command line? Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 You need to run it again without -n or nothing will be done. Quote Link to comment
djgizmo Posted August 7, 2023 Author Share Posted August 7, 2023 Did so, said it couldn't read the log. Followed the Spaced Invader video on XFS repair, and used the -L and the file system has been repaired. Now that I have my base data back, I need to know why this happened and how I can prevent from happening again. Quote Link to comment
JorgeB Posted August 7, 2023 Share Posted August 7, 2023 Post the syslog when it happens again, hopefully it catches something. Quote Link to comment
rkotara Posted August 8, 2023 Share Posted August 8, 2023 22 hours ago, djgizmo said: Which version should I run? I believe any of the 4.x versions, which also can run on non-UEFI systems as a bonus. Here is a link to the usb installer version of that: https://www.memtest86.com/downloads/memtest86-4.3.7-usb.img.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.