gbcayce Posted January 6, 2023 Share Posted January 6, 2023 Good morning, new to posting so apologies in advance for missing info. I have searched high and low don't know where start with my issues. My server has been functionally going off line seemingly randomly. it started with the server staying on and the web GUI still accessible, but all the shares disappear (see attached, error in corner says "array undefined". to my knowledge no heavy reads or writes happen when this occurs. when i go to reboot it hangs and i have to hard restart. sometimes only my docker containers will go offline and the array will stay online. now i just cant get half my docker containers to start. any help or guidance is greatly appreciated! (server specs below) here are the things i have done so far. -updated everything -run long form memtest -multiple parity checks -installed "fix common problems" (no real issues) -reseated all hardware (drives, ram, cpu, power cables) -looked for obvious errors in logs (may have missed something im not a great log whisper) server specs - i7-4790 -16gb RAM -gigabyte z87x-ud5h-cf -2x 3TB HDD -4TB Parity Drive -250GB SSD Cache -1TB unassigned drive (not used) Services Docker -heimdall -krusader -noip -pihole -plex -speedtest-tracker -unifi-controller -uptimekuma -watchtower VM -home assistant (2core, 4GB RAM) Quote Link to comment
JorgeB Posted January 6, 2023 Share Posted January 6, 2023 See if you can get the diagnostics, or the syslog at least. Quote Link to comment
gbcayce Posted January 6, 2023 Author Share Posted January 6, 2023 that would be helpful, wouldn't it.... please see attached. tower-diagnostics-20230106-1039.zip Quote Link to comment
JorgeB Posted January 6, 2023 Share Posted January 6, 2023 Dec 30 11:30:27 Tower kernel: BTRFS error (device sdf1): block=476397568 write time tree block corruption detected This usually indicates a RAM problem, so start by running memtest. Quote Link to comment
gbcayce Posted January 6, 2023 Author Share Posted January 6, 2023 i ran it for about 48 hours with no fails. i cant remember how many cycles it goes through. im using the baked in memtest, is there another more through version? Quote Link to comment
Solution JorgeB Posted January 6, 2023 Solution Share Posted January 6, 2023 Memtest doesn't always catch the errors, try running with just one two sticks, if the same try the other two, pool might need fixing first. Quote Link to comment
itimpi Posted January 6, 2023 Share Posted January 6, 2023 14 minutes ago, gbcayce said: im using the baked in memtest, is there another more through version? You can get a more recent version from the memtest86.com site. For licencing reasons this cannot be included with Unraid, but it is free for personal use. Not sure if does more thorough testing but it would not hurt to try and it can test EEC RAM properly which the Unraid version does not as I understand it. Quote Link to comment
gbcayce Posted January 6, 2023 Author Share Posted January 6, 2023 i just logged in to shut down the server and now i have this new weirdness.... the gui still is responsive, docker is frozen (nothing loads on that tab). the containers are frozen (pihole will not load). and the CPU on a number of cores is pegged. attached are logs and screen grab. before i shut this down is there anything i should look for? uptime is at 7 days 14.5 hrs tower-diagnostics-20230106-1450.zip Quote Link to comment
gbcayce Posted January 6, 2023 Author Share Posted January 6, 2023 also cant access the shares from windows explorer Quote Link to comment
JorgeB Posted January 7, 2023 Share Posted January 7, 2023 Unraid driver crashed, you need to reboot, force it if needed, that also points to hardware problems. Quote Link to comment
gbcayce Posted January 8, 2023 Author Share Posted January 8, 2023 sorry if this is a novice question, but where in the logs do i look for that? trying to understand what component may be causing these outages. Quote Link to comment
JorgeB Posted January 9, 2023 Share Posted January 9, 2023 Hardware issues usually are not logged directly. Quote Link to comment
gbcayce Posted January 18, 2023 Author Share Posted January 18, 2023 not to jinx it but after changing all the ram for new ones it appears to be stable and solved. thanks for the help, i had more trust in memtest than it deserved clearly. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.