January 6, 20233 yr Good morning, new to posting so apologies in advance for missing info. I have searched high and low don't know where start with my issues. My server has been functionally going off line seemingly randomly. it started with the server staying on and the web GUI still accessible, but all the shares disappear (see attached, error in corner says "array undefined". to my knowledge no heavy reads or writes happen when this occurs. when i go to reboot it hangs and i have to hard restart. sometimes only my docker containers will go offline and the array will stay online. now i just cant get half my docker containers to start. any help or guidance is greatly appreciated! (server specs below) here are the things i have done so far. -updated everything -run long form memtest -multiple parity checks -installed "fix common problems" (no real issues) -reseated all hardware (drives, ram, cpu, power cables) -looked for obvious errors in logs (may have missed something im not a great log whisper) server specs - i7-4790 -16gb RAM -gigabyte z87x-ud5h-cf -2x 3TB HDD -4TB Parity Drive -250GB SSD Cache -1TB unassigned drive (not used) Services Docker -heimdall -krusader -noip -pihole -plex -speedtest-tracker -unifi-controller -uptimekuma -watchtower VM -home assistant (2core, 4GB RAM)
January 6, 20233 yr Author that would be helpful, wouldn't it.... please see attached. tower-diagnostics-20230106-1039.zip
January 6, 20233 yr Community Expert Dec 30 11:30:27 Tower kernel: BTRFS error (device sdf1): block=476397568 write time tree block corruption detected This usually indicates a RAM problem, so start by running memtest.
January 6, 20233 yr Author i ran it for about 48 hours with no fails. i cant remember how many cycles it goes through. im using the baked in memtest, is there another more through version?
January 6, 20233 yr Community Expert Solution Memtest doesn't always catch the errors, try running with just one two sticks, if the same try the other two, pool might need fixing first.
January 6, 20233 yr Community Expert 14 minutes ago, gbcayce said: im using the baked in memtest, is there another more through version? You can get a more recent version from the memtest86.com site. For licencing reasons this cannot be included with Unraid, but it is free for personal use. Not sure if does more thorough testing but it would not hurt to try and it can test EEC RAM properly which the Unraid version does not as I understand it.
January 6, 20233 yr Author i just logged in to shut down the server and now i have this new weirdness.... the gui still is responsive, docker is frozen (nothing loads on that tab). the containers are frozen (pihole will not load). and the CPU on a number of cores is pegged. attached are logs and screen grab. before i shut this down is there anything i should look for? uptime is at 7 days 14.5 hrs tower-diagnostics-20230106-1450.zip
January 7, 20233 yr Community Expert Unraid driver crashed, you need to reboot, force it if needed, that also points to hardware problems.
January 8, 20233 yr Author sorry if this is a novice question, but where in the logs do i look for that? trying to understand what component may be causing these outages.
January 18, 20233 yr Author not to jinx it but after changing all the ram for new ones it appears to be stable and solved. thanks for the help, i had more trust in memtest than it deserved clearly.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.