denzo Posted August 29, 2022 Share Posted August 29, 2022 (edited) I have been experiencing random lockups/freezes for several months. I am always trying out new Dockers and so I just assumed it was related to those but I have not installed (or had running) anything other than the dockers I "need" for a couple months and I continue to get random lockups. Tonight I was able to access the gui (even though some dockers and other things stopped working) and was able to download diagnostics (also included is a syslog after rebooting). I don't know what I am looking at in these logs etc. but see lots of errors. I am hoping a knowledgeable member here can take a look and sort of give me a quick roadmap as what I should look at first, second, third to try and get things stable. Any help will be much appreciated. nas-diagnostics-20220828-2210.zip nas-syslog-20220829-0410.zip Edited August 29, 2022 by denzo added syslog Quote Link to comment
Solution JorgeB Posted August 29, 2022 Solution Share Posted August 29, 2022 BTRFS error (device nvme0n1p1): block=2530556395520 write time tree block corruption detected This is usually a sign of bad RAM, there was also some data corruption found before, so start by running memtest. 1 Quote Link to comment
denzo Posted August 29, 2022 Author Share Posted August 29, 2022 11 hours ago, JorgeB said: BTRFS error (device nvme0n1p1): block=2530556395520 write time tree block corruption detected This is usually a sign of bad RAM, there was also some data corruption found before, so start by running memtest. Thanks for the reply, I am running Memtest and getting plenty of errors (see attached images). Pardon my ignorance but let me ask... are these types of errors solely due to the ram itself or could it be motherboard related? Should I just replace this memory? It looks to me like more than one stick has errors, does memory "go bad" like this in multiple sticks at the same time? What might cause multiple (or singular) failures like this, in other words what should I do to fix this and hopefully have it not happen again in another 2 years? Quote Link to comment
trurl Posted August 29, 2022 Share Posted August 29, 2022 test each stick by itself and test each slot by itself Quote Link to comment
denzo Posted August 29, 2022 Author Share Posted August 29, 2022 Thanks for the reply. What would be a good amount of time to run each memtest for each stick/slot? (I have 4 sticks/slots) Quote Link to comment
itimpi Posted August 30, 2022 Share Posted August 30, 2022 It is possible for RAM, CPU and motherboard to go wrong so a failure does not pinpoint the failing item. My guess is it happens more frequently with RAM but that guess is not based on any hard evidence. It can sometimes be worth simply reseating the RAM in its motherboard slots in case it has worked slightly loose. It is possible for each RAM stick to test out fine individually but you still get failures when you have multiple sticks plugged in due to overloading the motherboard memory controller. Carefully check in your motherboard manual the maximum RAM speeds your motherboard+CPU combination can support and remain stable - it is often lower than the rated speed of the RAM sticks, and can vary according to the number of sticks you have plugged in. Anything other than 0 failures means the system will be unreliable. In terms of how long to run the test the general answer is at least for a complete pass, and ideally for longer (e.g. overnight) as long as you are getting 0 errors. No point in continuing a test once you start getting errors reported other than perhaps seeing if it points to a particular RAM sticks/slot. Quote Link to comment
denzo Posted August 31, 2022 Author Share Posted August 31, 2022 (edited) UPDATE 1: I started testing my memory one stick at a time (in slot 1) 2 sticks passed (zero errors after 1 hour+) and the 2 other sticks (tested separately and repeatedly) would not let my system POST. So I am going to run my system with the two "good" sticks and see how stable everything is. I'll update this thread either way. Thanks for the help so far! UPDATE 2: It's been over a week and everything has been stable. Once again, thanks for the help figuring this out! Edited September 8, 2022 by denzo 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.