Nummy Posted June 18 Share Posted June 18 Can someone please have a look over my diagnostics and see what I need to fix etc, thanks :) numserver-diagnostics-20240618-1619.zip Quote Link to comment
JorgeB Posted June 18 Share Posted June 18 There's a lot of corruption detected by btrfs for the pool, start by running memtest. 1 Quote Link to comment
Nummy Posted June 18 Author Share Posted June 18 Running Memtest now, 6 hours soak enough? Quote Link to comment
JorgeB Posted June 18 Share Posted June 18 Run it for 2 or 3 passes, but note that memtest is only definitive if it finds errors, if you have multiple sticks try using the server with just one, if the same try with a different one, that will basically rule out bad RAM. Quote Link to comment
JorgeB Posted June 19 Share Posted June 19 Run a correcting scrub for the pool and post the results. Quote Link to comment
Nummy Posted June 19 Author Share Posted June 19 Going take out my HDDs and and check the power connectors and sata connectors , also going to run them on crystal disk to see if there's any major issues. When I start my array Parity keeps failing at 0.1 with a read error then starts to run a read check on the drives, and when I cancel that it freezes. So ill do that and then run correcting scrub and let you know, thanks for the help so far. :) Quote Link to comment
Nummy Posted June 19 Author Share Posted June 19 11 hours ago, JorgeB said: Run a correcting scrub for the pool and post the results. UUID: 27b890d1-1335-421c-afc6-7131a6ff851f Scrub started: Wed Jun 19 19:38:58 2024 Status: finished Duration: 0:12:23 Total to scrub: 2.61TiB Rate: 3.59GiB/s Error summary: csum=372 Corrected: 0 Uncorrectable: 372 Unverified: 0 Quote Link to comment
Nummy Posted June 19 Author Share Posted June 19 So just failed parity again, all the drives came back fine when it took them out and checked them all in crystal disk These are the two assholes that are fighting me. Added new diags again just incase there's a change that you might notice. numserver-diagnostics-20240619-2217.zip Quote Link to comment
JorgeB Posted June 20 Share Posted June 20 13 hours ago, Nummy said: Uncorrectable: 372 Look in the syslog for the corrupt file list, then delete them/restore from a backup and run another scrub to confirm no more errors. 10 hours ago, Nummy said: These are the two assholes that are fighting me Looks like a controller problem to me, or a power issue, but they are connect to the same 2 port Asmedia controller, and I've seen issues with those 4 x Asmedia all-in-one controllers before, especially with IOMMU enabled, recommend replacing it. Quote Link to comment
Nummy Posted June 28 Author Share Posted June 28 So Ive replaced the SAS card, and those drive issues have gone away now,somenumserver-diagnostics-20240628-2325.zip so thanks for that. I am still getting VM crashes though, so I updated to the latest Unraid as there are meant to have VM improvements, but I can't have VM running, have Plex running and the dockers doing their late evening tasks. Quote Link to comment
JorgeB Posted June 29 Share Posted June 29 Nothing jumps out to me in the log, do you remember the time code for the VM crash? Quote Link to comment
Nummy Posted July 2 Author Share Posted July 2 So crashed about 35 mins ago, i had stopped the windows vm then clicked on an other tab and it crashed on me. numserver-diagnostics-20240702-0822.zip Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 Syslog in the diags starts over after every boot, enable the syslog server and post that after a crash. Quote Link to comment
itimpi Posted July 2 Share Posted July 2 The syslog in the diagnostics is the RAM version that starts afresh every time the system is booted. You should enable the syslog server (probably with the option to Mirror to Flash set) to get a syslog that survives a reboot so we can see what leads up to a crash. The mirror to flash option is the easiest to set up (and if used the file is then automatically included in any diagnostics), but if you are worried about excessive wear on the flash drive you can put your server's address into the remote server field. Quote Link to comment
Nummy Posted July 2 Author Share Posted July 2 Here you go, all I did was start my win vm and unraid crashed again Unconfirmed 674128.crdownload Unconfirmed 405002.crdownload Quote Link to comment
JorgeB Posted July 3 Share Posted July 3 Please confirm the diags downloads on Chrome before posting, and they are not opening even after renaming to zip. Quote Link to comment
Nummy Posted July 3 Author Share Posted July 3 (edited) Ok ,here you go. syslog.rar numserver-diagnostics-20240703-2004.zip Edited July 3 by Nummy Quote Link to comment
JorgeB Posted July 4 Share Posted July 4 Not seeing anything relevant logged, what was the timecode for the last crash? Quote Link to comment
Nummy Posted July 9 Author Share Posted July 9 Yeah it was, found out that issue is down to the ram and XMP profile causing issues which i now tested with xmp off and memtest is fine. So I am now just letting the system run and see if it randomly crashes again. Thanks again for your help so far! Quote Link to comment
Solution Nummy Posted August 16 Author Solution Share Posted August 16 So Ive ran this now since then an seems fairly stable without xmp, thanks for the help please close this ticket. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.