datagnome Posted March 16 Share Posted March 16 (edited) Hey all Shortly after installing the "Fix Common Problems" plugin, my new server started posting the above mentioned error (FCP is totally not the problem, I've got it on another server w/o issue). I reviewed the post about this error and decided to post here with my diagnostics. I'm pretty confident that it is likely the RAM that is at fault but having trouble determining which slot it could be. The motherboard and CPU are brand new though the RAM was canablized from a different server that was running w/o issues the last time I had it turned on. Server details: Motherboard: SuperMicro X11SPH-nCTF CPU: Intel Xeon Bronze 3204 RAM: 8x16GB DDR4 ECC, Hynix HMA42GR7MFR4N-TF Unraid v6.12.8 3 of the drives do have SMART reported errors though only one is of concern (ST8000NM0105_ZA11TE8Y, Disk 5) though it hasn't incremented the attribute in a while Let me know if you need any more details! dgstore-diagnostics-20240316-0926.zip Edited March 16 by datagnome Quote Link to comment
datagnome Posted March 16 Author Share Posted March 16 I totally forgot memtest86 is a thing, I'll kick that off now Quote Link to comment
datagnome Posted March 16 Author Share Posted March 16 Well memtest86 passed. I decided to pull the Supermicro AOC-SLG3-2M2 I installed earlier yesterday and kicked off the copy I was doing. So far no issues so I'm wondering if the machine check errors were coming from the card/NVME even though I actually wasn't using it (drive was detected but not mounted anywhere). I'll see how this looks later this evening. Quote Link to comment
datagnome Posted March 16 Author Share Posted March 16 Hm still seeing these "mce: [Hardware Error]: Machine check events logged" being logged though its only this line, not seeing the memory correction messages I did see before. Maybe it could be due to the way I'm copying the data? Info: Array -> Disks are all formatted ZFS Synology NAS -> Mounted the share using SMB using the unassigned devices plugin Using rsync with the -azvhp syntax so its: rsync -avzhp /mnt/remotes/location1/ /mnt/user/Destination/ Anyone have any recommendations on what else I can test this with? dgstore-syslog-20240316-1848.zip dgstore-diagnostics-20240316-1448.zip Quote Link to comment
datagnome Posted March 16 Author Share Posted March 16 For those curious, here is the output of /var/log/mcelog. Now to figure out how this relates to the physical DIMMs dgstore_mcelog.txt Quote Link to comment
datagnome Posted March 16 Author Share Posted March 16 I didn't realize the version of memtest86 thats included with Unraid is older so I downloaded the latest and greatest. 25min into it and getting some ECC errors so hopefully by the time it is done I'll be able to determine which modules Quote Link to comment
Solution datagnome Posted March 17 Author Solution Share Posted March 17 In the end it was indeed a failed module but I ended up running through the updated memtest86 through all the modules just in case Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.