sfaruque Posted August 13, 2021 Share Posted August 13, 2021 Good evening I've been getting these errors below after upgrading from 6.8.3 to 6.9.2 recently. The server appears to be running fine but I'm worried as these errors were not noticed before the upgrade. I have a AMD Radeon R7 passed through to a Windows 10 VM. I recently added more ECC Kingston RAM (Total 16GBx4). MB: Asrock X570 Steel Legend CPU: AMD Ryzen 7 3700X Recent diagnostics attached. Any help is much appreciated. Thanks dhaka-diagnostics-20210813-2049.zip Quote Link to comment
jonp Posted August 16, 2021 Share Posted August 16, 2021 Are there any symptoms you are noticing or just errors in logs? Quote Link to comment
Squid Posted August 16, 2021 Share Posted August 16, 2021 Aug 13 06:05:24 Dhaka kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. Aug 13 06:05:24 Dhaka kernel: EDAC MC0: 1 CE on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x5b7962 offset:0x880 grain:64 syndrome:0x1000) Looks like a bad stick Quote Link to comment
sfaruque Posted August 16, 2021 Author Share Posted August 16, 2021 4 hours ago, jonp said: Are there any symptoms you are noticing or just errors in logs? I haven't noticed any symptoms - just errors in the logs. Quote Link to comment
sfaruque Posted August 16, 2021 Author Share Posted August 16, 2021 4 hours ago, Squid said: Aug 13 06:05:24 Dhaka kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. Aug 13 06:05:24 Dhaka kernel: EDAC MC0: 1 CE on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x5b7962 offset:0x880 grain:64 syndrome:0x1000) Looks like a bad stick I'll remove the new sticks and see. Could there be any long term consequences if I leave them there do you think? As the server appears to be running fine. Quote Link to comment
sfaruque Posted August 22, 2021 Author Share Posted August 22, 2021 Looks like it was bad DRAM sticks. I've removed them and errors have disappeared. Thanks for your help. Quote Link to comment
Squid Posted August 22, 2021 Share Posted August 22, 2021 On 8/16/2021 at 6:54 PM, sfaruque said: Could there be any long term consequences if I leave them there do you think? As the server appears to be running fine. My opinion: You purchased ECC memory and are using it so that if/when one starts going bad you can replace it. Sure the errors are currently being corrected, but that is simply the stick telling you "replace me". If you have ECC memory and chose to not replace the memory when errors begin to happen then I'd question why you even bought ECC memory and a compatible CPU / motherboard in the first place. 1 Quote Link to comment
sfaruque Posted August 22, 2021 Author Share Posted August 22, 2021 8 hours ago, Squid said: My opinion: You purchased ECC memory and are using it so that if/when one starts going bad you can replace it. Sure the errors are currently being corrected, but that is simply the stick telling you "replace me". If you have ECC memory and chose to not replace the memory when errors begin to happen then I'd question why you even bought ECC memory and a compatible CPU / motherboard in the first place. Very valid point. Noted. Thanks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.