Your server has detected hardware errors. HELP!


Recommended Posts

I have been using unraid for a couple of years now, but this is my first time having any issues. Please forgive any lack of protocol. 

 

Fix Common Problems started giving me this message. 

"Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged"

 

Hopefully this is enough info for someone to help me. This part is a little over my head. 

 

Basics:

Unraid 6.9.2

AMD Ryzen 7 1700X Eight-Core @ 3400 MHz

ROG STRIX B350-F GAMING

Kingston ValueRAM 16GB (2 x 8GB) DDR4 2400 RAM (Server Memory) ECC DIMM (288-Pin)

AMD Radeon R7 250 2GB GDDR3

2 - I/O Crest 4 Port SATA III PCI-e 2.0 x1 Controller Card Marvell 9215 Non-Raid with Low Profile Bracket SI-PEX40064

2 - Seagate Exos X10 10TB Parity

1 - Seagate Exos X10 10TB Storage

4 - Seagate Ironwolf 6 TB Storage 

2 - Samsung SSD 860 EVO Cache Pool 

 

 

 

syslog.txt syslog oasis-diagnostics-20210627-1322.zip

Edited by Screaming Wookie
Link to comment

Looks like a correct RAM error:

Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Corrected error, no action required.
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: CPU:0 (17:1:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Error Addr: 0x0000000034f18740
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x00008b100a400200
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.

 

Since it's not a server board there won't be more info in the BIOS on which DIMM it was, suggest you try one at a time, though it could have been a one time thing or not be a frequent issue.

Link to comment
1 hour ago, JorgeB said:

Looks like a correct RAM error:


Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Corrected error, no action required.
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: CPU:0 (17:1:1) MC15_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0xdc2040000000011b
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Error Addr: 0x0000000034f18740
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x00008b100a400200
Jul  3 20:10:55 OASIS kernel: [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.

 

Since it's not a server board there won't be more info in the BIOS on which DIMM it was, suggest you try one at a time, though it could have been a one time thing or not be a frequent issue.

Thank you JorgeB! 

 

Any suggestions on what would be the best way to clear the alert? If it happens again I will replace the ram. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.