December 8, 201510 yr Yes, memory errors are much less likely to happen than a hard drive failure. However, it can happen and I'm wondering if unRaid has anything (other than a syslog entry?) that would report an ECC error? When a disk goes wonky, I get an email stating the fact and it alerts me, if nothing else, to pay attention to the drive. Already had my first drive failure and recovery (thank you Parity Disk!). I'm looking at my next system build and the motherboard can support ECC memory (ASUS M5A97). I've seen others using this board with ECC memory and like the idea. My question is, so if I put it in and configure the BIOS to use it, how do I get notified, when unRaid recognizes a memory failure (I'm assuming it does)? If a message is written to dmesg/the syslog, this is already fine, but I'd love to know what to look for installing additional daemons (like smartmontools for hard drives) is acceptable. I supposed Nagios/Icinga monitoring would be another way to go in the Linux world (would that work with unRaid?) and not all machines to be monitored have IPMI. Any thoughts perspectives on doing this?
December 8, 201510 yr I don't know if there's any reporting done when single-bit errors are corrected, or if that's simply a transparent function with this board. But I'd definitely use ECC modules, as the vast majority of memory errors are single-bit random errors that generally go either unnoticed or cause "anomalies" that are often thought of as glitches in the OS, but are in fact caused by these random errors. A typical memory module will experience a few of these per year -- not a big deal, but with ECC they're virtually completely eliminated.
Archived
This topic is now archived and is closed to further replies.