Jump to content

To ECC or not to?


Abula

Recommended Posts

Hi,

 

I'm looking to build a 16x8tb Unraid server, it will be mostly for storing movies,  but im wondering if i would benefit building it around ECC memory or not.  Im going to add 2x IBM1015 / LSI 9211-8i flashed to IT mode to handle the hdds.

 

I have some parts from an current pc that i can grab,

 

ASRock Motherboard Micro ATX DDR4 B150M PRO4

Crucial 8GB Kit (4GBx2) DDR4 2133 MT/s (PC4-17000) SR x8 Unbuffered DIMM 288-Pin Memory - CT2K4G4DFS8213

Intel Pentium G4400

 

Or i could build around a motherboard that i wanted to test, ASRock C236 WS + ECC ram + Celeron/Pentium/Xeon LV.

 

THanks for your opinions and suggestions,

 

 

 

 

 

Link to comment

The use of BTRFS and ECC are Ashman70's personal preference, but are not required. You have the flexibility to choose XFS (which is much more mature than BTRFS and used by the vast majority of unRaid users). With no UPS, XFS would definitely be my recommendation. I use XFS on all my data drives. And you can choose ECC or non-ECC. The setup in unRaid is virtually identical and not dependent on either of these choices..

 

As far as memory, ECC is a type of memory that had the ability to detect memory errors caused by faulty memory chips. It is a nice to have feature for a server, as a bad memory chip can corrupt data and cause instability. But bad memory is also rare. If you don't go with ECC, you'd want to run an extensive memory test during burn in. Normally 24 hours is recommended, but I'd say 24 hours minimum per 16G of RAM. Personally I'd recommend running it for a week to give me highest confidence that the memory is rock solid. I might also run aanother memory test after 3-4 months of 24x7 server use, to confirm the memory had not developed errors. Is ECC better? Yes. But I (personally) would consider a Ryzen server that does not support ECC with the precautions I listed. In fact, if buying a server, I'd definitely be looking at Ryzen and ThreadRipper. Seems last performance issue has been resolved in latest beta, and these are quite a bargain for an unRaid server.

 

Do your own research on xfs vs btrfs. Either are good choices.

Link to comment

Given your choices above, I'd go with the C236 over the B150M.  While the B150M supports two x16 cards, one of them look like it will be operating at x4.  I'm not sure if you can get them to operate in x8/x8.  The C236 does appear to support x8/x8, though.  Since you're likely to be using PCIex 2.0 SATA cards, you could hit bandwidth limitations with 8 HDs at x4.  Yeah, you'd need really fast hard drives or some SSDs to hit the cap, and it would only happen during parity checks (and the C236 costs a lot more) but I'd just rather put x8 cards into x8 slots.

 

Oh, and yes - the C236 supports ECC which I think is a great idea in an always-on fault tolerant server.

Link to comment

Whether or not you've tested your memory extensively, with non-ECC memory running in a server with high up time, you're probably going to flip a bit. Could be a couple over a month's time and never know it. Until later when a jpg is corrupted. With ECC you'll at least know and any single bit errors will be corrected.

Link to comment
2 hours ago, dmacias said:

Whether or not you've tested your memory extensively, with non-ECC memory running in a server with high up time, you're probably going to flip a bit. Could be a couple over a month's time and never know it. Until later when a jpg is corrupted. With ECC you'll at least know and any single bit errors will be corrected.

 

How would you know?

 

I have 32G of ECC memory and never had any indication of a memory error being corrected. In fact, I have never seen anyone reporting ECC memory corrections are occurring. I'm assuming we'd see them in the syslog or server log?

 

I had two memory sticks fail in last 15 years or so. One I did not test and Windows became unstable. I tested the Ram and sure enough it had issues. Since then have always tested memory - at least 24 hours. I've had only one test fail at rated speed. But when I did overclocking, I was testing the memory repeatedly at different speeds to get to stable. And never seen one give trouble after testing stable for more than 24 hours. But have had several runs that took more than half that time to report a problem, so lengthy runs are definitely important. I think a week would give a couple extra sigmas in reliability.

 

Maybe I'm lucky or naive. I agree ECC is better, but with an extensive round of testing, and a second round after a good period of 24x7 use, I think the risk is low based on my experience. Since Ryzen is arguable the best deal going right now, I'm not sure lack of ECC memory would hold me back. But definitely respect your opinion and knowledge, and interested in hearing more about how we can observe if ECC is actually doing corrections.

Link to comment

I agree I've only found a couple sticks of bad memory over the years. Most memory problems I've seen are just compatibility problems. Testing memory will show hard errors from component failure e.g. actual bad stick. But there are soft errors that can be caused by ambient factors, power surge or fluctuation, other components, cosmic rays, radiation whatever.

The errors are reported in the IPMI event log on both my Supermicro and ASRock boards. I've had my ECC memory one time just freak out and spam a bunch of errors. I restarted the computer and ran mem check overnight. Tested fine. I haven't seen anything like that again. But non ECC wouldn't have warned me.

Link to comment

I have a fairly irrelevant comment to add.

 

Last year for Black Friday I got a great deal on a new server motherboard. I had to buy new RAM to go with it and found a reasonable price on an Amazon Warehouse (open box) ECC stick of RAM. Popped the ECC RAM in the server and haven't thought about since!

 

Just over the weekend I was tinkering with unRaid and noticed in my motherboard info page that it explicitly said "NON-ECC RAM" under my RAM slot. I looked up the model number and sure enough, the RAM was not ECC RAM. I looked at my Amazon order and saw that the model number I purchased on Amazon was not the same as the model number in my server.

 

So what happened? My guess is someone bough the more expensive ECC Ram from Amazon and returned it with a less expensive Non-ECC RAM stick instead.

 

Either way, server is humming along just fine.

Link to comment

IMHO, ECC is a nice feature, but in cases where it raises the cost of the server substantially, non-ECC memory is a manageable risk. Purchasing ECC even if it is not operating in ECC mode might be a good strategy if your plan is to upgrade to an ECC server in a reasonable timeframe. This would avoid repurchasing memory at a later date.

Link to comment
IMHO, ECC is a nice feature, but in cases where it raises the cost of the server substantially, non-ECC memory is a manageable risk. Purchasing ECC even if it is not operating in ECC mode might be a good strategy if your plan is to upgrade to an ECC server in a reasonable timeframe. This would avoid repurchasing memory at a later date.

That may be the case for UnRaid, but ECC is only a small increase in cost for a large gain in confidence

But in any application where you’re relying on the outcome of a process in RAM to be correct ECC is a must.
FreeNAS for example really is a must for ECC.

I’ve personally seen hammer row attacks which comprise the entire PC/Server (which would have been avoided by ECC) do severe damage to the system and the components attached.


Sent from my iPhone using Tapatalk
Link to comment

It's not the cost of the memory. It's the fact that the Ryzen CPUs can't take advantage of it. Ryzen + MB much cheaper than TR + MB. (And all our most of the TR motherboards apparently don't support it, although CPU does.)

 

Guess not a good time to jump on the AMD bandwagon, even with the NPT issue resolved.

 

Link to comment

But the OP was about 2 Intel boards, ECC and parts he already had. If he wants cheap, my backup server in my sig cost less than $200 excluding drives. All from ebay: $109 for a Supermicro X10SLL-F, $35 Pentium G3220 and $38 Hynix 2x4GB unbuffered ECC. I would and have thought about making this my main server by adding a couple IBM M1015's and Xeon v3.

Link to comment

Thanks for all the replies and feedback. 

 

I'm going to go with ECC memory and thus a new mobo, already chosen the memory that its compatible with both boards im considering, Kingston ValueRAM KVR21E15D8/16 DDR4-2133 16GB/2Gx72 ECC CL15 Server Memory.

 

I have done build with Supermicro in the past, and while i don't have much against them, i do want to try something else, i personally don't like Supermicro bios fancontrol, it had issues with certain fans not reading correctly the PWM and having like a breathing effect on the fans.

 

I was a very Asus oriented person for a long time, but upon my last two desktop builds they have been MSI and AsRock, and both have better bios fan control than Asus, and for sure than Supermicro.    I was set on AsRock until i recently saw a cheaper MSI that caught my attention, 

 

MSI C236A Workstation

ASRock Rack C236 WS

 

Do you see any reason to go with AsRock over the MSI board? 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...