To ECC or to not ECC, that is the...


Recommended Posts

Afternoon Neowinians!

I am looking at upgrading my home server with some new bits and bobs - basically new mobo, cpu, and RAM.

I was going to go with an Intel i3 2120T/2120 (because I have a quiet case, and it gets quite hot - so cooler is nice) and an ASRock mobo with your run of the mill RAM. I also only wanted to spend a few hundred, this server is really simple (basically a file server, and a minecraft server).

Thats where the problems started. I was always under the impression ECC RAM was only supported on the higher end hardware...however apparently the AMD FX and AM3+ products support it! While I know intel offers better power/performance/cost ratios, I don't mind sacrificing a little performance and gaining a little heat if ECC is brilliant.

Firstly, is ECC worth it? I mean I rarely see my home computer go for more than a few months without resetting, however our server at work goes for a good year before i reset it out of pitty.

If ECC is worth it, can someone suggest an affordable cpu/mobo/ram combo (i'm in australia, west coast)?

I am planning on running WHS 2011 (as it is cheaper than Win7 Pro! >.<)

Thanks for any help you can give! =)

Regards,

UL

Link to comment
Share on other sites

Unless you're running mission-critical software, no not really.

RAM errors and malfunctions are pretty rare these days.

ECC actually reduces performance, it increases the CL latency but improves reliability in being able to withstand some memory errors.

Link to comment
Share on other sites

Is it worth the price premium, when just "0.22% of DIMMs suffer an ECC-correctable error every year" : source

ECC really is only needed in the most important of applications - is what you need THAT important?

Link to comment
Share on other sites

Haha thanks guys! That is exactly what I was after! Hard numbers, with a source, is brilliant.

For the sake of a 0.22% chance per YEAR...I don't think ill bother with ECC =P if it was per month...maybe, but not per year!

(I'm impressed you managed to dig up that paper so quickly bio...i was searching for a good few hours =P)

Thanks for the help! =)

Link to comment
Share on other sites

Is it worth the price premium, when just "0.22% of DIMMs suffer an ECC-correctable error every year" : source

ECC really is only needed in the most important of applications - is what you need THAT important?

I don't think you read it properly:

About a third of machines and over 8% of DIMMs in

our fleet saw at least one correctable error per year. Our

per-DIMM rates of correctable errors translate to an aver-

age of 25,000?75,000 FIT (failures in time per billion hours

of operation) per Mbit and a median FIT range of 778 ?

25,000 per Mbit (median for DIMMs with errors), while pre-

vious studies report 200-5,000 FIT per Mbit. The number of

correctable errors per DIMM is highly variable, with some

DIMMs experiencing a huge number of errors, compared to

others. The annual incidence of uncorrectable errors was

1.3% per machine and 0.22% per DIMM.

The conclusion we draw is that error correcting codes are

crucial for reducing the large number of memory errors to

a manageable number of uncorrectable errors. In fact, we

found that platforms with more powerful error codes (chip-

kill versus SECDED) were able to reduce uncorrectable er-

ror rates by a factor of 4?10 over the less powerful codes.

Nonetheless, the remaining incidence of 0.22% per DIMM

per year makes a crash-tolerant application layer indispens-

able for large-scale server farms.

My understanding of the 0.22% figure is the amount of errors per DIMM which ECC would be unable to correct.

Link to comment
Share on other sites

  • 2 weeks later...
  • 3 months later...

For a real business server? Not using ECC is silly.

For your home server? Using ECC is probably a waste of money.

For an enterprise (read - multiple-XEON/multiple-Opteron rack server), ECC makes sense.

For a server smaller than that (even single-XEON/single-Opteron rack), ECC is pretty darn pointless, because the failure rate (even using standard desktop non-ECC) won't be higher than the worst-case expense in such a scenario, as ECC is priced higher by a factor of greater than three (it's currently closer to five) than desktop RAM of the same speed. (It's still ROI FTW - even when it comes to servers.)

Link to comment
Share on other sites

This topic is now closed to further replies.