Linus Torvalds isn’t content with the way Intel has dealt with assistance for Mistake Correcting Code (ECC) memory, and he blames the silicon giant for fundamentally killing the technologies outside the house of servers. ECC memory is used to capture and accurate one-little bit problems in memory. It just cannot accurate multi-little bit problems, but just fixing one-little bit can make a substantial variance to procedure steadiness.
There was a time when you could obtain ECC assistance on mainstream chipsets, but Intel phased out that functionality on non-Xeon platforms a range of decades in the past. The 975X may perhaps have been the very last client Intel platform to assistance it, and that spouse and children launched 15 decades in the past. The Xeon 3450 chipset was cross-compatible with selected large-end CPUs in the Nehalem spouse and children, but that is nonetheless a Xeon chipset — not a mainstream section.
As a result, assistance for ECC in client merchandise — and the availability of ECC RAM for client merchandise — both fell off a cliff. Linus summarizes his circumstance in a somewhat prolonged write-up, arguing that the ongoing persistence of Rowhammer and the reality that one-little bit problems have in no way gone absent to declare Intel’s ECC insurance policies “bad and misguided.” He truly can take on the overall DRAM sector, producing:
The memory companies claim it is simply because of economics and reduced ability. And they are lying bastards – permit me once yet again issue to row-hammer about how individuals problems have existed for numerous generations presently, but these f*ckers happily offered damaged components to consumers and claimed it was an “attack”, when it normally was “we’re reducing corners.
Torvalds also refers to many incidents of kernel “oopsies” that he feels may perhaps be superior stated by a components mistake. While goal facts on this kind of issue is really hard to appear by, a 2009 Google report on memory problems presents some evidence he’s right, though of course a 2009 paper may perhaps have minimal applicability to DDR4 RAM in 2020.
Google’s conclusion from 2009 was easy: “We discovered the incidence of memory problems and the range of mistake charges throughout different DIMMs (twin in-line memory modules) to be substantially bigger than earlier reported… Memory problems are not scarce gatherings.” The group detected mistake charges that it describes as “orders of magnitude bigger than earlier noted.”
They conclude: “error correcting codes are essential for lessening the big range of memory problems to a workable range of uncorrectable problems.”
AMD’s Existing Assistance of Restricted Worth
On paper, AMD’s Ryzen spouse and children supports ECC unofficially (Threadripper has official ECC assistance). As Ian Cutress factors out later on in the thread, on the other hand, just simply because a motherboard promises ECC assistance doesn’t signify that assistance is truly enabled. We do not run into this condition incredibly frequently, but CPUs and motherboards report their several attribute sets via registers, which apps like CPUID then check to decide and report which features a chip supports. An software claiming to check to make guaranteed a given attribute is supported (SSE, AVX, ECC, and many others), can only report what the CPU or motherboard promises about its personal procedure via sign-up flags. It just cannot truly check to see that assistance exists, unless the software truly has a attribute test — like, say, a compact benchmark that virtually just cannot run unless AVX assistance is practical.
Simply because AMD’s assistance is unofficial, it suggests no a single is standing more than OEMs with a whip to make guaranteed they properly put into practice the attribute, and they aren’t screening to make guaranteed the attribute truly will work. Simply because it is doable to established the little bit for “Supports ECC” in a motherboard sign-up without the need of truly employing practical ECC, there are motherboards out there that claim to assistance the standard and show up to do so if you scan them with a utility, but do not truly put into practice ECC at all. The only way to guarantee that ECC compatibility will work on an AMD Ryzen motherboard is to run a utility that forces an ECC mistake.
As for no matter whether we’ll see the attribute make a return to Intel desktops or formally debut for Ryzen, that is unclear. It would require obtain-in from memory companies, and it is not distinct incredibly quite a few people today in the Computer sector would spring for it. Most people today obtain on rate, and considering that you in no way know about the Computer crashes you do not have, it is really hard to offer people today on the reward. Then yet again, we’re going to see the x86 CPU companies dealing with substantially stiffer troubles from ARM more than the upcoming 2-5 decades than we have at any time noticed before. It would not be astonishing to see Intel and/or AMD “rediscover” some features, particularly if individuals features let them to claim greater steadiness when compared to prior merchandise.
Function impression reveals registered DDR4-2133 DIMMs. Registered DIMMs frequently also assistance ECC, but it is doable to find unbuffered ECC RAM as properly.