We have an Asus RS920-E7/RS8 or RS926-E7/RS8 at my work (purchased through a third party vendor). Yesterday, after a scheduled reboot, the machine stopped posting suddenly. Upon investigation it appeared that 4 RAM modules (out of the 16 modules @ 8GB each) are suddenly bad. Moreover the particular sockets where the failed modules failed correspond to an interesting configuration. It is one per NUMA node or processor. And it appears to be what would correspond to the same socket on each node if you assume that there are 8 sockets per node.
I find it hard to believe that 4 modules which previously worked independently failed at the same time. Given the particular circumstances I have a suspicion that the the board itself is faulty and damaged the modules somehow. Currently, based on the layout, I am guessing that these particular modules shared some kind of voltage source on the board.
Does this sound plausible/any other ideas?
The modules failed in sockets L1, N1, F1, and D1. Here is the manual: