Always Test the RAM on a New Server!
Don't expect your provider to do it for you! I ran into paging errors on bootup with the monolithic 2.6.38 paravirtualized Xen kernel I published here and have been using without issue for over three months on a new dedicated server with a similar dom0 software configuration as the one it was compiled for. The tell-tale sign that I was seeing a physical (re-mapped virtual) memory error rather than a flaw in the kernel was the fact that it only happened when the virtual machine was allocated over a certain amount of RAM, but never under. Unfortunately, this server is an i5 and lacks ECC RAM or an IPMI for hardware logging so I was forced to take a gamble on support fees and get their techs to attach an IP KVM so I could run memtest86.
As you can see, after at least 16 clean passes memtest shit the bed. The golden rule has always been to test the RAM on a new server for at least 24 hours straight. Of course, the other golden rule is to only get servers with ECC - not that it makes having bad RAM OK, it just makes it easily detectable and recoverable. I'm certainly going to start practising what I preach as were it not for some fortunate last minute scheduling changes this could have been a rather costly inconvenience!
Comments
There are no comments for this item.