• Sign in to Neowin Faster!

    Create an account on Neowin to contribute and support the site.

Sign in to follow this  

Power issue.

Recommended Posts

StrikedOut    173

Hi All.

 

One of my sites is having a very strange power issue that I can figure out. We have 2 servers, one an older Dell PowerEdger 2900 due for retirement, and the other an HP DL385 Gen8. Recently, one of the servers (HP) became unresponsive so I asked a user to take a look and tell me what they could see. The screen showed no input signal, so after confirming the server was connected to the monitor correctly, we did a hard reset. This brought the server back up and everything became available again. I logged in to the Dell server and this, along with the HP server, showed the unexpected power off message indicating to me both lost power at the same time, looked at the log and found nothing to indicate any issues. I suspected the batteries were faulty (hadn’t been replaced for some time) in the UPS, an APC Smart UPS 2200, and ordered the replacements and went on site the next day for the install.

 

Before I have a chance to replace them, I was working on the server when for a split second, all servers lost power and went into the same loop as the day before. So, I checked to see if this was isolated in the building and it was just the servers that lost power, both at the same time, server room lights didn’t flicker and I can’t tell you if the switches restarted but I am sure the router didn’t (but I may be wrong). So, I replaced the batteries, simulated a power cut by pulling the plug on the UPS - All fine. I also moved the power socket as it was originally in a floor box.

 

A couple of days later, I get the call that it has happened again, logged onto the server and again, see the unexpected shutdown message, so no I suspect the UPS is faulty. Order a replacement for next day delivery and was onsite in the morning to install.

This is what I then did and tested;

 

  • Installed a new APC UPS system and a new APC PDU and plugged one PSU from each server into the PDU.
  • Plugged the other power supply on both servers into a power lead straight into a socket so there are 2 feeds from different sources for the power. During the transition, both power supplies in both servers were tested and all 4 are fine.
  • Simulated a power cut from both sockets - All fine.
  • Full virus scan from Panda Adaptive Defence on both servers - All clear
  • Checked the logs and it seems that because it is a power failure, no log is created to indicate any issue.

 

So, my thoughts are that it isn’t the UPS as it is the same on a new unit plus, I have an independent feed to the second PSU on both servers. The leads are fine as surely it would not cause both servers to restart? Mains power supply isn’t the issue as the UPS should take care of this and a simulation of a power cut was done with no issues. An application would not cause a power cut to both servers unless it is a virus but a straight power cut? If it was an application, then I would expect a graceful restart/shutdown.

 

At this point, I am out of ideas so hope you have some for me??

Share this post


Link to post
Share on other sites
+BudMan    3,537

What exact error are you seeing in the log.. Just an event ID 6008 saying the previous shutdown was unexpected?

Share this post


Link to post
Share on other sites
StrikedOut    173

Correct, Just 6008.

 

Annotation 2019-08-27 141944.jpg

Share this post


Link to post
Share on other sites
spikey_richie    220

You got minidumps turned on? If so, crack one open with WinDbg

Share this post


Link to post
Share on other sites
StrikedOut    173

Minidumps are on but doesnt give any. I suspect due to the manner in which it is losing power.

Share this post


Link to post
Share on other sites
spikey_richie    220

Eep, that's a pretty catastrophic failure then. You got another PSU you can try?

Share this post


Link to post
Share on other sites
StrikedOut    173

Both servers have 2 PSU's, both seemed to be picking up when testing by pulling the others socket. Plus this is affected on 2 servers and the one time I was in the room, both lost power at the same time. Never seen anything like it and damned if I can figure out the common component that may be failing.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.