So this Sunday when I woke up and tried to check my work email I found out I could not. In fact, I couldn’t hit work’s websites, VPN, SSH machines, or anything. When I finally arrive at work (it took me some time to get there; I was not at home) I see that my boss John is already there and has managed to get the internet up and running. Half the machines are off still and John tells me “it looks like the room lost power”.
So I start looking at things. Many of the machines shat themselves when coming back up because they are dependent on other servers that hadn’t started yet. The other half were rather confused… they are all set to “Last State” in the BIOS for what to do on power restore. It seems many of the machines couldn’t remember what state they were in. I should change those to just “Power on”.
So John investigated the UPS while I’m getting things up and running again. In the logs was a brief switch to battery followed shortly after by a dead battery. The next log item was “8:15am: Power off by front panel”. Wait, what?? Yeah. Someone pushed the shiny power button on the UPS and confirmed they wanted to shut it off. They then turned it back on and didn’t tell anyone what they had done (probably for fear of their job). I blurt out while still in shock someone would do that “Maybe having security silence alarms in the server room isn’t the best idea”.
A 2 hour support call for the phone system, 8 hours of my time, 4 hours of John’s time, a dead hard drive, and many upset researchers later all because at some point someone along the line decided it was a good idea to have unqualified people pushing buttons in the server room rather than just having them report the issue and let it beep. Thanks a lot! Have you learned your lesson yet?
Post a Comment