Server Outage: Warren the Hero

| | |

Dirty MotherboardDirty MotherboardI doubt many people actually noticed but the server that runs this site (and about a dozen more) and my email... died sometime between 4:30 and 6:40 this morning. I texted (using Google Voice since I do NOT own/want a cell phone) Warren (since the server is in his house) about 6:38 and he responding back at 7:10 saying the server was dead. It would power on for about a second and then immediately turn off. It was full of dust bunnies so Warren cleaned it out and re-seated everything that might have had a loose connection... but still power-on failure. Warren opened up the power supply and cleaned it out. Inspecting both the motherboard and the power supply, there were no obviously bad components... no bad caps. Still no results.

Without a quick fix we decided to try to get a temporary system going... perhaps using a desktop computer with 4 SATA ports. 4 SATA ports are needed because the dead server has 4 hard drives that are in a Linux software RAID 5 configuration. Luckily Warren had access to a spare Dell OptiPlex 960 Core 2 Duo system with 8GB of RAM and 4 SATA ports. Many tower systems that have 2 hard drive bays come with 4 SATA ports so they can support 2 HDs and 2 optical drives. Getting the cables to go where they needed to go (both data and power) for loose drives seemed to work and for the time being the desktop power supply seems to be beefy enough to run all of the drives. Seeing as we run CentOS 6.x for OpenVZ... I was concerned that the network chipset would be too new... and for a little while that seemed to be the case... but Warren booted from a CentOS 6.7 LiveDVD and the network worked fine... so we knew it was a configuration issue from the previous hardware configuration. Turns out the NIC was detected as eth2 rather than eth0. While that could probably have been resolved by nuking a udev rule file somewhere, Warren just moved and edited the eth0 config file so it was eth2 and we were up and running again.

Dirty Power SupplyDirty Power SupplyWhile I do have backups (a couple of days old) and have previously researched a few cloud services (6sync and fastmail are on my radar) just in case, it is great to be back up after a few hours rather than having to worry about transferring a few hundred gigabytes of data before we are back up.

For the mid to long-term we haven't thought about where to go... and are just happy to be up and running again. Warren had it back up about 2 PM so he put a good 7 hours in today. Turns out Warren starts the Montanan Marathon early tomorrow morning, which he has been training all year for, so he wanted to get some rest today. I hope he does get some rest after saving our hobby server. Thanks Warren! I hope you do well in the run tomorrow buddy.

Syndicate content