Why the Website Was Gone All Day Long

The short version is that the hosting company had a problem that caused the webserver part to barf. If you want to read the long version, click through to the rest of the story.

I had gotten a message on Friday that the hardware on which my virtual server resides had to be given an emergency reboot. I checked the websites I host and all was well. This morning I got another one, but nothing would load from my end. I immediately logged in to try and figure out what's going on. Oddly enough, Apache would not shut down gracefully, forcing me to use kill -9 to wipe its processes. Starting Apache again did not fix anything.

After some digging, I found that Apache was throwing a segfault when it started up. I groaned because I knew this meant a lot of fixing beyond my ability. I dropped a note to the hosting company to see if they could be any help and started trying to figure out what I could do about it. Unfortunately, everything I read pointed towards rebuilding Apache. Worse yet, I was still on the 2.0 branch and had long put off upgrading to the 2.2 branch.

I had been doing so because the config files from 2.0 do NOT work with 2.2 right out of the box and would require a lot of tinkering from my end. I'd spent several hours just getting a reliable WebDAV server on 2.0 and didn't want to know what it would take on 2.2 Nevertheless, I synced up Portage, backed up my /etc/apache2 folder and crossed my fingers as I ran "emerge apache". After about 20 minutes of compiling and 10 minutes of trying to interactively merge the config files, I threw my hands up in frustration and asked the hosting company to go ahead and move it to another piece of hardware around 11AM.

And so I waited. And waited. And waited some more. It wasn't done around 1PM and I had to leave to take Shauna for an ultrasound. When I got back around 3PM, it still wasn't up. We left for her brother's house around 5PM and it still wasn't up. We got home around 9:30PM and it was still down. I was pretty frustrated at this point and sent them an e-mail to figure out an ETA. After a few messages back-and-forth, they had it back up so I could try to get Apache to work again.

I had to spend about another 90 minutes getting Apache into a level of base functionality. I still can't use my WebDAV repository and I can't use the web-based mail admin program. Heck, I can't even get the mail admin program to install cleanly anymore. It's kind of messy, so expect the website to go down for a few seconds at a time over the next few days as I sort everything out and try and get that functionality back.

So what do I say about the hosting company, Slicehost? I'm certainly not happy that a simple reboot blew up Apache, though had I been more patient it may have started working on the other server. I'm also not happy that about 8 hours of my waiting was due to them forgetting to flip a virtual switch and send me a notification e-mail. That said, they immediately comped me for three months of hosting when I asked which was far above and beyond what I would expect them to do. This is also the first major issue I've had where there wasn't an immediate fix.

I'm very happy that they take customer service so seriously and have, this incident aside, provided great technical service. I still strongly recommend their service and have learned a few lessons on keeping packages up to date sooner rather than later.

This entry was posted in Geek, Website. Bookmark the permalink.

4 Responses to Why the Website Was Gone All Day Long

  1. Pingback: The Recent Extended Downtime » Free UTOPIA!

  2. Chris says:

    I’d dump Gentoo and go for something stable like Debian. Gentoo has major quality assurance issues.

  3. Jesse says:

    I’ve been getting a little miffed at the way Gentoo handles package updates, especially since config files often have a nasty habit of being changed between ebuilds. I remember the first such incident many moons ago when they totally changed the format of the vpopmail DB config files without any notification in Portage. (This was only a jump from 5.2 to 5.4, so it made it even more unexpected.) That said, I don’t think I want to invest the time to re-build from the ground up just yet.

  4. Mike says:

    I’m sorry your interweb blew up Jesse.

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.