I’m starting to feel like I’m bitching and complaining a lot on here instead of documenting steps towards resolving issues. Today’s subject, Media Temple’s Grid-Service hosting. I personally don’t have a Media Temple account (for reference, I am currently hosting on a dedicated box with ServerBeach) but have a few friends, and a few clients that use them. Now I’ve noticed their slowness and have heard a few back up horror stories, but never before have I worked through a Grid-Server crash. Before I continue, let me quote their web site:
“(gs) is a cluster-based, modern hosting service powered by hundreds of servers working in tandem to power your websites, applications and email with unrivaled power, burstability and reliability.”
Sounds great, right? I thought so, in theory this is perfect; multiple servers, virtual machines, redundancy, zero change of a server crash? So I thought, I thought the point of having cloud hosting like this is that is anything went wrong, they could just bring up a copy of the VM that took a shit and you’d be good to go. Not so much, to say the least. A week or so ago, one of my client’s site’s went down, hard. Upon further investigation, their (gs) took a shit and was in the process of being brought back up. WHen I say brought back up, I don’t mean a new image was brought up, I mean they had to restore backups and it took literally days. Okay, not bad, server’s back up and all is well in the neighborhood, right? Wrong. Now their shopping cart is throwing errors. “Did it work before the crash?” I asked. “Sho’nuff” was the reply (paraphrasing of course). So I was later asked to troubleshoot and resolve the issue, which I did. So what was the problem? The shopping cart code had hardcoded paths (that’s an issue in itself, but I wasn’t here to update the code to use config files) that were all pointing to non-existing paths. I logged into the (mt) admin and got the system paths from there, updated the code and “voilà” all is well again. That struck me as very strange, if we’re running on a cluster and on virtual machines, why wouldn’t the server come back up exactly the same as before the crash? My conspiracy theory is that they didn’t bring back up a copy of that server and restore an incremental backup, they built a new VM differently than the original was built, and that led to the path’s being different. Now I could be wrong with all this, but the whole thing does smell fishy. Either way, I think it’s great that Media Temple was willing and able to provide a credit for the suffered downtime and issue that was encountered after the “restore”.