The bright and dark sides of the datacenter

by scalingexperts

Download our Web Scaling eBook
Web Scaling vol. 1 - Small Architectures

The Dark Sides

If, like me, you’ve spent years hanging in and around datacenters, you’ll most likely know the “dark sides” of said places. For those who don’t know, here’s a short list:

  • cabling messes: many datacenters have grown organically without proper planning, therefore cable management has become incredibly difficult and messy.
  • uneven cooling: unplanned cooling requirements sometimes force datacenters to install “temporary” portable cooling units and tubing to equipment generating unusually high amounts of heat.
  • poor power management: got 2 x Cisco 6500s? but do you have ANY idea how much power they use? Unmeasured amperage can blow a circuit, taking down 3 or 4 racks at the same time.
  • datacenter operators from hell: these kids, 18yrs old with minimal computer skills and don’t care about anything but their paycheck. They are the ones who “accidentally” plug your main DB server into another client’s switch.

Now this might seem extreme, but I can guarantee even the biggest and most famous server cloud/hosting/colocation companies have at least one of these problems. You’ll know when you work for and with them, nothing is perfect, that’s why you never get 100% uptime 😉

The Bright Sides

If you step in a puddle of water every morning when leaving your house, eventually you will learn to jump over it and keep your shoes/socks dry. Well the same thing happened with datacenters. The ones who build and design them are constantly learning from their mistakes. Nowadays, the big companies have mostly got their act together and rarely experience problems caused by poor planning.

One lingering problem, and perhaps a solution

On the other hand, the human factor still exists, so I must go back to that “datacenter operator from hell”. I’m not referring to Sysadmins with long hair who keep to themselves and work odd hours, we’re cool and we know what we’re doing.

I’m talking about the youngins. I’ve discussed this issue with previous coworkers and other people in the industry, and it’s the one constant that no one has seemed to figure out how to change.

In my opinion, there are a few important factors to prevent these kids from continually causing downtime for our services:

  1. Automate as much as possible to minimize manual intervention.
  2. Companies need to stop acting like datacenter operators are replaceable drones/monkeys.
  3. Companies need to hire people who care, and who are qualified to play with such expensive equipement.
  4. Companies need to give these people different responsibilities throughout the day, because connecting cables all day long is not fun for anyone.
  5. Companies need to pay these people MORE money.
  6. Datacenter operators should work in pairs when handling physical equipement. This ensures at least another person is there to confirm what is being done.

Planning for failure

When you’re at the mercy of a hosting company with “datacenter operators from hell”, your only option is to plan for failure. Serious failure. If you put all your eggs in the same basket, you will eventually face an unplanned outage caused by someone “tripping” over a cable or replacing the wrong failed RAID disk.

If you plan ahead, and are aware that things can and will go wrong, then you’ll be able to endure these problems without suffering any serious consequences (ex: losing your customers’ trust).