Scaling Experts

We will show you how to scale.

Leveraging an API for DNS updates

Download our Web Scaling eBook

Web Scaling vol. 1 - Small Architectures

DNS Updates

Back in the the day, human intervention was required for most DNS manipulations. Nowadays, many providers such as Linode offer an API for accessing and managing your domains. This means it is now possible to script the failover, by having an agent detect network changes and perform calls to an API in order to update DNS zone entries.

How do you do it?

I wrote a simple Ruby script which handles DNS zone updates. It is meant to run on a *NIX-based operating system such as Debian or OS X. The script is very simple, but it is more of a proof of concept than anything else.

The idea is to create an initial A record in a zone, and let the script run in the background, and update the zone when an IP address change is detected.

The mechanisms for detecting or requiring a change can be tweaked and scripted for even more automation, but the premise remains the same: no need to login to a website anymore to perform any sort of DNS manipulations.

Get the script here:

https://github.com/alexwilliamsca/linode-dynamic-dns

The path to bliss: Autoscaling

Download our Web Scaling eBook
Web Scaling vol. 1 - Small Architectures

Autoscaling

I’ve been playing with the idea of autoscaling for quite some time (years, for real!). Only recently has this become more and more feasible, and I think I finally figured out a way to do it on my own.

Simply complex

I think it’s quite simple, but my best friend who happens to be a kick-ass sysadmin (@patrixl) gave me the longest blank stare ever when I tried to explain this to him. For that reason I’ve added lots more text to this article than I really wanted. Hopefully it’ll make my explanation easier to grasp.

The idea

For starters, be aware this applies only to virtualized server instances, and you’ll understand why shortly. The idea stemmed from 2 big problems when you’re running a business:

  1. Physical servers with exact specifications take way too long to provision (whether you’re ordering the equipment yourself, or relying on a web hosting company to do it for you).
  2. Capacity planning is extremely difficult, and usually very expensive (planning time and money lost from unused resources).

At my previous employer (a web hosting company), customers would sometimes send urgent requests for more servers with exact specifications, only to be told it would take one week due to back-ordered parts or mismatched RAM modules (accidents happen).

Most people just have no idea how much capacity they need until suddenly a blog post puts them on “The Map” and their website suddenly requires some serious additional resources.

What to do?

Well you can throw your arms in the air and quit your job, or you can laugh in amusement at how awesome it feels to automate your previous “job”, and go back to enjoying that wonderful mojito on the beach in DR.

OK shutup, tell me how!?!

Try this: create a virtual server for your web server, but limit the resources with absolute and relative values:

  • ionice the disk IO with a low priority (say… 7)
  • limit the CPU to just 25% (1/4 of CPU availability)
  • limit the memory to just 1GB (of 4GB)
  • limit the network to just 200Mbps (approx 1/4 of actual network throughput ~800Mbps)

Next, run a benchmark on that virtual server to see exactly how many website visitors that instance can handle.

Finally, duplicate that virtual server on the same physical machine. Technically, you should be able to serve exactly TWO times more concurrent visitors assuming there isn’t a bottleneck elsewhere (disk/network/external database).

Now what?

Well there you have it! You just created your own local “EC2-ish” instance for handling X visitors. Next time you need to handle Y more visitors, just launch the appropriate amount of virtual servers to support those visitors. The best part, you don’t need to provision physical machines with exact specifications anymore. Since your virtual servers only require a small amount of resources, you can simply request one or more physical machines with “at least X CPU and at least Y RAM etc”.

In our example above, if you provision a server with 8GB RAM and 2x 1Gigabit NICs, you should technically be able to deploy 8 virtual server instances. If your hosting provider can only give you 4 servers with 2GB of RAM, no problem! you can still deploy 8 virtual server instances, but spread across 4 machines instead of 1. Hah!

Beats the hell out of “NEEDING” 48GB RAM and 4 Quad Xeons by tomorrow.

Autoscaling it

To get to the point where you can auto-scale, you need the following:

  • The ability to detect your website’s usage and automatically trigger an alert when a limit has been reached.

This alert should NOT email/page/call/sms/ping/tweet/like/+1 you, duh you’re busy sleeping! The alert should call a script or command to launch the deployment process of a new virtual server. Any alerts you receive should be due to something critical, such as: the script didn’t work and all hell broke loose!

  • The ability to find a physical server in your network with available resources for your virtual server instance.

If your physical servers are all at capacity, you need to get on that ASAP. In the meantime, your deployment/automation scripts should be able to deploy your virtual server in an alternate location, such as a Cloud Hosting provider or a VPS provider somewhere. Obviously you’ll have a lot of work for the initial setup, but make sure you do it. You’ll sleep better at night.

  • The ability to deploy a perfect/fully configured virtual server instance with only 1 command.

You can do this by running your own PXE server with pre-configured network boot based on a MAC Address prefix which your virtual server instances will use (beware network booting doesn’t scale well past 10,000 servers).

You can also do this by scripted commands, or with the help of a sweet Puppet configuration to automate your virtual server installation/configuration.

  • A load-balancer pre-configured with like, hundreds of IPs for your future servers.

You don’t HAVE to pre-configure IPs, but you need some place to reserve those IPs which will be assigned/configured on your future virtual servers. One technique is to run a script which modifies your load-balancer once a new virtual server is deployed, and another is to simply configure 100 servers and hope you don’t need to scale from 1 to 100 while you’re sleeping on the beach.

  • The ability to remove a virtual server if it’s not needed.

Once again, you should automate the destruction of a virtual server which is costing you money if you don’t need it anymore (ex: an actual EC2 instance deployed in an emergency).

Feel free to post your ideas, techniques and thoughts.

The correct way to use virtualization

Download our Web Scaling eBook
Web Scaling vol. 1 - Small Architectures

Why virtualize?

This question has been asked and answered a million times. Most IT consultants will tell you exactly how to virtualize your environment, depending on your needs. That’s fine, but the truth is, there are actual valid reasons to do it, and they have nothing to do with your needs.

  • You’ll save money, and since saving money is important for EVERYONE, that need is not specific to you.
  • You’ll save time, once again EVERYONE needs to save time.
  • You’ll simplify your life, now seriously if you don’t get my point, just stop reading.

That leaves us with the question, why doesn’t EVERYONE virtualize?

Well to answer that, you have to be aware that virtualization only solves a specific set of problems. Technology hasn’t reached the point where it can be applied to absolutely every situation, so for the moment we have no choice but to pick and choose.

What to virtualize?

That’s a fun question to answer, because most people get it wrong, and then complain that their servers are slow, or having strange problems, or unbearably difficult to manage (what?).

In my opinion, absolutely everything CAN be virtualized, but SHOULDN’T for the sake of not waking up at 3am trying to figure out why your server didn’t actually WRITE the data to disk when it said it would.
Here’s my current list of things to virtualize:

  • Web servers
  • Caching servers
  • Load-balancers, firewalls, routers and proxys
  • Non-busy mail servers
  • DNS servers
  • All other application servers with little to no disk IO activity

Here’s my current list of things to NOT virtualize:

  • File servers
  • Database servers

You’ll notice the trend, essentially it’s related to disk IO. With virtualization, you’re sharing devices and adding another layer between your disk and your data. When data needs to be written to disk, it’s usually because it’s extremely important information. In regards to file/database servers, you just can’t afford having layers of software (sometimes buggy) between your data and the disk.

Some might say: “Don’t virtualize load-balancers or firewalls because you lose network performance”, which is a correct statement, but entirely based on peanuts. In fact, if you have network performance issues, you can easily spawn the same virtual servers on multiple physical machines, therefore potentially increasing your total network throughout beyond that of what 1 small Gigabit network adapter can do. There’s a word for that: Scalability.

Are we doomed?

No, we’re not doomed! In fact, you can go ahead and virtualize your file/database servers too, but make sure you do one thing:

  • PCI PASSTHROUGH

That one thing will be like magic. You see with PCI Passthrough, you’re essentially giving full control of your PCI device to one of your virtual machines. If you have only 1 virtualized fileserver accessing a PCIe RAID adapter through PCI Passthrough, you’ll not only get a huge performance boost (native performance), but you’ll also have direct access to the disk as if you weren’t using virtualization at all.

The advantage now is that your environment becomes much more homogenized (which is good when you have fifty thousand servers to manage), and you can easily use the other wonderful features of virtualization, with some minor caveats (usually regarding live migrations).

The bright and dark sides of the datacenter

Download our Web Scaling eBook
Web Scaling vol. 1 - Small Architectures

The Dark Sides

If, like me, you’ve spent years hanging in and around datacenters, you’ll most likely know the “dark sides” of said places. For those who don’t know, here’s a short list:

  • cabling messes: many datacenters have grown organically without proper planning, therefore cable management has become incredibly difficult and messy.
  • uneven cooling: unplanned cooling requirements sometimes force datacenters to install “temporary” portable cooling units and tubing to equipment generating unusually high amounts of heat.
  • poor power management: got 2 x Cisco 6500s? but do you have ANY idea how much power they use? Unmeasured amperage can blow a circuit, taking down 3 or 4 racks at the same time.
  • datacenter operators from hell: these kids, 18yrs old with minimal computer skills and don’t care about anything but their paycheck. They are the ones who “accidentally” plug your main DB server into another client’s switch.

Now this might seem extreme, but I can guarantee even the biggest and most famous server cloud/hosting/colocation companies have at least one of these problems. You’ll know when you work for and with them, nothing is perfect, that’s why you never get 100% uptime 😉

The Bright Sides

If you step in a puddle of water every morning when leaving your house, eventually you will learn to jump over it and keep your shoes/socks dry. Well the same thing happened with datacenters. The ones who build and design them are constantly learning from their mistakes. Nowadays, the big companies have mostly got their act together and rarely experience problems caused by poor planning.

One lingering problem, and perhaps a solution

On the other hand, the human factor still exists, so I must go back to that “datacenter operator from hell”. I’m not referring to Sysadmins with long hair who keep to themselves and work odd hours, we’re cool and we know what we’re doing.

I’m talking about the youngins. I’ve discussed this issue with previous coworkers and other people in the industry, and it’s the one constant that no one has seemed to figure out how to change.

In my opinion, there are a few important factors to prevent these kids from continually causing downtime for our services:

  1. Automate as much as possible to minimize manual intervention.
  2. Companies need to stop acting like datacenter operators are replaceable drones/monkeys.
  3. Companies need to hire people who care, and who are qualified to play with such expensive equipement.
  4. Companies need to give these people different responsibilities throughout the day, because connecting cables all day long is not fun for anyone.
  5. Companies need to pay these people MORE money.
  6. Datacenter operators should work in pairs when handling physical equipement. This ensures at least another person is there to confirm what is being done.

Planning for failure

When you’re at the mercy of a hosting company with “datacenter operators from hell”, your only option is to plan for failure. Serious failure. If you put all your eggs in the same basket, you will eventually face an unplanned outage caused by someone “tripping” over a cable or replacing the wrong failed RAID disk.

If you plan ahead, and are aware that things can and will go wrong, then you’ll be able to endure these problems without suffering any serious consequences (ex: losing your customers’ trust).