Zurück

Web Server Failover with Round-Robin DNS


Overview

If you serve a particularly popular site, you will eventually find the wall at which your server is simply down for a while or can't serve any more requests.

In the web server world, this is called the Slashdot effect, and it isn't a pretty site (er, sight). While adding RAM, upgrading processors, and using faster drives and buses will help in the short term, you may eventually find that a single machine can't possibly perform the work that needs to be done.

One way to overcome the limits of the monolithic server is to distribute the load across many machines. By adding a second (or third) server to the available pool of machines, you can not only increase performance but also add to the stability of the network. If you have a hot spare (or three) running all of the time, then if one develops trouble, the others can take over for it without any downtime.

Round-Robin DNS

Perhaps the easiest way to distribute the load of public traffic is to have the public do the work of distributing the load for you. Through the magic of round-robin DNS, inbound requests to a single host name can be directed to come from any number of IP addresses.

In BIND 9, this is as easy as adding multiple A records for a single host. For example, suppose we use this in the zone file for akadia.com:

www 60 IN A 217.193.130.251
www 60 IN A 193.247.121.197

Now, when a hosts looks up www.akadia.com in DNS, about half of the time they will see .....

host www.akadia.com
www.akadia.com has address 217.193.130.251
www.akadia.com has address 193.247.121.197

.... and the rest of the time, they get:

host www.akadia.com
www.akadia.com has address 193.247.121.197

www.akadia.com has address 217.193.130.251

As most applications only use the first address returned by DNS, this works rather nicely. Approximately half of the requests will go to each address, and therefore the load of two servers should be roughly half of that of a single server. We set the TTL low (to 60 seconds) to prevent any intervening caching DNS servers from hanging onto one sort order for too long, which will hopefully help keep the number of requests to each host more or less equal.

It is only useful to spread out the load if all of the servers are in agreement about what they're serving. If your data gets out of sync, then browsers might get one version of a web page on the first hit and another when they hit reload.