Posts Tagged ‘load balance’

Quick and Dirty (and Free!) Host Monitoring for DNS Failover and Round-Robin

Round-Robin DNS gets trash-talked a lot because although it is a cheap and easy way to distribute loads it is counter-redundant: the more A records (servers) there are behind a domain the more points of failure there are and the lower your mean time to failure is going to be. The good news is that if one in five web servers/reverse proxies are down then only about one fifth of your audience is unable to connect at any given time.

The answer to this problem is host monitoring. If we can update our DNS records to remove the IPs of downed servers then add them back when the hosts recover no direct intervention on our part is required. Unfortunately, DNS is a heavily cached system so we will have to work with reasonably short timeouts. DNS Made Easy recommends a TTL of no less than 180 seconds as some ISPs are configured to ignore the TTLs of records which they deem are too short and default to a much higher value. The drawback to short TTLs is that you will end up receiving more DNS queries, which is a problem if you use a commercial billed-by-million-queries DNS provider like Amazon’s Route 53 or EasyDNS’s enterprise service.

If your objective is to have web server failover that happens instantly this is simply not the solution for you – you need a load balancer and/or anycast address space. Amazon’s Route53 and DNS Made Easy can be configured to check as often as every minute and it doesn’t make a lot of sense to run a ping/tcp test more often than that. At worst this means that the failover system doesn’t even know there is a problem for up to 60 seconds. Once the failover system updates the records there may be a short delay while the slave name servers synchronize. Then we have to wait for the record to expire at any-given-user’s ISP’s recursive name servers, which could take up to the TTL of your record or longer if their ISP is manipulative. Then you may have to wait for the record to expire in the caching DNS daemon on their home or office router. Then you may have to wait for the record to expire in their OS or browser’s DNS cache. This could take up to 15 minutes even if you use a very low TTL like 180.

So the question is: you already have DNS infrastructure. Why pay these large DNS outfits for host monitoring and DNS failover when it’s not really that great anyway and you can do it just as well as they can?

Just because BIND doesn’t have built-in support? Pshaw!

You could just as easily do the host monitoring with nagios/icinga or use the mysql-bind backend or even some other database-backed name daemon but in this article I’ll show you how to drop in a simple shell script that will work with your existing BIND installation because it demonstrates how mind-numbingly simple this is and why it shouldn’t be charged for as a premium service.

Observe a typical zone file with round-robin:

$TTL 6400       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                201305140       ; Serial
                                28800           ; Refresh
                                7200            ; Retry
                                60480           ; Expire
                                600 )           ; TTL Minimum
@               IN      A       10.0.0.10
@               IN      A       10.0.0.11
@               IN      A       10.0.0.12
@               IN      A       10.0.0.13
@               IN      A       10.0.0.14
*               IN      A       10.0.0.10
*               IN      A       10.0.0.11
*               IN      A       10.0.0.12
*               IN      A       10.0.0.13
*               IN      A       10.0.0.14
ns1             IN      A       10.0.1.10
ns2             IN      A       10.0.1.11
@               IN      NS      ns1.somedomain.com.
@               IN      NS      ns2.somedomain.com.
www             IN      CNAME   somedomain.com.

Our SOA contains the serial which will have to be updated by the script if our changes are to propagate properly. In the zone file on the master server(s) replace the SOA and block of round-robin A records with $INCLUDE statements like this:

$INCLUDE "/var/bind/soa.include"
$INCLUDE "/var/bind/ips.include"
ns1             IN      A       10.0.1.10
ns2             IN      A       10.0.1.11
@               IN      NS      ns1.somedomain.com.
@               IN      NS      ns2.somedomain.com.
www             IN      CNAME   somedomain.com.

Do this for every zone file which is to use this pool of A records. Now we have a centralized place to put the IPs and serial number that come from the shell script.

Create the script on the master name server and chmod +x it, don’t forget to update the paths to reflect your DNS situation. Also note that I’m adding a wildcard subdomain to the pool:

#!/bin/bash
HOSTS="10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14"
COUNT=4
echo "; Generated by monitor.sh $(date)" > /chroot/dns/var/bind/ips.include
for myHost in $HOSTS
do
  count=$(ping -c $COUNT $myHost | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }')
  if [ $count -eq 0 ]; then
    # 100% failed 
    echo "$(date) $myHost is down" >> /var/log/monitor.log
  else
    echo "@               IN      A       $myHost" >> /chroot/dns/var/bind/ips.include
    echo "*               IN      A       $myHost" >> /chroot/dns/var/bind/ips.include
  fi

done

echo "; Generated by monitor.sh $(date)
\$TTL 300       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                $(date +%s)      ; Serial
                                300             ; Refresh
                                60              ; Retry
                                86400           ; Expire
                                300 )           ; TTL Minimum" > /chroot/dns/var/bind/soa.include

rndc reload

This script will ping each host in the HOSTS array four times. If at least one ping is received the host is written to a new version of ips.include (note the single angle bracket when inserting the date). If all four pings fail a message will be recorded in /var/log/monitor.log. You may want to adjust the number of pings and failure tolerance, or replace the logging line with an e-mail notification instead. Once the ping tests are done a new soa.include is written with an epoch serial number and the zones are reloaded.

At the end of execution you should see something like this in ips.include:

; Generated by monitor.sh Tue May 14 16:15:26 EDT 2013
@               IN      A       10.0.0.10
*               IN      A       10.0.0.10
@               IN      A       10.0.0.11
*               IN      A       10.0.0.11
@               IN      A       10.0.0.12
*               IN      A       10.0.0.12
@               IN      A       10.0.0.13
*               IN      A       10.0.0.13
@               IN      A       10.0.0.14
*               IN      A       10.0.0.14

And in soa.include:

; Generated by monitor.sh Tue May 14 16:15:26 EDT 2013
$TTL 300       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                1368562526      ; Serial
                                300             ; Refresh
                                60              ; Retry
                                86400           ; Expire
                                300 )           ; TTL Minimum

Note that you may need to chown named: the .include files after they are created the first time, depending on your environment.

I switched from using the widely popular YYYYMMDDID format to epoch since the 5 minute interval requires hours, minutes and seconds to be effective and YYYMMDDHHMMSS is too large a value for BIND. This resulted in a lower serial value – you may have to go around to your slaves and manually delete then reload their zone files.

This approach ends up generating a lot of NOTIFY traffic since every 5 minutes (or whatever interval you cron the shell script at) a new serial is loaded and all of your slaves have to be contacted. A more graceful improvement would be to save the state that each host is in inside of a temporary file and only update the serial when there has actually been a change in the status of your pool.

Another neat thing I thought of trying was using something like heartbeat for real-time monitoring and dnsupdate to dynamically update the zone files. This should narrow the propagation latency on your side of the equation down to the barest minimum possible.

Mass Virtual Hosting Part Eight: MySQL-Proxy for Easy Network Topology Changes and Localhost vs. Sockets

Once your hosting clients are all settled in you may find one day that you need to change their MySQL server address or other configuration parameters. Naturally it’s not going to look good on you or be a very good use of your time to contact every webmaster and have them update their settings. Worse, juggling two active database servers would be a nightmare. Fortunately Oracle came up with mysql-proxy, a lightweight app that sits between your clients and MySQL server(s) which acts as a drop-in replacement for mysqld. Users connect to the proxy like they would the actual server and it transports data to and fro. You can do all sorts of neat things to the data while it’s in motion with lua scripts but that goes beyond the focus of this article.

By default mysql-proxy listens on port tcp/4040 and mysqld listens on tcp/3306. In my experience most users who come from other ISPs are already wired to use localhost as their default SQL host and I don’t want to make them have to remember the port number, which is typically defaulted in most webapps. If you’re running mysqld on the same host you’re serving web from you’ll need to change the port it listens on in /etc/mysql/my.cnf.

MySQL has a contentious age-old convention of changing “localhost” to “use the local socket” rather than resolving localhost to 127.0.0.1 and connecting via TCP. That’s because at the time the decision was made local sockets were far more efficient than using TCP, now it’s not so much an issue. We need to configure mysql-proxy to listen to a socket too or our users will have to use the numeric address, lest they encounter this error:

ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

This can be specified – not all too intuitively – by replacing the proxy-address value in /etc/mysql/mysql-proxy.cnf with the default path to the local socket, thus:

proxy-address = /var/run/mysqld/mysqld.sock

Configure the proxy-backend-address variable to reflect the actual server’s location and port number. Restart mysql-proxy and you now have a working, default-looking configuration that can be redirected anywhere. Thanks to the lua capability of the proxy you can even implement fast and easy load balancing and failover, but that will be the topic of another article!

Return top
foxpa.ws
Online Marketing Toplist
Internet
Technology Blogs - Blog Rankings

Internet Blogs - BlogCatalog Blog Directory

Technology blogs
Bad Karma Networks

Please Donate!


Made in Canada  •  There's a fox in the Gibson!  •  2010-12