Posts Tagged ‘redundant’

Quick and Dirty (and Free!) Host Monitoring for DNS Failover and Round-Robin

Round-Robin DNS gets trash-talked a lot because although it is a cheap and easy way to distribute loads it is counter-redundant: the more A records (servers) there are behind a domain the more points of failure there are and the lower your mean time to failure is going to be. The good news is that if one in five web servers/reverse proxies are down then only about one fifth of your audience is unable to connect at any given time.

The answer to this problem is host monitoring. If we can update our DNS records to remove the IPs of downed servers then add them back when the hosts recover no direct intervention on our part is required. Unfortunately, DNS is a heavily cached system so we will have to work with reasonably short timeouts. DNS Made Easy recommends a TTL of no less than 180 seconds as some ISPs are configured to ignore the TTLs of records which they deem are too short and default to a much higher value. The drawback to short TTLs is that you will end up receiving more DNS queries, which is a problem if you use a commercial billed-by-million-queries DNS provider like Amazon’s Route 53 or EasyDNS’s enterprise service.

If your objective is to have web server failover that happens instantly this is simply not the solution for you – you need a load balancer and/or anycast address space. Amazon’s Route53 and DNS Made Easy can be configured to check as often as every minute and it doesn’t make a lot of sense to run a ping/tcp test more often than that. At worst this means that the failover system doesn’t even know there is a problem for up to 60 seconds. Once the failover system updates the records there may be a short delay while the slave name servers synchronize. Then we have to wait for the record to expire at any-given-user’s ISP’s recursive name servers, which could take up to the TTL of your record or longer if their ISP is manipulative. Then you may have to wait for the record to expire in the caching DNS daemon on their home or office router. Then you may have to wait for the record to expire in their OS or browser’s DNS cache. This could take up to 15 minutes even if you use a very low TTL like 180.

So the question is: you already have DNS infrastructure. Why pay these large DNS outfits for host monitoring and DNS failover when it’s not really that great anyway and you can do it just as well as they can?

Just because BIND doesn’t have built-in support? Pshaw!

You could just as easily do the host monitoring with nagios/icinga or use the mysql-bind backend or even some other database-backed name daemon but in this article I’ll show you how to drop in a simple shell script that will work with your existing BIND installation because it demonstrates how mind-numbingly simple this is and why it shouldn’t be charged for as a premium service.

Observe a typical zone file with round-robin:

$TTL 6400       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                201305140       ; Serial
                                28800           ; Refresh
                                7200            ; Retry
                                60480           ; Expire
                                600 )           ; TTL Minimum
@               IN      A       10.0.0.10
@               IN      A       10.0.0.11
@               IN      A       10.0.0.12
@               IN      A       10.0.0.13
@               IN      A       10.0.0.14
*               IN      A       10.0.0.10
*               IN      A       10.0.0.11
*               IN      A       10.0.0.12
*               IN      A       10.0.0.13
*               IN      A       10.0.0.14
ns1             IN      A       10.0.1.10
ns2             IN      A       10.0.1.11
@               IN      NS      ns1.somedomain.com.
@               IN      NS      ns2.somedomain.com.
www             IN      CNAME   somedomain.com.

Our SOA contains the serial which will have to be updated by the script if our changes are to propagate properly. In the zone file on the master server(s) replace the SOA and block of round-robin A records with $INCLUDE statements like this:

$INCLUDE "/var/bind/soa.include"
$INCLUDE "/var/bind/ips.include"
ns1             IN      A       10.0.1.10
ns2             IN      A       10.0.1.11
@               IN      NS      ns1.somedomain.com.
@               IN      NS      ns2.somedomain.com.
www             IN      CNAME   somedomain.com.

Do this for every zone file which is to use this pool of A records. Now we have a centralized place to put the IPs and serial number that come from the shell script.

Create the script on the master name server and chmod +x it, don’t forget to update the paths to reflect your DNS situation. Also note that I’m adding a wildcard subdomain to the pool:

#!/bin/bash
HOSTS="10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14"
COUNT=4
echo "; Generated by monitor.sh $(date)" > /chroot/dns/var/bind/ips.include
for myHost in $HOSTS
do
  count=$(ping -c $COUNT $myHost | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }')
  if [ $count -eq 0 ]; then
    # 100% failed 
    echo "$(date) $myHost is down" >> /var/log/monitor.log
  else
    echo "@               IN      A       $myHost" >> /chroot/dns/var/bind/ips.include
    echo "*               IN      A       $myHost" >> /chroot/dns/var/bind/ips.include
  fi

done

echo "; Generated by monitor.sh $(date)
\$TTL 300       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                $(date +%s)      ; Serial
                                300             ; Refresh
                                60              ; Retry
                                86400           ; Expire
                                300 )           ; TTL Minimum" > /chroot/dns/var/bind/soa.include

rndc reload

This script will ping each host in the HOSTS array four times. If at least one ping is received the host is written to a new version of ips.include (note the single angle bracket when inserting the date). If all four pings fail a message will be recorded in /var/log/monitor.log. You may want to adjust the number of pings and failure tolerance, or replace the logging line with an e-mail notification instead. Once the ping tests are done a new soa.include is written with an epoch serial number and the zones are reloaded.

At the end of execution you should see something like this in ips.include:

; Generated by monitor.sh Tue May 14 16:15:26 EDT 2013
@               IN      A       10.0.0.10
*               IN      A       10.0.0.10
@               IN      A       10.0.0.11
*               IN      A       10.0.0.11
@               IN      A       10.0.0.12
*               IN      A       10.0.0.12
@               IN      A       10.0.0.13
*               IN      A       10.0.0.13
@               IN      A       10.0.0.14
*               IN      A       10.0.0.14

And in soa.include:

; Generated by monitor.sh Tue May 14 16:15:26 EDT 2013
$TTL 300       ; max TTL
@       IN      SOA     ns1.somedomain.com. admin.somedomain.com. (
                                1368562526      ; Serial
                                300             ; Refresh
                                60              ; Retry
                                86400           ; Expire
                                300 )           ; TTL Minimum

Note that you may need to chown named: the .include files after they are created the first time, depending on your environment.

I switched from using the widely popular YYYYMMDDID format to epoch since the 5 minute interval requires hours, minutes and seconds to be effective and YYYMMDDHHMMSS is too large a value for BIND. This resulted in a lower serial value – you may have to go around to your slaves and manually delete then reload their zone files.

This approach ends up generating a lot of NOTIFY traffic since every 5 minutes (or whatever interval you cron the shell script at) a new serial is loaded and all of your slaves have to be contacted. A more graceful improvement would be to save the state that each host is in inside of a temporary file and only update the serial when there has actually been a change in the status of your pool.

Another neat thing I thought of trying was using something like heartbeat for real-time monitoring and dnsupdate to dynamically update the zone files. This should narrow the propagation latency on your side of the equation down to the barest minimum possible.

Nine Web and Server Hosting Providers I Hate

I’ve had a lot more crummy experiences with web hosts, collocation and dedicated server providers than I have glowy, happy ones – as seems to be the case for most everyone in this industry subject to a budget. For your benefit (and my therapy) here are some of the crappier ones I have had the pleasure of leaving, and why:

1&1
One and one is probably best described as the Wal-Mart of Internet Service Providers. If you don’t expect a lot and you’re happy with cheap crap it could be right for you. Tickets generally go unresolved for one or two full days and their shared hosting servers have hidden limitations even where unlimited resources are advertised. In particular if your site starts receiving “too much” traffic they will start tossing 500 errors and their techs may take days and days to apologetically tell you they have no idea why.

Netfirms
One and one of the North – Nerfirms’ service is based in Toronto, Ontario. They are so oversold if you do not notice the hit to your page load time the minute you move your site there it only speaks to what kind of host you were coming from. Netfirms shared hosting also suffers from hidden limitations (CPU and RAM utilization), which are actually much lower on the higher-priced business package than 1&1′s low-end package.

Hostway
Hostway is yet another crappy discount hosting provider that was at one time a decent place to grab certain domain names on the cheap despite their chronically buggy administration interface. Hostway seems to outsource most of their technical support (as with 1&1), switching from some joint in India to some joint in Russia or another ghastly eastern European country some time during my tenure. The price increase on domains alone would have been enough to drive me away if it weren’t for the fact that I have had multiple tickets with them open for literally weeks at a time. I’ve learned that you can often judge a service provider by the quality of their VoIP system – if the hold music sounds fuzzy or cuts out a lot run. run and never look back.

Hosting Check
NoMonthlyFees.Com was doing fine until it was bought out by Hosting Check, their service is now so oversold the lowest traffic sites are now having serious availability issues, nevermind speed issues. Hosting Check charges $20/yr for dot coms – at that price I expect them to be grown organically and harvested on a fair-trade farm.

PaylessDomains.ca
I’ve never used Payless for anything but .ca domain name registrations so they wouldn’t have made it on the list if their (outgoing) domain transfer function hadn’t been suspiciously broken ever since I started using them over three years ago. Somehow, continuing to renew domains at a registrar whose prices I had matched a long time ago for the simple fact that I was too lazy to contact their technical support every time the front end told me an error occurred trying to unlock the domain made me feel like I was being taken advantage of a little. My fears were confirmed when I e-mailed in and was asked to provide a reason for wanting to transfer my domain.

I told them it was because they are crooked. My authorization code came in a template e-mail, further supporting the theory that the web-based function to unlock and obtain the authorization code was never intended to actually work. Dirty dirty dirty. Mass migration ensued.

3z Canada
3z is a discount collocation provider operating out of 151 Front St. Toronto, Ontario. Their business almost entirely consists of Chinese clients who need servers with a Canadian presence. I don’t have a lot of nasty things to say about them because I had to virtually fight with them to take my money for a month before I decided to stop wasting my time and get colo elsewhere. Interestingly, in the same breath the Moxie Communications/Secure Access Colo owner told me 151 residents are a back-stabbing rumour-driven lot, he told me he had once rented rack space to them and found them “taking liberties” with some of his and other tenants’ free space that had not been properly negotiated for. Of course, how much truth there is in that rumour I can’t say because the Moxie owner is himself a greaseball of epic proportions (see below).

Carat Networks
Carat Networks, formerly “Clearance Rack” is based in Hamilton, Ontario. A discount colo and dedicated provider once operating out of a 10×10′, chronically overheating telecommunications “bunker” out in the middle of some guy’s farm. They now occupy Hamilton’s Mountain Cable facility where the customer service and competence are still as low as the prices.

Secure Access Colo
Another member of the had-to-change-its-name club, Secure Access Colo was formerly known as Moxie Communications. Staffed by imbeciles and at the top of my personal shit list Secure Access Colo actually extorted yours truly out of almost 2 grand. Before I get into that let me take the piss out of them a little so you don’t think this is entirely personal:

  • Redundant physical loop: no
  • Generator: no
  • VoIP (support) not affected by DC going down: no
  • Secure access: Lock on the rack, RFID at the front door – and they will swear up and down the $0.15 RFID card is actually worth $100; no one else plays that guff as it’s generally recognised that you pay a $100 deposit for access cards because that’s just the industry standard. You bend over or you don’t.
  • Rooftop security sensors: True, but not unique and shouldn’t be the crown jewel of a physical security marketing strategy.

I’m not sure what makes Secure Access Colo think it deserves to put Secure Access right in its name. Mediocre at best, except for their one room at 151 Front – where physical security isn’t up to them anyway. Despite the rooftop sensors there’s both a regular door and loading door in the back and if you really want to get in the easiest way is probably through the huge conference room windows or the RFID controlled front doors (one after the other), also almost entirely glass.

Anyway, after sending out a client-base-wide notification that we would have to switch IPs after a certain date their upstream provider (at the time Carrier Connex) cut their pool “early” leaving us totally disconnected before we had even been assigned our new IPs. As it turns out, the date Moxie had given us had only been arranged as an extension on the actual cutoff date with Carrier Connex – unofficially, verbally and with a low level representative. Secure Access/Moxie was not actually paying for IP service up to that date and they were operating on little more than assurances. They offered emergency remote KVM so their clients could log in and assign new IP addresses but I was so incredibly pissed off by this I demanded the termination of my service.

Apparently this was a big mistake, they cited a contract which I had no record of nor recollection of signing and which they consistently failed to produce. They demanded payment for the remaining 6 months of a one year term or they would not release my servers because my decision to terminate was “no fault of theirs” despite the fact that they were clearly negligent in maintaining or extending their contract with Carrier Connex until at least the date set forward in their notification. They held my servers hostage and disconnected for two weeks while I conferred with my lawyer who eventually made it clear that it would be cheaper to pay the bastards than pay his bill – the only way to get your servers back online in a situation like this is to file an injunction first and sue later which is just not in the budget of a young entrepreneur.

Nonetheless, if I didn’t have a greater obligation to my clients I would have been thrilled to have my day in court – these scum truly deserve everything horrible that happens to them.

You can tell we are dealing with some brilliant people by how well their front page photo turned out. You would have to be an asshat to give them your money (in my defence their old site was slightly more convincing).

iWeb
iWeb is based in Montreal, Canada and is rated one of Canada’s top 100 employers by Ernst & Young – I believe it! I was happy to swallow 24 hour+ turnarounds for support tickets in exchange for their very reasonable prices when I was only dealing with them for my personal interests, but when it came time to move a business client in I felt five days with no communication after proving there was a bad stick of RAM in my freshly provisioned server was more than unacceptable. Expect two to three days minimum to get your new servers provisioned. iWeb is notorious for their slow (and patently lazy) support and this is why I am sure they are one of the best companies in the nation to work for. I envision google-esque “crash zones,” fully stocked beer fridges and plush couches where the technical support staff spend their whole days dispensing witty anecdotes about irate anglophone clients amongst one another.

I was unfortunate enough to be a customer of iWeb’s during a particularly long (two weeks?) DDoS against an isolated portion (commercial DNS services) of their network. In their defence a 30gbit/s attack is a significant event, but knowing how networks like theirs are constructed I am disappointed in their apparent inability to protect clients on unrelated segments from collateral damage.

Consider iWeb if you want a sweet server and loads of bandwidth for great prices and can live without reasonably paced technical support (i.e. a mirror, game server, redundant DNS).

Forget about iWeb if you owe any level of care to your clients. You will be made to look a fool and possibly lose your contracts as a result of their slow support – worse you will know the whole time it is happening that things are entirely out of your hands.

On an interesting note, it turns out iWeb (unintentionally) hosts a number of Syrian government sites by way of a third party reselling their services. Their official response:

Several media outlets have recently published articles based on a report from the Citizen Lab organization, which can be found here: http://citizenlab.org/wp-content/uploads/2011/11/canadian_connection.pdf

The report, entitled “The Canadian Connection: An investigation of Syrian government and Hezbullah web hosting in Canada” highlights a very important issue, providing a summary of what has become a complex problem for the web hosting industry.

The Canadian government has enacted regulations that restrict Canadian firms such as iWeb from doing business with certain foreign individuals and entities. iWeb is committed to strict compliance with these laws and continues to monitor its compliance.

In 2008, iWeb inadvertently hosted two websites affiliated with Hezbollah. When iWeb learned of the websites’ affiliation, it cancelled its web hosting services.

Canada has enacted targeted sanctions against certain Syrian government entities and individuals. Canada has not enacted a broad embargo against doing business with Syria.

The Citizen lab report identified a number of Syrian government entities for which the internet address resolves directly or indirectly to iWeb. With one exception, none of the listed entities are subject to Canadian sanctions. The exception is Addunia T.V. which was listed as a sanctioned entity on October 3, 2011. iWeb has not provided any services directly to Addunia T.V. and is investigating whether its facilities have been used by one of its customers for the benefit of Addunia T.V. without its knowledge. iWeb will be taking all appropriate steps in light of its findings.

If you feel you must go with iWeb I at least strongly encourage you to pay entirely by credit card. This way a chargeback can be made against them in case their billing department decides to dick you around.

Please learn from my mistakes and avoid these companies like the plague. In the wonderful world of hosting, dedicated servers and collocation you don’t always get what you pay for.

Just most of the time.

Return top
foxpa.ws
Online Marketing Toplist
Internet
Technology Blogs - Blog Rankings

Internet Blogs - BlogCatalog Blog Directory

Technology blogs
Bad Karma Networks

Please Donate!


Made in Canada  •  There's a fox in the Gibson!  •  2010-12