Transparent Proxy for Hot Spot/Public Network Web-Based Authentication on ClearOS

Despite the title  neither making a hot spot nor "public" network is my intent in this article, but what it will cover can be directly applied to these situations.

One of my clients is a small ISP and collocation datacentre which has a network that is configured 100% statically on both sides. This sounds tedious at first but consider that we could use a DHCP server to direct un-configured clients (new server/virtual machine, new router, wiped configuration, new computer etc.) to use a gateway running a transparent web proxy that forcefully tells them to call in and have their router/host reconfigured remotely or by phone.

This ensures that clients call in quickly (because they can't do anything but see the instructions telling them to) rather than assume there is a long-but-temporary problem with the connection, leading to unjustified dissatisfaction with the service. We also want to want to try and outrun any rogue DHCP servers that might be out there because some asshat plugged his d-link in backwards.

I created a VM a couple years ago that did just this but it has succumbed to file system corruption and they would like a new one. In the last week I have been drafting an article on using transparent proxy techniques to provide blanket anonymity to a private network, quelle coincidence! Since it's Christmas eve (and I'm a little drunk) I thought I might get fancy and slap a "technician's login" on this one so the field techs can do quicker and easier testing by using DHCP.

My client has multiple peers and subnets; if they let their clients whom have dropped their configuration simply use DHCP as a "fallback" network to provide the illusion of service continuity we end up with several problems:

  • Clients with public IP addresses who depend on them for incoming connections would probably not notice until "the phones start ringing" because from their side of the NAT they are still able to browse the web etc.
  • Degradation of service where bandwidth controls and Quality of Service have been implemented; defending against accusations of not meeting Service Level Agreements
  • Running the risk of overloading a single pipe with fallback users if they go unattended
    • Load balancing can help address this but has its own caveats
  • IP-based accounting and service monitoring will be negatively impacted

Before we begin you will need a working deployment of ClearOS, either installed to a dedicated physical server or running in a virtual machine. You can use my pre-installed virtual machines to get running quickly:

Note: Virtual machines should be given at least 300MB RAM to work with when using Squid or you may encounter counter-intuitive out-of-memory issues.

We will need to install at least three modules from ClearSDN.  If you are using the above-listed virtual machine images you must load the following address into your browser before you can register with ClearSDN: https://lan-ip:81/admin/register.php?Reset=yes . In webconfig, go to Software Modules under the ClearCenter menu and ensure the following modules are installed:

  • Content filtering module (DansGuardian)
  • Web proxy server module (Squid)
  • Caching nameserver module (provides DHCP)
  • Firewall - custom rules

First you'll need to set up a DHCP pool on the private network. Click on DHCP Server in the Network menu. Delete the default pool then click the Add button next to your private network's entry. I chose to use for the private network and since the whole thing will be for DHCP clients it's safe to set the pool from .2 to .254. Once you have saved the pool settings click the Start button at the top of the page to activate dnsmasq's DHCP functionality.

Click on Web Proxy under the Gateway menu. Click the To Auto button then the Start button at the top of the page. Enable Transparent Mode and Content Filter then click the Update button.

Click on Content Filter under the Gateway menu. Click the To Auto button then the Start button at the top of the page. Check the Blanket Block checkbox and click the Update button.

Set up a host to use DHCP and acquire a lease. Try to load any website. If you were successful you should be presented with a nice ClearOS page telling you the content filter has blocked the request. We can replace this page with our own custom sign-in page if we drop it in /var/webconfig/htdocs/public/filtered.inc.php:

        WebHeader("CONFIGURATION ERROR", "splash");
        print("<div style=\"margin-top: 100px;\"></div>");
        WebDialogWarning("Your router or computer is not configured correctly. Please call tech support immediately at (XXX) XXX-XXXX.<br>
You may be asked to provide the make of your router and this IP address: <strong>{$_SERVER['REMOTE_ADDR']}</strong>.");
        print("<form style=\"margin: 0px; padding: 0px;\" action=\"https://{$_SERVER['SERVER_ADDR']}:81/admin/tech.php\" method=\"post\">");
    WebTableOpen("Technician's Login", "600");
    echo "
            <td class='mytablesubheader' nowrap width='200'>Username</td>
            <td><input type=\"text\" name=\"username\"></td>
            <td width='200' class='mytablesubheader' nowrap>Password</td>
            <td><input type=\"password\" name=\"password\"></td>
            <td class='mytablesubheader' nowrap>  </td>
            <td style=\"text-align: center;\"><input type=\"submit\" value=\"Login\"></td>

The layout broke a little when I changed the logo image so I added the 100px div at the top to space things out properly, you may not need it. As you can see we will be processing the request via the webconfig vhost on port 81. This will let us take advantage of its user permissions and function libraries. It does add a kink in fluidity however, as the first time one logs in they will have to accept an unsigned certificate.

UPDATE: I realized after deciding against using the ClearOS API to manipulate netfilter that using webconfig on port 81 for the authentication script is not necessary, further it probably never was due to the fact :82 is actually a webconfig vhost.

Before we make our login processor we need to prep the firewall. Only port 80 is under our control at this point but knowledgeable clients can still tunnel their connection through SSH, VPNs and so forth. If you have installed the custom firewall rules module edit ??/etc/rc.d/rc.firewall.custom otherwise use /etc/rc.firewall.local. The difference is that the custom firewall module provides an interface through webconfig. Add:

?iptables -t nat -A PREROUTING -d ! $LAN_ADDRESS -j DROP # Disable NAT

where $LAN_ADDRESS is the IP of your LAN interface. In our authentication script we'll be using this line to enable access and bypass proxying for individual IPs:

iptables -t nat -I PREROUTING -s $IP_ADDRESS -j ACCEPT

We also need to install the at daemon so we can put a time limit on the IP's connectivity:

# yum install at

It will automatically be set to start on boot if you install it via yum. Add webconfig to /etc/at.allow:


We also need to give webconfig (or httpd if you will be using the web server module) permission to manipulate the NAT table. Run visudo and add this line to the bottom:

?webconfig ALL=(root) NOPASSWD: /sbin/iptables-bin

In my example authentication script below I'm going to use a simple array to store authentication credentials. It will be up to you to work in your own authentication system, I suggest either tying it into ClearOS' built-in LDAP implementation or via MySQL so that it can be managed through a web interface.  Create tech.php (or whatever suits you) in /var/webconfig/htdocs/admin/:


$credentials['tech'] = 'tech1234';
$credentials['emergency'] = 'emerg1234';

$validated = false;

foreach($credentials as $user => $pass)
        if($_POST['username'] == $user and $_POST['password'] == $pass)
                $validated = true;

        exec("sudo /sbin/iptables-bin -t nat -I PREROUTING -s {$_SERVER['REMOTE_ADDR']} -j ACCEPT");
        exec("echo \"sudo /sbin/iptables-bin -t nat -D PREROUTING -s {$_SERVER['REMOTE_ADDR']} -j ACCEPT\" | at now + 1hours");
        header("Location: http://www.google.ca/");
        die("You have not been validated. Please go back and try again.");


You may choose to make the experience more fluid by passing the $url variable available from filtered.php as a hidden form element in filtered.inc.php then supplying it as the value for our Location HTTP header.

See Also:

Cleaning up Snort's Droppings on ClearOS

In the last couple of weeks a wave of attacks has seen the snort packet logs on a client's firewall fill the disc to capacity, causing all sorts of wonderful problems. Packet logging is optional and usually only worth the trouble if you are actively trying to solve an attack or false positive, in which cases it can be added at that time. For most folks it simply provides a hindrance on performance and, if your storage is not well diversified, a hazard as we have seen with this router:

Disable packet logging by editing /etc/init.d/snort to start the daemon with the -N flag:

???  start)
        echo -n $"Starting $prog: "
        if test "x`/sbin/pidof snort`" != x; then
                echo ""
                # Add support for multiwan
                if [ -n "$EXTIF" ]; then
                                for INTERFACE in $EXTIF; do
                                                daemon snort -N -i $INTERFACE -D -c /etc/snort.conf
                                daemon snort -N -D -c /etc/snort.conf
                [ $RETVAL -eq 0 ] && touch /var/lock/snort

Restart snort via its init script:

# /etc/init.d/snort restart

If you take a look at the logrotate configuration file for snort at ?/etc/logrotate.d/snort you'll see:

# A bit of a kludge here - the logrotate file is empty and
# created by /etc/rc.d/init.d/snort.
/var/log/snort/logrotate {
 tar -czf /var/log/snort.tar.gz /var/log/snort 2> /dev/null
 rm -rf /var/log/snort/[0-9]* /var/log/snort/snort.log.[0-9]* 2> /dev/null
 killall -HUP snort 2> /dev/null || true

I'm not sure why the ClearOS people are using a "kludge" here, at best guess it seems the point is to put the snort.tar.gz archive directly under /var/log rather than in its own directory. Maybe it has to do with accommodating snort's built-in log rotation. I don't know. I don't really care.

If you're concerned about aesthetics keep the init script from creating the blank:

        # Creates a dummy file for /etc/logrotate.d/snort script
#       if [ -d /var/log/snort ]; then
#               echo "Used for logrotate... do not delete" > /var/log/snort/logrotate
#       fi

If I read that right it's saying "Used for logrotate... please delete."

# yes | rm -r /var/log/snort/*

Search Engines for Fun and Profit Interlude: The Bungle

In my last article I talked about using NFS to separate resources for indexing and querying. I mentioned my preference for using a third, dedicated file server for both indexing and query servers. It didn't take long before the graphs told me that disabling the file attribute cache (noac as a mount option) - essential for the stable release's implementation of Lucene to work distributed across NFS - had decreased crawling efficiency about tenfold:

The outbound spike early on is my moving the /data/ directory to the NFS server. The large mutual inbound/outbound block is the traffic between the spider and NFS servers. In that run it indexed about 7000 pages. Where you see it drop off is when I re-mounted the share with the noac option. It indexed 300 pages.

I theorize that if you use the spider server as the NFS server for the query server you should be fine. You should still be able to use a third dedicated NFS server if, like mine, your network has an unequal distribution of storage capacity vs. processing capacity. It's simply a matter of daisy-chaining; mount /data/ from the NFS server onto the spider with attribute caching then serve it from the spider to the query server without it. The spider benefits from the cache and the read-only query server always sees fresh data.

The good news is there is a much less ass-backward way of doing this available in the latest developer and beta releases of OSS: index replication (and authentication!). I'm having trouble getting it to work flawlessly but the benefits are so tremendous I think it's going to be worth the wait. The short story is it will be easier to set up a dedicated spider in your home or office, anywhere there is a consumer grade connection and space and power are abundant so your server(s) can be cheap and huge. It is then merely a matter of setting up a VPN with your collocated/hosted servers to update the read-only index, all at once or periodically. That's zesty.

Search Engines for Fun and Profit Part Four: Separating Resources for Queries and Indexing

NOTE Before you go and do any of this please read Interlude: The Bungle.

If you're indexing a wide array of URLs you will quickly find indexing operations are load-intensive and this can affect the speed of delivery in your front-end. To overcome this problem we can put the index/ices on an NFS share and run two instances of OSS, one dedicated to crawling and one that will only serve queries. Virtual machines lend themselves especially well to this setup since you will be making an exact copy of the original server, plus some minor configuration changes.

You can either set up the original OSS server as the NFS server or use a third server. I prefer using a dedicated file server however your resources may not permit. If you are using a dedicated file server first shut down OSS then move the contents of the /data/ directory down a level. Mount the NFS share onto the /data/ directory, move the indices onto the share then set the ownership for the mount point:

# chown oss: data/ -R

If you're going to use the indexing server as the NFS server your /etc/exports should look something like this:

/opt/open-search-server/data [QUERY_SERVER_ADDRESS](sync,no_subtree_check,ro,root_squash)

Note that if you are hosting the indices on a dedicated NFS server you should be using 'rw' in your exports file and the fstab of the indexing server instead of 'ro'. Add this to the /etc/fstab of the query server:        /opt/open-search-server/data    nfs     defaults,ro,noac             0 0

It is important that you mount the share with the noac option or you may end up with ?java.io.IOException: Stale NFS file handle errors resulting from the file attribute cache lagging behind changes made by the indexing server. Now restart your nfs and netmount init scripts (where available) or mount the share manually:

# mount /opt/open-search-server/data

It's now safe to start OSS on the query server.

# cd /opt/open-search-server/
# sudo -u oss ./start.sh

Search Engines for Fun and Profit Part Three: Indexing your Sites

Once one has a working installation of OSS one will need to index some content so there is something to work with when implementing the front-end. Start by going to the OSS configuration interface at http://[server-address]:8080 and create a new Index with the web crawler template using the form on the front page. One can create multiple indices to offer up results for different sets of sites or fields.  This makes OSS an ideal solution for search-as-a-service as all of your clients can be consolidated on a single server and managed through a single interface.

Once you've created your index select it and a tab menu will show up across the top of the page. Click the crawler tab and, if it is not already selected the Web sub-tab. Click on the Pattern List tab and add some sites to be indexed, following the instructions regarding wildcards:

Enter http://www.open-search-server.com if you only want to crawl the home page
Enter http://www.open-search-server.com/* if you want to crawl all the content
Enter http://www.open-search-server.com/*wiki* if you only wish to crawl URLs containing the word "wiki" within the open-search-server.com domain.

Click the add button then on the Crawl process tab. Change the UserAgent to something relevant then tune the timing settings to be as timid or aggressive as your situation requires. Start indexing by de-selecting (if selected) the Dry run check-box, selecting the Optimize check-box and clicking on the Not running - Click to start button. Your statistics and threads panes should begin to populate with statistics.

See the Quick Start Guide to Crawl the Web for screencaps of this process.

Errors I have encountered while crawling include:

Error (org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: [email protected]/opt/open-search-server/data/furfinder/index/20101207171650/write.lock)

This is caused by the write.lock lockfile being left over from an unclean shutdown. Simply delete the file and start crawling again.

Error (background merge hit exception: _1k8:C27497 _1kj:c1116 _1kk:c27 _1kl:c4 _1km:c13 into _1kn [optimize])

Lucene, the "guts" behind OSS is having trouble optimizing the index after the crawl. Reading the catalina.out file in tomcat's logs directory indicated that there was not enough free storage to work with so the /data/ directory was moved off of the VM and onto a file server.