=^.^=

Gentoo Xen 4 Migration

I was stoked to find out Xen 4 had finally made it into portage a couple weeks ago, The improvements are so sweet I had been checking gentoo-portage.com every few days or so. Let's take a look at a few of the advances Xen has made since 3x:

  • Better performance and scalability: 128 vcpus per guest, 1 TB of RAM per host, 128 physical CPUs per host (as a default, can be compile-time increased to lots more).
  • Blktap2 for VHD image support, including high-performance snapshots and cloning.
  • Improved IOMMU PCI passthru using hardware accelerated IO virtualization techniques (Intel VT-d and AMD IOMMU).
  • VGA primary graphics card passthru support to an HVM guest for high performance graphics using direct access to the graphics card GPU from the guest OS.
  • TMEM allows improved utilization of unused (for example page cache) PV guest memory. more information: http://oss.oracle.com/projects/tmem/
  • Memory page sharing and page-to-disc for HVM guests: Copy-on-Write sharing of identical memory pages between VMs.This is an initial implementation and will be improved in upcoming releases.
  • New Linux pvops dom0 kernel 2.6.31.x as a default, 2.6.32.x also available. You can also use linux-2.6.18-xen dom0 kernel with Xen 4.0 hypervisor if you want.
  • Netchannel2 for improved networking acceleration features and performance, smart NICs, multi-queue support and SR-IOV functionality.
  • Online resize of guest disks without reboot/shutdown.
  • Remus Fault Tolerance: Live transactional synchronization of VM state between physical servers. run guests synchronized on multiple hosts simultaneously for preventing downtime from hardware failures.
  • RAS features: physical cpu/memory hotplug.
  • Libxenlight (libxl): a new C library providing higher-level control of Xen that can be shared between various Xen management toolstacks.
  • PV-USB: Paravirtual high-performance USB passthru to both PV and HVM guests, supporting USB 2.0 devices.
  • gdbsx: debugger to debug ELF guests
  • Support for Citrix WHQL-certified Windows PV drivers, included in XCP (Xen Cloud Platform).
  • Pygrub improvements: Support for PV guests using GRUB2, Support for guest /boot on ext4 filesystem, Support for bzip2- and lzma-compressed bzImage kernels

What tickles me the most is the Remus Fault Tolerance, it basically lets you run a standby instance of a VM on a different physical server and it constantly updates that VM of the master's status, I/O etc. If the master VM dies, the standby kicks in so fast there may be no perceivable downtime. I've been dying for something that provides solid HA that's well supported and works out of the box - not to mention does its job transparently for years. Now that it's a core feature 0f Xen, competing technologies will be compelled to introduce their own easy-to-use HA solutions which I hope could usher in a golden age of reliability.

The original intent was to migrate 9 physical 32-bit servers the week it came out, most of them running kernel 2.6.21 on Xen 3.2.1. This was not to be, I was determined to make the new 2.6.32 kernel work (I had heard that .32 was going to be the new .18 in terms of adoption/support) and the thing just won't work with megaraid. I haven't tried it with cciss yet and I don't intend to, for the time being I have downgraded all of the dom0 kernels to 2.6.18. Things seem to be very stable now and much faster.

PAE, or physical address extension allows 32-bit processors to address up to 64GB of memory. When I first started working with Xen I had no idea that I had omitted PAE from my hypervisor build (it is not a default USE flag) nor that every shrinkwrapped Xen kernel out there required it. To make matters worse, the 2.6.21 dom0 kernel I was using on all of the servers for the sake of consistency lacked the ability to enable PAE at all - even by manually editing the .config, something I still haven't figured out. That kernel was eventually hard masked, then removed from portage. This situation cost me a lot of time because every new image I wanted to import required special preparation to work with my "foregin" domU kernel and without pleasantries like initramfs and pygrub.

I'm going to start off by showing you the make.conf that will be used in this article:

CFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer -mno-tls-direct-seg-refs -fstack-protector-all" #CFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer -mno-tls-direct-seg-refs" CXXFLAGS="${CFLAGS}" CHOST="i686-pc-linux-gnu" MAKEOPTS="-j4" GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/ http://www.gtlib.gatech.edu/pub/gentoo " SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage" USE="-alsa cracklib curlwrappers -gnome -gtk -kde -X -qt sse png snmp cgi usb cli berkdb bzip2 crypt curl ftp ncurses snmp xml zip zlib sse2 offensive geoip nptl nptlonly acm flask xsm pae pygrub xen" FEATURES="parallel-fetch -collision-protect" LINGUAS="en_CA en"

If you're upgrading from an existing Xen installation you may want to disable collision protection, past experience with 3x upgrades has been that sometimes portage will see every Xen-related file in /boot as a potential conflict. Note the second CFLAGs line that's commented; some packages don't compile well (or at all) with Stack Smashing Protection (particularly glibc) so I update them individually with the second CFLAGS enabled before any sort of emerge world or deep update. SSP is enabled by default if you are using the hardened profile. I choose not to use the hardened profile because it can be needlessly problematic/inflexible and in this application the term "hardened" is misleading. If your dom0 isn't going to be exposed to the Internet (and it certainly should not be) it might be safe to omit -fstack-protector-all altogether but there is no such thing as paranoia. Also bear in mind that an attacker gaining access to a dom0 can be more devastating than an attacker gaining access to all of the VMs running on it individually.

Depending on the circumstances at the time you read this, one may or may not have to add the following to /etc/portage/package.keywords:

app-emulation/xen app-emulation/xen-tools sys-kernel/xen-sources

Sync portage and run an emerge --update --deep --newuse world --ask, if you see xen 4 in the package list you're on the right track. Compile away. You might be interested in this article on global updates with gentoo.

Once Xen has been upgraded it's time to build the new kernels. Follow the usual routine, making sure to enable xen backend drivers in the dom0 and frontend drivers in the domU. I like to make a monolithic domU kernel so there's no mess with installing or updating modules to the VMs. Make sure you have IP KVM/Console redirection if you're going to be booting this machine remotely and a non-xen fallback kernel configured in /etc/grub/grub.conf in case the hypervisor fails. Xen and some x86 BIOSes can be configured to use a serial console; a null modem to another server in the rack is often all you need.

I got all sorts of shit from the 2.6.32/34 kernels, for instance the kernel won't build properly if you enable Export Xen atributes in sysfs (on by default in 2.6.32-xen-r1). I got this message at the end of make and was not particularly successful at tracking down solutions:

WARNING: vmlinux.o (__xen_guest): unexpected non-allocatable section. Did you forget to use "ax"/"aw" in a .S file? Note that for example <linux/init.h> contains section definitions for use in .S files.

I don't know what - if anything - needs the Xen /sys interface to work, so it's probably no big deal.

When trying to compile at least versions 2.6.32 and 2.6.34 if the Xen compatibility code is set to 3.0.2 you can expect this error at the end of building the kernel:

  MODPOST vmlinux.o
WARNING: vmlinux.o (__xen_guest): unexpected non-allocatable section.
Did you forget to use "ax"/"aw" in a .S file?
Note that for example
 contains
section definitions for use in .S files.                                                                          

  GEN     .version
  CHK     include/generated/compile.h
  UPD     include/generated/compile.h
  CC      init/version.o
  LD      init/built-in.o
  LD      .tmp_vmlinux1
ld: kernel image bigger than KERNEL_IMAGE_SIZE
ld: kernel image bigger than KERNEL_IMAGE_SIZE
make: *** [.tmp_vmlinux1] Error 1

This seems to be fixable by upping the lowest version to at least 3.0.4.

In all cases 2.6.29, 2.6.32 and the yet-un-portaged 2.6.34 kernels panicked on bootup if the megaraid driver was compiled in or made available in an initrd. After four days of dusk-until-dawn tinkering I got tired of fucking with it and decided to go with 2.6.18, which compiled and booted without a hitch.

Previously I had been doing all sorts of contorted things to the networking configuration but since I was dealing with a clean slate anyway I decided to set things up the Gentoo way. The Gentoo way of Xen networking is to abandon Xen networking. Suddenly life's great. In the set of four servers I migrated this week all of them have one physical interface on an external-facing VLAN and another interface on an internal VLAN as depicted in the diagram (left). I wanted to make it so I could take a router VM and move it from physical server to physical server as quickly as possible, and this is how I did it (thanks xming on the Gentoo forums):

  1. Edit /etc/conf.d/rc and change RC_PLUG_SERVICES to look like this: RC_PLUG_SERVICES="!net.*" this will prevent Gentoo's hotplug script from automatically starting your interfaces on bootup
  2. Remove existing interfaces from default runlevel, i.e. rc-update del net.eth0 default
  3. Configure one bridge that connects to the external VLAN and one bridge that connects to the internal VLAN
    config_extbr0=("null") bridge_extbr0="eth0"

    config_xenbr0=("x.x.x.x/24")
    bridge_xenbr0="eth1"
    routes_xenbr0=("default via x.x.x.y")

  4. Create init scripts for the new bridges, i.e cd /etc/init.d; ln -s net.lo net.extbr0; ln -s net.lo net.xenbr0
  5. Add the bridges to the default runlevel: rc-update add net.extbr0 default; rc-update add net.xenbr0
  6. Edit /etc/xen/xend-config.sxp and comment out (network-script network-bridge) and add (vif-script vif-bridge bridge=xenbr0)
  7. Edit VM configuration files, edit VIFs to connect to the appropriate bridge, i.e: vif = ['mac=00:16:3e:XX:XX:XX,bridge=xenbr0' ]

I found that my Gentoo VMs needed this line added to their config in order to get any connection to xenconsole at all:

extra="xencons=tty console=tty1"

My ClearOS VMs, for the first time running the kernel that ships with them, needed a more dramatic approach. I added this line to their config files:

extra="xencons=ttyS"

and then in the VM's /etc/inittab I added this line to make it talk on what would be its serial port:

s0:12345:respawn:/sbin/mingetty ttyS0

I had some minor complaints from init about the dom0 kernel being too old for some udev feature so I added this line to /etc/portage/package.mask and rebuilt it:

>sys-fs/udev-124-r2

Bad Snort Rules

Protecting a busy web platform or ISP with snort is made challenging due to the large number of false positives it generates in these environments. This post will serve as a compilation of snort rules I think should be disabled or tuned out of the box, it will be updated as I find new ones. The only way to tell if snort is interfering with legitimate traffic is to monitor its logs over a long period and look up the rules which appear most often. Make sure these rules don't do anything that could interfere with your regular operations - for example a rule to block the slammer worm that hits a dozen times a day probably has nothing to do with your e-mail troubles but a rule for an outdated MTA might explain why Blackberry's servers are being blocked every time a client tries to check their mail.

On ClearOS systems snort rules are located in /var/lib/suva/services/intrusion-protection/rules/ (formerly /var/lib/suva/services/snort/rules/).  To find a rule change to that directory then

# grep sid:number *

and the file name and line number will be printed next to the rule's line. Disabling a rule is as simple as prefixing it with a hash mark (#) and restarting snort:

# /etc/init.d/snort restart

I've started a thread in the Clear forums so everyone can share and discuss their false positives: http://www.clearfoundation.com/component/option,com_kunena/Itemid,232/catid,7/func,view/id,12205/

SID 1091:
False positive on any request containing ?????????? in the URL

SID 1233:
"I noticed, what I believe to be a false positive in rule 1233 (Outlook EML access).: a user was accessing hotmail and the GET was for /i.p.emlips.gif. Since the rule is looking for uricontent:".eml", an alert was triggered. "
http://www.networksecurityarchive.org/html/Snort-Signatures/2005-06/msg00008.html

SID 1390:
False positive on payloads containing CCCCCCCCCCCCCCCCCCCCCCCC, brought to my attenti0n on the ClearOS forums (thanks Kevin and Tim)
http://www.clearfoundation.com/component/option,com_kunena/Itemid,232/catid,8/func,view/id,9967

SID 1394:
False positive on payloads containing 'aaaaaaaaaaaaaaaaaaaaa'

SID 1807:
False positive on any request where the string "chunked" preceeds the Transfer-Encoding header. A fix is available at:
http://osdir.com/ml/security.ids.snort.sigs/2004-07/msg00018.html

SID  2109:
False positive on any attempt to retrieve e-mail from BlackBerry servers. An interesting note about this, I had dozens of users with blackberries on the network running just fine for years then all at once a few months ago - without even touching the software - snort started blocking the BlackBerry pop3 IP pool and the complaints came flooding in. When I researched it at the time it seemed as though this false positive was reported on at least one mailing list between 2004-6, perhaps RIM switched "back" to the would-be offending platform.

SID 2229:
This rule is false positive for ANY request to viewtopic.php, taking out any visitors to a phpBB installation. There is a fix for this rule outlined in this e-mail from 2004:
http://www.webservertalk.com/archive253-2004-11-452217.html

SID 2000540:
False positive on SSL connections with Google Gmail web servers
It appears to be generating false positives for fragmented ACK packets intended to match a form of Nmap scan
http://www.clearfoundation.com/component/option,com_kunena/Itemid,232/catid,8/func,view/id,19544/#19551

SIDs 1000000211, 1000000212, 1000000213, 1000000214:
"These are known to cause issues with Gallery2 (part of the web PHP rulesets)"
Tim Burgess supplied these on Clear's forum, I think the 1000000000-range rules may be unique to ClearOS.
http://www.clearfoundation.com/component/option,com_kunena/Itemid,232/catid,7/func,view/id,12205/#12227

SID 966
This rule has caused me a lot of problems when uploading large binary files, though I'm not sure why since only the URI should be scanned and not entire POST variables. May be a bad one for me only.

Foolproof Gentoo World Update Build Order

Users new to Gentoo often end up breaking their system after their first global package update - particularly if they stopped reading the manual as soon as they got their system booting (I'm usually one of those). Over the years I have run into every problem that portage can imaginably throw at a person and from that I have formulated a fool-proof update process which, sometimes with a little intervention, always works and - assuming you take proper care of configuration - leaves you with a working system.

The first thing you need to do is think of any software you have installed where a version change might break something, for example: some fglrx driver versions will work with old cards and Xinerama, and some will not - in no particular order. I had the unfortunate experience of blindly upgrading my fglrx implementation to one that didn't work only to find the version I had been using was no longer available in portage. Use quickpkg to make a quick standalone package of the existing files for a given atom:

# quickpkg cate-gory/package-name

The package will be available for later use in /usr/portage/packages.

Now let's update portage:

# emerge --sync

If you are given a message instructing you to update the portage package it is important that you do so before continuing.

# emerge portage

Before we begin compiling anything I need to point out that if you use certain CFLAGS some packages may fail to compile. I always include -fstack-protector which adds buffer overflow protection to software at compile-time. Glibc in particular tends to have a problem with this feature. Updating world with this flag in would break the long update process when emerge gets to glibc and we want the newest GCC suite available to compile the rest of our system anyway - so let's update glibc and gcc separately from world, and if we encounter any problems we'll temporarily strip down our CFLAGS:

# emerge --update gcc glibc

Now we want to make sure your profile is set to use the latest version of GCC available on your system, it might look something like this:

# gcc-config -l
[1] x86_64-pc-linux-gnu-4.1.2 *
[2] x86_64-pc-linux-gnu-4.3.4

The asterisk indicates which version of GCC you are using. You can set the version by running gcc-config number where number is in the leftmost position of the output above, thus:

# gcc-config 2

You will be instructed to update your profile environment variables:

# source /etc/profile

It should be noted here that some software may not compile correctly with a given GCC version, requiring you to move up or down. At the time of writing Quemu requires GCC 3x and I have had a few isolated incidents of Xen being picky.

Before we get started let's launch a screen instance (emerge screen if you have not already done so) so we can move from terminal to terminal without interrupting the build process:

# emerge screen
# screen -S update

You can detatch from the session with the key combination control+a, d. Returning to the session is as simple as logging in from anywhere as the user who spawned the session and typing:

# screen -x update

Multiple clients may attach to the same screen session at once. This is particularly useful if you want to be able to orchestrate some or all of the upgrade over secure shell.

Now we're ready for the magic line:

# emerge --update --newuse --deep world --ask

Recent versions of portage have excellent automatic blocked package resolution, but it doesn't work all the time. If you can't continue due to blocked packages try to emerge either package one at a time. This often isn't enough - particularly where one package has taken over "duties" for the same files an existing package used to provide. In these cases you will have to unmerge one or both of the conflicting packages - but sometimes unmerging the wrong one(s) may leave your system unusable. Google the package names and version numbers; without fail I have always found someone on the Gentoo bugtraq and elsewhere who has had the same problem and someone has usually posted the safest order of removal. Of course, nothing can't be fixed by booting with the livecd.

If you get strange errors about file collision protection but you're sure it's safe to overwrite the listed files, you may add this line to your make.conf or precede your emerge command similarly:

FEATURES="-collision-protect"

If you will be unable to attend the update process (going home for the weekend etc) you may find it useful to add --keep-going to the emerge flags; this will skip over packages that fail to build and continue emerging. If you can't tell from the error messages repeated at the end of the build simply re-run the emerge command, making sure to use either --ask or --pretend and the missing packages will be neatly listed.

Often enough these packages fail to compile due to a missing reverse-dependency. The next stop on the line is revdep-rebuild, run simply:

# revdep-rebuild

This tool will scour your system for missing reverse dependencies, build a list then start compiling them if it finds any. This process can fail if it attempts to rebuild a package for which there is no longer a matching version in portage, in which case you may have to do some emerging/unmerging/updating of the given package's dependants. If the builds themselves fail it is safe to move on to the next step then return to revdep-rebuild later.

Python is an integral part of the gentoo system so it is very, very important that when python is updated the python-updater tool is run:

# python-updater

If you just upgraded PERL, be sure to run:

# perl-cleaner all

Which will take care of removing old header files, renaming files and rebuilding perl modules. It usually ends up rebuilding net-snmp on my systems and not much else in the way of ebuilds.

Lafilefixer fixes libtool archives and may not be needed on your system (does not come installed by default, emerge lafilefixer if you have not already) but it won't hurt to run it:

# lafilefixer --justfixit

Even if your first revdep-rebuild worked it is a good idea to run it again, in case any of the (possibly) newly installed packages have broken anything else. If some packages failed to build in the first world update you may now run it again.

It is now safe to clear out your old dependencies - should any exist. Most of the packages you will see listed are simply old versions of updated packages as a number of them occupy slots (such as GCC) for the sake of backwards compatibility. If you would like to retain a specific package, add it to your /var/lib/portage/world file, in full if you would like to target a specific version: =categ-gory/package-name-version.number

# emerge --depclean --deep --ask

Do not forget to update your configuration files. Be warned that simply overwriting all of the old configuration files may not be necessary and may render your system inaccessible:

# etc-update

That concludes my foolproof build order. If you followed it correctly there is a high probability your system is completely up-to-date and working, from a portage point of view. Of course, proper configuration is important and compile-stopping bugs are common and may require some mild footwork, but this usually amounts to copying and pasting the first error line into google or blocking/unblocking specific package versions with /etc/portage/package.mask, package.keywords and package.unmask. More information on package masking is available in the emerge man page.

Happy building!

IP Subnets and Netmasks Demystified

I had to explain the purpose of netmasks to my field tech yesterday and recalled the great difficulty I had with the concept when I started taking an interest in networking. In this article I will attempt to put the matter in terms I wish I had heard first.

The Internet is a network composed of networks which may be composed of smaller networks ad infinitum. These networks are called sub-networks or "subnets." Hosts on the same subnet can generally communicate directly with one another. Hosts on different subnets communicate with one another by sending the traffic via one or more gateways or "routers" which have a presence on each subnet. Some say proper protocol is to only call a router a gateway when it connects two or more different layer 2 technologies, however in OS network configuration common parlance is usually just gateway.

Hosts on a network need to know what traffic they can send directly over the layer 2 (or otherwise transparently local) network to other hosts and what packets need to be passed to the gateway. The value of the configured netmask is laid over the host's own IP address in such a way that is difficult to visualize in dotted-decimal notation, so let's take the IP address 192.168.0.1 and netmask 255.255.255.0 and look at them in binary:

11000000101010000000000000000001
11111111111111111111111100000000

A netmask of 255.255.255.0 when applied to 192.168.0.1 means "anything that doesn't match the first 24 digits of your IP is NOT local traffic and must be sent to the gateway." The bits in the local address which are in the same position as the 1 bits of the netmask determine if an address is local or not, so it could be said that the "1 bits" are masked out. The "0 bits" are called the rest field and by ignoring them the host can quickly tell 192.168.0.22 is on the local subnet but 192.168.1.22 is not, and neither is 222.222.222.22:

11000000101010000000000000000001 192.168.0.1 local host
11111111111111111111111100000000 255.255.255.0 netmask
11000000101010000000000000010110 192.168.0.22
11000000101010000000000100010110 192.168.1.22
11011110110111101101111000010110 222.222.222.22

There are 8 rest bits in this example so we can tell that the address space of this subnet is 256 addresses deep and they range from 192.168.0.0 to 192.168.0.255 (00000000 - 11111111).

Before Classless Inter-Domain Routing or CIDR, the subnets of the Internet were divided into fixed-width blocks called classes. Class names were given to subnets of various sizes and specializations, starting with large Class A subnets at the beginning of Internet address space and advancing to smaller and even experimental subnets as it descends. CIDR did away with the concept of classes and opened the netmask to single-bit-level control; subnets could now be created at sizes of any power of two, in any position of the global address space. CIDR also brought with it a new notation for routing prefixes, a slash after an IP followed by the number of "1 bits" in the netmask. In this way the IP 192.168.0.1 netmask 255.255.255.0 can be written 192.168.0.1/24. Whle CIDR notation can be used to determine the netmask of a subnet and vice versa it is typically used only to describe whole subnets or convey that a given IP is part of a certain sized subnet.

Netmask Netmask (binary) CIDR Contains
255.255.255.255 11111111.11111111.11111111.11111111 /32 Host (single address)
255.255.255.254 11111111.11111111.11111111.11111110 /31 Unusable
255.255.255.252 11111111.11111111.11111111.11111100 /30 2 usable
255.255.255.248 11111111.11111111.11111111.11111000 /29 6 usable
255.255.255.240 11111111.11111111.11111111.11110000 /28 14 usable
255.255.255.224 11111111.11111111.11111111.11100000 /27 30 usable
255.255.255.192 11111111.11111111.11111111.11000000 /26 62 usable
255.255.255.128 11111111.11111111.11111111.10000000 /25 126 usable
255.255.255.0 11111111.11111111.11111111.00000000 /24 "Class C" 254 usable
-
255.255.254.0 11111111.11111111.11111110.00000000 /23 2 Class C's
255.255.252.0 11111111.11111111.11111100.00000000 /22 4 Class C's
255.255.248.0 11111111.11111111.11111000.00000000 /21 8 Class C's
255.255.240.0 11111111.11111111.11110000.00000000 /20 16 Class C's
255.255.224.0 11111111.11111111.11100000.00000000 /19 32 Class C's
255.255.192.0 11111111.11111111.11000000.00000000 /18 64 Class C's
255.255.128.0 11111111.11111111.10000000.00000000 /17 128 Class C's
255.255.0.0 11111111.11111111.00000000.00000000 /16 "Class B"
-
255.254.0.0 11111111.11111110.00000000.00000000 /15 2 Class B's
255.252.0.0 11111111.11111100.00000000.00000000 /14 4 Class B's
255.248.0.0 11111111.11111000.00000000.00000000 /13 8 Class B's
255.240.0.0 11111111.11110000.00000000.00000000 /12 16 Class B's
255.224.0.0 11111111.11100000.00000000.00000000 /11 32 Class B's
255.192.0.0 11111111.11000000.00000000.00000000 /10 64 Class B's
255.128.0.0 11111111.10000000.00000000.00000000 /9 128 Class B's
255.0.0.0 11111111.00000000.00000000.00000000 /8 "Class A"
-
254.0.0.0 11111110.00000000.00000000.00000000 /7
252.0.0.0 11111100.00000000.00000000.00000000 /6
248.0.0.0 11111000.00000000.00000000.00000000 /5
240.0.0.0 11110000.00000000.00000000.00000000 /4
224.0.0.0 11100000.00000000.00000000.00000000 /3
192.0.0.0 11000000.00000000.00000000.00000000 /2
128.0.0.0 10000000.00000000.00000000.00000000 /1
0.0.0.0 00000000.00000000.00000000.00000000 /0 This Network

It should be noted here that in any size subnet you create you will be shy two usable IP addresses from the full number of slots. This is because subnets have a broadcast address that reaches every host at the last IP and their routes are represented by their lowest, therefore 192.168.0.0 represents the subnet, 192.168.0.255 is the broadcast address and 192.168.0.1 - 192.168.0.254 can be used for real hosts, a total of 254 usable addresses.

ClearOS Installation Checklist

I'm writing this checklist as I setup a new router for the home office to remind me of the modifications I need to make to get a fresh deployment "just right" the first time.  ClearOS is a CentOS-based router distribution that lets one rapidly and easily deploy and manage routers and miscellaneous network services. CentOS itself is a de-branded flavour of Red Hat Enterprise Linux. Back when ClearOS flew under the ClarkConnect label if one wanted certain parts of the product one had to either pay for an Office or Enterprise license or use well crafted google queries to find the ftp credentials for Enterprise repositories, grab the RPMs, rpm2tgz/rpm2cpio them and overlay them on the filesystem (and don't forget to fix the permissions!) to avoid unresolvable dependencies.

Fortunately with the morph to ClearOS the Clear Foundation folks stopped charging for commodity software (i.e. web configuration modules for DMZ and Multi-WAN (load balancing/failover)) and started focusing on services you can live without. I warn against paying just for the tech support, you're much better off in the community forum - sparse though the posts are. From my experience (last dating in 2008 mind you) their Level 1 cuts off at putting the cd in the drive and may flat out refuse to support their product if you reveal it's being used in any setting more complex than a small office. If you're an intermediate *nix user and you can't figure out the problem on your own or with google chances are tech support can't (or won't) help you anyway; drop by the users forum.

Most of the routers I make these days are virtual machines and that goes beyond the scope of this checklist, however I plan to cover my process in a future article. The machine this checklist will be based on is an AthlonXP 2500+ 1.83GHz with 384MB DDR266 and an 8GB CompactFlash card plugged into a CF-IDE adapter, which you can get on eBay for about $1CAD. The machine has an onboard NIC and two PCI NICs as well as a wifi card so it can be turned into an access point. I like to use CF cards instead of hard drives on my physical routers because - although the cheap ones can be quite slow - you don't have to worry about them up and dying on you for several years. The cards are worn out with repeated write cycles (though often in the high thousands) so if you choose to use them you should try to minimize the amount of data written to disk during day-to-day operation. A remote syslogd might be of great help.

We're going to assume you've already downloaded and burned the installation ISO to disc. At the time of writing the current version is 5.1 SP1, the instructions below may not apply to future versions. Once you've booted you'll eventually be asked if you would like to let the installer automatically partition your hard drive or if you'd like to manually configure the partitioning. It's usually fine to let the installer do its thing but if you're working in confined spaces like our 8GB flash card or have plenty of ram the oft-defaulted swap partition size of 1GB is a tad generous. Also choose to manually configure the partitioning if you would like to use software RAID. If you choose to RAID your drives use anything but level 0; there is no need which I could conceive for high performance storage on a router (almost everything needed is loaded into ram on startup) and reliability is priority one on mission-critical systems like these.

Chances are if you want to do anything with the storage on your router you want to outsource that operation to another machine. Your router is the gatekeeper for the network and if it becomes compromised the consequences could be worse than with any single workstation. The simplest way to reduce the risk of a service being exploited is to not run it, so your router shouldn't run anything it doesn't need for management or to route and protect the network (firewall, IPS, IDS etc). In following with that notion keep the number of users on the system to an absolute minimum. If you're the only person who should have access to the box you should be the only person with a user account. ClearOS allows root logins via SSH by default so you should create at least one user account for yourself in order to separate privileges.

Once you've completed the installation and rebooted you can connect to the management interface at https://lan-ip:81 and log in as root. You'll be asked to fill in a number of details to complete the installation process. Once that's completed register your router with ClearSDN. If you have not made an account with them yet you should do so at the ClearSDN portal first. I generally disable "send diagnostic reports" when I register the routers but you may be less paranoid and more helpful. Once your router has been registered go to Software Updates. You can enable or disable automatic updates. They are enabled by default and I don't like that one bit: what if one of the repositories gets hacked? What if a new RPM breaks something critical and I'm not around to fix it?

Don't waste your time with all the checkboxes, shell into your router and run the following:

  • yum update
    • Updates all packages currently installed
  • yum install screen
    • Screen is a handy tool for multitasking shells
  • yum install lynx
    • There's already a version of lynx that comes as part of the ClearOS text console and you can use it by symlinking it to /usr/bin and its config file to /etc but that's messy.
  • yum install links
    • You don't need this if you have lynx, I just like to install them both so I can type either. Depends on my mood.
  • yum install nmap
    • nmap is an invaluable network diagnostic and analysis tool.
  • yum groupinstall "Development Tools"
    • Install this on systems where you expect to be compiling third-party software. Wherever possible use RHEL/CentOS RPMs for the corresponding version of ClearOS. You probably don't want to install this on space-restricted systems. If you have space to kill it never hurts to be prepared.
  • yum install ncurses-devel
  • yum install kernel-devel
    • Install these on systems where you expect to be modifying the kernel, you will need the Development Tools group to compile the kernel or modules.
  • yum install net-snmp
    • Install this so you can monitor system statistics remotely. (i.e. with Cacti)
  • yum install wpa-supplicant
    • You need this if you want to run an access point, ClearOS is only configured for WEP by default and can't be set up through the web config.

The ClearOS web config has an embedded MRTG package that graphs system vitals, but if you plan on remotely monitoring your router's statistics (load average, network traffic, etc.) you will probably want to install net-snmp. Depending on your configuration you may need open port 161UDP. Here's a very short configuration sample that you can drop into /etc/snmp/snmpd.conf:

rouser  public
rocommunity  public localhost
syslocation  "Server Room"
syscontact  [email protected]
com2sec local     127.0.0.1/32    public
com2sec local     192.168.0.0/24    public
group MyROGroup v1         local
group MyROGroup v2c        local
group MyROGroup usm        local
view all    included  .1  80
access MyROGroup ""      any       noauth    exact  all    none   none

Replace 192.168.0.0/24 with the subnet or IP that should have access to SNMP data.

ClearOS is one of the few distributions that enable syncookies by default. You probably don't need to add these lines since syncookies override tcp_max_syn_backlog. I like to do it anyway just in case something fails on bootup. Per my previous article Defending Against the SYN Flood add these lines to /etc/rc.d/rc.firewall.local:

echo 3096 > /proc/sys/net/ipv4/tcp_max_syn_backlog
echo 2 > /proc/sys/net/ipv4/tcp_syn_retries
echo 1 > /proc/sys/net/ipv4/tcp_synack_retries
echo 1 > /proc/sys/net/ipv4/tcp_syncookies

Or not. It won't kill you either way.

ClearOS comes with a special rule that snort will use to detect local SSH brute force attempts but as we covered in my previous article Stifling Brute Force Attacks with fail2ban fail2ban is highly extensible and can perform any operation that can be executed from the command line in response to any pattern match found in a given log file.  Fail2ban is not available in the default ClearOS repositories but we can use the RHEL 5 rpm available at http://dag.wieers.com/rpm/packages/fail2ban/. After installing the packages listed above the RPM should have only one dependency: gamin-python. Install fail2ban thus:

# yum install gamin-python
# wget http://rpmforge.sw.be/redhat/el5/en/i386/rpmforge/RPMS/fail2ban-0.8.1-1.el5.rf.noarch.rpm
# rpm -iv fail2ban-0.8.1-1.el5.rf.noarch.rpm

Return to the webconfig and make sure you have installed all the components and third party applications listed that you need, like the Advanced Firewall Module which is not installed by default. Configure your firewall, DHCP and VPN(s). Back at the command line let's clean out all the packages we just downloaded:

# yum clean all

At the command line run chkconfig --list:

acpid           0:off   1:off   2:on    3:on    4:on    5:on    6:off
avahi-daemon    0:off   1:off   2:off   3:on    4:on    5:on    6:off
avahi-dnsconfd  0:off   1:off   2:off   3:off   4:off   5:off   6:off
clamd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
cpuspeed        0:off   1:on    2:on    3:on    4:on    5:on    6:off
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off
cups            0:off   1:off   2:off   3:off   4:off   5:off   6:off
dansguardian-av 0:off   1:off   2:off   3:off   4:off   5:off   6:off
dnsmasq         0:off   1:off   2:on    3:on    4:on    5:on    6:off
fail2ban        0:off   1:off   2:off   3:on    4:on    5:on    6:off
firewall        0:off   1:off   2:on    3:on    4:on    5:on    6:off
freshclam       0:off   1:off   2:off   3:off   4:off   5:off   6:off
haldaemon       0:off   1:off   2:off   3:on    4:on    5:on    6:off
httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
ipsec           0:off   1:off   2:off   3:off   4:off   5:off   6:off
iscsi           0:off   1:off   2:off   3:off   4:off   5:off   6:off
iscsid          0:off   1:off   2:on    3:on    4:on    5:on    6:off
kudzu           0:off   1:off   2:off   3:on    4:on    5:on    6:off
l7-filter       0:off   1:off   2:off   3:off   4:off   5:off   6:off
ldap            0:off   1:off   2:off   3:on    4:on    5:on    6:off
ldapsync        0:off   1:off   2:off   3:on    4:on    5:on    6:off
lm_sensors      0:off   1:off   2:on    3:on    4:on    5:on    6:off
lvm2-monitor    0:off   1:on    2:on    3:on    4:on    5:on    6:off
mcstrans        0:off   1:off   2:off   3:off   4:off   5:off   6:off
mdmonitor       0:off   1:off   2:off   3:off   4:off   5:off   6:off
mdmpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
messagebus      0:off   1:off   2:off   3:on    4:on    5:on    6:off
multipathd      0:off   1:off   2:off   3:off   4:off   5:off   6:off
mysqld          0:off   1:off   2:off   3:off   4:off   5:off   6:off
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off
netfs           0:off   1:off   2:off   3:on    4:on    5:on    6:off
netplugd        0:off   1:off   2:off   3:off   4:off   5:off   6:off
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
nmb             0:off   1:off   2:off   3:off   4:off   5:off   6:off
nscd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
ntpd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
openvpn         0:off   1:off   2:off   3:off   4:off   5:off   6:off
pptpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
rdisc           0:off   1:off   2:off   3:off   4:off   5:off   6:off
restorecond     0:off   1:off   2:on    3:on    4:on    5:on    6:off
saslauthd       0:off   1:off   2:off   3:on    4:on    5:on    6:off
smb             0:off   1:off   2:off   3:off   4:off   5:off   6:off
snmpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off
snmptrapd       0:off   1:off   2:off   3:off   4:off   5:off   6:off
snort           0:off   1:off   2:off   3:off   4:off   5:off   6:off
snortsam        0:off   1:off   2:off   3:off   4:off   5:off   6:off
squid           0:off   1:off   2:off   3:off   4:off   5:off   6:off
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off
suvad           0:off   1:off   2:on    3:on    4:on    5:on    6:off
syslog          0:off   1:off   2:on    3:on    4:on    5:on    6:off
system-mysqld   0:off   1:off   2:on    3:on    4:on    5:on    6:off
syswatch        0:off   1:off   2:on    3:on    4:on    5:on    6:off
vpnwatchd       0:off   1:off   2:off   3:off   4:off   5:off   6:off
webconfig       0:off   1:off   2:on    3:on    4:on    5:on    6:off
winbind         0:off   1:off   2:off   3:off   4:off   5:off   6:off
wpa_supplicant  0:off   1:off   2:off   3:off   4:off   5:off   6:off

We want to turn off anything that starts in runlevels 2.3.4 or 5 that we don't need. This will make the router boot faster and use less ram, which is particularly important if you're building a virtual machine. A fast power-cycle has obvious advantages for any connectivity device and one can comfortably fit a low-traffic ClearOS router into 96MB of RAM with room to breathe by disabling the right services. Based on the defaults shown here, these are some services you probably want to turn off - your mileage of course may vary:

  • avahi-daemon
  • avahi-dnsconfd
    • Zeroconf stuff. You only want it if you know what that means, and probably not even then.
  • haldaemon
    • Practically unused by anything but X
  • iscsi
  • iscsid
    • Obviously you want these if you really are using iSCSI.
  • kudzu
    • Checks for new hardware and can interrupt boot process, can be run from the command line anyway
  • lvm2-monitor
    • You only want this if you're using LVM
  • messagebus
    • Same with HALD
  • netfs
    • Leave this on if you're doing anything with NFS

Services you may want to disable include:

  • suvad
    • Talks to ClearSDN, disabling interferes with updates and ClearSDN services but it can be started on demand
  • lm_sensors
    • There's no hardware to monitor on a xen virtual machine
  • cpuspeed
    • Ditto
  • acpid
    • Ditto

Use the following syntax to remove init scripts from these runlevels:

# chkconfig --level 2345 iscsid off

And enable anything that should be turned on:

# chkconfig --level 2345 snmpd on

Be sure not to touch any of the numerous LDAP services, ClearOS uses that internally to manage the user accounts. If you don't know what a service does be sure to look it up before you disable it.

If your router includes a wireless card that requires firmware do not forget to download it to /lib/firmware.

Updatedb indexes the files on your mounted partitions for fast searching with the locate or slocate tool. You should run it once now that you have most of the files installed. By default, cron runs updatedb every night. This causes high I/O load and can be disabled by disabling its cron script's execute bit:

# chmod -x /etc/cron.daily/mlocate.cron

While the web config makes ClearOS what it is, I don't like the console configuration (slow, featureless and requires two logins...) - and I really don't like the graphical one (wtf?). These are a severe obstacle on Xen installations where it's difficult to navigate serial-based (no ptys to alt+Fx to) xenconsole out of the bottomless ncurses pit. Rather than loop-mount the image and configure the networking and reboot the VMs and shell in I prefer to disable that rubbish altogether. Edit /etc/inittab to reflect:

# Run gettys in standard runlevels
#1:2345:respawn:/sbin/mingetty --autologin=clearconsole tty1
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2

If this router is going to be in a high traffic environment you may find that snortreport.sh progressively uses more and more resources to compile the webconfig-accessible IPS/IDS reports until it starts overlapping with itself, causing extreme load. You can delete it, move it, or remove its execute bit:

# chmod -x /usr/sbin/snortreport.sh

ClarkConnect used to come with some fairly dangerous default snort rules, particularly if your router is intended to be the firewall for a public network. Things look a lot better now, the two rules I could remember always having to comment out now come commented by default; as things run I'll keep a list of false-positive generating rules here.

NOTE: That list can be found here: Bad Snort Rules

Snort rules take the form of lines in files located in /var/lib/suva/services/intrusion-protection/rules/ (formerly /var/lib/suva/services/snort/rules/).  Disabling a rule is as simple as prefixing it with a hash mark (#) and restarting snort:

# /etc/init.d/snort restart

Check your Intrusion Prevention reports in the web config regularly when you first deploy your new firewall. Investigate any rule that appears multiple times to determine if your particular environment is triggering false positives. This is critical if you are protecting a public network, say a farm of web servers. One rule (that is now commented by default) would block an IP that sent or received a string of ascii 'a's, like: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaa' because a long string of 'a's  is one signature of a certain buffer overflow attack. One day one of my users said "Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaah!" in a web-based chat room when I had a freshly deployed router out and he and everyone who saw it were swiftly blocked from the network. One can either live without IPS (I wouldn't recommend it) or one can mitigate the downtime through careful monitoring.

You may wish to lock down SSH by following my article on key-exchange. While by default (in gateway mode) ssh is not accessible on the external addresses one should never discount the possibility of attack from within. Any machine behind your firewall that can be compromised will be in a unique position to compromise other machines if attention is not paid to internal network security. There is no such thing as a trusted network, only more trustworthy.