I was stoked to find out Xen 4 had finally made it into portage a couple weeks ago, The improvements are so sweet I had been checking gentoo-portage.com every few days or so. Let's take a look at a few of the advances Xen has made since 3x:
- Better performance and scalability: 128 vcpus per guest, 1 TB of RAM per host, 128 physical CPUs per host (as a default, can be compile-time increased to lots more).
- Blktap2 for VHD image support, including high-performance snapshots and cloning.
- Improved IOMMU PCI passthru using hardware accelerated IO virtualization techniques (Intel VT-d and AMD IOMMU).
- VGA primary graphics card passthru support to an HVM guest for high performance graphics using direct access to the graphics card GPU from the guest OS.
- TMEM allows improved utilization of unused (for example page cache) PV guest memory. more information: http://oss.oracle.com/projects/tmem/
- Memory page sharing and page-to-disc for HVM guests: Copy-on-Write sharing of identical memory pages between VMs.This is an initial implementation and will be improved in upcoming releases.
- New Linux pvops dom0 kernel 2.6.31.x as a default, 2.6.32.x also available. You can also use linux-2.6.18-xen dom0 kernel with Xen 4.0 hypervisor if you want.
- Netchannel2 for improved networking acceleration features and performance, smart NICs, multi-queue support and SR-IOV functionality.
- Online resize of guest disks without reboot/shutdown.
- Remus Fault Tolerance: Live transactional synchronization of VM state between physical servers. run guests synchronized on multiple hosts simultaneously for preventing downtime from hardware failures.
- RAS features: physical cpu/memory hotplug.
- Libxenlight (libxl): a new C library providing higher-level control of Xen that can be shared between various Xen management toolstacks.
- PV-USB: Paravirtual high-performance USB passthru to both PV and HVM guests, supporting USB 2.0 devices.
- gdbsx: debugger to debug ELF guests
- Support for Citrix WHQL-certified Windows PV drivers, included in XCP (Xen Cloud Platform).
- Pygrub improvements: Support for PV guests using GRUB2, Support for guest /boot on ext4 filesystem, Support for bzip2- and lzma-compressed bzImage kernels
What tickles me the most is the Remus Fault Tolerance, it basically lets you run a standby instance of a VM on a different physical server and it constantly updates that VM of the master's status, I/O etc. If the master VM dies, the standby kicks in so fast there may be no perceivable downtime. I've been dying for something that provides solid HA that's well supported and works out of the box - not to mention does its job transparently for years. Now that it's a core feature 0f Xen, competing technologies will be compelled to introduce their own easy-to-use HA solutions which I hope could usher in a golden age of reliability.
The original intent was to migrate 9 physical 32-bit servers the week it came out, most of them running kernel 2.6.21 on Xen 3.2.1. This was not to be, I was determined to make the new 2.6.32 kernel work (I had heard that .32 was going to be the new .18 in terms of adoption/support) and the thing just won't work with megaraid. I haven't tried it with cciss yet and I don't intend to, for the time being I have downgraded all of the dom0 kernels to 2.6.18. Things seem to be very stable now and much faster.
PAE, or physical address extension allows 32-bit processors to address up to 64GB of memory. When I first started working with Xen I had no idea that I had omitted PAE from my hypervisor build (it is not a default USE flag) nor that every shrinkwrapped Xen kernel out there required it. To make matters worse, the 2.6.21 dom0 kernel I was using on all of the servers for the sake of consistency lacked the ability to enable PAE at all - even by manually editing the .config, something I still haven't figured out. That kernel was eventually hard masked, then removed from portage. This situation cost me a lot of time because every new image I wanted to import required special preparation to work with my "foregin" domU kernel and without pleasantries like initramfs and pygrub.
I'm going to start off by showing you the make.conf that will be used in this article:
CFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer -mno-tls-direct-seg-refs -fstack-protector-all"
#CFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer -mno-tls-direct-seg-refs"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/ http://www.gtlib.gatech.edu/pub/gentoo "
USE="-alsa cracklib curlwrappers -gnome -gtk -kde -X -qt sse png snmp cgi usb cli berkdb bzip2 crypt curl ftp ncurses snmp xml zip zlib sse2 offensive geoip nptl nptlonly acm flask xsm pae pygrub xen"
If you're upgrading from an existing Xen installation you may want to disable collision protection, past experience with 3x upgrades has been that sometimes portage will see every Xen-related file in /boot as a potential conflict. Note the second CFLAGs line that's commented; some packages don't compile well (or at all) with Stack Smashing Protection (particularly glibc) so I update them individually with the second CFLAGS enabled before any sort of emerge world or deep update. SSP is enabled by default if you are using the hardened profile. I choose not to use the hardened profile because it can be needlessly problematic/inflexible and in this application the term "hardened" is misleading. If your dom0 isn't going to be exposed to the Internet (and it certainly should not be) it might be safe to omit -fstack-protector-all altogether but there is no such thing as paranoia. Also bear in mind that an attacker gaining access to a dom0 can be more devastating than an attacker gaining access to all of the VMs running on it individually.
Depending on the circumstances at the time you read this, one may or may not have to add the following to /etc/portage/package.keywords:
Sync portage and run an emerge --update --deep --newuse world --ask, if you see xen 4 in the package list you're on the right track. Compile away. You might be interested in this article on global updates with gentoo.
Once Xen has been upgraded it's time to build the new kernels. Follow the usual routine, making sure to enable xen backend drivers in the dom0 and frontend drivers in the domU. I like to make a monolithic domU kernel so there's no mess with installing or updating modules to the VMs. Make sure you have IP KVM/Console redirection if you're going to be booting this machine remotely and a non-xen fallback kernel configured in /etc/grub/grub.conf in case the hypervisor fails. Xen and some x86 BIOSes can be configured to use a serial console; a null modem to another server in the rack is often all you need.
I got all sorts of shit from the 2.6.32/34 kernels, for instance the kernel won't build properly if you enable Export Xen atributes in sysfs (on by default in 2.6.32-xen-r1). I got this message at the end of make and was not particularly successful at tracking down solutions:
WARNING: vmlinux.o (__xen_guest): unexpected non-allocatable section. Did you forget to use "ax"/"aw" in a .S file? Note that for example <linux/init.h> contains section definitions for use in .S files.
I don't know what - if anything - needs the Xen /sys interface to work, so it's probably no big deal.
When trying to compile at least versions 2.6.32 and 2.6.34 if the Xen compatibility code is set to 3.0.2 you can expect this error at the end of building the kernel:
MODPOST vmlinux.o WARNING: vmlinux.o (__xen_guest): unexpected non-allocatable section. Did you forget to use "ax"/"aw" in a .S file? Note that for example contains section definitions for use in .S files. GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 ld: kernel image bigger than KERNEL_IMAGE_SIZE ld: kernel image bigger than KERNEL_IMAGE_SIZE make: *** [.tmp_vmlinux1] Error 1
This seems to be fixable by upping the lowest version to at least 3.0.4.
In all cases 2.6.29, 2.6.32 and the yet-un-portaged 2.6.34 kernels panicked on bootup if the megaraid driver was compiled in or made available in an initrd. After four days of dusk-until-dawn tinkering I got tired of fucking with it and decided to go with 2.6.18, which compiled and booted without a hitch.
Previously I had been doing all sorts of contorted things to the networking configuration but since I was dealing with a clean slate anyway I decided to set things up the Gentoo way. The Gentoo way of Xen networking is to abandon Xen networking. Suddenly life's great. In the set of four servers I migrated this week all of them have one physical interface on an external-facing VLAN and another interface on an internal VLAN as depicted in the diagram (left). I wanted to make it so I could take a router VM and move it from physical server to physical server as quickly as possible, and this is how I did it (thanks xming on the Gentoo forums):
- Edit /etc/conf.d/rc and change RC_PLUG_SERVICES to look like this: RC_PLUG_SERVICES="!net.*" this will prevent Gentoo's hotplug script from automatically starting your interfaces on bootup
- Remove existing interfaces from default runlevel, i.e. rc-update del net.eth0 default
- Configure one bridge that connects to the external VLAN and one bridge that connects to the internal VLAN
routes_xenbr0=("default via x.x.x.y")
- Create init scripts for the new bridges, i.e cd /etc/init.d; ln -s net.lo net.extbr0; ln -s net.lo net.xenbr0
- Add the bridges to the default runlevel: rc-update add net.extbr0 default; rc-update add net.xenbr0
- Edit /etc/xen/xend-config.sxp and comment out (network-script network-bridge) and add (vif-script vif-bridge bridge=xenbr0)
- Edit VM configuration files, edit VIFs to connect to the appropriate bridge, i.e: vif = ['mac=00:16:3e:XX:XX:XX,bridge=xenbr0' ]
I found that my Gentoo VMs needed this line added to their config in order to get any connection to xenconsole at all:
My ClearOS VMs, for the first time running the kernel that ships with them, needed a more dramatic approach. I added this line to their config files:
and then in the VM's /etc/inittab I added this line to make it talk on what would be its serial port:
I had some minor complaints from init about the dom0 kernel being too old for some udev feature so I added this line to /etc/portage/package.mask and rebuilt it: