=^.^=

Everything You Never Wanted to But Absolutely Have to Know About Sparse Files

karma

This article began as what I hoped would be a short and helpful sentence about a tiny caveat regarding managing sparsely allocated files which I felt was needed badly enough to edit a straightforward and very succinct little primer I wrote all the way back in 2010 called Managing Raw Disk/File System Image Files. This effort escalated so violently out of proportion into a subsection bigger and much more dense than the original article that I turned it into one. Fill yer boots!

So! What are sparse files? I'm so glad you didn't have to ask. Despite vicious rumours, they are not a surreptitious form of filesystem debt. They look something like this in the wild:

ls -lsah total 9.4G 0 drwxr-xr-x. 2 root root 101 Dec 10 00:09 . 0 drwxr-xr-x. 7 root root 92 Dec 10 00:09 .. 724M -rw-r--r--. 1 root root 724M Dec 9 14:46 opnsense-23.7.9-xen.hdd.xz 4.0K -rw-------. 1 root root 466 Dec 9 14:31 opnsense.conf 6.1G -rw-r--r--. 1 root root 20G Dec 10 00:08 opnsense.hdd 2.6G -rw-r--r--. 1 root root 20G Dec 9 11:58 pristine.hdd

These files have been allocated 20GB to one day grow - or not grow into - but for now they weigh about 9GB on the books, and that's all you'll be billed for when you check df -h because that is all the space they are in fact using at present.

Weird. So what gives?

Sparse files are almost always the best choice for a new VM's raw disk image because while presenting a concrete limit to the system contained therein, they are only allocated blocks on the host filesystem as they are written to. This permits the most optimal use of truly available storage space and even makes overprovisioning possible.

To create a sparse file one will traditionally run:
dd if=/dev/zero of=image.img seek=X bs=1M count=0
Where X is the size of the image in MB.

Alternatively, you can use the truncate command which is a fair bit more succinct:
truncate -s image.img

To install a fresh and empty filesystem in your sparse image call up your mkfs of choice and direct it at the filename. You may pull up a man page at this point and feel inclined to sprinkle some sensible, hopefully optimizing flags on top. The image file may now be treated as a single partition.

To create a whole disk image out of it you can use fdisk and its competitors, the same as you would on a block device, but specifying the filename instead. This results in a somewhat more complicated scenario however as you will need to make use of loop devices to access the constituent partitions. This can be made easier (or even more convoluted!) with a bevy of tools and scripts that come with little hope of being already installed, readily found or easy to use. Luckily for you, I found a really good shell script that uses common tools and will likely work right out of the copypasta box for you when I was writing Mounting LUKS Encrypted Drives, Disk Images and Partitions Thereof. The first script posted in that article is for you. The second version, I kind of taught it crypto. Because I play a spooky hacker on TV.

When it comes to virtual machines, the choice of whole disk vs single image partitions is usually made for you. While there are oddball pre-built appliance images and it was my favourite way to build a PV image in the earlier Xen era, even a custom import/conversion is unlikely to occupy a single partition image. Most contemporary virtualization solutions tend to expect unmodified bootloaders (installed to MBR or EFI) and in-system kernels etc. Additionally, one is hard pressed to find an OS installation medium that expects to perform anything less than a whole-disk install.

Essentially the only time it is actually preferable to create a directly, linearly allocated raw image is when handling situations where potential fragmentation are markedly detrimental - or are in fact unacceptable - as is the case with swap image files:

dd if=/dev/zero of=swap.img bs=1M count=2048 mkswap swap.img swapon swap.img

Don't forget to add the additional swap to /etc/fstab if it is to be a permanent addition. If different sources of swap space have different performance characteristics it is prudent to add a usage priority weighting to enable optimum performance. Add pri=XM/v> after the sw keyword:

/dev/sda2 none swap sw,pri=1 0 0 /swap.img none swap sw,pri=2 0 0

Swap image files are an essential tool in production situations when physical RAM is exhausted and can not be allocated to a starved VM suffering out-of-memory process kills and crashes. Or, as I prefer to frame it: they are a last-minute hack permissible only in a crisis and demand revisiting with a real solution that does not all but require ceaseless thrashing to storage what should be properly fixed in RAM. Try to avoid making directly allocated image files from within running VMs that may draw on sparse images and create them externally on the host system (dom0/hypervisor manager) instead. It is usually possible to attach new storage images to a running VM.

It is also necessary to be aware of how tools interact with a sparse image in certain situations; often programs that are not sparse aware or have not been made aware (usually a flag, if supported) will seem to inflate a sparse file as it is being manipulated or moved (i.e.: over sftp), spontaneously filling in unallocated blocks with zero at the destination, thereby undoing their purpose. It is necessary to employ a little forethought: rsync and tar possess the -S or --sparse facility and it is possible to leverage tar to either make a safe-for-transport tar archive which retains the sparse properties of its contents upon extraction, wherever that may be. It is also possible to leverage this capability with a convenient pipe:

tar cSv sparse.img | ssh user@remote "tar xSv"

When files and metadata are deleted from a filesystem contained in a sparsely allocated image the unused blocks are not de-allocated in the host filesystem. There is no mechanism for communication between the two about such operations, which if there were would be akin to an SSD's TRIM functionality. That means that a sparse image's actual size can only increase, never decrease. Furthermore, the recent data written inside the image itself becomes increasingly fragmented as the now "conventionally empty" blocks (allocated but empty, like a normal file) are then re-allocated. When they are reallocated they are quite likely to be assigned chunks of even more non-contiguously arranged data, ever-worsening. It is highly unlikely to become so severe, but depending on the usage pattern, scale and value, you may find it may actually be worth the space and performance reclaimed to simply start over. With a brand new sparse image, copy over the contents of the original so they are written with an efficient, contiguous burst at the beginning of the file. Where it is a simple matter of mounting the contained filesystems simultaneously to access their contents, you can easily leverage cp's archival capability to preserve the filesystem structure and details:

mkdir /mnt/old mkdir /mnt/new dd if=/dev/zero of=new.img seek=X bs=1M count=0 mkfs.ext4 new.img mount old.img /mnt/old mount new.img /mnt/new cp -ax /mnt/old/* /mnt/new/ umount /mnt/new umount /mnt/old

To enlarge a sparse file run:
dd if=/dev/zero of=image.img seek=X bs=1M count=0
Where seek=X should be the current size of the sparse file plus the amount of space you wish to grow the image by in MB (set by bs).

When you're working with a single partition image, expanding the existing filesystem usually has a fun, easy utility like expand2fs that is so easygoing in spite of its name it will even expand an ext4 filesystem for free, if you don't make a big deal about it. If you are expanding or shuffling partitions in a full disk image however, good luck with that unmitigated nightmare. They don't pay me enough to do horror writing. Situations like that are the very archetype of why alternative, extremely flexible solutions like LVM are so popular. If you need to perform non-optional, invasive partition surgery on a file-backed drive (raw or otherwise) I would at that point strongly advise you to consider migrating your data to LVM instead.

Comments

There are no comments for this item.