Everything You Never Wanted to But Absolutely Have to Know About Sparse Files
This article began as what I hoped would be a short and helpful sentence about a tiny caveat regarding managing sparsely allocated files which I felt was needed badly enough to edit a straightforward and very succinct little primer I wrote all the way back in 2010 called Managing Raw Disk/File System Image Files. This effort escalated so violently out of proportion into a subsection bigger and much more dense than the original article that I turned it into one. Fill yer boots!
So! What are sparse files? I'm so glad you didn't have to ask. Despite vicious rumours, they are not a surreptitious form of filesystem debt. They look something like this in the wild:
These files have been allocated 20GB to one day grow - or not grow into - but for now they weigh about 9GB on the books, and that's all you'll be billed for when you check
Weird. So what gives?
Sparse files are almost always the best choice for a new VM's raw disk image because while presenting a concrete limit to the system contained therein, they are only allocated blocks on the host filesystem as they are written to. This permits the most optimal use of truly available storage space and even makes overprovisioning possible.
To create a sparse file one will traditionally run:
Where
Alternatively, you can use the
To install a fresh and empty filesystem in your sparse image call up your
To create a whole disk image out of it you can use
When it comes to virtual machines, the choice of whole disk vs single image partitions is usually made for you. While there are oddball pre-built appliance images and it was my favourite way to build a PV image in the earlier Xen era, even a custom import/conversion is unlikely to occupy a single partition image. Most contemporary virtualization solutions tend to expect unmodified bootloaders (installed to MBR or EFI) and in-system kernels etc. Additionally, one is hard pressed to find an OS installation medium that expects to perform anything less than a whole-disk install.
Essentially the only time it is actually preferable to create a directly, linearly allocated raw image is when handling situations where potential fragmentation are markedly detrimental - or are in fact unacceptable - as is the case with swap image files:
Don't forget to add the additional swap to
/dev/sda2 none swap sw,pri=1 0 0
/swap.img none swap sw,pri=2 0 0
Swap image files are an essential tool in production situations when physical RAM is exhausted and can not be allocated to a starved VM suffering out-of-memory process kills and crashes. Or, as I prefer to frame it: they are a last-minute hack permissible only in a crisis and demand revisiting with a real solution that does not all but require ceaseless thrashing to storage what should be properly fixed in RAM. Try to avoid making directly allocated image files from within running VMs that may draw on sparse images and create them externally on the host system (dom0/hypervisor manager) instead. It is usually possible to attach new storage images to a running VM.
It is also necessary to be aware of how tools interact with a sparse image in certain situations; often programs that are not sparse aware or have not been made aware (usually a flag, if supported) will seem to inflate a sparse file as it is being manipulated or moved (i.e.: over sftp), spontaneously filling in unallocated blocks with zero at the destination, thereby undoing their purpose. It is necessary to employ a little forethought:
When files and metadata are deleted from a filesystem contained in a sparsely allocated image the unused blocks are not de-allocated in the host filesystem. There is no mechanism for communication between the two about such operations, which if there were would be akin to an SSD's TRIM functionality. That means that a sparse image's actual size can only increase, never decrease. Furthermore, the recent data written inside the image itself becomes increasingly fragmented as the now "conventionally empty" blocks (allocated but empty, like a normal file) are then re-allocated. When they are reallocated they are quite likely to be assigned chunks of even more non-contiguously arranged data, ever-worsening. It is highly unlikely to become so severe, but depending on the usage pattern, scale and value, you may find it may actually be worth the space and performance reclaimed to simply start over. With a brand new sparse image, copy over the contents of the original so they are written with an efficient, contiguous burst at the beginning of the file. Where it is a simple matter of mounting the contained filesystems simultaneously to access their contents, you can easily leverage
To enlarge a sparse file run:
Where
When you're working with a single partition image, expanding the existing filesystem usually has a fun, easy utility like
Comments
There are no comments for this item.