Data compression is an incredibly useful tool in many situations, be it for backups, archiving, or even filesystems. But of the many compressors that there are, which of them is the one to be preferred? To properly answer this question, we first have to answer a different, but related one: What exactly makes a compression algorithm good?
Time for a closer investigation to find the best of the best of the best!
Pareto optimality or what makes a compression algorithm good
Pareto optimality is a situation where no […] preference criterion can be better off without making at least one […] preference criterion worse off […].
The concept of Pareto Optimality is tremendously useful, far beyond the scope of this blogpost, by acknowledging competing interests.
In our particular case of optimal compression, one of the obvious candidates is compression size. The smaller the resulting file is, the better the compressor, right?
But what if we are just compressing so we can speed up upload to a remote location? At this point, it doesn’t matter if the compression is supremely superior to everything else if we have to wait twice as long for it to happen compared to simply upload the uncompressed file.
So for our algorithms, we have at least two criteria that come into play: actual achievable compression and compression cost, which we will measure as “how long does it take to compress our content?”1. These two are the criteria this blogpost is focused on.
A practical example
Practically speaking, let’s assume two compression tools
B, having both two compression levels
2, which we use on a sample file.
The results might look like the following:
We find that while both algorithms increase their compression size with level,
B.1 is unconditionally better than
A.2, so there is no reason to ever use
B.1 is available.
B.2 continuously increase in compression effectiveness as well as time taken for that.
For easier digestion, we can visualize these results:
Here we clearly see what could already be taken from the table above, but significantly more intuitive.
Remembering our definition of Pareto optimality from before, we see that
A.2 is not Pareto optimal, as
B.1 is both better in time taken as well as in compression effect.
This shows more intuitively that in this scenario there is no reason to use
B.1 is available.
B.2 the choice is not so clear-cut, as the resulting file size can only be reduced further by investing more time into the compression. Hence, all three of them are Pareto optimal and constitute the Pareto frontier or the Pareto set.
One wants to always strive for choosing a Pareto optimal solution whenever possible, as non-Pareto-optimal solutions are always to some degree wasteful.
With this insight into Pareto optimality, we can now put our knowledge into practice and begin our journey to the Pareto frontiers of compression algorithms.
Setup and test procedure for real-world measurements
Data was gathered on a Linux system with a given sample file for the resulting filesize compared to the original as well as the time the process took for compressing. First, the sample file was read from disk entirely, ensuring it would be present in Linux’ filesystem cache, then compressed with each compression program at each level 15 times for a decent statistic, the compressed result is routed directly into
All presented compression times presented are the median of these 15 runs, compression size was only measured once after verifying that it was deterministic2.
Unless explicitly denoted otherwise, all compressors were run in their default configuration with a single compression thread. Furthermore, all applications on the machine used for benchmarking except the terminal holding the benchmarking process were closed to reduce interference by other processes as much as possible.
The tests for decompression were done analogously (with the obvious exception that the sample file is compressed according to the testcase in play beforehand), single-threaded decompression as well as routing the decompressed result to
All tests were run on several different CPUs:
Intel i5-3320M3 (both mobile CPUs),
Intel i7-7700K and
AMD Ryzen 7 3800X (both workstation CPUs).
While the absolute compression times changed, the Pareto frontiers did not, hence in this blogpost only the plots for the
i7-8550U are shown exemplarily.
Case study: initramfs-compression
Just recently, the current maintainer of mkinitcpio, Giancarlo Razzolini, announced the transition from
zstd as the default for initramfs-compression.
The initramfs is a little mini-system, whose only objective is to prepare the hardware and assemble all disks so that the main system is readily prepared to take over operation.
The initramfs needs to be loaded (and hence decompressed) at every boot as well as recreated occasionally, when components contained in it are updated.
mkinitcpio supports a variety of compression algorithms:
lz4 and most recently
Quantifying these algorithms and plotting the Pareto frontier for compression yields:
We find that the change of defaults from
zstd was well justified, as gzip can no longer be considered Pareto optimal for this type of file.
lzma as the default would make it even smaller, but this would be paid by a noticable higher resource usage for compression (which has to be invested on every update affecting the initramfs), so from the data
zstd is certainly the wiser choice4.
This can also be seen when we take a look of Pareto optimality of decompression (after all, this decompression needs to happen on every single system boot):
We clearly see that
zstd is blasting all other algorithms out of the water when it comes to decompression speed, making it even more of a good choice for this use case.
Given these numbers, it is even more of a good choice for an initramfs, not only does it compress fast, it also decompresses impressively fast, six times faster than lzma which was previously known for its quick decompression speed despite high compression factors.
Given the data for zstd, it cannot be ruled out completely that
zstd simply hit a non-CPU-bound on decompression, but even if it did, the conclusion for the choice of algorithm does not change.
If you happen to be on a Linux distribution that uses
dracut for initramfs-generation, the conclusions that can be drawn for dracut-initramfs-compressions are the almost identical given the compression and decompression data for a dracut-initrd, the Pareto frontier remains mostly unchanged with just some more levels of
zstd in it.
Real world usage recommendation
zstd in mkinitcpio, simply use version 30 and above.
You can modify the compression level (default is
-3) by adding a specific level to
COMPRESSION_OPTIONS, but given the data, this doesn’t seem to provide much of a benefit.
For dracut, add
/etc/dracut.conf to get zstd compression at the default level.
Case study: borgbackup
In the next scenario we will investigate the impact of compression when backing up data with borgbackup.
Borg comes with an integrated option to compress the backupped data, with algorithms available being
In addition to this, borg has an automated detection routine to see if the files backupped do actually compress enough to spend CPU cycles on compressing (see
borg help compression for details).
For this scenario, we do not only have one definite sample, but will consider three different samples:
A simple text file, being a dump of the Kernel log via the
dmesg command, representing textual data.
A binary, in this particular case the
dockerd ELF binary, (rather arbitrarily) representing binary data.
Finally, an mkinitcpio-image, intended as somewhat of a mixed-data sample.
We do not consider media files, as these are typically already stored in compressed formats, hence unlikely to compress further and thus dealt with by borg’s compressibility heuristics.
The resulting Pareto frontiers for compression are, starting with least effective compression:
We see that effectively, except for some brief occurrance of lz4 at the top, the relevant choices are
More details can be seen in the plots linked in the column headers.
Hence, as backups should be run often (with a tool as borg there is little reason for anything else but daily),
zstd with a level slightly below 10 seems to be a good compromise of speed and resulting data size.
Real world usage recommendation
--compression=auto,zstd,7 to the borg command used to create a backup will use
zstd on level 7 if borgs internal heuristics considers the file in question to compress well, otherwise no compression will be used.
This flag can be added on-the fly, without affecting existing repositories or borgs deduplication.
Already backupped data is not recompressed, meaning that adding this flag for use with an existing repository does not require a reupload of everything.
Consequentially, it also means that to recompress the entire repository with
zstd one effectively has to start from scratch.
Case study: Archiving with tar
Compression can not only be used for backup-tools like borg, it can also be used to archive files with tar.
Some compressors have explicit flags in tar, such as
-j), but any compression algorithm can be used via tar’s
Working with tar poses two challenges with regard to the compressor:
- streaming: tar concatenates data and streams the result through the compressor. Hence, to extract files at a certain position of a tarball, the entirety of the data before that file needs to be decompressed as well.
- compress-only: as a further consequence of this, tar lacks a feature testing if something is well compressible, so uncompressible data will also be sent through the compressor
If we want to investigate a good compressor without consideration for the input data, we aim for picking a Pareto optimal compressor that takes two properties into consideration:
- fast decompression, to be able to easily and quickly extract data from the archive again if need be
- good performance on non-compressible data (this particularly means that incompressible data should especially not increase in size).
To investigate the decompression capabilities of certain compressors, we can reuse the dataset used on the borg case study and add some incompressible data in form of a flac music file to the mix.
As tar has a larger variety of usable algorithms, we include
brotli as well.
We can exemplarily see the effectiveness of these on the dockerd elf binary (other datasets can be found below) first:
In summary, the Pareto frontiers for the different types of data overall turn out to be (with c for compression and d for decompression):
|text (c)||text (d)||binary (c)||binary (d)||initramfs (c)||initramfs (d)||flac (c)||flac (d)|
As can be seen in the linked plots, we again find that while
lzma still achieves the highest absolute compression,
zstd dominates the sweet spot right before computational cost skyrockets.
We also find
brotli to be an interesting contender here, making it into the Pareto frontier as well.
However, with only sometimes making it into the Pareto frontier, whereas
zstd robustly defend their inclusion in it, it seems more advisable to resort to either
zstd as this only provides a sample binary and actual data might vary.
Furthermore, when it comes to decompression brotli is not Pareto optimal anymore at all, also indicating
zstd as being the better choice.
Impact on incompressible files in detail
We will take another closer look at the incompressible case, represented by a flac file and strip away
lzma, as we could tell from the linked plots that these two clearly increase the size of the result and are hence not Pareto optimal (as they are already beaten by the Pareto optimal case “no compression”).
The results have a clear indication:
The recommended choice of algorithm for compression is either
zstd, but when it comes to decompression,
zstd takes the lead again.
This is of course a cornercase, the details of this might change with the particular choice of incompressible data.
However, I do not expect the overall impression to significantly change.
Real world usage recommendation
Concluding this section, the real-world recommendation resulting from this seems to simply use
zstd for any tarball compression if available.
To do so,
tar "-Izstd -10 -T0" can be a good choice, with
zstd to parallelize the compression onto all available CPU cores, speeding things up even more beyond our measurements.
Depending on your particular usecase it might be interesting to use an alias like
alias archive='tar "-Izstd -19 -T0" -cf'
which allows quickly taring data into a compressed archive via
archive myarchive.tar.zst file1 file2 file3 ….
Case study: filesystems
Another usecase for compression is filesystem compression. Conceptually similar to what borg does, files are transparently compressing and decompressed when written to or read from disk.
Among the filesystems capable of such inline-compression are ZFS and btrfs.
zstd, whereas ZFS supports
LZJB (which was not included in these benchmarks as no appropriate binary compressor was found),
ZLE (zero-length-encoding, only compressing zeroes, hence also not tested).
zstd support for OpenZFS has been merged, but apparently hasn’t made it into any stable version yet at time of writing according to the OpenZFS documentation.
This case is situated similar to the tar-case study, and as all compressors available for ZFS and btrfs have already been covered in the section above, there is no reason to reiterate these results here.
It shall, however, be noted, that at least for btrfs, the standard flag for filesystem compression adopts a similar heuristic as borg and hence the case of incompressible data might not be so relevant for a btrfs installation.
That being said, the conclusion here is a recommendation of
zstd, and as we have seen in the last section, the question of incompressible files doesn’t change the overall recommendation.
Real world usage recommendation
If you want to save diskspace by using compression, the mount option
compress in combination with
zstd is generally a good choice for btrfs.
This also includes the compressibility-heuristics (
compress-force would be the option that compresses without this heuristics).
For ZFS, the general recommendation is consequentially also
zstd once it makes its way into a release.
Concluding the experiments laid out in this blogpost, we can effectively state an almost unconditional and surprisingly clear recommendation to simply use
zstd for everything.
The exact level might depend on the usecase, but overall it has demonstrated to be the most versatile, yet effective compressor around when it comes to effectiveness and speed, both for compression and especially decompression.
Furthermore, it has, in contrast to most other contenders, flags for built-in parallelization, which not used in this blogpost at all, and yet
zstd still stomped almost the entire competition.
Only if resulting filesize should be pushed down as much as possible, without any regard for computational cost,
lzma retains an edge for most kinds of data.
In practice, however, the conclusion is to simply use
Which is a easy though little bit cheated way of asking “how much CPU-time do I have to burn on this?”. Obviously, there are plenty of other criteria that might be relevant, depending on the particular usecase, such as memory consumption. Furthermore, all these criteria also apply for decompression as well as compression, as we will investigate later, technically doubling the amount of criteria we can take into consideration. ↩︎
For algorithms that allow parallelized compressions, this might no longer necessarily be the case, but all data in this blogpost was gathered with non-parallelized compression for all tested algorithms. ↩︎
Fun fact on the site: the entire benchmarking suite (including some more data that is not included in this blogpost) runs 61 days straight on the i5-3320M. Fortunately it’s a bit faster on newer CPUs. :D ↩︎
Furthermore, mkinitcpio runs
-T0by default, which parallelizes compression to all available cores. This accellerates compression even further, but was not tested in this particular scenario and hence not included in the plot, as most compressors do not support parallelization. But even without parallelization,
zstdstill makes it to the pareto frontier. There might be another blogpost upcoming to take a look at parallelization at some point, though… ↩︎