Pareto-optimal compression

02. Mar 2021

Data compression is an incredibly useful tool in many situations, be it for backups, archiving, or even filesystems. But of the many compressors that there are, which of them is the one to be preferred? To properly answer this question, we first have to answer a different, but related one: What exactly makes a compression algorithm good?

Time for a closer investigation to find the best of the best of the best!

Pareto optimality or what makes a compression algorithm good

Pareto optimality is a situation where no […] preference criterion can be better off without making at least one […] preference criterion worse off […].

~ Wikipedia

The concept of Pareto Optimality is tremendously useful, far beyond the scope of this blogpost, by acknowledging competing interests.

In our particular case of optimal compression, one of the obvious candidates is compression size. The smaller the resulting file is, the better the compressor, right?

But what if we are just compressing so we can speed up upload to a remote location? At this point, it doesn’t matter if the compression is supremely superior to everything else if we have to wait twice as long for it to happen compared to simply upload the uncompressed file.

So for our algorithms, we have at least two criteria that come into play: actual achievable compression and compression cost, which we will measure as “how long does it take to compress our content?”¹. These two are the criteria this blogpost is focused on.

A practical example

Practically speaking, let’s assume two compression tools A and B, having both two compression levels 1 and 2, which we use on a sample file. The results might look like the following:

algorithm	level	file size	time
A	1	60%	2s
A	2	50%	4s
B	1	40%	3s
B	2	30%	11s

We find that while both algorithms increase their compression size with level, B.1 is unconditionally better than A.2, so there is no reason to ever use A.2 if B.1 is available. Besides that, A.1, B.1 and B.2 continuously increase in compression effectiveness as well as time taken for that. For easier digestion, we can visualize these results:

Compression results for different hypothetical compression algorithms, including the Pareto frontier indicated in blue.

Here we clearly see what could already be taken from the table above, but significantly more intuitive. Remembering our definition of Pareto optimality from before, we see that A.2 is not Pareto optimal, as B.1 is both better in time taken as well as in compression effect. This shows more intuitively that in this scenario there is no reason to use A.2 if B.1 is available. For A.1, B.1 and B.2 the choice is not so clear-cut, as the resulting file size can only be reduced further by investing more time into the compression. Hence, all three of them are Pareto optimal and constitute the Pareto frontier or the Pareto set.

One wants to always strive for choosing a Pareto optimal solution whenever possible, as non-Pareto-optimal solutions are always to some degree wasteful.

With this insight into Pareto optimality, we can now put our knowledge into practice and begin our journey to the Pareto frontiers of compression algorithms.

Setup and test procedure for real-world measurements

Data was gathered on a Linux system with a given sample file for the resulting filesize compared to the original as well as the time the process took for compressing. First, the sample file was read from disk entirely, ensuring it would be present in Linux’ filesystem cache, then compressed with each compression program at each level 15 times for a decent statistic, the compressed result is routed directly into /dev/null. All presented compression times presented are the median of these 15 runs, compression size was only measured once after verifying that it was deterministic².

Unless explicitly denoted otherwise, all compressors were run in their default configuration with a single compression thread. Furthermore, all applications on the machine used for benchmarking except the terminal holding the benchmarking process were closed to reduce interference by other processes as much as possible.

The tests for decompression were done analogously (with the obvious exception that the sample file is compressed according to the testcase in play beforehand), single-threaded decompression as well as routing the decompressed result to /dev/null.

All tests were run on several different CPUs: Intel i7-8550U, Intel i5-3320M³ (both mobile CPUs), Intel i7-7700K and AMD Ryzen 7 3800X (both workstation CPUs). While the absolute compression times changed, the Pareto frontiers did not, hence in this blogpost only the plots for the i7-8550U are shown exemplarily.

Case study: initramfs-compression

Just recently, the current maintainer of mkinitcpio, Giancarlo Razzolini, announced the transition from gzip to zstd as the default for initramfs-compression. The initramfs is a little mini-system, whose only objective is to prepare the hardware and assemble all disks so that the main system is readily prepared to take over operation. The initramfs needs to be loaded (and hence decompressed) at every boot as well as recreated occasionally, when components contained in it are updated.

mkinitcpio supports a variety of compression algorithms: none, gzip, bzip2, lzma, lzop, lz4 and most recently zstd.

Quantifying these algorithms and plotting the Pareto frontier for compression yields:

Compression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier. The difficult-to-decipher black culmination at about 34% resulting file size are the bzip2 results. — Compression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier. The difficult-to-decipher black culmination at about 34% resulting file size are the `bzip2` results.

We find that the change of defaults from gzip to zstd was well justified, as gzip can no longer be considered Pareto optimal for this type of file. Choosing lzma as the default would make it even smaller, but this would be paid by a noticable higher resource usage for compression (which has to be invested on every update affecting the initramfs), so from the data zstd is certainly the wiser choice⁴.

This can also be seen when we take a look of Pareto optimality of decompression (after all, this decompression needs to happen on every single system boot):

Decompression results for a standard mkinitcpio-generated initramfs and the corresponding Pareto frontier.

We clearly see that zstd is blasting all other algorithms out of the water when it comes to decompression speed, making it even more of a good choice for this use case. Given these numbers, it is even more of a good choice for an initramfs, not only does it compress fast, it also decompresses impressively fast, six times faster than lzma which was previously known for its quick decompression speed despite high compression factors.

Given the data for zstd, it cannot be ruled out completely that zstd simply hit a non-CPU-bound on decompression, but even if it did, the conclusion for the choice of algorithm does not change.

dracut

If you happen to be on a Linux distribution that uses dracut for initramfs-generation, the conclusions that can be drawn for dracut-initramfs-compressions are the almost identical given the compression and decompression data for a dracut-initrd, the Pareto frontier remains mostly unchanged with just some more levels of zstd in it.

Real world usage recommendation

To use zstd in mkinitcpio, simply use version 30 and above. You can modify the compression level (default is -3) by adding a specific level to COMPRESSION_OPTIONS, but given the data, this doesn’t seem to provide much of a benefit.

For dracut, add compress="zstd" to /etc/dracut.conf to get zstd compression at the default level.

Case study: borgbackup

In the next scenario we will investigate the impact of compression when backing up data with borgbackup. Borg comes with an integrated option to compress the backupped data, with algorithms available being lz4, zstd, zlib/gzip and lzma. In addition to this, borg has an automated detection routine to see if the files backupped do actually compress enough to spend CPU cycles on compressing (see borg help compression for details).

For this scenario, we do not only have one definite sample, but will consider three different samples: A simple text file, being a dump of the Kernel log via the dmesg command, representing textual data. A binary, in this particular case the dockerd ELF binary, (rather arbitrarily) representing binary data. Finally, an mkinitcpio-image, intended as somewhat of a mixed-data sample. We do not consider media files, as these are typically already stored in compressed formats, hence unlikely to compress further and thus dealt with by borg’s compressibility heuristics.

The resulting Pareto frontiers for compression are, starting with least effective compression:

text	binary	initramfs
lz4.2	lz4.1	lz4.1
zstd.1	zstd.1	zstd.1
zstd.2	zstd.2	zstd.2
zstd.4	zstd.3	zstd.3
zstd.3	zstd.4	zstd.4
zstd.5	zstd.5	zstd.5
zstd.6	zstd.6	zstd.6
zstd.7	zstd.7	zstd.7
zstd.8	zstd.8	zstd.8
zstd.9	zstd.9	zstd.9
zstd.10	lzma.0	lzma.0
zstd.11	lzma.1	lzma.1
zstd.12	lzma.2	lzma.2
lzma.2	lzma.3	lzma.3
lzma.3	lzma.4	lzma.4
lzma.6	lzma.5	lzma.5
zstd.19	lzma.6	lzma.6
.	lzma.7	lzma.7
.	lzma.8	lzma.8
.	lzma.9	lzma.9

We see that effectively, except for some brief occurrance of lz4 at the top, the relevant choices are lzma and zstd. More details can be seen in the plots linked in the column headers. Hence, as backups should be run often (with a tool as borg there is little reason for anything else but daily), zstd with a level slightly below 10 seems to be a good compromise of speed and resulting data size.

Real world usage recommendation

Adding --compression=auto,zstd,7 to the borg command used to create a backup will use zstd on level 7 if borgs internal heuristics considers the file in question to compress well, otherwise no compression will be used.

This flag can be added on-the fly, without affecting existing repositories or borgs deduplication. Already backupped data is not recompressed, meaning that adding this flag for use with an existing repository does not require a reupload of everything. Consequentially, it also means that to recompress the entire repository with zstd one effectively has to start from scratch.

Case study: Archiving with tar

Compression can not only be used for backup-tools like borg, it can also be used to archive files with tar. Some compressors have explicit flags in tar, such as gzip (-z), lzma (-J) or bzip2 (-j), but any compression algorithm can be used via tar’s -I flag.

Working with tar poses two challenges with regard to the compressor:

streaming: tar concatenates data and streams the result through the compressor. Hence, to extract files at a certain position of a tarball, the entirety of the data before that file needs to be decompressed as well.
compress-only: as a further consequence of this, tar lacks a feature testing if something is well compressible, so uncompressible data will also be sent through the compressor

If we want to investigate a good compressor without consideration for the input data, we aim for picking a Pareto optimal compressor that takes two properties into consideration:

fast decompression, to be able to easily and quickly extract data from the archive again if need be
good performance on non-compressible data (this particularly means that incompressible data should especially not increase in size).

Compression capabilities

To investigate the decompression capabilities of certain compressors, we can reuse the dataset used on the borg case study and add some incompressible data in form of a flac music file to the mix. As tar has a larger variety of usable algorithms, we include lz4, gzip, lzma, zstd, lzop and brotli as well.

We can exemplarily see the effectiveness of these on the dockerd elf binary (other datasets can be found below) first:

Compression results and the corresponding Pareto frontier for the dockerd elf binary. The unreadable black cluster at 25% compressed size is again bzip2, the one at 34% is predominantly lz4. — Compression results and the corresponding Pareto frontier for the dockerd elf binary. The unreadable black cluster at 25% compressed size is again `bzip2`, the one at 34% is predominantly `lz4`.

Decompression results and the corresponding Pareto frontier for the dockerd elf binary. The clusters are lzma for 20% resulting size, zstd at 25%, brotli at 24% and lz4 at 33%. — Decompression results and the corresponding Pareto frontier for the dockerd elf binary. The clusters are `lzma` for 20% resulting size, `zstd` at 25%, `brotli` at 24% and `lz4` at 33%.

In summary, the Pareto frontiers for the different types of data overall turn out to be (with c for compression and d for decompression):

text (c)	text (d)	binary (c)	binary (d)	initramfs (c)	initramfs (d)	flac (c)	flac (d)
lz4.2	lz4.9	lz4.1	lz4.8	lz4.1	lz4.7	lz4.2	zstd.1
zstd.1	zstd.13	lzop.1	lz4.9	lzop.6	lz4.9	brotli.1	zstd.5
zstd.2	zstd.11	lzop.3	zstd.15	zstd.1	zstd.1	brotli.2	zstd.7
zstd.4	zstd.12	zstd.1	zstd.16	zstd.2	zstd.9	brotli.3	zstd.19
zstd.3	zstd.14	zstd.2	zstd.17	zstd.3	zstd.14	zstd.18	.
zstd.5	zstd.15	zstd.3	zstd.19	zstd.4	zstd.15	zstd.19	.
zstd.6	zstd.19	zstd.4	lzma.8	zstd.5	zstd.16	.	.
zstd.7	.	zstd.5	lzma.9	zstd.6	zstd.17	.	.
zstd.8	.	zstd.6	.	zstd.7	zstd.18	.	.
zstd.9	.	zstd.7	.	zstd.8	zstd.19	.	.
zstd.10	.	zstd.8	.	zstd.9	lzma.8	.	.
brotli.5	.	zstd.9	.	brotli.5	lzma.9	.	.
brotli.6	.	brotli.5	.	lzma.0	.	.	.
brotli.7	.	lzma.1	.	lzma.1	.	.	.
lzma.3	.	lzma.2	.	lzma.2	.	.	.
lzma.6	.	lzma.3	.	lzma.3	.	.	.
zstd.19	.	lzma.4	.	lzma.4	.	.	.
.	.	lzma.5	.	lzma.5	.	.	.
.	.	lzma.6	.	lzma.6	.	.	.
.	.	lzma.7	.	lzma.7	.	.	.
.	.	lzma.8	.	lzma.8	.	.	.
.	.	lzma.9	.	lzma.9	.	.	.

As can be seen in the linked plots, we again find that while lzma still achieves the highest absolute compression, zstd dominates the sweet spot right before computational cost skyrockets. We also find brotli to be an interesting contender here, making it into the Pareto frontier as well. However, with only sometimes making it into the Pareto frontier, whereas lzma and zstd robustly defend their inclusion in it, it seems more advisable to resort to either lzma or zstd as this only provides a sample binary and actual data might vary. Furthermore, when it comes to decompression brotli is not Pareto optimal anymore at all, also indicating lzma and zstd as being the better choice.

Impact on incompressible files in detail

We will take another closer look at the incompressible case, represented by a flac file and strip away bzip2 and lzma, as we could tell from the linked plots that these two clearly increase the size of the result and are hence not Pareto optimal (as they are already beaten by the Pareto optimal case “no compression”).

The results have a clear indication:

Compression results on incompressible data.

Decompression results on incompressible data. The unreadable black cluster at 99.975% size is zstd, the one at 99.988% contains lzop and brotli. — Decompression results on incompressible data. The unreadable black cluster at 99.975% size is `zstd`, the one at 99.988% contains `lzop` and `brotli`.

The recommended choice of algorithm for compression is either brotli or zstd, but when it comes to decompression, zstd takes the lead again. This is of course a cornercase, the details of this might change with the particular choice of incompressible data. However, I do not expect the overall impression to significantly change.

Real world usage recommendation

Concluding this section, the real-world recommendation resulting from this seems to simply use zstd for any tarball compression if available. To do so, tar "-Izstd -10 -T0" can be a good choice, with -T0 telling zstd to parallelize the compression onto all available CPU cores, speeding things up even more beyond our measurements. Depending on your particular usecase it might be interesting to use an alias like

alias archive='tar "-Izstd -19 -T0" -cf'

which allows quickly taring data into a compressed archive via archive myarchive.tar.zst file1 file2 file3 ….

Case study: filesystems

Another usecase for compression is filesystem compression. Conceptually similar to what borg does, files are transparently compressing and decompressed when written to or read from disk.

Among the filesystems capable of such inline-compression are ZFS and btrfs. btrfs supports ZLIB (gzip), LZO (lzop) and zstd, whereas ZFS supports lz4, LZJB (which was not included in these benchmarks as no appropriate binary compressor was found), gzip and ZLE (zero-length-encoding, only compressing zeroes, hence also not tested). zstd support for OpenZFS has been merged, but apparently hasn’t made it into any stable version yet at time of writing according to the OpenZFS documentation.

This case is situated similar to the tar-case study, and as all compressors available for ZFS and btrfs have already been covered in the section above, there is no reason to reiterate these results here.

It shall, however, be noted, that at least for btrfs, the standard flag for filesystem compression adopts a similar heuristic as borg and hence the case of incompressible data might not be so relevant for a btrfs installation. That being said, the conclusion here is a recommendation of zstd, and as we have seen in the last section, the question of incompressible files doesn’t change the overall recommendation.

Real world usage recommendation

If you want to save diskspace by using compression, the mount option compress in combination with zstd is generally a good choice for btrfs. This also includes the compressibility-heuristics (compress-force would be the option that compresses without this heuristics). For ZFS, the general recommendation is consequentially also zstd once it makes its way into a release.

Conclusion

Concluding the experiments laid out in this blogpost, we can effectively state an almost unconditional and surprisingly clear recommendation to simply use zstd for everything. The exact level might depend on the usecase, but overall it has demonstrated to be the most versatile, yet effective compressor around when it comes to effectiveness and speed, both for compression and especially decompression. Furthermore, it has, in contrast to most other contenders, flags for built-in parallelization, which not used in this blogpost at all, and yet zstd still stomped almost the entire competition.

Only if resulting filesize should be pushed down as much as possible, without any regard for computational cost, lzma retains an edge for most kinds of data. In practice, however, the conclusion is to simply use zstd.

Which is a easy though little bit cheated way of asking “how much CPU-time do I have to burn on this?”. Obviously, there are plenty of other criteria that might be relevant, depending on the particular usecase, such as memory consumption. Furthermore, all these criteria also apply for decompression as well as compression, as we will investigate later, technically doubling the amount of criteria we can take into consideration. ↩︎
For algorithms that allow parallelized compressions, this might no longer necessarily be the case, but all data in this blogpost was gathered with non-parallelized compression for all tested algorithms. ↩︎
Fun fact on the site: the entire benchmarking suite (including some more data that is not included in this blogpost) runs 61 days straight on the i5-3320M. Fortunately it’s a bit faster on newer CPUs. :D ↩︎
Furthermore, mkinitcpio runs zstd with -T0 by default, which parallelizes compression to all available cores. This accellerates compression even further, but was not tested in this particular scenario and hence not included in the plot, as most compressors do not support parallelization. But even without parallelization, zstd still makes it to the pareto frontier. There might be another blogpost upcoming to take a look at parallelization at some point, though… ↩︎