Data compression is an incredibly useful tool in many situations, be it for backups, archiving, or even filesystems. But of the many compressors that there are, which of them is the one to be preferred? To properly answer this question, we first have to answer a different, but related one: What exactly makes a compression algorithm good?
Time for a closer investigation to find the best of the best of the best!
Pareto optimality or what makes a compression algorithm good
Pareto optimality is a situation where no […] preference criterion can be better off without making at least one […] preference criterion worse off […].
The concept of Pareto Optimality is tremendously useful, far beyond the scope of this blogpost, by acknowledging competing interests.
In our particular case of optimal compression, one of the obvious candidates is compression size. The smaller the resulting file is, the better the compressor, right?
But what if we are just compressing so we can speed up upload to a remote location? At this point, it doesn’t matter if the compression is supremely superior to everything else if we have to wait twice as long for it to happen compared to simply upload the uncompressed file.
So for our algorithms, we have at least two criteria that come into play: actual achievable compression and compression cost, which we will measure as “how long does it take to compress our content?”^{1}. These two are the criteria this blogpost is focused on.
A practical example
Practically speaking, let’s assume two compression tools A
and B
, having both two compression levels 1
and 2
, which we use on a sample file.
The results might look like the following:
algorithm  level  file size  time 

A  1  60%  2s 
A  2  50%  4s 
B  1  40%  3s 
B  2  30%  11s 
We find that while both algorithms increase their compression size with level, B.1
is unconditionally better than A.2
, so there is no reason to ever use A.2
if B.1
is available.
Besides that, A.1
, B.1
and B.2
continuously increase in compression effectiveness as well as time taken for that.
For easier digestion, we can visualize these results:
Here we clearly see what could already be taken from the table above, but significantly more intuitive.
Remembering our definition of Pareto optimality from before, we see that A.2
is not Pareto optimal, as B.1
is both better in time taken as well as in compression effect.
This shows more intuitively that in this scenario there is no reason to use A.2
if B.1
is available.
For A.1
, B.1
and B.2
the choice is not so clearcut, as the resulting file size can only be reduced further by investing more time into the compression. Hence, all three of them are Pareto optimal and constitute the Pareto frontier or the Pareto set.
One wants to always strive for choosing a Pareto optimal solution whenever possible, as nonParetooptimal solutions are always to some degree wasteful.
With this insight into Pareto optimality, we can now put our knowledge into practice and begin our journey to the Pareto frontiers of compression algorithms.
Setup and test procedure for realworld measurements
Data was gathered on a Linux system with a given sample file for the resulting filesize compared to the original as well as the time the process took for compressing. First, the sample file was read from disk entirely, ensuring it would be present in Linux’ filesystem cache, then compressed with each compression program at each level 15 times for a decent statistic, the compressed result is routed directly into /dev/null
.
All presented compression times presented are the median of these 15 runs, compression size was only measured once after verifying that it was deterministic^{2}.
Unless explicitly denoted otherwise, all compressors were run in their default configuration with a single compression thread. Furthermore, all applications on the machine used for benchmarking except the terminal holding the benchmarking process were closed to reduce interference by other processes as much as possible.
The tests for decompression were done analogously (with the obvious exception that the sample file is compressed according to the testcase in play beforehand), singlethreaded decompression as well as routing the decompressed result to /dev/null
.
All tests were run on several different CPUs: Intel i78550U
, Intel i53320M
^{3} (both mobile CPUs), Intel i77700K
and AMD Ryzen 7 3800X
(both workstation CPUs).
While the absolute compression times changed, the Pareto frontiers did not, hence in this blogpost only the plots for the i78550U
are shown exemplarily.
Case study: initramfscompression
Just recently, the current maintainer of mkinitcpio, Giancarlo Razzolini, announced the transition from gzip
to zstd
as the default for initramfscompression.
The initramfs is a little minisystem, whose only objective is to prepare the hardware and assemble all disks so that the main system is readily prepared to take over operation.
The initramfs needs to be loaded (and hence decompressed) at every boot as well as recreated occasionally, when components contained in it are updated.
mkinitcpio supports a variety of compression algorithms: none
, gzip
, bzip2
, lzma
, lzop
, lz4
and most recently zstd
.
Quantifying these algorithms and plotting the Pareto frontier for compression yields:
We find that the change of defaults from gzip
to zstd
was well justified, as gzip can no longer be considered Pareto optimal for this type of file.
Choosing lzma
as the default would make it even smaller, but this would be paid by a noticable higher resource usage for compression (which has to be invested on every update affecting the initramfs), so from the data zstd
is certainly the wiser choice^{4}.
This can also be seen when we take a look of Pareto optimality of decompression (after all, this decompression needs to happen on every single system boot):
We clearly see that zstd
is blasting all other algorithms out of the water when it comes to decompression speed, making it even more of a good choice for this use case.
Given these numbers, it is even more of a good choice for an initramfs, not only does it compress fast, it also decompresses impressively fast, six times faster than lzma which was previously known for its quick decompression speed despite high compression factors.
Given the data for zstd, it cannot be ruled out completely that zstd
simply hit a nonCPUbound on decompression, but even if it did, the conclusion for the choice of algorithm does not change.
dracut
If you happen to be on a Linux distribution that uses dracut
for initramfsgeneration, the conclusions that can be drawn for dracutinitramfscompressions are the almost identical given the compression and decompression data for a dracutinitrd, the Pareto frontier remains mostly unchanged with just some more levels of zstd
in it.
Real world usage recommendation
To use zstd
in mkinitcpio, simply use version 30 and above.
You can modify the compression level (default is 3
) by adding a specific level to COMPRESSION_OPTIONS
, but given the data, this doesn’t seem to provide much of a benefit.
For dracut, add compress="zstd"
to /etc/dracut.conf
to get zstd compression at the default level.
Case study: borgbackup
In the next scenario we will investigate the impact of compression when backing up data with borgbackup.
Borg comes with an integrated option to compress the backupped data, with algorithms available being lz4
, zstd
, zlib
/gzip
and lzma
.
In addition to this, borg has an automated detection routine to see if the files backupped do actually compress enough to spend CPU cycles on compressing (see borg help compression
for details).
For this scenario, we do not only have one definite sample, but will consider three different samples:
A simple text file, being a dump of the Kernel log via the dmesg
command, representing textual data.
A binary, in this particular case the dockerd
ELF binary, (rather arbitrarily) representing binary data.
Finally, an mkinitcpioimage, intended as somewhat of a mixeddata sample.
We do not consider media files, as these are typically already stored in compressed formats, hence unlikely to compress further and thus dealt with by borg’s compressibility heuristics.
The resulting Pareto frontiers for compression are, starting with least effective compression:
text  binary  initramfs 

lz4.2  lz4.1  lz4.1 
zstd.1  zstd.1  zstd.1 
zstd.2  zstd.2  zstd.2 
zstd.4  zstd.3  zstd.3 
zstd.3  zstd.4  zstd.4 
zstd.5  zstd.5  zstd.5 
zstd.6  zstd.6  zstd.6 
zstd.7  zstd.7  zstd.7 
zstd.8  zstd.8  zstd.8 
zstd.9  zstd.9  zstd.9 
zstd.10  lzma.0  lzma.0 
zstd.11  lzma.1  lzma.1 
zstd.12  lzma.2  lzma.2 
lzma.2  lzma.3  lzma.3 
lzma.3  lzma.4  lzma.4 
lzma.6  lzma.5  lzma.5 
zstd.19  lzma.6  lzma.6 
.  lzma.7  lzma.7 
.  lzma.8  lzma.8 
.  lzma.9  lzma.9 
We see that effectively, except for some brief occurrance of lz4 at the top, the relevant choices are lzma
and zstd
.
More details can be seen in the plots linked in the column headers.
Hence, as backups should be run often (with a tool as borg there is little reason for anything else but daily), zstd
with a level slightly below 10 seems to be a good compromise of speed and resulting data size.
Real world usage recommendation
Adding compression=auto,zstd,7
to the borg command used to create a backup will use zstd
on level 7 if borgs internal heuristics considers the file in question to compress well, otherwise no compression will be used.
This flag can be added onthe fly, without affecting existing repositories or borgs deduplication.
Already backupped data is not recompressed, meaning that adding this flag for use with an existing repository does not require a reupload of everything.
Consequentially, it also means that to recompress the entire repository with zstd
one effectively has to start from scratch.
Case study: Archiving with tar
Compression can not only be used for backuptools like borg, it can also be used to archive files with tar.
Some compressors have explicit flags in tar, such as gzip
(z
), lzma
(J
) or bzip2
(j
), but any compression algorithm can be used via tar’s I
flag.
Working with tar poses two challenges with regard to the compressor:
 streaming: tar concatenates data and streams the result through the compressor. Hence, to extract files at a certain position of a tarball, the entirety of the data before that file needs to be decompressed as well.
 compressonly: as a further consequence of this, tar lacks a feature testing if something is well compressible, so uncompressible data will also be sent through the compressor
If we want to investigate a good compressor without consideration for the input data, we aim for picking a Pareto optimal compressor that takes two properties into consideration:
 fast decompression, to be able to easily and quickly extract data from the archive again if need be
 good performance on noncompressible data (this particularly means that incompressible data should especially not increase in size).
Compression capabilities
To investigate the decompression capabilities of certain compressors, we can reuse the dataset used on the borg case study and add some incompressible data in form of a flac music file to the mix.
As tar has a larger variety of usable algorithms, we include lz4
, gzip
, lzma
, zstd
, lzop
and brotli
as well.
We can exemplarily see the effectiveness of these on the dockerd elf binary (other datasets can be found below) first:
In summary, the Pareto frontiers for the different types of data overall turn out to be (with c for compression and d for decompression):
text (c)  text (d)  binary (c)  binary (d)  initramfs (c)  initramfs (d)  flac (c)  flac (d) 

lz4.2  lz4.9  lz4.1  lz4.8  lz4.1  lz4.7  lz4.2  zstd.1 
zstd.1  zstd.13  lzop.1  lz4.9  lzop.6  lz4.9  brotli.1  zstd.5 
zstd.2  zstd.11  lzop.3  zstd.15  zstd.1  zstd.1  brotli.2  zstd.7 
zstd.4  zstd.12  zstd.1  zstd.16  zstd.2  zstd.9  brotli.3  zstd.19 
zstd.3  zstd.14  zstd.2  zstd.17  zstd.3  zstd.14  zstd.18  . 
zstd.5  zstd.15  zstd.3  zstd.19  zstd.4  zstd.15  zstd.19  . 
zstd.6  zstd.19  zstd.4  lzma.8  zstd.5  zstd.16  .  . 
zstd.7  .  zstd.5  lzma.9  zstd.6  zstd.17  .  . 
zstd.8  .  zstd.6  .  zstd.7  zstd.18  .  . 
zstd.9  .  zstd.7  .  zstd.8  zstd.19  .  . 
zstd.10  .  zstd.8  .  zstd.9  lzma.8  .  . 
brotli.5  .  zstd.9  .  brotli.5  lzma.9  .  . 
brotli.6  .  brotli.5  .  lzma.0  .  .  . 
brotli.7  .  lzma.1  .  lzma.1  .  .  . 
lzma.3  .  lzma.2  .  lzma.2  .  .  . 
lzma.6  .  lzma.3  .  lzma.3  .  .  . 
zstd.19  .  lzma.4  .  lzma.4  .  .  . 
.  .  lzma.5  .  lzma.5  .  .  . 
.  .  lzma.6  .  lzma.6  .  .  . 
.  .  lzma.7  .  lzma.7  .  .  . 
.  .  lzma.8  .  lzma.8  .  .  . 
.  .  lzma.9  .  lzma.9  .  .  . 
As can be seen in the linked plots, we again find that while lzma
still achieves the highest absolute compression, zstd
dominates the sweet spot right before computational cost skyrockets.
We also find brotli
to be an interesting contender here, making it into the Pareto frontier as well.
However, with only sometimes making it into the Pareto frontier, whereas lzma
and zstd
robustly defend their inclusion in it, it seems more advisable to resort to either lzma
or zstd
as this only provides a sample binary and actual data might vary.
Furthermore, when it comes to decompression brotli is not Pareto optimal anymore at all, also indicating lzma
and zstd
as being the better choice.
Impact on incompressible files in detail
We will take another closer look at the incompressible case, represented by a flac file and strip away bzip2
and lzma
, as we could tell from the linked plots that these two clearly increase the size of the result and are hence not Pareto optimal (as they are already beaten by the Pareto optimal case “no compression”).
The results have a clear indication:
The recommended choice of algorithm for compression is either brotli
or zstd
, but when it comes to decompression, zstd
takes the lead again.
This is of course a cornercase, the details of this might change with the particular choice of incompressible data.
However, I do not expect the overall impression to significantly change.
Real world usage recommendation
Concluding this section, the realworld recommendation resulting from this seems to simply use zstd
for any tarball compression if available.
To do so, tar "Izstd 10 T0"
can be a good choice, with T0
telling zstd
to parallelize the compression onto all available CPU cores, speeding things up even more beyond our measurements.
Depending on your particular usecase it might be interesting to use an alias like
alias archive='tar "Izstd 19 T0" cf'
which allows quickly taring data into a compressed archive via archive myarchive.tar.zst file1 file2 file3 …
.
Case study: filesystems
Another usecase for compression is filesystem compression. Conceptually similar to what borg does, files are transparently compressing and decompressed when written to or read from disk.
Among the filesystems capable of such inlinecompression are ZFS and btrfs.
btrfs supports ZLIB
(gzip
), LZO
(lzop
) and zstd
, whereas ZFS supports lz4
, LZJB
(which was not included in these benchmarks as no appropriate binary compressor was found), gzip
and ZLE
(zerolengthencoding, only compressing zeroes, hence also not tested).
zstd
support for OpenZFS has been merged, but apparently hasn’t made it into any stable version yet at time of writing according to the OpenZFS documentation.
This case is situated similar to the tarcase study, and as all compressors available for ZFS and btrfs have already been covered in the section above, there is no reason to reiterate these results here.
It shall, however, be noted, that at least for btrfs, the standard flag for filesystem compression adopts a similar heuristic as borg and hence the case of incompressible data might not be so relevant for a btrfs installation.
That being said, the conclusion here is a recommendation of zstd
, and as we have seen in the last section, the question of incompressible files doesn’t change the overall recommendation.
Real world usage recommendation
If you want to save diskspace by using compression, the mount option compress
in combination with zstd
is generally a good choice for btrfs.
This also includes the compressibilityheuristics (compressforce
would be the option that compresses without this heuristics).
For ZFS, the general recommendation is consequentially also zstd
once it makes its way into a release.
Conclusion
Concluding the experiments laid out in this blogpost, we can effectively state an almost unconditional and surprisingly clear recommendation to simply use zstd
for everything.
The exact level might depend on the usecase, but overall it has demonstrated to be the most versatile, yet effective compressor around when it comes to effectiveness and speed, both for compression and especially decompression.
Furthermore, it has, in contrast to most other contenders, flags for builtin parallelization, which not used in this blogpost at all, and yet zstd
still stomped almost the entire competition.
Only if resulting filesize should be pushed down as much as possible, without any regard for computational cost, lzma
retains an edge for most kinds of data.
In practice, however, the conclusion is to simply use zstd
.

Which is a easy though little bit cheated way of asking “how much CPUtime do I have to burn on this?”. Obviously, there are plenty of other criteria that might be relevant, depending on the particular usecase, such as memory consumption. Furthermore, all these criteria also apply for decompression as well as compression, as we will investigate later, technically doubling the amount of criteria we can take into consideration. ↩︎

For algorithms that allow parallelized compressions, this might no longer necessarily be the case, but all data in this blogpost was gathered with nonparallelized compression for all tested algorithms. ↩︎

Fun fact on the site: the entire benchmarking suite (including some more data that is not included in this blogpost) runs 61 days straight on the i53320M. Fortunately it’s a bit faster on newer CPUs. :D ↩︎

Furthermore, mkinitcpio runs
zstd
withT0
by default, which parallelizes compression to all available cores. This accellerates compression even further, but was not tested in this particular scenario and hence not included in the plot, as most compressors do not support parallelization. But even without parallelization,zstd
still makes it to the pareto frontier. There might be another blogpost upcoming to take a look at parallelization at some point, though… ↩︎