Compression formats for kernel.org
What's driving this discussion is the availability of the XZ tool, which is based on the LZMA compression algorithm. XZ offers better compression performance - 53MB on that 2.6.32 tarball - but it suffers from a familiar problem: the tools are not yet widely available in distributions, especially those of the "enterprise" variety. This has led to pushback against the idea of standardizing on XZ in the near future, as can be seen in this comment from Ted Ts'o:
In fact, there is little pressure to replace the gzip format anytime in the near future. Its compression performance may not be the best, but it does have the advantage of being far faster than any of the alternatives. From the discussion, it is fairly clear that some users care about decompression time. What is more likely is that XZ might eventually displace files in the bzip2 format. Then there would be a clear choice: speed and widespread availability or the best available compression. Even that change, though, is likely to be at least a year away; in the mean time, kernel.org will probably carry files in all three formats.
(This discussion also included a side thread on changing the 2.6.xx
numbering scheme. Once again, though, the expected flame wars failed to
materialize. There just does not seem to be much interest in or energy for
this particular change.)
Posted Feb 18, 2010 10:07 UTC (Thu)
by intgr (subscriber, #39733)
[Link] (4 responses)
Also, maybe one shouldn't be measuring decompression time in isolation, but
Posted Feb 18, 2010 11:04 UTC (Thu)
by dlang (guest, #313)
[Link] (3 responses)
I frequently compress my logfiles with gzip -9 even though I know that I will read them a few hours later. I do this because I have measured and found that it's faster to read the compressed data from disk and uncompress it than to read the uncompressed data from disk (even on some fairly beefy disk systems)
with bzip2 this is very much not the case.
I have not had a chance to measure xz in similar conditions yet, but from the sounds of things there's a good possibility that it will be a similar win (and if the decompression can be multithreaded it may be even better)
Posted Feb 18, 2010 14:43 UTC (Thu)
by pointwood (guest, #2814)
[Link] (2 responses)
Posted Feb 18, 2010 17:42 UTC (Thu)
by intgr (subscriber, #39733)
[Link]
Just for some rough figures, I'm decompressing the Linux kernel 2.6.32 source tarball, on my quad-core Phenom II system:
So, parallel bzip2 decompression will probably beat gzip at 8 cores, whereas XZ would be on par with just 2 cores. While XZ is slow at compression, it will definitely beat gzip and bzip2 in parallel decompression.
Posted Feb 23, 2010 6:23 UTC (Tue)
by SEJeff (guest, #51588)
[Link]
Posted Feb 18, 2010 11:58 UTC (Thu)
by zuki (subscriber, #41808)
[Link]
Posted Feb 18, 2010 12:01 UTC (Thu)
by epa (subscriber, #39769)
[Link]
An alternative would be to distribute git trees for each release, but without any of the version history; just put all the files into a fresh git repository and do 'git pack' with maximum settings. Then compress that.
Note that there are at least two LZMA compression programs with a gzip-style interface: XZ and lzip. I have no idea why they haven't merged or at least standardized on a common file format.
Posted Feb 18, 2010 15:20 UTC (Thu)
by dunlapg (guest, #57764)
[Link] (4 responses)
Ubuntu 9.10 gives you a scary warning message:
You are about to do something potentially harmful
Not seeing this replacing bzip2 for at least another year or two.
Posted Feb 18, 2010 19:24 UTC (Thu)
by magnus (subscriber, #34778)
[Link] (3 responses)
Posted Feb 23, 2010 12:58 UTC (Tue)
by nye (guest, #51576)
[Link] (2 responses)
Posted Feb 23, 2010 15:44 UTC (Tue)
by johill (subscriber, #25196)
[Link] (1 responses)
Posted Feb 24, 2010 11:54 UTC (Wed)
by nye (guest, #51576)
[Link]
Posted Feb 18, 2010 18:46 UTC (Thu)
by clugstj (subscriber, #4020)
[Link] (4 responses)
Posted Feb 18, 2010 22:37 UTC (Thu)
by proski (subscriber, #104)
[Link]
Posted Feb 19, 2010 0:01 UTC (Fri)
by gdt (subscriber, #6284)
[Link] (2 responses)
But you are asking users to do exactly that when reporting errors. The LKML doesn't like reports against distribution kernels, it prefers reports again a recent kernel.org kernel. So if you want decent error reports then you've got to make it easy for users — even beginners who have no interest in Linux beyond this one bug that is making their life hell — to download, compile, install and run the kernel.org kernel on their otherwise stock operating system. If you don't want decent error reports and real user testing of recent kernels, then by all means use tools that aren't packaged with recent distributions. In summary: move tar.bz2 to whatever but keep tar.gz.
Posted Feb 19, 2010 0:19 UTC (Fri)
by jspaleta (subscriber, #50639)
[Link] (1 responses)
I don't really expect the vast majority of linux beginners using Google Android or Palm WebOS users to have the technical competence or desire to compile stock kernels on their own without the intervention of Google or Palm employees who originally built and tested the patched kernel binaries being used by their users.
-jef
Posted Feb 19, 2010 10:51 UTC (Fri)
by nix (subscriber, #2304)
[Link]
I mean, sure, if you're using some out-of-tree thingy you got yourself, then obviously you're expected to be able to patch/compile/build your own kernel... but if it came from the distro, then *they* are the ones who should be interacting with upstream to pass on bug reports (although things might be interesting if there are bugs that only the end user can reproduce: in that situation I'd expect a three-way, with upstream providing diagnostic patches, the distro building them for the poor damn user or providing a script to do so, and the user running them and reporting the results. Maybe this is too much wild-eyed dreaming, but the alternative is that bugs in the manifold out-of-tree patches that some distros include will never be fixed unless upstream happens to have just the right hardware to reproduce them.)
Posted Feb 23, 2010 21:28 UTC (Tue)
by meyert (subscriber, #32097)
[Link] (1 responses)
Posted Feb 26, 2010 7:24 UTC (Fri)
by efexis (guest, #26355)
[Link]
Compression formats for kernel.org
format was especially designed to support parallelization, so on modern
quad-core processors it has the potential to be even faster than gzip.
Unfortunately, even though the file format can handle it, the current xz-
utils does not support parallelization yet.
add in download time as well? If the user spent 5 less seconds downloading
the tarball, then does it matter if it takes 5 seconds more to decompress
it?
Compression formats for kernel.org
Compression formats for kernel.org
Compression formats for kernel.org
pbzcat, four threads, takes 4.1 seconds of wall-clock time (15.6s CPU time).
xzcat, single thread, takes just 4.7 seconds.
zcat, single thread, takes 2.3 seconds
Compression formats for kernel.org
it uses a ton of ram and is still slow.
Compression formats for kernel.org
Take a look at lrzip. It is an LZMA version of the older 'rzip', and works by first doing a simple compression using a very large window (say, 200 megabytes) before feeding the data to LZMA. This often allows it to get a better space-speed tradeoff than other compressors. Its strongest performance is when compressing archives with several almost-identical copies of the same data, for example a set of different kernel releases. For just one release, plain LZMA as implemented by XZ might be as good.
lrzip is often the winner
"...the tools are not yet widely available in distributions"
To continue type in the phrase Yes, do as I say!
?]
"...the tools are not yet widely available in distributions"
"...the tools are not yet widely available in distributions"
"...the tools are not yet widely available in distributions"
"...the tools are not yet widely available in distributions"
Compression formats for kernel.org
Exactly. The same is true from the security standpoint. Installing a utility is easier than installing a kernel.
Compression formats for kernel.org
Compression formats for kernel.org
Compression formats for kernel.org
Compression formats for kernel.org
Compression formats for kernel.org
Compression formats for kernel.org