JPEG XL Next-Generation Image Compression Architecture and Coding Tools
JPEG XL Next-Generation Image Compression Architecture and Coding Tools
JPEG XL Next-Generation Image Compression Architecture and Coding Tools
ABSTRACT
An update on the JPEG XL standardization effort: JPEG XL is a practical approach focused on scalable
web distribution and efficient compression of high-quality images. It will provide various benefits compared to
existing image formats: significantly smaller size at equivalent subjective quality; fast, parallelizable decoding and
encoding configurations; features such as progressive, lossless, animation, and reversible transcoding of existing
JPEG; support for high-quality applications including wide gamut, higher resolution/bit depth/dynamic range,
and visually lossless coding. Additionally, a royalty-free baseline is an important goal. The JPEG XL architecture
is traditional block-transform coding with upgrades to each component. We describe these components and
analyze decoded image quality.
Keywords: JPEG XL, image compression, DCT, progressive
1. INTRODUCTION
JPEG1 has reigned over the field of practical lossy image compression ever since its introduction 27 years ago.
Other mainstream media encodings, especially for video and sound, have gone through several generations of
significant and well-rounded improvements during the same time period. One can ask what are the characteristics
of JPEG and the field of lossy image compression that let JPEG keep its position in spite of all the efforts so far.
We have identified two main repeating issues in previous efforts attempting to replace JPEG with more efficient
image encodings, namely, psychovisual performance in professional quality photography and the lack of gradual
migration features. We have addressed them thoroughly in our new proposal, JPEG XL, as well as generally
improved compression performance.
As an outcome of the 79th JPEG meeting in April 2018, the JPEG Committee announced the JPEG XL
activity,2 aiming to standardize a new generation of image coding that offers substantially better compression
efficiency than existing image formats (e.g. 50% size reduction over JPEG), along with features desirable for web
distribution and efficient compression of high-quality images. Proposals were due September 1, 2018, and based
on psychovisual experiments, Google’s PIK3 and Cloudinary’s FUIF4 codecs were chosen as the base framework
for JPEG XL and its further core experiments. As of July 2019, JPEG XL has advanced into the committee
draft phase, and is being prepared for circulation in ISO’s national bodies.
The authors believe that a key factor in the failure to deliver outstanding high quality psychovisual perfor-
mance in new lossy image codecs has been their tendency to target their best performance at too low bit rates,
and the coding methods not extending to higher bit rates without losing most of their efficiency along the way.
This can be most easily observed in image codecs that are derived from successful video encoding research. To
make sure that we get this right with JPEG XL, we have gathered real-world usage data on existing JPEG
utilization, and conducted independent experiments on how this could be mapped into bit rates and quality
settings in JPEG XL. We note that as a result of this focus we have aimed at higher image qualities and bit
rates than previous efforts.
Applications of Digital Image Processing XLII, edited by Andrew G. Tescher, Touradj Ebrahimi,
Proc. of SPIE Vol. 11137, 111370K © The Authors. Published under a Creative Commons
Attribution CC-BY 3.0 License · doi: 10.1117/12.2529237
2. MOTIVATION
Historically, technological advances have not always been associated with a direct improvement in user experience.
For example, with the initial deployment of terrestrial digital television, broadcasting companies often chose
a distribution model that slightly degraded the perceived quality of the program in favor of being able to
distribute more channels. Block artefacts started to appear in fast-moving complex scenes where the previously
uncompressed signal had uncompromised good quality. The economic pressure of having more segmentation in
the market led to a slight reduction in quality.
We would like to prevent this from happening again. Our aim is to build solutions that allow us to deliver
images faster, more economically, robustly, without surprises and with a generally better perceived quality.
There was a time in the history of the internet when downgrading the quality of media was the only option
for internet transfers to happen at bearable latencies. In those times, recording devices also used to produce
images of much lower quality than they do today. However, we no longer live in that era. The average internet
connection speed around the world is steadily increasing; it was reported at 11.03Mbps in May 2019, a 20.65%
increase compared to one year before.5 Alongside, the recording and display capabilities of current phones,
cameras and monitors often have higher resolutions than an average end user can perceive. The current digital
world is evolving towards richer and highly immersive user experiences, where images are expected to look
natural. It is safe to say that optimizing for the quality of images should be regarded as a higher priority than
optimizing for the transfer speed of lower-size images.
To find out the quality of images currently used on the web, we examined the distribution of bits per pixel
(BPP) of small, medium and large JPEG images loaded by users in Google Chrome (Figure 1). The data
was recorded by telemetry; it is pseudonymous and aggregated across users. Telemetry in Chrome is opt-out
by default and can be disabled from the browser’s “Settings” menu. We approximated the number of bytes
transferred at each BPP value by multiplying the image count for a particular BPP value with the BPP value
medium
small
(normalized)
0 1 2 3 4 5 6 7
Bits per pixel
Figure 1. Estimation of bytes transferred at each BPP value. BPP values are aggregated from pseudonymous data
collected by telemetry in Google Chrome. The images are classified by their smallest dimension (width or height) into
small (100–400 pixels), medium (400–1000 pixels) and large (1000+ pixels). The BPP distributions are aggregated across
all images loaded in Chrome across all platform from June 1st to 28th, 2019. The estimated counts are normalized for
each image size category. The mean for each image size is shown with dashed vertical lines. The BPP counts are obtained
from an order of billions of samples.
With these considerations in mind, JPEG XL is designed to handle high-quality images particularly well. Its
aim is serving the current and future needs of an increasingly rich media experience. Ideally, an image should
be compressed up to a point where there is no perceptible loss of quality from the original.
To assess the efficiency of JPEG XL over that of legacy JPEG in the task of encoding images with no
perceptible loss of quality, we performed the following experiment. We selected a corpus of 43 images consisting
of 12 images from imagecompression.info6 (IC dataset) and 31 images from a corpus provided by Jyrki
Alakuijala7 (J31 dataset). We chose the IC dataset because it is a standard benchmark for image compression,
and the J31 dataset because it is designed to capture semantically important low intensity features on a variety
of background colors, such as the fine structure of flower petals; our experience with image compression showed
that red, purple, and magenta backgrounds often cause high frequency features to be lost and that image quality
metrics have difficulty normalizing the differences across different background colors. We used all images from
each dataset, except for leaves iso 1600 and nightshot iso 100 from IC, which had duplicates at a different
ISO level. As the IC images are very large, we rescaled them so that the maximum dimension would be 1024
pixels. All IC images and 6 J31 hence corresponded to the large size category described above, while the other
25 J31 images corresponded to the medium size category.
We compressed each of the 43 images with legacy JPEG and JPEG XL at 24 different quality levels. For
images compressed with JPEG, we used libjpeg1.5.28 with a quality parameter of 40, 44, 48, 52, 56, 60, 64,
68, 72, 76, 80, 82, 84, 86, 88, 90, 92, 94, 95, 96, 97, 98, 99 and 100 for each of the 24 images respectively. For
images compressed with JPEG XL, we used default settings on the latest development version of the codec, and
150
100
50
0
0 1 2 3 4 5 6 7
Bits per pixel
Figure 2. All participant responses groups by condition (JPEG and JPEG XL). The mode was obtained by grouping the
data into bins of size 0.1. The histogram bin size is 0.25. For JPEG, an additional 28 responses larger than 7, with a
maximum of 13.08, are not shown here.
∗
approximately proportional to the inverse of the quality parameter of legacy JPEG, calibrated so that a distance of
1.0 corresponds to barely perceptible artefacts under normal viewing conditions
0
0 1 2 3 4 5 6 7
Bits per pixel
Figure 4. Mean responses by participant and condition (JPEG and JPEG XL). For each participant, each value was
obtained by averaging over individual dataset (IC and J31), then averaging the two values. The histogram bin size is
0.25.
These results indicate that JPEG XL yields a large improvement in the representation efficiency of high-
quality images, where no perceptible loss of quality is desired. By replacing legacy JPEG with JPEG XL, it is
possible to represent such images while almost halving the required number of bits.
3. CODEC FEATURES
JPEG XL aims to be a full-featured image codec, with support for all the features that are expected by modern
image formats. Thus, JPEG XL supports an arbitrary number of channels, including alpha and depth channels;
it also allows parallel, progressive, and partial decoding. Moreover, JPEG XL supports encoding of multiple
animation frames, with different behaviours for how each frame should be blended with the previous ones: they
can be summed, replace the full frame or a sub-rectangle, or alpha-blended. JPEG XL also supports a fully
lossless mode and a responsive-by-design mode.
4. CODEC ARCHITECTURE
This section describes the architecture of the JPEG XL image codec, as per the committee draft of the specifi-
cation.9 An overview is given in Figure 5; the rest of the section gives more details about each component.
• small DCTs of sizes 8 × 4, 4 × 8, 4 × 4, and 2 × 2 cover the opposite use case of larger transforms, such as
very heterogeneous areas, for which it is important to reduce the propagation distance of artefacts. Note
that these small DCTs are still applied on 8 × 8 blocks: as an example, for the 4 × 4 DCT, the transform
is applied on each of the four 4 × 4 areas in the 8 × 8 block; coefficients are then mixed in such a way that
DC ends up in the top-left 2 × 2 corner of the block, and a 2 × 2 DCT is applied on the DC coefficients to
obtain a single DC value. Similar techniques are used to obtain a DC value for other DCTs.
• the IDENTITY transform covers the case of some pixels in the block having very different values from
nearby pixels. As for the 4 × 4 DCT, it applies a transform to each of the four 4 × 4 areas in the 8 × 8 block,
and then proceeds with shuffling the resulting coefficients and doing a 2 × 2 DCT on the DC coefficients.
†
DCT sizes other than 8 use a different DCT algorithm, described by Vashkevich and Petrovsky11
As with JPEG, JPEG XL also has special handling of DC coefficients. For transforms that are bigger than
8 × 8, this creates a problem: there is now only one DC coefficient that spans more than one 8 × 8 block.
JPEG XL handles this situation by computing X Y
8 × 8 pseudo-DC coefficients for each X × Y transform, roughly
corresponding to the average value in each 8 × 8 block, in a reversible way. This can be done thanks to the
following key observation:
Lemma 4.1. Given N and K ≤ N , there exist non-zero constants c0 , . . . , c2N −K −1 such that the following two
procedures are equivalent for any sequence of numbers a0 , . . . , a2N −K −1 :
For example, with N = 4 and K = 3, both procedures compute the same two values, the average of each of the
two halves of a IDCT of size 16 computed on an array with 2 nonzero entries followed by 14 zero entries.
This claim can easily proven by induction on K. By choosing K = 3, and applying the lemma along both
dimensions, this allows us to compute a value that is approximately the average of each 8 × 8 block covered by
a large DCT (computed on pixel values after ignoring some higher frequencies) that also allows to completely
recover an equivalent number of low-frequency DCT coefficients, as the first of the two equivalent procedures in
Lemma 4.1 is clearly reversible.
4.6 DC handling
One of the most noticeable artefacts produced by JPEG compression at lower quality is banding, the transforma-
tion of an area of the image containing a slowly varying color (such as the sky at sunset) into one that contains
a few bands of the same color (see Figure 6). This creates very noticeable artefacts at the boundaries between
two different bands.
Banding in JPEG is typically caused by DC quantization. To avoid this effect, JPEG XL allows using finer
quantization steps for encoding residuals that are close to 0 in the original image. This is in practice equivalent
to using finer quantization steps in areas that are slowly varying, but without sacrificing compression ratio as
much as using finer quantization on the whole image would do.
To further reduce banding in areas with steeper gradients, JPEG XL applies a selective smoothing algorithm
to the DC image, that is only allowed to move values inside their quantization boundaries. If the smoothed value
would be outside of the quantization boundaries, it is discarded, and the original is used.
4.7 LF predictions
To reduce banding further, and to make block artefacts less noticeable in the resulting image, JPEG XL estimates
the low-frequency coefficients of each X × Y transform (the top-left corner of size X Y X Y
4 × 4 , excluding the 8 × 8
X Y X Y
corner) from the DC image. This procedure starts by extending the known 8 × 8 coefficients to have size 4 × 4 ,
by filling missing coefficients with zeros. It then uses the inverse of the first procedure described in Lemma 4.1
on each block to produce a 2× upsampled version of the DC image.
After applying the a smoothing algorithm similar to the one used on the DC image, the upsampled DC is
converted back to DCT coefficients; the low-frequency values that are produced this way are then added to the
encoded low-frequency coefficients.
4.8 AC encoding
JPEG always encodes AC coefficients using the so-called zig-zag order, which proceeds in order of increasing
sum of coordinates in the block.
As JPEG XL uses different transform sizes, it generalizes the zig-zag order to those cases. Moreover, since
this order is not necessarily optimal for a given image, it allows to encode a custom order. This custom order
is encoded as a permutation of the zig-zag order, using a Lehmer15 -like code to achieve efficient encoding of the
identity permutation.
The encoding of AC coefficients proceeds in the order specified above; it keeps track of the number of remaining
non-zeros in the current block, using that value, combined with the current position, to split coefficients into
multiple distributions. No further values are present in the bitstream when it is known that no non-zero coefficient
is left.
Those improvements enable 16% size savings on average in a corpus of 100 000 random images from the
Internet. For larger photographic images, the savings are in the range 13%–22%, depending on the JPEG
encoder and quality settings.
5. CONCLUSIONS
This paper introduces the coding tools used in JPEG XL: a new approach for responsive images, and a variable-
size DCT designed from the ground up for economical storage of high-quality images.
Large-scale user metrics indicate that most use cases on the internet target higher quality levels than previ-
ously assumed in codec design. This helps explain why codecs optimized for lower bitrates have not supplanted
the 27-year-old JPEG standard. We present new quality-of-experience data indicating JPEG XL requires less
than half the bitrate of JPEG to achieve perceptually lossless storage, with fewer outliers. In other experiments,
we have found JPEG XL encodings to be 33–40% of the size of libjpeg output at similar psychovisual quality.
We acknowledge that JPEG and JPEG XL will co-exist. Rather than reduce the quality of existing images
by transcoding, or increase hosting costs by storing both formats, we propose to build a bridge between JPEG
REFERENCES
[1] Wallace, G. K., “The JPEG still picture compression standard,” IEEE transactions on consumer electron-
ics 38(1), xviii–xxxiv (1992).
[2] “Final call for proposals for a next-generation image coding standard (JPEG XL).” https://jpeg.org/
downloads/jpegxl/jpegxl-cfp.pdf. Accessed 2019-07-26.
[3] “Github - google/pik: A new lossy/lossless image format for photos and the internet.” https://github.
com/google/pik. Accessed 2019-07-26.
[4] “Github - cloudinary/fuif: Free universal image format.” https://github.com/cloudinary/fuif. Accessed
2019-07-26.
[5] “Worldwide broadband speed league 2019.” https://www.cable.co.uk/broadband/speed/
worldwide-speed-league/. Accessed: 2019-07-25.
[6] “Image compression benchmark.” http://imagecompression.info/. Accessed: 2019-07-25.
[7] Alakuijala, J., “Image compression benchmark.” https://drive.google.com/corp/drive/folders/0B0w_
eoSgaBLXY1JlYUVOMzM5VFk. Accessed: 2019-07-25.
[8] “libjpeg 1.5.2 release.” https://github.com/libjpeg-turbo/libjpeg-turbo/releases/tag/1.5.2. Re-
leased 2017-08-09.
[9] Rhatushnyak, A., Wassenberg, J., Sneyers, J., Alakuijala, J., Vandevenne, L., Versari, L., Obryk, R.,
Szabadka, Z., Kliuchnikov, E., Comsa, I.-M., Potempa, K., Bruse, M., Firsching, M., Khasanova, R., van
Asseldonk, R., Boukortt, S., Gomez, S., and Fischbacher, T., “Committee draft of JPEG XL image coding
system,” (2019).
[10] Arai, Y., Agui, T., and Nakajima, M., “A fast DCT-SQ scheme for images,” IEICE TRANSACTIONS
(1976-1990) 71(11), 1095–1097 (1988).
[11] Vashkevich, M. and Petrovsky, A., “A low multiplicative complexity fast recursive DCT-2 algorithm,” arXiv
preprint arXiv:1203.3442 (2012).
[12] Huffman, D. A., “A method for the construction of minimum-redundancy codes,” Proceedings of the
IRE 40(9), 1098–1101 (1952).
[13] Rissanen, J. and Langdon, G. G., “Arithmetic coding,” IBM Journal of research and development 23(2),
149–162 (1979).
[14] Duda, J., Tahboub, K., Gadgil, N. J., and Delp, E. J., “The use of asymmetric numeral systems as an
accurate replacement for Huffman coding,” in [2015 Picture Coding Symposium (PCS)], 65–69, IEEE (2015).
[15] Laisant, C.-A., “Sur la numération factorielle, application aux permutations,” Bulletin de la Société
Mathématique de France 16, 176–183 (1888).
[16] Buades, A., Coll, B., and Morel, J.-M., “A non-local algorithm for image denoising,” in [2005 IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition (CVPR’05)], 2, 60–65, IEEE (2005).
[17] Khasanova, R., Wassenberg, J., and Alakuijala, J., “Noise generation for compression algorithms,”
CoRR abs/1803.09165 (2018).
[18] Lakhani, G., “DCT coefficient prediction for JPEG image coding,” in [2007 IEEE International Conference
on Image Processing], 4, IV – 189–IV – 192 (Sep. 2007).
[19] Richter, T., “JPEG on STEROIDS: Common optimization techniques for JPEG image compression,” in
[2016 IEEE International Conference on Image Processing (ICIP)], 61–65 (Sep. 2016).