Handout On Image File Formats
Handout On Image File Formats
Handout On Image File Formats
Storing and processing images are the two most im- images created by screen dumps from the X-
portant topics of discussion in this course. Images that Window system. This is the standard windowing
we use in our day-to-day activities are stored in a variety system used by UNIX operating systems.
of formats and these images are processed (and saved) • ICO: This is the format used to display icons in
in MATLAB (the image processing software that we use Microsoft Windows operating systems. It allows
in this course), in a variety of file-types. The handout multiple images per file.
on MATLAB discussed about these different file-types • CUR: This is the format used to display the mouse
commonly used, viz. the binary and gray-scale file-types cursor in Microsoft Windows operating systems.
for black and white images and the RGB and indexed In this handout, we will primarily discuss about the
file-types for color images. GIF, JPEG, BMP and TIFF formats. But before going to
In this handout, we will elaborate more about the these formats, we see how image information is stored.
different image formats that an image is usually stored in Image information is stored in two broadly defined
either the World Wide Web (WWW) or in memory disks. formats, these are the vector and raster storing formats.
The big advantage of MATLAB for image processing is Vector images store image information as a collection of
that one can work with images without worrying about lines or vectors and raster images as a collection of dots.
the different graphics formats that the image is actually Each image storing format has its own set of advan-
stored in. Some knowledge of this can however help us tages and disadvantages. The vector format allows us to
in deciding which file-type to use in MATLAB etc. magnify the image to any desired size without losing any
Some of the image formats have been designed to sharpness. However, it is limited by the fact that most
fulfill a particular need (for example, to transmit image natural sceneries lack straight line patterns. The standard
data over a network), others have been designed around vector format is the Adobe PostScript. PostScript is
a particular operations system or environment. the format of choice for images consisting mainly of
The following are the more commonly found image lines, mathematically described curves, architectural (and
formats for storage. industrial) plans and mathematical figures.
• JPEG: Images are created using the Joint Photo- The great bulk of image file formats fit in the raster set,
graphics Experts Group compression standard. We i.e. as a list of gray (or color) intensities of each pixel.
will see more about this in class. Images capured by digital means (cameras and scanners)
• TIFF: “Tagged Image File Format”; a very general are stored in the raster format. The raster format also
format that supports different compression stan- contains some header information which includes the
dards, multiple images per file, and binary, gray- size of the image, some documentation, a colormap and
scale, RGB and indexed images. the kind of compression used etc.
• GIF: “Graphics Interchange Format,” this format
was designed essentially for data transfer. It is pop- I. GIF
ular and well supported, but is somewhat restricted Compuserve GIF (pronounced “jif”) is an image for-
in the image types it can handle. mat that was proposed around the late 80s for distributing
• PNG: “Portable Network Graphics,” this is de- images over networks. GIF uses the raster format and
signed to overcome some of the limitations of GIF. GIF allows not more than 256 different colors per image.
• BMP: Microsoft Bitmap, this format has become Colors are stored using a colormap and it does not allow
very popular with its use by Microsoft operating binary or gray-scale images (except for those that are
systems. produced using RGB combinations). Pixel data is com-
• HDF: “Hierarchical Data Format,” this is an ex- pressed using the LZW (Lempel-Ziv-Welch) technique.
tensible, versatile format designed primarily for use This compression technique works by constructing a
with scientific images. codebook of the data. That is, the first time a pattern
• PCX: This was originally designed for use with is seen, it is placed in the codebook and then the
MS-DOS based software PC Paintbrush. It is also encoder outputs the code for that pattern on subsequent
used by some Microsoft products. appearances. LZW compression can be used on any data,
• XWD: “X Window Dump,” this is used to store but it is limited by licensing terms (as LZW is patented).
A GIF file contains a header that includes image size, • Byte 28-29 Number of bits per pixel
colormap, color resolution, colormap size and a flag that • Byte 30-33 Type of compression standard used
indicates if the colormap is ordered. The GIF format has • Byte 34-37 Size of the image
become one of the standard formats supported by the • Byte 38-41 Horizontal resolution in pixels per meter
World Wide Web and the Java programming language. • Byte 42-45 Vertical resolution in pixels per meter
The PNG format has been more recently introduced to • Byte 46-49 Number of colors used in the image
replace GIF and to overcome some of its disadvantages. (Bytes 28-29 give the maximum possible number
PNG is definitely to be preferred in comparison with of colors, 46-49 what is actually used)
GIF, but being newer, it is not yet as well supported. • Byte 50-53 The number of important colors in the
image
II. JPEG After the header comes the color table, which is used
Compression techniques used by GIF and PNG are only if number of bits per pixel is less than or equal
what are called lossless techniques, i.e. the original to 8 and the total number of bytes used in this case is
information can be recovered completely. The JPEG 4× (Number of colors in the image). This format uses
standard uses lossy compression in which not all the the Intel “little-endian” convention for ordering bytes,
original data can be recovered. Such methods result in wherein for each word of four bytes, the least valued
much higher compression rates, and JPEG images are in byte comes first.
general much smaller than GIF or PNG images.
Compression of JPEG images works by breaking the IV. TIFF
image into 8 × 8 blocks, applying the discrete cosine The Tagged Image File Format (TIFF) is one of the
transform (which will be handled in class very soon) most comprehensive image formats. It can save multiple
to each block, and removing the small values. JPEG images per file. It also allows different compression rou-
images are best used for the representation of natural tines (none at all, LZW, JPEG, Huffman, RLE), different
scenes, since there is more often than not some kind of byte-orderings (little-endian, as in BMP, or big-endian,
redundancy in these images. JPEG is not very suitable in which the bytes retain their order within words). It
for scientific images as the loss might be quite dam- also allows binary, gray-scale, RGB or indexed images.
aging. The mechanics of the JPEG transform ensures For that reason, it requires some skillful programming
that a JPEG image, when restored from its compression to write image reading software which will read all
routine, looks almost similar to the original image. The possible TIFF images. But TIFF is a good format for
differences are in general too small to be noticeable by data exchange.
the human eye.
III. BMP
The Microsoft Windows BMP image format is a fairly
simple example of a binary image format. It consists of a
header followed by the image information. The header is
divided into two parts, the first 14 bytes corresponding to
the “File Header” and the next 40 bytes the “Information
Header.”
The following is a more detailed description of this
header information.
• Byte 0-1 Signature (“BM” in ASCII or 42 4D in
hexadecimal)
• Byte 2-5 File size
• Byte 6-9 All zeros
• Byte 10-13 File offset to the raster data
• Byte 14-17 Size of the information header (= 40
bytes)
• Byte 18-21 Width of the image in pixels
• Byte 22-25 Height of the image in pixels
• Byte 26-27 Number of image planes