Grayscale Images and RGB Video:
Compression by Morphological Neural Network
Osvaldo de Souza1, Paulo César Cortez1, and Francisco A.T.F. da Silva2
1
Federal University of Ceará, DETI, Fortaleza, Brazil
osvaldo@ufc.br, cortez@lesc.ufc.br
2
National Institute For Space Research, ROEN, Eusébio, Brazil
tavares@roen.inpe.br
Abstract. This paper investigates image and RGB video compression by a
supervised morphological neural network. This network was originally
designed to compress grayscale image and was then extended to RGB video. It
supports two kinds of thresholds: a pixel-component threshold and pixel-error
counting threshold. The activation function is based on an adaptive
morphological neuron, which produces suitable compression rates even when
working with three color channels simultaneously. Both intra-frame and interframe compression approaches are implemented. The PSNR level indicates that
the compressed video is compliant with the desired quality levels. Our results
are compared to those obtained with commonly used image and video
compression methods. Network application results are presented for grayscale
images and RGB video with a 352 × 288 pixel size.
Keywords: Supervised Morphological
Compression, and Image Compression.
1
Neural
Network,
RGB
Video
Introduction
The loss of data is common in a variety of image and video compression techniques,
and such losses generally occur in parts of the information (redundancy data) that are
not noticed by human eyes. Numerous compression algorithms utilize common
techniques such as “color space sampling” and “redundancy reduction” [1]. The
color space sampling technique is used when it is necessary to reduce the amount of
data needed for the representation (coding) of an image. In the redundancy reduction
technique, compression can be obtained by eliminating the redundancies that appear
in a specific frame (intra-frame) or in a sequence of frames (inter-frame) in a video
stream. Several studies have investigated the use of artificial neural networks (ANN)
in image and video compression [1]. Some researchers [2] investigated image
compression and reconstruction using a radial basis function (RBF) network, while
others [3] proposed a technique called a “point process” that used a combination of
motion estimation, compression, and temporal frame sub-sampling with a random
neural network (RNN). In [4], the authors discussed various ANN architectures for
N. Mana, F. Schwenker, and E. Trentin (Eds.): ANNPR 2012, LNAI 7477, pp. 213–224, 2012.
© Springer-Verlag Berlin Heidelberg 2012
214
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
image compression and presented the results for a back-propagation network (BPN),
hierarchical back-propagation network (HBPN), and adaptive back-propagation
network (ABPN). In [5], they used a self-organizing map (SOM) network to reduce
the number of pixels in each frame of a video sequence. After this modification, each
frame was stored using a Hopfield neural network as a form of video codification. In
[6], they used the growing neural gas (GNG) learning method, another approach
based on a SOM network, in an incremental training algorithm. In [7], the authors
presented the details of an approach in which a neural network is used to determine
the best ratio for discrete cosine transform (DCT) compression. Although there have
been many works related to image and video compression, the use of supervised
morphological neural networks (SMNNs) in this context has not been extensively
investigated thus far. Therefore, in this paper, we investigate the extension of an
SMNN, which was originally designed to compress grayscale images, by applying it
to the compression of RGB video.
We organize the remaining sections of this paper as follows. We first provide a
brief review of the morphological operators involved in the design of the adaptive
morphological neuron. Second, we introduce the SMNN for grayscale image
compression and then extend its application to RGB video compression. Third, we
present the image and video compression results.
2
Brief Review of Morphological Operators
The morphological operators presented in this section were defined in [8] and briefly
in [9], while the researchers in [10] proposed a morphological approach for template
matching.
Definition 1. Let E be a non-void set in
and be an integer number between 0 and
, denoted by
, are operators defined as dilation, erosion, antin.
dilation, and anti-erosion, respectively. Formal definitions for these operators are
given in [8] and [10].
Definition 2. Let a window
be a non-void subset of
is denoted by , according to:
|
. An individual element of
.
(1)
Definition 3. Let
be a window and and be two non-void subsets of ,
such that
, and let be an integer number between 0 and . The symbol
to
refers to Minkowski addition. We denote by ε and δ the operators from
defined in [10].
and
and
Definition 4. Let
define the following functions from to
0,
0,
(
:
,
,
be two integer constants. We
,
,
(2)
(3)
Grayscale Images and RGB Video: Compression by Morphological Neural Network
where
equation.
. The values of
where the length of
and
is the interval
215
are calculated according to de following
2
,
, centered at
2
.
(4)
Definition 5. Let ε and δ be the operators from
to
given by definition 1, and
is an operator
be the functions defined by (2) and (3), respectively.
and
let
from
to
defined by:
where
is a window;
# ;
ε
δ
,
1, … ,
(5)
.
Definition 6. The pattern matching operator from
to
is:
(6)
,…,
The operator represents the intersections between erosions and anti-dilations with
the tolerances introduced by equations (2) and (3), which are controlled by the value
of . Therefore, such operations start to behave as morphological operators with gray
level tolerance. Observe that the operations in equation (6) result in adaptive pattern
matching.
Definition 7. Let
to , which
from
be a threshold. The operator
localizes a concentration of gray levels above or equal to , according to [11] is
defined as:
1,
0,
,
(7)
This threshold operator is a morphological filter, which is useful for adaptive pattern
detection, and it is a key component in the activation function of the morphological
neuron used in this work, as discussed later. The equations and definitions presented
in this section are first applicable to gray scale images. Thus, it is important to note
that because the color components in RGB schema can be represented in a range of
values between 0 to 255, all of the definitions and proofs available in [8-10] and [12]
that were developed for gray scale images are suitable for processing color images,
since we consider only one component of the color at a time. This strategy is adopted
in this work, and we refer to the values between 0 to of the color component, as the
“color variation of the component” (CVC), where 0
255.
3
SMNN for Image Compression
The activation potential and activation function of SMNN for image compression are
based on equations (6) and (7), and they are defined according to:
216
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
,
and
(8)
.
(9)
During the supervised training, the weights first decay in order to accelerate the
weight adjustment process, according to equation (10):
Where 0
1,
defined according to
1
(10)
.
(11)
refers to the weight of neuron k in iteration n, and
1
Then the weights are adjusted according to
,
is the desired value, and β Is an array, defined by
where
.
Note that
equations
where
.
is the learning constant and
is an array with dimensions
where
(12)
(13)
is an array defined by the following
,
.
(14)
(15)
,
where
is
is the complement of ,
and
are the dimensions of
,
(16)
, and finally we have
(17)
.
The activation function for the morphological neuron in the auxiliary layer is:
,
0 ,
1
(18)
Note that equation (18) is a morphological dilation. Figure 1 presents the
architecture for this SMNN; observe that the network is composed of an input layer,
an output layer, and a hidden layer with its auxiliary layer. The input layer receives
the patterns to be learned; in this case, “patterns” refers to the data of the image to
be compressed.
Grayscale Images and RGB Video: Compression by Morphological Neural Network
217
Fig. 1. Architecture of a SMNN
In this architecture, the patterns presented to the input layer must belong to the
domain; the
value refers to the desired value, which is mandatory for error
correction in the learning phase. The network’s output is limited to the
domain.
refers to a sub-image of dimension
with a positive grayscale level between [0,
m].
refers to the output of a morphological neuron of the second layer.
According to definition 4 and equations (2–4) SMNN allows the definition of a
pixel-component threshold, in fact, the tolerance interval . This interval defines the
tolerance of the SMNN to deal with gray level variations between the neurons’
weights and a pattern under processing by the network. In addition, SMNN also
allows the definition of a pixel-error counting threshold. This threshold is responsible
for restricting the neuron’s activation. In the following sections, we extend a SMNN
in order to make it capable of compressing images and RGB video.
3.1
Grayscale Image Compression
The image to be compressed is fragmented into a set of windows. Each element of
this set is processed by all of the morphological neurons (MNs). The Winner neuron
produces an output with value 1 (high), while all of the others produce outputs with
value 0 (low). Let
be an image of dimensions
and positive
grayscale levels within [0, m], and
be a sub-image of , such that the union of all
of the sub-images reconstructs the original image . The reconstruction of image is
defined by:
218
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
(19)
One extension of SMNN is required for image compression: the auxiliary neuron is
loaded with the sequential number of the corresponding MN from the second layer.
The output of the MN is received by its corresponding outstar neuron, which when
excited with a high input, outputs the value loaded during the training phase. Thus,
the network effectively indicates the winner MN that has learned or recognized the
input pattern (which in this particular application refers to a
window). This
output must be preserved, mapping which
window a MN can reproduce. This is
the key for decoding the compressed image; we use a mapping between a window
and the neuron that has “learned” this window. In this way, the nth neuron’s
weights are used in order to reproduce the windows associated with it. This
mapping is defined by
,
,…,
,
|
,
,
(20)
sub-images, regarding image . The value of means the
where is the set of
kth MN and ,
refers to the ith mapping between a MN and a sub-image
(window). Figure 2 shows how an image or a component of the frame is processed
by a SMNN.
Fig. 2. Illustration of how image or frame is compressed by the SMNN
3.2
RGB Video Compression
For the compression of RGB video, each component of the stream is processed in an
individual instance of SMNN. Consequently, at the end of the compression, we obtain
three instances of SMNN. This process is depicted in Figure 3.
Grayscale Images and RGB Video: Compression by Morphological Neural Network
219
Fig. 3. RGB video compression by SMNN
The use of SMNN for video stream compression does not require any adaptation of
its extended proposition for grayscale image compression. Each frame is split into a
sub-images. Then, for each set, we create a
map, according to equation
set of
(20). Finally, an entire component stream is encoded according to:
,
(21)
where is the length of the video stream measured in frames. In the next section we
present the SMNN results related to the compression of grayscale images and RGB
video.
4
Computer Simulations and Discussion
Table 1 presents the results for the compression of Figure 2(G), which is a grayscale
image with 320X240 pixels and 8 bits per pixel (bpp), totaling 76.800 bytes without
compression.
Table 1. Results obtained by applying 3 image compression methods to figure 2(G)
Image
Format
A
B
C
D
E
F
PNG
PNG
NMC
NMC
JPG
JPG
Image Size
320x240
320x240
320x240
320x240
320x240
320x240
Size (bytes)
77.279
54.858
72.040
30.146
41.684
2.468
Compression
ratio
(0,99)
1.39
1.07
2.55
1.84
31.11
Fidelity Criteria
eRMS
2.6
8.4
0.3
4.9
3.6
27.5
220
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
Measurements of
and
are always calculated in relation to image 2(G). In
Table 1, JPG refers to the Joint Photographic Experts Group format, PNG refers to the
portable network graphics and NMC refers to the neural morphological compression
method, produced by SMNN. Each method was used to produced images with the
highest and lowest compression levels possible. To evaluate the fidelity criteria, we
use the root mean square error (eRMS) for an objective evaluation of the images in
Figure 2(A-G).
Fig. 4. Dog Lisbela in different images formats obtained with 3 compression methods
The compression ratio estimation listed in Table 1 were obtained in accordance
refers to original image size in bytes,
refers to
with equation (22), in which
compressed image size in bytes, and eRMS are defined according to equation (23).
(22)
1
,
,
(23)
Figure 5(A) depicts the results for compression with a window size = 3 and the
following variations in the SMNN’s parameters: T from 0.7 to 1.0, and F from 5 to
10. Note that in charts (A) and (B), the value of F was normalized. In chart (B), we
can see the results for a window size = 3 and the same variation in the SMNN’s
parameters as seen in chart (A).
Grayscale Images and RGB Video: Compression by Morphological Neural Network
221
Fig. 5. Results for compression of grayscale image with variations of SMNN’s parameters
For RGB video compression we used the “foreman” stream [13]. This test video
was obtained by converting a CIF video to the RGB color space, sampling with 10
frames. The results are presented for 8×8 and 16×16 window sizes. In Figure 5 we can
see samples of frame ten compressed with variations of SMNN’s parameters.
A
B
C
D
E
F
Fig. 6. Frame ten compressed with various parameters.
Table 2. Samples of results obtained by applying SMNN to 10th frame of Foreman video
Frame Window Size
A
B
C
D
E
F
8×8
8×8
8×8
16×16
16×16
16×16
F
10
20
30
10
20
30
t
0.90
0.90
0.90
0.90
0.90
0.90
Compression
ratio
1.296
2.357
3.698
1.162
1.690
2.227
Objective Quality
(db)
45.47
34.33
30.25
40.31
34.20
30.59
222
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
The foreman video was successfully compressed by a SMNN, as we can see in
table 2, for a windows size 8×8, with a low tolerance to variations in the pixel values
(F = 10) and a pixel-error threshold setting of t=0.90, resulting in a good compression
ratio (3.69) and an acceptable PSNR level. Note that these results refer to the 10th
frame only. To evaluate the fidelity criteria for the compressed images and video we
utilize the peak signal-to-noise ratio (PSNR) for an objective evaluation according to
equation:
10. log
(24)
We extended the investigation of RGB video compression by compressing the first
100 frames of the Foreman video, and compared the results of SMNN with results
from other well-known compression techniques. Table 3 shows these results. Figure 6
shows the PSNR and CR evolution, frame-by-frame, throughout the compression of
the first 100 frames of the Foreman video, while Figure 7 shows the growth in the
number of neurons during this compression.
In Table 3 the results for HEO-II refers to [14], KAMINSKY and JM9.5 refer to
[15], and FS and ANEA refer to [16]. NMC1 refers to the results for SMNN with
t=0.6, F=5.0, and a window with 8×8 pixels size, and NMC2 refers to results for
SMNN with t=0.8, F=20.0, and a window with 4×4 pixels size.
Table 3. Numerical results obtained by applying SMNN to first 100 frames of Foreman video.
Technique
Requires complex
pre-processing?
PSNR
Bit rate
(bits/pixel)
Compression
ratio
HEO-II-100F
NA
0.660
1.51
35.85
0.0702
14.282
35.93
0.0705
14.182
36.33
0.384
2.60
36.29
0.543
1.84
NMC1
(yes)
H264/AVC
(yes)
H264/AVC
(yes)
H264/AVC
(yes)
H264/AVC
(yes)
H264/AVC
(none)
34.30
0.380
2.63
NMC2
(none)
27,93
0.104
9,59
KAMINSKY-100F
JM9.5-100F
FS
ANEA
In Figure 7 the sub-images refer to the number of elements in , measured at the
100th frame (equation (21)).
As we can see in Tables 1 and 3, SMNN gave good results demonstrating that the
network is capable of RGB video compression. Note that SMNN does not require preprocessing and all of results shown in this paper refer to the data without any
secondary compression. Saving SMNN results to a hard-disk using trivial data
compression can improve the final compression rates.
Grayscale Images and RGB Video: Compression by Morphological Neural Network
223
Fig. 7. Frame-by-frame evolution during compression of first 100 frames
Fig. 8. Growth in the numbers of neurons during compression
5
Conclusion
This investigation and the detailed results for SMNN demonstrated that it is practical
for RGB video and grayscale image compression and capable of producing results
comparable to well-known methods. The reconstruction of the compressed image
essentially occurs through data translation from neuron’s weights to the respective
windows, without requiring additional mathematical operations.
References
1. Winkler, S., van den Branden Lambrecht, C.J., Kunt, M.: Vision Models and Applications
to Image and Video Processing, p. 209. Springer (2001)
2. Reddy, et al.: Image Compression and Reconstruction Using a New Approach by Artificial
Neural Network. International Journal of Image Processing (IJIP) 6(2), 68–85 (2012)
3. Cramer, C., Gelenbe, E., Bakircloglu, H.: Low Bit-rate Video Compression with Neural
Networks and Temporal Subsampling. Proceedings of the IEEE 84(10), 1529–1543 (1996)
224
O. de Souza, P.C. Cortez, and F.A.T.F. da Silva
4. Vaddella, R.P.V., Rama, K.: Artificial Neural Networks for Compression of Digital
images: A Review. International Journal of Reviews in Computing, 75–82 (2010)
5. Singh, M.P., Arya, K.V., Sharma, K.: Video Compression Using Self-Organizing Map and
Pattern Storage Using Hopfield Neural Network. In: International Conference on Industrial
and Information Systems (ICIIS), December 28-31, pp. 272–278 (2009)
6. García-Rodríguez, J., Domínguez, E., Angelopoulou, A., Psarrou, A., Mora-Gimeno, F.J.,
Orts, S., García-Chamizo, J.M.: Video and Image Processing with Self-Organizing Neural
Networks. In: Cabestany, J., Rojas, I., Joya, G. (eds.) IWANN 2011, Part II. LNCS,
vol. 6692, pp. 98–104. Springer, Heidelberg (2011)
7. Khashman, A.: Neural Networks Arbitration for Optimum DCT Image Compression. In:
IEEE Eurocon (2007)
8. Banon, G.J.F.: Characterization of Translation Invariant Elementary Morphological
Operators Between Gray-level Images. INPE, São José dos Campos, SP, Brasil (1995)
9. Banon, G.J.F., Faria, S.D.: Morphological Approach for Template Matching. In: Brazilian
Symposium on Computer Graphics and Image Processing Proceedings. IEEE Computer
Society (1997)
10. Faria, S.D.: Uma abordagem morfológica para casamento de padrões, Master Tesis,
National Institute for Space Research, INPE-6346-RDI/597 (1997)
11. Silva, F.A.F.S., Banon, G.J.F.: Rede morfológica não supervisionada (RMNS). In: IV
Brazilian Conference on Neural Networks, pp. 400–405 (1999)
12. Banon, G.J.F., Barrera, J.: Decomposition of Mappings Between Complete Lattices by
Mathematical Morphology – Part I: General Lattices. Signal Processing 30(3), 299–327
(1993)
13. Foreman, Video stream for tests, http://trace.eas.asu.edu/yuv/
14. Heo, J., Ho, Y.-S.: Efficient Differential Pixel Value Coding in CABAC for H.264/AVC
Lossless Video Compression. Circuits, Systems and Signal Processing 31(2), 813–825
(2012)
15. Kaminsky, E., Grois, D., Hadar, O.: Dynamic Computational Complexity and Bit
Allocation for Optimizing H.264/AVC Video Compression. Journal of Visual
Communication and Image Representation 19(1), 56–74 (2008)
16. Saha, A., Mukherjee, J., Sural, S.: A Neighborhood Elimination Approach for Block
Matching in Motion Estimation. Signal Processing, Image Communication 26(8), 438–454
(2011)