Digital Image Processing
Digital Image Processing
Introduction Hardware for image processing - Basics Eye Human vision sensor
Mars Orbiter depicts Marineris canyon on Mars. This image is obtained using several images with so called stereo camera.
Course information
Lecturer: Number of lectures: Type of examination: Literature: Other resources: Dr Igor urovi 3+1 depending on number of students book
www.obradaslike.cg.ac.yu PPT presentations examples test images English textbooks available etc
Covered topics
Introduction and image acquisition Human vision, color models and image formation Color and point transforms - histogram Geometrical transforms Interpolation Image in spectral domains Filtering Basics of image reconstruction Edge detection Basics of image recognition
Other topics
Compression Digital image protection Motion picture processing Stereo images Superresolution Computer graphics etc. course Multimedia systems
History
Photography appeared in XIX century. Ideas for developing fax machines and usage of telegraphic lines for image submission was born during the World War I. The idea about TV development was born around 1930. The key event in digital image processing was development of the first electronic computing machines. These machines enabled simple and fast image processing. The second important event was astronomical exploration and race to space. JPL from California had the first task related to the digital image processing within NASA USA space program.
Image acquisition
source of radiation normal line
surface
Usually we will assume that source of radiation is within visible light frequency domain but it could be other electromagnetic frequency bands (Xrays, gamma rays, radio waves), ultrasound, vibrations etc. sensors receive signals reflected from the surface.
Image Acquisition
Optical system Sensor Digitaizer
This is non-electronic part that consists of lenses and other similar parts; it can be modeled as 2D linear space invariant system with impulse response h(x,y) that can be approximately known in advance.
Image Acquisition
f(x,y) can be considered as a power of the light (or some other signal that is subject of visualization). h(x,y) is impulse response of the optical system. Output of the optical system is optical signal. For linear space invariant systems output is given as:
b( x, y) =
zz
f (, )h( x , y )dd
since b(x,y) and f(x,y) represent optical power they are non-negative
2D convolution (very similar to the 1D convolution for linear time invariant systems)
Image Acquisition
Optical system can deform image. Companies producing systems can estimate distortion caused by h(x,y) and they can develop techniques to reduce effects of distortion. The second important element in optical system is sensor that transforms optical signal into electrical equivalent. We will not study sensors since they are subject to quite fast development and we will not consider it within this course. Some details related to sensors are given in our textbook but development in this area is quite fast and some of them can already be too old.
Digitizer
Analog electric equivalent i(x,y) is transformed to the digital version within two procedures:
Sampling Quantization 1 1 2 Based on image context in point or in area value of the luminance for image pixel i(1,1) is determined 3 4 5 6 7 2 3 4 5 6 7 i(x,y)
Pixel (picture element) is elementary point of the image. The fact that eye considers numerous close dots as a continual image is used.
Digitizer
Sampling phase is followed with digitalization (digitalization is performed by using quantization to closest value of the quantization steps multiple). This integer can easily be represented by binary number. Number of quantization levels is commonly 2k where k is integer. i(n1,for fixed n2) n1
Fax.
Developed in some variant at beginning of the XX century. Digitalization of black points on the paper and coded submission. Standardized.
Quality improvement in the last 5 years is larger than in the last 50 years.
Image Digitizers
Parts
Sampling apparatus Scanning mechanical parts Source of light Quantizier (commonly AD converter) Memory medium
Scanning technologies:
Important characteristics
SCAN-IN source of light is moving through the image part by part SCAN-OUT entire image is illuminated while sampling is performed part by part
Pixel size Image size What is transformed to image (for visible objects it is its transmittance) Linearity Check in the textbook and on the Internet for more details on digitalization technology
TV
Digital TV (DVB) is in rapid progress presently. Digital TV standard is called the HDTV. Since there is rapid development it can happen that other standards emerge. High compression lower quality formats are developed for video streaming over the Internet. Development of new generations of digital videodiscs requires also new high quality video and image formats.
It is possible to visualize signals that are not produced with electromagnetic rays such as for example vibrations in seismology.
There are several methods for solving this problem. In addition, there could be additional cards for video data processing. Also, there are rather expensive special purpose machines graphical stations.
Printers
Printers are developed for giving images on paper. There are numerous types of printers. Here, we will give some data about the most important ones:
Matrix printers (inked ribbon pressed on the paper with needles called pins). Line printers (similar technology like in the matrix printers but with ability to print entire line at once) Laser and LED printers (electro photography based technology described in 1938 but applied in 1980 by Canon). This technology as one of the most important is described in details on next couple of slides. Inkjet and bubble jet printers (one or more boxes with ink is positioned above the paper and appropriate amount of color is put on the paper by adjusting piezoelectric device in inkjet printers and by heater in the bubble jet printers).
For illuminating photoconductive material in the LEDT printers LED diodes are used. The LED technology has ability to print entire line at once.
Printing
Laser printers print material in single color and due to the imperfection of human visual system we see different amounts of the same color as different colors. There are numerous alternative printing schemes and apparatus. Two the most important characteristics of printers are dpi=dots per inch (determines printer resolutions and how fine is our printed page; 600dpi is today assumed as reasonable minimum) and ppm=page per minute.
Displays
Displays are devices for displaying images. We will not consider here so called permanent displays that produce permanent images (for example Xerox machines). Under displays we assume:
Computer monitors, Medical devices for displaying images, Projectors, etc.
Monitors
Three basic technologies:
CRT cathode ray tube technology based on physical properties of illuminated phosphorous LCD liquide crystal displays PDP plasma display panel (with special purpose gas on high voltage).
CRT monitors have numerous disadvantages (they are large, energy demanding, different problems with image presentation (shaking etc); not usable in military applications etc.) However, they still have important advantages: cheap; produced by numerous companies; allow looking under large angles; etc.
for CRT monitors ratio typical 1600x1200 How many images width/height=4/3 is presented in the Parts of monitor close to limits are screen in 1sec not usable in the CRT monitors
Lowpass pattern
Numerous close pixels on small distance are recognized as continuous shade and we are unable to recognize separate pixels. One pixel can be seen in the following manner.
pixel enlarged more than 100 times Pixel region Due to the numerous reasons (imprefection of the monitor, imperfection of eyes and other reasons) we cannot see pixel in ideal manner.
Lowpass pattern
Luminance of singe pixel is modeled as: Aexp(-(x2+y2)/R2) , A is maximal luminance, R depends on monitor quality. What happens when pixels with the same luminance are in neighborhood? resulting luminance is
y d is distance between pixels not flat
y-axis is neglected for brevity
d 2 d
x luminance of single pixel d d
Lowpass pattern
Monitors have good lowpass pattern when oscillations in luminance are as small as possible. Details related to the lowpass pattern optimization are given in textbook.
Highpass pattern
Highpass pattern is formed with alternating lines in different colors. Due to the previously described imperfections changes between colors are not abrupt and in the case of bad monitors it is smooth or blurred. Details related to highpqss pattern can be found in the textbook. Why this pattern is called highpass? With the same purpose checkboard pattern formed as the chess board is introduced.
Cameras
Light photons hit Polutransparentna the Semitransparent fotokatoda photocathode photocathode +400V +600V +200V that release I +700V primary R electrons. These electrons light Svjetlost accelerated by electric field hit next cathodes +300V +500V +100V 0V causing Sekundarni Secondary avalanche effect elektroni electrons Primary Primarni electrons with amplified elektroni number of Output of the system is video signal, i.e., electrical equivalent electrons. (significantly amplified) of light signal.
Digital cameras
Analogous cameras have several advantages but cheaper and cheaper digital video cameras are in use. There are two types of digital cameras:
CCD (Charged Coupled Device) sensors CID (Charge Injection Device) sensors
CCD sensors are dominant (due to price reasons) and only they will be analyzed here. Details related to the CID sensors can be found in the textbook.
Digital cameras
The main part of this camera is photodiode field:
Photodiodes
MOS switches
Photodiode field is made on the single chip. Diodes works as condensators in light integrating regime. Opening of MOS switches causes flow of electricity that is proportional to the charge integrated on diodes.
Digital cameras
This electricity is proportional to luminance within given period of integration. This is similar (with little bit higher inertia) to the vidicon. There are problems with reading data from this sensors. There are three reading strategy.
Oblast slike Image region Pomjeraj line shift linije Full frame oblast Pomjeraj line shift linije Pomjeraj line shift linije Oblast za Storage region smije{tanje
Serijski registar Serijski registar
Serijski registar
Check textbook for details on the third strategy. In the case of color images acquisition there are color filters that allow only single basic color for one pixel and combination of images from diodes with various filters form the color image.
Basic colors will be discussed later.
Human eye
Eye is one of five human senses. 5/6 of all information we are getting using the eyes. Work of eyes is similar to other sensors.
1. posterior compartment 2. ora serrata 3. ciliary muscle 4. ciliary zonules 5. canal of Schlemm 6. pupil 7. anterior chamber 8. cornea 9. iris 10. lens cortex 11. lens nucleus 12. ciliary process 13. conjunctiva 14. inferior oblique muscule 15. inferior rectus muscule 16. medial rectus muscle 17. retinal arteries and veins 18. optic disc 19. dura mater 20. central retinal artery 21. central retinal vein 22. optical nerve 23. vorticose vein 24. bulbar sheath 25. macula 26. fovea 27. sclera 28. choroid 29. superior rectus muscule 30. retina
Light is transformed in visual cells to electric impulses. Number of visual cells decreases with moving away from the yellow spot On relatively small distance from the yellow spot is so called blind spot. It is position where optical nerve exits. This nerve is connected with the brain. Optical nerve is connected with visual cells with ganglions. This nerve integrate outputs of visual cells.
Steps in vision
Optical nerve is rather long (about 6m) and it connects eye with the very sensitive part of the brain called (visual) cortex. Our brains reconstruct object images. We are looking with eyes but we see with brain!!! Human visual system is inertial. We are unable to create image immediately after abrupt luminance change. Transport of optical information through the optical nerve is relatively slow (comparing to wired connections in machines). This leads to eye persistence we are able to see 24 images in second.
Problems in eye
Problems in focusing light in the eye leads to missing yellow spot then we have two known drawbacks short-sightedness and long-sightedness. Due to slower reaction of the muscles that manage the size of eye lenses special type of long-sightedness common for older people can develop. Problem with development of one of types of cone cells can follow to daltonism. There are other very danger deceases trachoma, conjunctivitis, cataract. Different other problems within eyes including physical damage can cause problems in vision.
460nm
580nm
650nm
[nm] wavelength
Combination of response from these three groups of cone cells gives our color vision.
Exercise No 1
Sensitivity to grayscale
The procedure will be performed as follows
We will create image with 16x16 pixels with 256 shades of gray Then this image will be shown Try to detect different shades
MATLAB program clear k=0; for m=1:16 for n=1:16 A(m,n)=k; k=k+1; end end pcolor(A),colormap(gray(256)),shading interp
Exercise No 1
Obtained image Question:
How many different grayscale you can observe?
Exercise No 2
Create binary image with chess-board pattern. MATLAB code:
[m,n]=meshgrid(1:100,1:100); r=rem(m+n,2); imshow(r) Adjust image to 100x100 monitor pixels What can you conclude about quality of your monitor for highpass images?
For self-exercise
Inform yourselves about recent development in the CCD sensors. Consider work of some scanner. Detect basic parts. Consider construction of own scanner. Try to find data about required parts. Inform yourselves about work of the LCD and plasma displays. Analyze prices and performance of video-projectors.
For self-exercise
Find data about various types of printing machines, especially about construction and parts. Find data about modern computer cards form graphics and video processing, performance and construction. Find data about devices for medical imaging.
Eye sensitivity
Eye is more sensitive to lower frequencies with red colors than to higher frequencies with blue colors. During the daylight the eye is the most sensitive to wavelength 555m (close to green). With decrease of the luminance the frequency of the most sensitive point decreases and it is approaching toward the dark red colors (wavelength increases). Eye has ability to adopt itself to the luminance level. In order to have an impression of looking to the same object it is just required to keep constant ratio between object luminance and total luminance.
Eye sensitivity
Eye ability to detect luminance variations depend on total luminance of the object. For bright objects we need larger variation in luminance in order to be able to detect it while for dark object it is smaller. Ratio between detectable variation of luminance with total luminance is constant for a particular eye. There is a threshold in detection of noise (disturbances) in image. Threshold level increases with luminance and it is more difficult to detect noise in bright than in dark images.
Eye sensitivity
Sensitivity to contrast assumes ability of eye to distinguish certain number of different luminance levels. This sensitivity decreases with increasing of noise level. Noise is more difficult for detection in images with large number of details than in image without details. Noise correlated with image can be detected in easier manner than non-correlated noise. Noise in motion (video) sequences causes fatigue of spectators.
RGB cube
The simplest method for representing colors is RGB cube in Cartesian coordinates. Coordinates correspond to three basic colors: red, green and blue. Maximal luminance is 1 in any of colors while minimal is 0.
(0,0,1) (1,0,1) Magenta lack of green
G
(0,1,0) (1,1,0) Yellow lack of blue
(1,0,0)
RGB cube
Black=(0,0,0) White=(1,1,1) On the main diagonal r=g=b are grayscale (achromatic) colors. Achromatic colors can be described by single coordinate only (luminance).
Example: Section of the color cube with the plane 0.5r+0.3g+0.2b=1.
RGB model
RGB model is used in computer and TV monitors. For example classical cathode ray tube monitors have phosphorous grains that bombarded with electrons produce red, green and blue colors. RGB model is called additive model since resulting color is obtained as a sum of three basic colors. Based on the Newton experiment the white color (the highest luminance) is obtained with full luminance in all basic colors. Memory representation of the RGB model data requires MxN (pixels) x 3 (colors) x k (bits for representing each color). Commonly k=8 but there are alternative models.
Color table
The RGB model is memory demanding. There are several techniques for reducing memory requirements. The simplest is to reduce number of bits used for representing blue color since humans have the smallest sensitivity to this color. Other, more sophisticated, methods are based on the colormap or CLUT color look-up table. Any pixel is coded with r bits. Numbers from 0 to 2r-1 represent one of colors from the RGB model.
Color table
Information about color coding are given in the special table (CLUT) that can be recorded together with image. The most common type of the CLUT is:
color code amounts of red, green and blue colors for a coded color
0 r0 g0 b0
1 r1 g1 b1
2 r2 g2 b2
3 r3 g3 b3
Color table
What to do with colors from the RGB model that exist in image but are not given within the CLUT? The most commonly these colors are replaced with some color from the CLUT that is based on some criterion the closest to desired color. Memory requirements of the CLUT-based representation are: MxNxr bits for image representation + (r+3x8)2r bits for the CLUT. There are alternatives. For example, instead of the color table we can record and submit information which table is selected since decoder have the same set of the CLUTs. There are several color tables commonly used in practice.
CMY model
Relationship between colors from the CMY and RGB models: Problem in the CMY model is inability to reproduce large number of colors. It is difficult to reproduce fluorescent and similar colors and the most importantly the black!!! Approximately just about 1 million of colors is possible to reproduce with good quality. The main problem is black since it is very important for human (important for image recognition, edges of shapes, etc). Reason: mix cyan, magenta and yellow and you will get dark but not black color (or you will use huge amount of color what is not economical).
C=1-R M=1-G Y=1-B
CMYK model
Printing machines usually print images in the CMYK model (4-channel model). K means blacK since B is reserved for Blue. Relationship between CMY and CMYK models: K=min(C,M,Y) C=C-K M=M-K Y=Y-K
CMY image
CMYK image
RGBGRAY
Sometimes it is required to transform RGB color space to grayscale. Since grayscale are on main diagonal of the color cube we can simply use corresponding projection on this diagonal consider as a grayscale variant of used color: GRAY=(R2+G2+B2)1/2
Square root is numerically demanding operation. Result should be quantized to proper number of color levels. Sometimes it is assumed that obtained results are not realistic with respect to image colors.
RGBGRAY
The most common RGBGRAY converter is mean value of channels: GRAY=(R+G+B)/3 Even simpler techniques such as considering some of channels (commonly red or green) as a grayscale are sometimes used: GRAY=R GRAY=G. Blue channel is rarely used since it is assumed to be non-realistic. There are also some alternative schemes some of them presented latter within this lecture. GRAYRGB will be taught at the end of course when we present story about pseudo coloring.
Binary image
Binary image has only two colors. Usually black and white are used. White can be represented as 1 while black is 0. Usually this image is obtained from the grayscale by comparing the grayscale with threshold:
threshold
Binary image is used in industry application, edge detection etc. Threshold selection will be discussed lately.
Ci , (i = 1, 2,3) hi = C1 + C2 + C3
An alternative for memorizing of colors is color space (h1,h2,Y) where Y=C1+C2+C3 is total luminance. This procedure is used in development of numerous color models.
X, Y, Z should be represent wavelegth on which the cone and rode cells are the most sensitive. Y corresponds to the most sensitive wavelength for rode cells.
RCIE 1.167 0.146 0.151 R G = 0.114 0.753 0.159 G CIE BCIE 0.001 0.059 1.128 B
V=Y
W=(-X+3Y+Z)/2
U * = 13W * (u u0 )
V * = 13W * (v v0 )
Colorimetry
Colorimetry is scientific area specialized for color comparison. For example in industry we can have some process that is done when color of some object is the same or close to some color known in advance. Assume that current color in the RGB model is (R1,G1,B1) while the target color is (R2,G2,B2). Distance between these two colors can be described as:
( R1 R2 ) 2 + (G1 G2 ) 2 + ( B1 B2 ) 2
Euclidian distance but some alternative distances are also used. Unfortunately, distance defined in this manner for RGB model does not produce reliable results since similar colors could produce large distance and quite different color relatively small distance.
Colorimetry
All models with linear dependency to the RGB suffers from the same problem as the RGB. Therefore, for colorimetry applications it is defined the Lab color space. Lab model can be defined in various manners but here we adopted definition based on the XYZ model:
Colorimetry
(X0,Y0,Z0) is referent white (almost always it is (1,1,1)). Euclidian distance in the Lab coordinates is assumed good measure of color difference. However, there are alternative approaches for defining color difference measures.
Numerous color models are developed for these purposes and we will describe only probably the most popular the HSL.
RGBHSL
Step 2. From Cartesian coordinates (xHS,yHS) to polar coordinates (radius is measure of saturation while angle is measure of hue)
2 2 = xHS + yHS
= ( xHS , y HS )
Obtained coordinate system (,,L) corresponds to the HSL but commonly several additional operations are performed. Step 3. Normalized saturation. max 3min( R, G, B ) 3 S= min( R, G, B) = 1 = 1 R+G+ B L
RGBHSL
Step 4. Additional processing of angle (hue).
LM = arccos MN
OP ( R G) + ( R B)(G B) P Q
05[( R G) + ( R B)] .
2
R H=S T2
GB GB
HSL model
Similarly it can be performed HSLRGB transformation. Determine this transform for self-exercise and with usage of the textbook.
White
Black
Exercise No.1
Realize relationship between RGB CIE and standard RGB model.
Here, we will realize several aspects of the problem:
We will create matrix that produces inverse transform from the RGB CIE to the RGB model. We will determine limits in which we should perform discretization of the RGB CIE model. Visualize channels of image for RGB and RGB CIE models.
A=
[1.167 -0.146 -0.151; 0.114 0.753 0.159; -0.001 0.059 1.128]; B=inv(A) B = 0.8417 0.1561 0.0907 -0.1290 1.3189 -0.2032 0.0075 -0.0688 0.8972
Exercise No.1
Limits of the RGB model are usually 0 and 1 along all coordinates but they are different for the RGB CIE model.
Minimum of the R component in the CIE model is obtained for R=0, G=1, B=1 and it is equal to 0.297 while the maximum is produced with R=1, G=0, B=0 and it exhibits 1.167. Minimum of G in the CIE model follows for R=G=B=0 and it exhibits 0, while maximum follows for R=G=B=1 and it exhibits 1.026. B component achieves maximum for R=0, G=B=1 and it exhibits 1.186 while the minimum is produced for R=1, G=B=0 and it exhibits -0.001.
For visualization of color channels we can use the following commands: a=double(imread('spep.jpg')); b(:,:,1)=1.167*a(:,:,1)-0.146*a(:,:,2)-0.151*a(:,:,3); b(:,:,2)=0.114*a(:,:,1)+0.753*a(:,:,2)+0.159*a(:,:,3); b(:,:,3)=-0.001*a(:,:,1)+0.059*a(:,:,2)+1.128*a(:,:,3); Channels can be represented with commands as the follows: pcolor(flipud(b(:,:,1)),shading interp
For self-exercise
1. 2.
3.
List of mini-projects and tasks for self-exercise: Solve problems from the textbook. Realize all color models given on these slides and in textbook and create transformations between them. Visualize channels for considered models. Consider the following experiment. Assume that colors that can not be printed in the CMYK model have any of channels except black channel when it is represented with more than 90% of maximal value. Assume that in addition we have color space with three alternative colors (for example rose, green and orange). Colors can be printed in CMYK or in alternative model with appropriate amount of black. Rules for printing in the corresponding model are the same as for the CMY (it is possible to print up to 90% of any color). How many colors from the RGB model is possible to print the CMYK color and how many in model with three additional alternative colors.
For self-exercised
4.
List of miniprojects and tasks for self-exercise: Create introduced color models and transformations between these color models and RGB. Make images of cakes in process of baking or some other similar kitchen experiment. The main results of the first set of experiments should be: average color of cake several minutes before we assume that it is done and average color when cake is done. The second set of experiments is performed after that. Try to determine algorithm for automatic turning off the baking appliance based on the first set of experiments and check if it is performing well for the second set of experiments. Determine number of correct and wrong results.
Histogram
Histogram is simple (but very useful) image statistics. H(X)=number of pixels with luminance X
255
H(X ) = M N
X =0
Sum histogram values of image for all luminances (here grayscale image with bi bits/pixel is considered) is equal to number of pixels in image.
Example of histogram
Histogram has numerous applications. It is very useful in techniques that use probabilistic model of image with probability density function of image luminance (or at least estimation of the pdf). How to connect histogram and probability density function?
unipolar histograms for dark and bright images bipolar histogram can be used for threshold determination and obtaining of binary images (how?)
Histogram extension
Optical sensors very often concentrate image in very narrow region of luminance. Then software systems are usually employed to solve this problem. Problems are solved by using information obtained using histogram. Let image be contained into luminance domain of [A,B] (estimation of A and B can be performed using histogram). Then assume that we want to extend the histogram over entire 8-bit grayscale domain [0,255]:
luminance of image with extended histogram
f (X ) =
255 255 A X B A B A
original histogram
Histogram equalization
Histogram equalization is also one of the most common histogram operations. In equalized image histogram is approximately flat, i.e., the goal is to have image with approximately the same number of pixels for all luminances. Then, we want to transform histogram to be approximately uniform. Images with equalized histogram have good contrast and it is the main reason for performing this operation.
Histogram Equalization
H ( P)
H(P)
original histogram
This can be considered as the following problem: There is a random variable with probability density function fx(x) (it can be estimated using the histogram of original image). We are looking for transform y=g(x) producing probability density function fy(y). In this case it is proportional to the equalized histogram.
Histogram equalization
From probability theory follows:
f x ( x1 ) f y ( y) = | g '( x) ||x = x1
Histogram Equalization
Since fy(y) is constant it means that |g(x1)| is proportional to fx(x1). Assume that g(x) is monotone increasing function it means g(x)=c fx(x).
constant value
Select c=1 (it means that output image would have the same luminance domain as input one) :
g ( x) =
f x ( x1 )dx1
Histogram Equalization
Since image is not continual but discrete function here we have no continual probability density function but its discrete version (histogram). MATLAB realization is quite simple: I=imread('pout.tif'); a=imhist(I); g=cumsum(a)/sum(a); J=uint8(255*g(I));
output image Reading original image
function g
corresponding histograms Obtained density is not uniform due to discrete nature of images
0 100 200
Histogram matching
Histogram equalization is operation that produces uniform probability density function. Similarly histogram can be matched to any desired probability density function. The procedure is the same for any monotone function g(x) (increasing and decreasing) that is satisfied in the equalization case. Otherwise we need more complicated operation involving segmentation of g(x) in monotone regions.
Applications of Histogram
All methods that uses probability density function estimate are histogram based. Improvement of image contrast (equalization). Histogram matching. Modifications of colors. Histogram can be applied locally to image parts. For example we have very bright object on dark background. We can perform histogram-based operations on selected region: object of background depending on our task. Also, histogram can be calculated for parts of image or for channels in color images.
Image Negative
There numerous operations that can be applied to the image luminance. Some of them are applied to each pixel in independent manner of other pixels. These operations are called point operations. One of the simplest of them is determination of the image negative (or positive if we have image negative). This operation can be performed in different manner depending on the image format.
Image negative
Grayscale
Negativ(n,m)=2k-1-Original(n,m)
RGB
Color Clipping
Color clipping is operation performed on colors but it is not related to geometrical clipping. We are keeping some image colors as in original image but other colors are limited to some selected limits:
a (i, j ) > cmax cmax b(i, j ) = a (i, j ) cmax a (i, j ) cmin c a (i, j ) < cmin min
Brightening (Darkening)
There are several methods to perform these operations. For example, f(n,m)=g(n,m)+r would increase luminance for r>0 while for r<0 image will darker. The second technique: f(n,m)=g(n,m)xr brightening for r>1 darkening for 0<r<1. These techniques are not of high quality since they have several drawbacks. The most common is the following procedure: Number of bits per pixel f(n,m)=[2k-1] {g(n,m)/ [2k-1]}
>1 darkening <1 brightening
Luminance correction
Histogram modification function:
1
brightening darkening
0.8
0.6 0.4
0.2
This way of representing point operation is accepted by almost all image processing software (for example photoshop).
0 0.2 0.4 0.6 0.8 1
Geometrical transforms
In the case of geometrical transform we have that pixel from position (x,y) moves to the position in (x1,y1) target image. Here we are in fact considering transformation of coordinates in digital image that can be written as:
x1 x X1 = = g ( X ) = y y1
The simplest transform is translation where entire image is moved with a given vector (x0,y0).
Translation
x0 x x0 X1 = X = y0 y y0
In this case we keep dimension of the target image to the same dimension as in the original image. In region appearing by translation we put white or some other default color. This strategy is called cropping. An alternative strategy (when we want to change image dimension) is enlarging image in order to entire original image be kept in target image.
g ( x, y ) = f ( x x0 , y y0 )
y0
f(x,y)
x0
g(x,y)
Also, it is possible to have cyclical translation with part that are removed from the image cyclically shifted at the beginning.
Cropping
Cropping is operation were part of original image is used as a new image. Of course this image has smaller dimensions then the original one. For example let the original image f(x,y) has dimensions (M,N) and let we want to crop region between (M1,N1) and (M2,N2) where 0<M1<M2<M and 0<N1<N2<N.
g ( x M 1 + 1, y N1 + 1) = f ( x, y )
Rotation
Obtained image is given as: g(x,y)=f(xcos+ysin,-xsin+ycos). We assumed that coordinate transform is performed around origin. This is rare situation in digital images. Develop the coordinate transform for rotation around pixel (x0,y0). Positive direction for rotation is counter clockwise.
positive direction counter clockwise negative direction clockwise
Distortion
y (x,y) (x',y')
Scaling
Coordinate scaling can be described as:
a 0 X' = X 0 b
Determine function of the output image based on input image. Determine dimensions of output image in function a and b. For which parameter values image is enlarged? This is scaling along x and y axes. Is it possible to define scaling along alternative directions? Could reflection with respect to coordinate axes or origin be described using scaling?
Nonlinear transforms
There are numerous non-linear transform used in digital image processing. Number of non-linear transforms is significantly greater than of linear ones. Here, we give a simple example: g(x,y)=f(x+Asinby,y+Asinbx). Important example is the fish-eye nonlinearity.
Fish-eye transform
The fish-eye effect is cause by shape and limited (relatively small) dimensions of camera lens. It causes that objects in the middle of the scene are larger than objects on borders of the scene. Sometimes this effect is desired in photography and photographers simulate it or they produce it using special form of lenses. Try to simulate fish-eye transform and to propose method for removing this effect.
Nearest neighbor
Nearest neighbor technique is the simplest interpolation strategy. For pixel in the grid we are taking value of the nearest pixel of interpolated image. This technique suffers from low quality problem.
original rectangle after rotation for 5 degrees and this interpolation technique Human eye is very sensitive to broken edges and disturbed small details that are caused by this interpolation form.
Bilinear Interpolation
Strategy of bilinear interpolation is slightly better with respect to image quality than the nearest neighbor but little bit slower. However, calculation burden in this strategy is still reasonable. Let pixel of original image be surrounded with four pixels of transformed image.
pixel that we want to determined luminance g(x,y) transformed pixels (we assume that dimensions of square in which we perform interpolation are not changed)
1-y 1-x y x
Bilinear interpolation
f(m,n+1) f(m+1,n+1) f(m+x,n+y) For simpler determination we will rotate the coordinate system.
f(m,n)
f(m+1,n)
Bilinear interpolation determines luminance in point (m+x,n+y) as: f(m+x,n+y)=axy+bx+cy+d constants a, b, c and d should be determined
Bilinear interpolation
Constants can be determined from the following condition: f(m,n)=ax0x0+bx0+cx0+d d=f(m,n) f(m+1,n)=ax1x0+bx1+cx0+d b=f(m+1,n)-f(m,n) f(m,n+1)=ax0x1+bx0+cx1+d c=f(m,n+1)-f(m,n) f(m+1,n+1)=a+b+c+d a=f(m+1,n+1)+f(m,n)-f(m+1,n)-f(m,n+1) Consider the following case. We are not performing geometrical transform but we want to change number of pixels in image (for example instead of NxM we want to get kNxlN where k and l are integers and k>1 and l>1. Determine relationship that connects original and target image with bilinear interpolation? This operation is called image resize.
b = zeros(M, N); %Target image %we assume that the target image has the same dimensions as source image %we will perform cropping of remaining parts for xp = 1 : M for yp = 1 : N
x = (xp - x0) * cos(theta) - (yp - y0) * sin(theta) + x0; y = (xp - x0) * sin(theta) + (yp - y0) * cos(theta) + y0; %Determination of the origin of the pixel mapped to (xp,yp) i.e., %where it is in the original image (inverse transform)
Write program for distortion. Write program for rotation and nearest neighbor. Rotate image for 5 degrees using the nearest neighbor two times and perform rotation for -5 degrees two times. Perform the same operation with bilinear interpolation and compare results.
p1
c p3 p2
f ( p1) / p1 + f ( p2 ) / p2 + f ( p3 ) / p3 + f ( p4 ) / p4 f (c) = 1 / p1 + 1 / p2 + 1 / p3 + 1 / p4
pi are distances between pi and c while f() is luminance in the corresponding pixel
For self-exercise
Write own program for evaluation of the image histogram. Write program for histogram adjusting where upper and lower bounds are adjusted that reject 5% darkest and 5% brightest pixels. Pixels outside of this range should be set to the maximal, i.e., minimal luminance. How to determined negative of image written using colormap? Calculation of image negative for color models different from the RGB? Write programs for calculation of the image negative. Create target image based on original image of the hexagonal shaped range in the size 2-4-2 where pixels of destination image are equal to mean of pixels of original image. Realize own functions for all variants of translation.
For self-exercise
Write coordinate transform where rotation is performed for arbitrary angle. Write coordinate transform that performs distortion parallel to arbitrary line y=ax+b. Determine functional relationship between output and input image for all transforms defined within lectures. Is original image enlarged or shrinked with respect to a and b in the case of scaling? Can scaling be defined for alternative directions than along the x and y axes? Can reflection along coordinate axes and with respect to origin be described using scaling? Realize all introduced linear geometric transforms. Realize coordinate transform: g(x,y)=f(x+Asinby,y+Asinxb). Perform experiments with A and b.
For self-exercise
Create program for image resize based on bilinear transform. This program should be able to handle with non-integer values of scale k and l, as well as with possibility that k and l are smaller than 1. Write program for distortion. Write program for rotation and the nearest neighbor interpolation. Perform rotation for 5 degrees twice using the nearest neighbor and after that for -5 degrees twice. Also, these operations repeat with bilinear interpolation. Compare results. Check if the bilinear interpolation used for transformation of polar to rectangular raster is the same as the standard bilinear interpolation introduced previously.
Project
Write program that allows that users can select colors and adjust colors in interactive manner including usage of curves presented on slide 18. When user define more different points curve should be interpolated using the Lagrange multipliers. Write program that perform the fish-eye transform as well as program able to image distorted with fish-eye return to normal (or close to normal) shape.
Project
Review the Lagrange interpolation formula and use it for polynomial interpolation using the grid. Find Internet resources related to interpolation and write seminar paper related to found facts.
FT of multidimensional signals
Images are 2D signals. The Fourier transform of 2D continuous time signal x(t1,t2) is:
X (1 , 2 ) =
1 x(t1 , t2 ) = (2) 2
X (1 , 2 )e j1t1 + j2t2 d 1d 2
FT of multidimensional signals
Signal and its FT represent the Fourier transform pair. This pair can be written in compact form by introducing vectors (allowing larger number of coordinates than 2):
X () = x(t )e
t
j t
dt
dt = dt1dt2 ...dtQ
t1 = t2 =
...
tQ =
t = 1t1 + ... + Q tQ
FT of multidimensional signals
Inverse multidimensional Fourier transform:
1 x (t ) = X ()e jt d (2)Q
We will consider the 2D signals only. Since we are considering discretized signals we will not consider in details FT of continuous time signals. Before we proceed with story about discretized signals we will give several general comments about the FT.
Fourier transform
FT establishes the 1 to 1 mapping with signal. Roughly speaking signal in time and spectral domain (its Fourier transform) are different representations of the same signal. In addition the FT and its inverse have quite similar definitions (difference in constant and sign of the complex exponential). Why we in fact use the FT?
2D FT of discrete signals
Here we will consider discrete signals. Then we will skip properties of the 2D FT of continuous time signals but I propose you to check that in textbook. Discrete-time signals are obtained from the continuous time counterpart by using sampling procedure. The sampling in the case of 1D signals is simple and we can take equidistantly separated samples:
x(n)=c xa(n T)
discrete-time signal
2D FT of discrete signals
In order that we can reconstruct continuous-time signal based on discrete-time counterpart the sampling theorem should be satisfied. This theorem is satisfied if sampling rate satisfies: T 1/2fm maximal signal frequency If the sampling theorem is not satisfied we are making smaller or bigger mistake. How to perform sampling in the case of digital images?
Sampling of 2D signals
The simplest sampling in digital images is: x(n,m)=c xa(n T1,m T2) Constant c is commonly selected as c=T1 T2. Sampling rate is usually equal for both coordinates T1=T2. Sampling theorem is satisfied when T11/2fm1 and T21/2fm2. Here, fm1 and fm2 are maximal frequencies along corresponding coordinates (note that 2D FT of continuous time signals has two coordinates 1 and 2. Then fm1 and fm2 are maximal frequencies along these coordinates fmi=mi/2).
Sampling of 2D signals
Previously described rectangular sampling is not unique sampling scheme. In the case of rectangular sampling we are replacing the rectangular of image with single sample:
Entire rectangular can be replaced with single sample This is practical sampling scheme but we can apply some alternative sampling patterns.
2D signal samplings
Some of alternative sampling schemes are given below:
It can also be rhomb but the hexagon is best pattern with respect to some well-established criterion.
However, we will continue with usage of hexagonal sampling due to simplicity and practical reasons!!!
Quantization
Discretized signal is not used directly but it is quantized. Instead of exact values we are taking values rounded (or truncated) to the closest value from the set of possible values (quant). Errors caused to rounding is smaller than error caused by truncation but truncation is used more often in practice.
Quantization
Quantization can be performed with possible values equidistantly separated but some systems and sensors are using non-uniform quantization. Number of quantization levels is commonly 2k and these quant levels are commonly represented as integers in domain [0,2k-1]. We will almost always assume that we have discretized and quantized (these signals are called digitalized).
2D FT of discrete signals
Fourier transform pair between discrete-time signal and corresponding FT can be represented using the following relationships:
X (1 , 2 ) =
1 x ( n, m ) = (2) 2
n = m =
1 = 2 =
X (1 , 2 )e j1n + j2 m d 1d 2
2D FT of discrete-time signal is continuous variable and it is not suitable in this form for operation on the computer machines. 2D FT is periodic with period along both coordinates of 2.
2D DFT
We will not explain in details properties of the 2D FT of discrete-time signals since we will not use it in process. Our goal is to have discretized transform that is suitable for processing using computer machines. In order to achieve this we use periodicity extension property. Namely, assume that signal x(n,m) is defined within limited domain (it is always the case for digital images). Let size of signal (image) is NxM.
2D DFT
Perform periodical extension of the original signal with period N1xM1 (it should be satisfied N1N and M1M but here from the brevity reasons we assume N1=N and M1=M).
m periodical extension
x p ( n, m ) =
r = p =
x(n + rN , m + pM )
2D DFT
FT of periodically extended signal is:
X p ( 1 , 2 ) = = =
n = m = r = p =
Analog Dirac pulses (generalized functions) produced by sumation over infinity number of terms in sums
r = p = n = m =
r = p =
X ( , )e
2
Here we changed places of sums and we used property of FT of translated (shifted) signal with possible neglecting some multiplicative constants.
2D DFT
Finally we obtain:
2k1 2k2 X p (1 , 2 ) = X , M N
Thus, the FT of periodically extended signal is equal to samples of the 2D FT taken in the discrete grid k1[0,N) and k2[0,M). Periodical extension produce discretized FT (DFT). Periodical extension is commonly not performed in practice due to infinity number of terms in sums and usage of generalized functions. However, we should keep in mind that we assumed and that the smallest period for extension is equal to dimension of image NxM.
2D DFT
2D discrete signal and 2D DFT are transformation pair
X (k1 , k2 ) = x(n, m) e
n =0 m=0
N 1 M 1
2 k1n 2 k2 m j N M
It is the FT of discrete signal calculated in limited interval and for discretized frequencies.
1 x(n, m) = NM
k1 = 0 k2 = 0
X (k , k ) e
1 2
N 1 M 1
2 k1n 2 k2 m +j N M
Important fact: The inverse DFT can be calculated in almost the same way as direct on usign sums. Differences are very small (minus in complex exponential and normalization constant 1/NM).
Domain of the 2D DFT is discrete set of points (k1,k2)[0,N)x[0,M). We have to determine relationship between frequencies in these two domains!!! Relationship 1=2k1/N and 2=2k2/M can be satisfied only for 0k1<N/2 and 0k2<M/2 while for larger k1 and k2 these relationship would produce frequencies outside of 1 and 2 domain.
(0,0)
(N,0)
These systems can be for various purposes but we will assume their application for image filtering and denoising
System is linear when linear combination of inputs produces linear combination of outputs: If x(n,m) is input and T{x(n,m)} is transformation of input produced by the LSIS then it holds: T{ax1(n,m)+bx2(n,m)}=aT{x1(n,m)}+bT{x2(n,m)}
= h(n1 , m1 ) x(n n1 , m m1 )
n1 =0 m1 =0
N1 1 M1 1
n =0 m =0 n =0
2 k1n 2 k2 n j N + N1 M + M1 2 k1n 2 k2 n j N + N1 M + M1
N + N1 1 M + M1 1
m =0
k2 =0
X '(k1 , k2 ) H '(k1 , k2 )e
2 k1n 2 k2 n +j N + N1 M + M1
There are cases when calculation of the convolution is faster in the case of using 3 2D DFTs than by direct computation. This is possible when we use fast algorithms for evaluation of the 2D DFTs.
Since digital images have more samples than 1D signals these algorithms are even more important than in the case of 1D signals. We can freely claim that the modern digital image processing would not be developed without FFT algorithms (FFT is the same transform as DFT but name only indicates fast evaluation algorithm!!!).
Direct evaluation for single frequency requires NxM complex multiplications (complex multiplication is 4 real multiplications and two real additions) and NxM complex additions (2 real additions). For each k1, k2 these operations should be repeated NxM times. Then calculation complexity is of order: N2M2 complex additions (2N2M2 real) N2M2 complex multiplications (4N2M2 real+2N2M2 real additions) For example N=M=1024 on the PC that can perform 1x109 operations in second requires: 8N2M2 > 8x1012 real operations, i.e., 8x103 sec that is more than 2h.
= X ( n, k 2 ) e
n =0
N 1
2 k1n N
The second sum represents the FT of obtained result in the first step. All applied DFTs are 1D and there are in total N+M DFTs. All 1D DFT can be realized using the fast algorithms.
Step-by-Step algorithm
For simplification assume that N=M. For each of the FFT in rows and columns we need Nlog2N complex additions and multiplications and for entire freqeuncy-frequency plane it is required: 2NxNlog2N complex additions and multiplications. This ie equal to 8N2log2N real additions and multiplications, i.e., 16N2log2N operations. For N=M=1024 required number of operations is 160x106. It is equal to 0.16sec on considered machine.
Step-by-step algorithm
For 1D signals 2FFT algorithms are in usage: Complexity of both of these algorithms is similar. The step-by-step algorithm is not optimal for 2D signals but it is quite popular due to its simplicity. PC earlier and today mobile devices have problems with memory demands of the 2D FFT algorithms since today some machines like mobile devices still have moderate memory amounts. Then we have problem since in the step by step algorithm we need 3 matrices for:
original image FT of columns or rows (complex matrix written in memory as two realvalued matrices) 2D DFT (again complex-valued matrix) decimation in time decimation in frequency.
X (k1 , k2 ) = x(n, m) e
*
+j
it holds x*(n,m)=x(n,m)
= x ( n, m ) e
n =0 m=0
N 1 M 1
2 ( N k1 ) n 2 ( M k2 ) m j N M
= X ( N k1 , M k2 )
You can find in the textbook how this relationship can be used to save memory space!!!
N 1 N 1
WN=exp(-j2/N)
We assume N=M
This 2D DFT can be given with 4 subsums for even and odd coefficients
X ( k1 , k2 ) = +
n =0 m =0 n even m even
N 1
N 1
N 1
x ( n, m)W W
nk1 N
mk2 N
+ +
n =0 m =0 n even m odd
N 1
N 1
nk mk x ( n, m)WN 1WN 2 +
n =0 m =0 n odd m even
N 1
x ( n, m )W W
nk1 N
mk2 N
n =0 m =0 n odd m odd
N 1 N 1 nk mk x ( n, m )WN 1WN 2
where
S00 ( k1, k2 ) =
N / 2 1 N / 2 1 m1 = 0 m2 = 0
S01( k1, k2 ) =
S10 ( k1, k2 ) =
N / 2 1 N / 2 1 m1 = 0 m2 = 0
N / 2 1 N / 2 1 m1 = 0 m2 = 0
x(2m + 1,2m )W
1 2
2 m1k1 + 2 m2 k 2 N
S11( k1, k2 ) =
N / 2 1 N / 2 1 m1 = 0 m2 = 0
for 0 k1 N / 2 1
X ( k1 +
N 2
0 k2 N / 2 1
for 0 k1 N / 2 1 for 0 k1 N / 2 1
0 k2 N / 2 1
0 k2 N / 2 1
X ( k1 +
N 2
for 0 k1 N / 2 1
0 k2 N / 2 1
X(k1,k2) X(k1+N/2,k2)
-1 -1 -1 -1
k2 WN
-1
X(k1,k2+N/2) X(k1+N/2,k2+N/2)
WN
k1+k2
x(0,0) x(0,2) x(2,0) x(2,2) x(0,1) x(0,3) x(2,1) x(2,3) x(1,0) x(1,2) x(3,0) x(3,2) x(1,1) x(1,3) x(3,1) x(3,3)
-
1 1 1 1 1 -j
-1
Decomposition can be performed in the next stage on each Sij(k1,k2) block. Full decomposition for image with 4x4 pixels.
1 -j 1 1 -j -j 1
-1
-1
-j -j -1
-1
Features of 2D DFT
2D FFT values around origin correspond to (x,y)=(0,0) are white and they are up to 1010 times larger than dark positions.
Features of 2D DFT
In the considered image Lena of dimension 256x256 pixels, less than 10 samples of the 2D DFT has more than 99% of energy. Can we memorize image with just 10 2D DFT samples (comparing to 256x256 pixels). The answer is NO, NO and NO!!! Namely, the main part of energy is related to image luminance while details of image that corresponds to features very important for human vision are on higher frequencies. This is very important feature of human eye: components of small energy on higher frequencies contain main part of information that humans receive. High-frequency components have small energy and they are subject to noise influence.
For self-exercise
Determine properties of the 2D DFT of real-valued signals. Prove properties of the 2D FT of continuous time signals given in the textbook. Do these properties hold for 2D FT of discrete signals and 2D DFT? Assume that 2D signal is discretized in some arbitrary manner (diamond or hexagon instead of the rectangular sampling). Reconstruct original signal based on these samples. Consider convolution of 2D signals. Can evaluation of the convolution be more efficient using the 2D DFT? We demonstrated within slides one algorithm for the 2D FFT by decimation of signal on 4 subsignals. Is this signal decimation in time or decimation in frequency?.
For self-exercise
Realize the 2D FFT using alternative decimation algorithm. Is it possible to combine decimations? For example decimation in frequency along rows and decimation in time along columns. If it is possible perform this decimation for 4x4 image and present full decomposition. If it is not possible explain reasons. Interpolate image using zero-padding of the 2D DFT of original image!!! Solve problems given in the textbook at the end of corresponding chapter.
Radon transform
Radon transform is developed during XIX century. The aim was to reconstruct interior of objects by using projections made along different angles. The entire scientific area called computer tomography is based on this transform. In practice, for this transform are used signals that can penetrate through the object (X-rays or some other signals) and we are recording the attenuation of these signals on the way through the objects. The first application of this transform was recording the Sun interior based on recordings from the Earth (the Sun was source of light, i.e., projection in this experiment).
Radon transform
Medicine is the main consumer of the Radon transform but it is also used in other fields such as astronomy. Recently it is used intensively in geology. Namely, the earth surface is searched for oil and other mineral goods. It is used sound signals for these recordings.
Earth surface
Radon transform
Assume that we have signal (wave, ray) that has possibility to penetrate through objects. This signal attenuates locally in point (x,y) with some attenuation function f(x,y). This function can tell us some important information about material through which this ray is penetrating through the object (ultrasound is able to propagate through liquid materials, X-rays attenuate significantly on bones etc). Therefore, our goal is visualization of attenuation function f(x,y) for each object point. However, we know only total attenuation of beam that is passing through the object (accumulated through the object on considered path) and based on this information we want to reconstruct f(x,y).
Radon transform
Consider attenuation function along direction s. A and B are points where beam entering and exiting the object.
AB
f ( x, y )ds
Under relative mild assumptions beam has linear propagation in the object and s can be parameterized as:
xcos+ysin=t
is angle with respect to the considered coordinate system and t is parameter determining line from all possible parallel lines with the same . All lines can be parameterized in this manner.
Attenuation function
Now we can write attenuation as function determined by angle and parameter t:
P (t ) =
zz
This relationship holds since (x,y) along s satisfies relationship from the previous slide.
Usage of the 2D FT
beams that are submitted toward object for angle 1 for various t. beams that are submitted toward object for angle 2 for various t.
Our goal is to reconstruct f(x,y) based on known P(t) for various angles and for various t. Consider 2D FT of f(x,y):
F (u, v) =
zz
f ( x, y)e j ( ux + vy )dxdy
S () =
Introduce:
P (t )e jt dt
P (t ) =
zz
Now we get:
S () =
S () =
Our problem can be solved in 4 steps. 1. Calculating projections. 2. Determination of the 2D FT of projections 3. Determination of the 2D FT of f(x,y). 4. Evaluation of f(x,y) using 2D IFT.
Since image is discrete rays are not passing exactly through the pixels.
This procedure should be performed for all angles. Note that this is not the most efficient technique for calculating the Radon transform but it is simple for understanding and it is reason why it is quite common in practice.
P (t ) =
zz
f ( x, y ) = ( y ax b)
Then projection is given as:
P (t ) =
Image is square of dimension 50x50 (check it with imshow(Z)). The second argument of the radon function are angles for which this transform is evaluated. Image shown with pcolor function has 4 peaks corresponding to 4 edges of the square. Inverse transform does not reconstruct image in ideal manner. Why?
DFT - Conclusions
Fourier transform, with its variants, DFT, FT of discretetime signals should be quite clear concept for one engineer. The FT signal decompose signal in series expansion with sinusoidal functions. Coefficients on low frequencies corresponds to the sinusoids with larger periods (slowly varying) while sinusoids on higher frequencies correspond to the highly varying signal components. For flat (slowly varying) images the FT is concentrated around origin while for fast-varying (textured) images the FT has components on high frequencies.
DFT - Conclusion
DFT in digital image processing has two serious problems:
DFT for real image gives complex signal. It means that memory requirements are double of memory requirements for real-valued signals. However, there are techniques to reduce complexity by using some properties of the DFT of real signals. How? This technique then does not solve problem of handling with data. The second problem is more important. Namely, when image is corrupted by noise we want to remove those samples of the DFT that are significantly corrupted by noise. Removing these components in the DFT domain produces undesired oscillatory effects in filtered image (caused by the so-called Gibbs phenomena). Sometimes it is better to keep noise image than filtered image since these artifacts can be very annoying. This effect can be reduced by smoothing (not truncating) DFT coefficients but it introduces some other drawbacks. This the reason why some alternative transforms are developed for image filtering and image compression.
x ( n)
C( k ) =
2 N
for k=1,...,N-1
Inverse DCT
Inverse DCT is defined as:
1 x ( n) = C(0) + N 2 N
C(k ) cos
k =1
N 1
(2n + 1) k 2N
For self exercise try to prove that the DCT and inverse DCT are mutually inverse.
Fast DCT
Since the DCT is very useful in the digital image processing it is important to have developed fast algorithms for its evaluation. There are several approaches for solving this problem:
One is to use the property: (2n + 1)k (2n + 1)k exp j + exp j (2n + 1)k 2N 2N cos = 2N 2 and using several simple relationship to reduce the 1D DCT evaluation to fast evaluation of the 1D DFT. Try this for homework!!!
Fast DCT
The second technique for fast DCT evaluation is based on specific methodology for signal extension. Check this methodology described in the textbook. Again using this methodology the fast DCT can be reduced to fast DFT evaluation. Finally, it is possible to decompose DCT to two DCTs with N/2 samples. Do it for homework.
dct function is used in MATLAB for the 1D DCT evaluation while the idct is used for its inverse.
2D DCT
There is not unique form of the 2D DCT. The simplest realization technique is calculation of the 1D DCT along columns and after that along rows of newly obtained matrix. However, there are alternative techniques for direct evaluation of the 2D DCT. Again there are several definitions of the 2D DCT that can be used in practice but here we adopted:
N 1 1 N 2 1
Inverse 2D DCT
Inverse 2D DCT (for our the 2D DCT form) is:
1 x(n1, n2 ) = N1 N 2
N 1 1 N 2 1
R1 / 2 w (k ) = S T1
i i
ki = 0 1 ki N 1
For homework prove that the 2D DCT and its inverse defined on this slide form transformation pair, i.e., they are mutually inverse. If this is not satisfied propose modification of one of them or both!!!
Fast 2D DCT
The same three methodologies used for the 1D DCT realization can be applied here for the 2D DCT fast realization. However, additional problem is related to problem dimensions since we should decide between stepby-step realization and direct 2D evaluation. Step-by-step reduces problem to the 1D DCT and (with help of the textbook) try to apply these variants to the realization of the 2D DCT. I am sharing your excitement with this task!
low-frequency coefficients with very high values, this part corresponds to the image luminance
X (k ) = x(n) w(n, k )
n =0
N 1
Obviously, all orthogonal transforms are in the same time unitary for real valued W. At the first glance the DFT doesnt belong to any of these two important classes. Then the DFT is sometimes defined as:
1 X (k ) = N
x(n)W
n =0
N 1
nk N
1 x ( n) = N
X (k )WN nk k =0
N 1
It is easy to prove that this form of the DFT is unitary transform. In order to avoid complications, we assume under orthogonal and unitary transforms all transforms that can be reduced to these transforms by simply introducing multiplicative constants.
Basic reason for using these two group of transforms is the fact that inverse matrix calculation is very demanding operation and that it is avoided in these transforms.
Basis signals
Now, we want to introduce a concept of basis signals. Consider the inverse transform:
x(n) = X (k ) g (n, k )
k =0
N 1
Obviously that this can be written in the matrix form and that between matrix G (rows of this matrix are g(n,k)) and matrix W exists simple relationship.
Now we can note that by using formalism of the transform detection we forget very often reasons for its introduction. Signal (under some conditions, for example signals of finite energy) can be represented with expansion of some elementary functions g(n,k) with weighted coefficients that are equal to the transformation coefficients.
Basis signals
Since considered transforms are linear, sum of transforms is equal to sum of signals. Write transform as:
X (k ) = X k1 (k k1 )
k1 = 0 N 1
This practically means that transform can be considered as a sum of N transforms that are equal to X(k1) =Xk1 for k=k1 and 0 elsewhere. Analyze combination of the last two relationships.
Basis signals
x(n) = X k1 (k k1 )g (n, k ) =
k = 0 k1 = 0 N 1 N 1
= X k1 (k k1 ) g (n, k ) = X k1 g (n, k1 )
k1 = 0 k =0 k1 = 0
N 1
N 1
N 1
Thus, any signal can be represented as a weighted sum of matrix of inverse transforms (that is commonly in simple relation with transform matrix rows or columns). Signals g(n,k1) are called basis and their analysis can give us a lot of information about nature of some transform.
Basis signals describe nature of transforms. Namely, weighted sum of basis functions produce transformed signal. When weight of low-frequency components is larger (for small k) signal is more low-frequency, while in opposite case it is more on high frequencies.
N 1 M 1
be written as:
can
X=Hc x Hr Inverse transform can be written as (recall basic matrix algebra): x=Hc-1 X Hr-1 The commonly transforms applied on rows and columns are the same Hc=Hr=T and it follows: X=T x T x=T-1 X T-1 For T unitary or orthogonal matrix the 2D transform can be performed as: x=TH X TH or x=TT X TT
Basis images
In analogy to the basis signals we can introduce the basic images. For image of dimensions NxM it can be defined NxM basic images that are equal to inverse transform of signal (i-p,j-q). These NxM basic images can be obtained when p i q are changed within the range (p,q)[0,N)X[0,M). If P=T-1 we have separable transform where basic image (p,q) is equal to: f(p,q)(n,m)=P(n,p)P(m,q) Then any image can be represented as a sum of the basic N 1 M 1 images:
x ( n, m ) = X p , q f ( p , q ) ( n, m )
p =0 q =0
Sinusoidal transforms
Signal can be given in the form of the expansion over function of different type but sinusoidal (or cosinusoidal functions) are the most common. They are quite natural concept. In electrical engineering they corresponds to electrical current produced in generators causing such type of current shape in our power lines. Numerous phenomena in communication systems are also associated with sinusoidal functions. In mechanic this function type can appear in the case of oscillations. In addition, sinusoidal functions offer elegant mathematical apparatus useful in analysis of numerous practical phenomena.
Sinusoidal transforms
The DFT has coefficients of the transformation matrix: w(n,k)=exp(-j2nk/N)
It can be multiplied with
(2n + 1)k c(n, k ) = (k ) cos 2N There are other sinusoidal transforms as the DST (discrete sinusoidal transform) with coefficients of the s ( n, k ) = transformation matrix:
1/ N (k ) = 2/ N
k =0 k0
2 (n + 1)(k + 1) sin N +1 N +1
Hartley transform
The Hartley transform (DHT) is relatively common in practice. Its coefficients are defined as:
Sinusoidal transforms
We concluded with the most common sinusoidal transforms used in practice. Within the next lecture we will learn that there are alternatives to the sinusoidal transforms in the case of digital images. For exercise determine relationships between these transform in 1D and 2D cases. In addition try to calculate basis images for these transforms.
For self-exercise
Here, we will just repeat tasks for self-exercise mentioned within lecture. Apply the Radon transform in the MATALB on several simple images and try to perform reconstruction using the iradon command. What are your conclusions? Apparatus for recording and creating Radon transform moves with uniform angular velocity around an object that is subject to recording. What is image with Radon image concentrated in single pixel? Is there more complicated shape than the straight line that can achieve this kind of functional relationship.
For self-exercise
Prove that the DCT and the inverse DCT are mutually inverse. Determine inverse transform for all introduced sinusoidal transforms. Realize fast DCT using specific periodical extension described in the textbook; by using the cosine written as two exponential functions; and by using direct procedure. Compare these solutions.
Realize fast 2D DCT using these three algorithms. Consider 2D DFT and 2D DCT for some real-valued image and mask (filter) part of coefficients and calculate inverse transform and compare it with original image. Present your conclusions.
For self-exercise
Determine relationships between 4 introduced sinusoidal transform and realize fast algorithms for their evaluations. It is especially important (and hard) to realize for example DHT with N samples by decimation to two DHTs with N/2 samples. For introduced sinusoidal transforms determine inverse transforms. For a given N determine basis signals and basis images for some introduced sinusoidal transforms by using MATLAB. Also, you can check next slides.