Proceedings of Vccc'08
Proceedings of Vccc'08
Proceedings of Vccc'08
of National Conference on
VLSI for
Communication, Computation and
Control
VCCC’ 08
15th March
Editors
Mrs. C. Kezi Selva Vijila
(Head, Department of ECE)
Mrs. G. Josemin Bala
(Assistant Professor, Department of ECE )
Organized by
Department of Electronics and Communication Engineering
KARUNYA UNIVERSITY
(Declared as Deemed to be University under Sec. 3 of the UGC Act,1956)
Coimbatore, Tamilnadu.
NATIONAL CONFERENCE ON VLSI FOR
COMMUNICATION, COMPUTATION AND CONTROL (VCCC’ 08)
PATRON
Dr. Paul Dhinakaran
Chancellor,
Karunya University, Coimbatore
ADVISORY COMMITTEE
Dr. S. Arumugam
Additional Director, DOTE, Chennai.
Dr. V. Palaniswami
Principal, GCT, Coimbatore.
Dr.E. Kirubakaran
BHEL, Tiruchirappalli.
ORGANISING COMMITTEE
Mrs.D.Jackuline Moni
Mrs.D.Synthia
Mrs.T.Anita JonesMary
Mr.D.Sugumar
Mrs.S.Sridevi Sathyapriya
Mrs.J.Anitha
Mr.N.Satheesh Kumar
Mrs.D.S.Shylu
Mrs.Jennifer S. Raj
Mr.S.Immanuel Alex
Ms.S.Sherine
Ms.K.Prescilla
Ms.J.Grace Jency Gananammal
Ms.G.Christina
Ms.F.Agi Lydia Prizzi
Mr.J.Samuel Manoharan
Mr.S.Smys
Mr.D.Narain Ponraj
Mr.D.Nirmal
Ms.B.Manjurathi
Mrs.G.Shine Let
Ms.Linda paul
Ms.Cynthia Hubert
Mr.A.Satheesh
Mr.P.Muthukrishnan
Ms.Anu Merya Philip
Mr.Jaganath
M.ReebaRex
Mr.T.Retnam
Mr.B.Jeyachandran
Mr.Arul Rajkumar
Mr.C.R.Jeyaseelan
Mr.Wilson Christopher
Mr.Manohar Livingston
MrJ.Jebavaram
EDITORIAL TEAM:
IV Yr - Lisbin
II Yr - Nixon
S.Arock Roy
I.Kingsly Jeba
J.John Christo
M.Muthu Kannan
D.Arun Premkumar
R.PrawynJebakumar
PROFILE OF KARUNYA UNIVERSITY
Sec. 3 of the UGC Act, 1956 ) is located 25 kms away from Coimbatore
The Siruvani river with its crystal clear water has its origin here and it
institution. During leisure time, one can feast on the mountains, the
skies with its rainbow colours and the horizon. One with an aesthetic
sense will not miss the waters trickling down the hills, the birds that
sing sweetly on the trees, the cool breeze and the drizzles. One cannot
HISTORY
including the tragic death of their dear daughter during the course of
this great endeavor. But nothing could stop them from reaching the
goal.
THE VISION
received from the Lord Almighty, the Institute was established with
academics and values. They will be total persons with the right
spiritual values.
THE MISSION
competency.
personal, social and public life and reach the highest level of
humanism, such that they shall always uphold and promote a high
social order and be ready and willing to work for the emancipation of
THE MISSION
activities.
&outreach programs.
SPECIAL FEATURES
software like Mentor Graphics, MATLAB Lab view, Tanner tools, FGPA
MANIFESTO OF FUTURE
CONTENTS
Messages
Organizing Committee
Advisory Committee
SUBSESSION A.1
VL 23. 2-D Fractal Array Design for 4-D Ultrasound Imaging 113
Mrs.C Kezi Selva Vijila,Ms Alice John
Karunya University ,Coimbatore
SPC 01. Secured Digital Image Transmission Over Network Using 118
Efficient Watermarking Techniques On Proxy Server
Jose Anand, M. Biju, U. Arun Kumar
JAYA Engineering College, Thiruninravur, Chennai 602024
SPC 07. VLSI Design Of Impulse Based Ultra Wideband Receiver For 142
Commercial Applications
G.Srinivasa Raja, V.Vaithianathan
SSN College of Engineering, Chennai
SPC 08. Distributed Algorithms for Energy Efficient Routing in Wireless 147
Sensor Networks
T.Jingo M.S.Godwin Premi, K.S.Shaji
Sathyabama university, Chennai
SUBSESSION B.2:
SPC 12. MRI Image Classification Using Orientation Pyramid and 166
Multiresolution Method
R.Catharine Joy, Anita Jones Mary
Karunya University, Coimbatore
SPC 13. Dimensionality reduction for Retrieving Medical Images Using 170
PCA and GPCA
J.W Soumya
Karunya University, Coimbatore
SPC 14. Efficient Whirlpool Hash Function 175
J.Piriyadharshini , D.S.Shylu
Karunya University, Coimbatore
SPC 15. 2-D Fractal Array Design For 4-D Ultrasound Imaging 181
Alice John, Mrs.C Kezi Selva Vijila
Karunya University, Coimbatore
SPC 16. PC Screen Compression for Real Time Remote Desktop Access 186
Jagannath.D.J,Shanthini Pandiaraj
Karunya University, Coimbatore
SPC 17. Medical Image Classification using Hopfield Network and 191
Principal Components
G.L Priya
Karunya University, Coimbatore
SPC 19. Analysis of MAC Protocol for Wireless Sensor Network 200
Jeeba P.Thomas,Mrs.M.Nesasudha, ,
Karunya University, Coimbatore
SPC 20. Improving Security and Efficiency in WSN Using Pattern Codes 204
Anu jyothy,Mrs.M.Nesasudha
Karunya University, Coimbatore
CC 05. Design and Simulation of Microstrip Patch Antenna for Various 224
Substrate
T.Jayanthy, A.S.A.Nisha, Mohemed Ismail, Beulah Jackson
Sathyabama university, Chennai.
SUBSESSION C.2:
CC10. Reed Solomon Encoders and Decoders using Concurrent Error 252
Detection Schemes
Rani Deepika.B.J, K.Rahimunnisa
Karunya University, Coimbatore
CC11. Design of High Speed Architectures for MAP Turbo Decoders 258
Lakshmi .S.Kumar ,Mrs.D.Jackuline Moni
Karunya University, Coimbatore
1
VCCC‘08
Therefore the constant value to be stored for the signal changes during the A/D conversion, the output
comparison should be equal to: digital value will be unpredictable. To overcome this,
the input voltage is sampled and held constant for the
124137931 * 51 * 51 ADC during the conversion. Two LF 398 ICs from
= 85905087 National Semiconductor have been used to sample
382.36 * 9.83
and hold the sampled values of voltage and current
during the A/D conversion. The working of a Sample
Thus, the constant 85905087 has been stored as meter
and Hold (SAH) circuit is illustrated in Fig.2.
constant. After reaching this value, the energy reading
displayed is incremented by 0.01 KWh.
III. IMPLEMENTATION
The FPGA forms the core part of FPGA based energy
meter. But in addition to the FPGA, various other
hardware components were used to convert the
voltage and current inputs to digital form for
processing by the FPGA. The energy consumed must Fig.2 Working of SAH
be displayed and seven-segment displays were used
for the purpose. The consumed energy was The sampling frequency used was 3.45kHz which
transmitted to PC using RS-232 interface, which helps the user to accurately measure the power
required additional external circuitry. The hardware contained in the harmonics up to the 33rd harmonic.
details of the FPGA based single phase energy meter This significantly increases the accuracy of the energy
is provided in this section. The working of each meter and the meter can be used in environments
component too is presented in brief. where the presence of harmonics in the supply is
A. Sensing unit significant.
The function of the sensing unit is to sense the C. Field Programmable Gate Array (FPGA)
voltage and current through the mains and to convert The FPGA is the key unit of the energy meter
them into a 0-5 V signal which is then fed into the presented in this paper. It is programmed to perform
ADC. The sensing unit is composed of the current the following functions:
transformer, the potential transformer and the adder
circuit. 1) Find the product of the instantaneous values of
The potential transformer is used to step down the voltage and current to get the instantaneous
mains voltage to a fraction of its actual value, so that power.
it can be safely fed into the adder circuit. The current 2) Accumulate the power and compare the
transformer is used to detect the current flowing accumulated value to the meter constant.
through the mains. A burden resistance is used at the 3) When meter constant is exceeded, the energy
secondary side to convert the current into an consumed is incremented by 00.01 and
equivalent voltage signal as current cannot be directly displayed.
fed to the ADC. 4) Drive the seven-segment displays.
Two op-amps in the IC are used as voltage followers 5) Send the energy reading to PC via RS-232.
and the remaining two are configured as non-
inverting amplifiers with a gain of 2 and also act as The instantaneous power consumption is calculated
level shifters, adding a d.c voltage of 2.5 V to the using an implementation of the booth multiplier
input a.c signal, thus changing the a.c signal range algorithm. Booth multiplier algorithm provides a fast
from +2.5 V to -2.5 V to 0V to +5 V, as A/D means of multiplying the 8-bit values of voltage and
converter used can only operate from 0V to +5V current obtained from the ADC. The resultant value is
range [3]. the instantaneous power which can be of a maximum
B. Analog to Digital Conversion 17-bit length. These instantaneous power values are
accumulated and the accumulated value is compared
The basic function of an analog to digital (A/D) to the meter constant already stored in the FPGA.
converter is to convert an analog input to its binary Once that meter constant is exceeded, the display is
equivalent. ADC 0808, an 8-bit Successive incremented by 00.01 kWh, the accumulator gets
approximation A/D converter from National reset and the amount by which the accumulator
Semiconductor is employed for converting the reading exceeded the meter constant is loaded into the
sampled voltage and current signals into equivalent 8- accumulator. The meter constant is chosen to
bit binary values [4]. correspond to 00.01 kWh primarily due to limitations
A Sample And Hold (SAH) circuit is needed as the of the FPGA kit which is used for implementing the
input voltage keeps varying during A/D conversion. If energy meter. Now the next set of digital values for
a Sample and Hold circuit is not used, and the input
2
VCCC‘08
voltage and current are available at the input and the side. The RS-232 Level Converter used is MAX-232
process of power calculation and accumulation which generates +10V and -10V from a single 5v
repeats. supply. On the PC side, Microsoft Comm control in
The FPGA used for implementing the energy meter is Visual Basic is used to read and display the incoming
Spartan2 from Xilinx. The Hardware Description data from the FPGA.
Language (HDL) used for the purpose is VHDL[5].
IV. RESULTS
D. Seven segment display
To display the total energy consumed, four seven-
segment displays are used and can be used to display The FPGA based single phase energy meter was
energy from 00.00 to 99.99 KW-hour. Each of the designed, simulated and implemented on a Spartan 2
displays needs a base drive signal for enabling it, and FPGA. The sensing circuit consisting of the op-amps
the seven segment equivalent of the digit it has to and the sample and hold ICs was implemented on a
display. The base drive is provided by the FPGA at printed circuit board. The results of simulation and
the rate of 0.25 MHz per display, at the same time it the test results are presented in this section.
sends the seven segment equivalent of the digit to that A. Simulation Results
display. Hence, all the four displays appear to be Simulation results for the adder circuit
displaying the digits simultaneously. The aim of the adder circuit, implemented using op-
E. Serial Communication Interface amps LM 324 is to shift the input a.c voltage
RS-232 (Recommended standard-232) is a standard (maximum allowed is 2.5Vmax) up by 2.5 V, so that
interface approved by the Electronic Industries the input is in the range 0-5V. This is required as the
Association (EIA) for connecting serial devices. ADC used is unipolar and can only convert signals in
Each byte of data is synchronized using it's start bit the range 0-5V to their 8-bit binary equivalent.
and stop bit. A parity bit can also be included as a The results of simulating the adder circuit are
means of error checking. Fig.3 shows the TTL/CMOS presented in Fig. 5. The input signal provided was an
serial logic waveform when using the common 8N1 a.c signal of 2.5Vmax and the output obtained was
format. 8N1 signifies 8 Data bits, No Parity and 1 same as input, but shifted up by 2.5V.
Stop Bit. The RS-232 line, when idle is in the Mark
State (Logic 1). A transmission starts with a start bit
which is (Logic 0). Then each bit is sent down the
line, one at a time. The LSB (Least Significant Bit) is
sent first. A Stop Bit (Logic 1) is then appended to the
signal to make up the transmission. The data sent
using this method, is said to be framed. That is the
data is framed between a Start and Stop Bit.
3
VCCC‘08
Fig.8 Multiplication
The next process after multiplication is accumulation.
The process of accumulation is triggered after every Fig.10 Energy Updating
third end of conversion signal. The product obtained
has to be either added or subtracted from the B. Hardware Implementation Results
accumulated value, depending on weather the inputs Design overview
were of same polarity or of opposite polarity. When The design overview was generated using Xilinx ISE.
both ‘voltage’ and ‘current’ are positive ( i.e. greater It gives an overview of the resources utilized on the
than ‘10000000’) or both of them are negative( i.e. FPGA board on which the design is implemented.
less than ‘10000000’) , the product is positive and has The details such as the number of slice registers used
to be added to the accumulator. Otherwise, the as flip-flops, latches etc. can be found from the design
4
VCCC‘08
Pin locking
The I/O pins for the FPGA were configured using
Xilinx ISE. A total of twenty pins were configured as
outputs, including those for control of ADC, Base
drive for seven-segment display, the data for seven-
segment display and the serial communication. A pin Fig.14 GUI from for serial communication
was configured exclusively to give sample/hold signal
to the LF 398 Sample and Hold IC. Eleven pins were The experimental setup used to implement and test
configured as inputs, including eight pins to detect the the design is shown in Fig.15.
ADC output, reset signal and the clock signal for the
FPGA. Fig.13 shows the pin locking and the location
of the pins on the board.
5
VCCC‘08
REFERENCES
6
VCCC‘08
1 .INTRODUCTION
7
VCCC‘08
A. Input Signal Conditioning and numbers. A sample grid for displaying letter ‘A’ on the
The input signal is scaled using op-amp based scaling screen is given in Fig.3.[4]
circuits to bring it to the voltage range accepted by the A/D
converter. After scaling this signal is fed to the A/D
converter which converts the signal to its digital
equivalent.[2] The control signals to A/D converter are
provided from the FPGA itself. The circuit designed draws
very little power thus minimizing loading effects. An
additional advantage of having the A/D converter as a
separate hardware unit is that any commercially available
A/D converter can be used depending on user requirement
with little or no change in the interface code.
B. Signal Processing
The digital values obtained after A/D conversion are
stored in a 640 X 8 bit RAM created inside FPGA. The Fig 3. Matrix values of letter “A”
control signals are sent to the memory for allowing data to
be written to it. The clock is fed to a counter that generates III. LABORATORY TESTS
the memory's sequential addresses. An option to store the
captured wave information is also provided through a flash A sample result of the laboratory tests is shown in
memory interface so that the information can be stored for Fig.4. In this sample test we will be displaying a sine wave
future reference. It also provides the user with the ability to of 2.5 kHz in second channel. The interface options are
log the waveform data for a duration, limited only by the displayed in Tamil language. Additionally maximum and
size of the flash memory used. minimum values of both the waves are also displayed. All
Additional functions like Integration, Differentiation, the waveforms are in different colors. So that it is easy to
Fast Fourier Transform and mathematical operations like differentiate between the waveforms. The waveforms are in
addition, subtraction and multiplication of signals are also the same color as that of the options available to avoid
implemented. Integration is done using Rectangular rule confusion between waves.
method. Differentiation is done using finite differentiation
method.[9] Fast Fourier Transform is implemented using
cordic algorithm. Addition, Subtraction and Multiplication
is done with basic operations like Ripple adder, 2’s
complement addition and Booth algorithm respectively.[7]
VGA Interface & Display
A VGA monitor interface was also developed and the
waveforms are displayed on the screen with 640X480
resolution with all the necessary signals such as horizontal
synchronization and vertical synchronization along with
RGB color information sent to the VGA monitor, being
generated by the FPGA.[4] The timing diagram of VGA
signals is shown in the Fig. 2.
IV. CONCLUSIONS
8
VCCC‘08
of the Digital Storage Oscilloscope presented here is [3] R.Lincke, I.Bijii, Ath.Trutia , V.Bogatu,
approximately USD 184. The system was successfully B.Logofzitu, “PC based Oscilloscopes”
tested with several forms of input waveforms, such as [4] A.N. Netravali P. Pirsch, “Character Display on
sinusoidal, square, and triangular signals. The developed CRT,”IEEE Transactions on Broadcasting, Vol.
system has possible expansion capacities in the form of Bc-29, No. 3, September 1983
additional signal processing modules. This DSO can be [5] IEEE Standard specification of general-purpose
used for real-time data acquisition in most common-purpose laboratory Cathode-Ray Oscilloscopes, IEEE
low-power- and low-frequency-range applications for high Transactions On Instrumentation And
school laboratories. It can also be used as an instructional Measurement, Vol. Im-19, No. 3, August 1970
tool for undergraduate data acquisition courses for [6] Oscilloscope Primer, XYZs of Oscilloscopes.
illustrating complex concepts concerning parallel port [7] Morris Mano,” Digital Design”.
programming, A/D conversion, and detailed circuit [8] Douglas Perry,”VHDL Programming by Example.”
development. The entire system is shown in Fig.5. [9] W.Cheney, D.Kincaid,”Numeric methods and
Computing.”
[10] Product Documentation of Tektronix and Aplab
[11] www.tektronix.com
REFRENCES
9
VCCC‘08
Abstract— NULL Convention logic (NCL) is a self- will become asserted. In a THmn gate, each of the ‘n’
timed circuit in which the control is inherent in inputs is connected to the rounded portion of the gate.
each datum. There are 27 fundamental NCL gates. The output emanates from the pointed end of the gate
The author proposes a logic element in order to and the gate’s threshold value ‘m’ is written inside the
configure as any one of the NCL gate. Two gate. NCL circuits are designed using a threshold gate
versions of reconfigurable logic element are network for each output rail [3] (i.e., two threshold
developed for implementing asynchronous FPGA. gate networks would be required for a dual-rail signal
One with embedded registration logic and the D, one for D0 , and another for D1 ).
other without embedded registration logic. Both
versions can be configured as any one of the 27 Another type of threshold gate is referred to as a
fundamental NULL convention logic (NCL) gate. weighted threshold gate, denoted as THmnWw1w2…..
It includes resettable and inverting variations. wR. Weighted threshold gates have an integer value
Both can utilize embedded registration for gates m>wR>1 applied to inputR. Here1<R<n, where n is
with three or fewer inputs. The version with only the number of inputs, m is the gate’s threshold and
extra embedded registration can utilize gates with w1,w2,….. , wR, each>1, are the integer
four inputs. The above two approaches are
compared with existing approach showing that
both version developed herein yield more area
efficient NULL convention logic (NCL) circuit
implementation.
Fig. 1. THmn threshold gate.
Index Terms—Asynchronous logic design, delay-
insensitive circuits, field-programmable gate array
(FPGA), NULL convention logic (NCL),
reconfigurable logic.
I.INTRODUCTION
Fig. 2. TH34w2 threshold gate:
Though synchronous circuit design presently
dominates the semiconductor design industry, there
are major limiting factors to this design approach, Z = AB + AC + AD + BCD.
including clock distribution, increasing clock rates, weights of input1, input2, inputR, respectively. For
decreasing feature size, and excessive power example, consider the TH34W2 gate shown in Fig. 2,
consumption[6]. As a result of the problems whose n=4 inputs are labeled A, B, C and D . The
encountered with synchronous circuit design, weight of input A,W(A), is therefore 2. Since the
asynchronous design techniques have received more gate’s threshold is 3, this implies that in order for the
attention. One such asynchronous approach is NULL output to be asserted, either inputs B, C and D, must
Convention logic (NCL). NCL is a clock-free delay- all be asserted, or input A must be asserted along with
insensitive logic design methodology for digital any other input B, C or D. NCL threshold gates are
systems. The separation between data and control designed with hysteresis state holding capability, such
representations provides self-synchronization, without that all asserted inputs must be deasserted before the
the use of a clock signal. output is deasserted. Hysteresis ensures a complete
NCL is a self-timed logic paradigm in which transition of inputs back to NULL before asserting the
control is inherent in each datum. NCL follows the output associated with the next wavefront of input
so-called weak conditions of Seitz’s delay-insensitive data.
signaling scheme. NCL threshold gate variations include resetting THnn
and inverting TH1n gates. Circuit diagrams designate
II. NCL OVERVIEW resettable gates by either ‘d’ or ‘n’ appearing inside
the gate, along with the gate’s threshold. ‘d’ denotes
NCL uses threshold gates as its basic logic elements the gate as being reset to logic ‘1’ and ‘n’ to logic ‘0’.
[4]. The primary type of threshold gate, shown in Fig. Both resettable and inverting gates are used in the
1, is the THmn gate, where 1<m<n. THmn gates have design of delay insensitive registers [8].
single-wire inputs, where at least ‘m’ of the ‘n’
inputs must be asserted before the single wire output
10
VCCC‘08
11
VCCC‘08
B. Reset Logic
The reset logic consists of a programmable latch and
transmission gate MUX [1]. During the programming
phase when P is asserted (nP is deasserted), the
latch stores the value Rv. The gate will be reset when
rst is asserted. rst is the MUX select input, such that
when it is logic ‘0’, the output of the PUPD function
passes through the MUX to be inverted and output on
Z. When rst is logic 1, the inverse of Rv is passed
through the MUX.
12
VCCC‘08
A. Reconfigurable Logic
The reconfigurable logic portion consists of the same
16-address LUT used in the previous version and a
revised PUPD function that includes additional
embedded registration logic. When embedded
registration is disabled (i.e., ER=logic ‘0’ during the
programming phase), Ki should be connected to logic
‘0’, and the PUPD logic functions the same as
explained. However, when embedded registration is
enabled, the output of the PUPD function will only be
logic ‘0’ when both F and Ki are logic ‘1’, and will
only be logic ‘1’ when all gate inputs (i.e., A,B,C and
D) and Ki are logic ‘0’.
B. Embedded Registration
13
VCCC‘08
14
VCCC‘08
REFERENCES:
TABLE II
Propagation delay comparison based on input
transition
15
VCCC‘08
Abstract— The great interest in RF CMOS comes flexible in terms of drive requirements, bandwidth
from the obvious advantages of CMOS technology and circuit loading. For RF applications, the most
in terms of production cost, high-level integration, common drive requirements for off-chip loads are
and the ability to combine digital, analog and RF based on 50 impedances. A factor governing the
circuits on the same chip. This paper reviews the bandwidth of the RF cells is the nodal capacitance to
development of an ASIC cell library especially for ground, primarily the drain and source sidewall
RF applications. The developed cell library capacitances. Transistors making up the library
includes cells like filters, oscillators, impedance elements are usually designed with multiple gate
matching circuits, low noise amplifiers, mixers, fingers to reduce the sidewall capacitance. Since these
modulators and power amplifiers. All cells were cells are to be used with digital and baseband analog
developed using standard 0.25µm and 0.18µm systems, control by on-chip digital and analog signals
CMOS technology. Circuit design criteria and is another factor in the design.
measurement results are presented. Applications The choice of cells in such a cell library should be
are low power, high speed data transfer RF based on the generalized circuit layout of a wireless
applications. system front end. A typical RF front end will have
both a receiver and transmitter connected to an
Index Terms— ASIC Cell Library, RF VLSI antenna through some type of control device. For the
receiver chain, the RF signal is switched to the upper
I.INTRODUCTION arm and enters the low noise amplifier and then to a
down converting mixer. For the transmit chain, the
he use of analog CMOS circuits at high frequencies RF signal enters an upconverting mixer and is then
Thas garnered much attention in the last several years.
sent to the output amplifier and through the control
CMOS is especially attractive for many of the device to the antenna. A number of CMOS cells
applications because it allows integration of both should be designed for the library. These cells include
analog and digital functionality on the same die, an RF switch for control of microwave and RF energy
increasing performance at the same time as keeping flow from the antenna to the transmitter or receiver, a
system sizes modest. The engineering literature has transmitter output amplifier capable of driving a 50
shown a marked increase in the number of papers antenna directly and at low distortion, and a mixer
published on the use of CMOS in high frequency that may be used in either circuit branch. An active
applications, especially since 1997. These load is included for use wherever a load may be
applications cover such diverse areas as GPS, micro required.
power circuits, GSM, and other wireless applications The cell library for RF applications presented here
at frequencies from as low as 100MHz for low earth attempts to address many of the design factors. The
orbiting satellite system to 1000 MHz and beyond. library consists of cells designed using 0.18 and
Many of the circuits designed are of high performance 0.25 CMOS processes. The cells described in this
and have been designed with and optimized for the paper can be used separately or combined to construct
particular application in mind. more complex functions such as an RF application.
At the heart of rapid integrated system design is the Each of the cells will be discussed separately for the
use of cell libraries for various system functions. In sake of clarity of presentation and understanding of
digital design, these standard cells are both at the the operation of the circuit. There was no post-
logic primitive level (NAND and NOR gates, for processing performed on any of the circuit topologies
example) as well as higher levels of circuit presented in this paper. The systems were designed to
functionality (ALUs, memory). For baseband analog maintain 50 system compatibility. The cells have
systems, standard cell libraries are less frequently been designed for flexibility in arrangement to meet
used, but libraries of operational amplifiers and other the designer's specific application. The larger
analog circuits are available. In the design of a CMOS geometry devices may also be used for education
RF cell library, the cells must be designed to be purposes since there are a number of low cost
fabrication options available for the technology. In
the design of any cell library, a trade-off between
16
VCCC‘08
speed/frequency response and circuit complexity is Fig.(1) the LC tank is shown explicitly, in practical
always encountered. A portion of this work is to show situations another configuration can be made, while
the feasibility of the cell library approach in RF for small signal circuits, it does not matter if the
design. The approach taken in this work with the second node of the capacitor C is connected to Vdd or
technologies listed above is directly applicable to ground. However, in any case a serial output
small device geometries. These geometries will yield capacitor is needed to block the DC path. This
even better circuit performance than the cells capacitor, not shown in Fig. (1) can contribute to the
discussed here. output matching, so it has to be chosen very carefully.
The output pad capacitance can be used for output
II.LOW NOISE AMPLIFIER DESIGN matching additionally.
The most critical point for the realization of a highly In order to connect the LNA to a measurement
integrated receiver is the RF input. The first stage of a equipment, a package or an antenna bonding pads
receiver is a low noise amplifier (LNA), which (Cpad) are needed. Fig. (1) shows two LNAs with
dominates the noise figure of the whole receiver. different input matching networks. In the networks
Besides of low noise, low power consumption, high from Fig. (1) all components are placed on the chip.
linearity and small chip size are the other key This principle is very often used, therefore we start
requirements. Because of this situation the design of the LNA analysis from this point. The bonding pad is
the LNA is really a challenge. parallel to the input of the LNA, and as long as their
impedance is much higher than the input impedance
of the LNA, they do not introduce any significant
effects to the input impedance of the whole circuit. In
our case assuming practical value of 150 fF for Cpad
and frequency of 2 GHz the impedance of the pad can
be neglected in comparison with required 50 .
However, if the influence of Cpad can not be neglected
only the imaginary part of Zin is affected.
The use of inductive degeneration results in no
additional noise generation since the real part of the
input impedance does not correspond to a physical
Fig.1. Amplifiers with input matching circuits: (a) resistor. The source inductor Ls generates a resistive
inductor Lg connected directly to the transistor, (b) term in the input impedance
pad capacitance Cpad connected directly to the
transistor. Zin = (gmLs/Cgs) + j ( ( 2
(Lg+Ls)Cgs -1)/ Cgs)
Among a few possible solutions for the LNA core, where Ls and Lg are source and gate inductors,
a cascode amplifier shown in Fig (1) with inductive respectively and gm and Cgs denote small signal
degeneration is often preferred. The transistor in parameters of transistor M1 (Cgd, gds and Cpad are
common-gate (CG) configuration of the cascode neglected).
amplifier reduces the Miller effect. It is well known, The inductor Lg series connected with the gate
that the capacitance connected between output and cancels out the admittance due to the gate-source
input of an amplifier with inverting gain, is seen at its capacitor. Here, it is assumed that the tuned load (L,
input and output multiplied by the gain. The gain of C) is in resonance at angular frequency 0 and
the common-source (CS) configuration is gmRL therefore appears to be a pure resistive load RL.
where RL is the output impedance, and the input To obtain a pure resistive term at the input, the
impedance of CG configuration is 1/gm. Therefore, if capacitive part of input impedance introduced by the
both transistors have similar gm the gain of the capacitance Cgs should be compensated by
transistor in CS configuration decreases and the inductances. To achieve this cancellation and input
Miller capacitance is reduced. At the output of the matching, the source and gate inductances should be
cascode amplifier, the overlap capacitance does no set to
affect the Miller effect since the gate of the amplifier Ls = RsCgs/ gm
is grounded. Thus, the tuned capacitor of the LC tank
only has to be large enough to make the tank Lg = (1- 02LsCgs) / 02Cgs
insensitive to Cgd2. In addition, with a low impedance where Rs is the required input resistance, normally
point at the output of the common source amplifier, 50 .
the instability caused by the zero of the transfer The noise figure of the whole amplifier with noise
function is highly reduced. Finally, with an AC contribution of transistor M2 neglected can be given
ground at the gate of the cascode amplifier, the output as
is decoupled from the input, giving the cascode
configuration a high reverse isolation. Although in
17
VCCC‘08
2
F = 1 + ( / )(1/Q) ( 0/ T) [ 1+ ( /k )(1+Q2) + been proposed, which allows dominant noise
2
2|c| ( /k )] contributions to be reduced. A very low noise figure
can be achieved by this way. This matching consists
Where = gm/gd0 of series inductance and parallel capacitance
connected between base and emitter of the common
, , c, k are bias dependent transistor parameters source transistor.
and Q=1/( 0CgsRs) is the quality factor of the input The input matching presented in Fig. (1)b is quite
circuit. It can be seen that noise figure is improved by similar to bipolar amplifier. Here, instead of base
the factor ( T / 0)2. Note, that for currently used emitter capacitance pad capacitance is used. It can be
sub-micron MOS-technologies T is in the order of expected, that taking pad capacitance as a part of
100 GHz. The noise figure of the LNA can be also input matching can lower the noise figure of a FET
expressed in simplified form with induced gate noise LNA. RF-CMOS LNAs have achieved lowest noise
neglected, however, easier for first order analysis values if pad capacitance was taken into
consideration. The reason for this behavior has not
2
F 1 + kgmRs/( T/ 0) been discussed enough, so far.
OSCILLATOR DESIGN
where is bias dependent constant and Rs is source Oscillators can generally be categorised as either
resistance. Although on a first sight suggests low amplifiers with positive feedback satisfying the
transconductance gm for low noise figure, taking into wellknown Barkhausen Criteria, or as negative
account that T gm/Cgs one can see that it is not resistance circuits. At RF and Microwave frequencies
true. Increasing of gm lowers the noise figure but at the negative resistance design technique is generally
the cost of higher power consumption. Since Cgs favoured.
contributes to the ( T / 0)2 factor, lowering this The procedure is to design an active negative
capacitance leads to improved noise. The last resistance circuit which, under large-signal steady-
possibility of noise reduction is reducing the signal state conditions, exactly cancels out the load and any
source resistance Rs. However, this resistance is fixed, other positive resistance in the closed loop circuit.
normally. This leaves the equivalent circuit represented by a
Decreasing the Cgs capacitance is done by reducing single L and C in either parallel or series
the size of the transistor. This has also impact on the configuration. At a frequency the reactances will be
linearity of the amplifier, and according to input equal and opposite, and this resonant frequency is
matching requirements, very large inductors Lg given by the standard formula
should be used that can not be longer placed on chip. f= 1/ (2 (LC))
Because of this reason the inductor Lg is placed off- It can be shown that in the presence of excess
chip. Between the inductor and the amplifier the on negative resistance in the small-signal state, any small
chip pad capacitance Cpad is located as it is shown in perturbation caused, for example, by noise will
Fig. (1)b. It consists of the pad structure itself and the rapidly build up into a large signal steady-state
on chip capacitance of ESD structure and signal resonance given by equation
wiring. In this case pad capacitance and Cgs are in Negative resistors are easily designed by taking a
similar order. Therefore, the pad has to be treated as a three terminal active device and applying the correct
part of an amplifier and then taken into account in the amount of feedback to a common port, such that the
design process. magnitude of the input reflection coefficient becomes
It should be noted, that particularly input pads need greater than one. This implies that the real part of the
special consideration. It has been proven that shielded input impedance is negative. The input of the 2-port
pads have ideally no resistive component, and so they negative resistance circuit can now simply be
neither consume signal power nor generate noise. terminated in the opposite sign reactance to complete
They consist of two metal plates drawn on the top and the oscillator circuit. Alternatively high-Q series or
bottom metals to reduce the pad capacitance value parallel resonator circuits can be used to generate
down to 50 fF. Unfortunately, it is not the whole higher quality and therefore lower phase noise
capacitance, which should be taken into account. One oscillators. Over the years several RF oscillator
has to realize that all connections to the pad increase configurations have become standard. The Colpitts,
this value. Hartly and Clapp circuits are examples of negative
The input matching circuit is very important for resistance oscillators shown here using bipolars as the
low noise performance of the LNA. Low noise active devices. The Pierce circuit is an op-amp with
cascode amplifiers using different approaches for positive feedback, and is widely utilised in the crystal
input impedance matching have been analyzed and oscillator industry.
compared in terms of noise figure performance for The oscillator design here concentrates on a
bipolar technology. The effect of noise filtering worked example of a Clapp oscillator, using a
caused by the matching network has been pointed out. varactor tuned ceramic coaxial resonator for voltage
Furthermore, a parallel-series matching network has control of the output frequency. The frequency under
18
VCCC‘08
19
VCCC‘08
20
VCCC‘08
frequency is important because it includes several [6] N. Ratier, M. Bruniaux, S. Galliou, R. Brendel, J.
commercial communications bands. The design goals Delporte, "A very high speed method to simulate
were met for all the cell library elements. Designed quartz crystal oscillator" Proc. of the 19th EFTF.
amplifier can be used to improve with on-off function Besan,on, March 2005, to be published.
by connecting the gate of the transistor M2 through a [7] J. Park, C.-H. Lee, B. Kim, and J. Laskar, “A
switch to the Vdd or ground. Although, not fully lowflicker noise CMOS mixer for direct
integrated on chip, this architecture is a good solution conversion receivers,” presented at the IEEE
for multistandard systems, which operates at different MTT-S Int. Microw. Symp., Jun. 2006.
frequency bands. In reality, the small-signal [8] T. H. Lee, The Design of CMOS Radio-
simulation is vital to ensure that adequate negative Frequency Integrated Circuits. Cambridge,
resistance is available for start-up of oscillation. With United Kingdom: Cambridge University Press,
the emergence of new and more advanced 2002.
semiconductor processes, the proper integrated mixer [9] D. Pienkowski, R. Kakerow, M. Mueller, R.
circuit topology with the highest overall performance Circa, and G. Boeck, “Reconfigurable RF
can be devised and implemented. Receiver Studies for Future Wireless Terminals,”
Proceedings of the European Microwave
REFERENCES Association, 2005, vol. 1, June 2005.
[1] D. Jakonis, K. Folkesson, J. Dabrowski, P. [10] S. Camerlo, "The Implementation of ASIC
Eriksson, C. Svensson, "A 2.4-GHz RF Sampling Packaging, Design, and Manufacturing
Receiver Front-End in 0.18um CMOS", IEEE Technologies on High Performance Networking
Journal of Solid-State Circuits, Volume 40, Issue Products," 2005 Electronic Components and
6, June 2005, PP. 1265-1277. Technology Conference Proceedings, June 2005,
[2] C. Ruppel, W. Ruile, G. Schall, K. Wagner, and pp. 927-932.
0. Manner, "Review of Models for Low-Loss [11] J. Grad, J. Stine, “A Standard Cell Library for
Filter Design and Applications," IEEE Student Projects”, Technical Report Illinois
Ultrasonics Symp. Prac. 1994, pp. 313-324. Institute of Technology 2002,
[3] H. T. Ahn, and D. J. Allstot, “A 0.5-8.5-GHz http://www.ece.iit.edu/ cad/scells
fully differential CMOS distributed amplifier,” [12] D. Stone, J. Schroeder, R. Kaplan, and A. Smith,
IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. “Analog CMOS Building Blocks for Custom
985-993, Aug 2002. and Semicustom
[4] B. Kleveland, C. H. Diaz, D. Vook, L. Madden, Applications”, IEEE JSSC, Vol. SC-19, No. 1,
T. H. Lee, and S. S. Wong, ”Exploiting CMOS February, 1984.
Reverse Interconnect Scaling in Multigigahertz
Amplifier and Oscillator Design,” IEEE J. Solid-
State Circuits, vol. 36, no. 10, pp. 1480-1488,
Oct 2001
[5] M. Yamaguchi, and K.-I. Arai, "Current status
and future prospects of RF integrated
inductors", J. Magn. Soc. Jpn., vol.25, no.2,
pp.59-65 2001.
21
VCCC‘08
Abstract –In computer vision systems, image into the image domain results in the desired
feature separation is a very difficult and important segmentation.
step. The efficient and powerful approach is to do
unsupervised clustering of the resulting data set. Components of a Clustering Task
This paper presents the mapping of the
unsupervised histogram peak-climbing clustering Typical pattern clustering activity involves the steps:
algorithm to a novel high-speed architecture Pattern representation (optionally including feature
suitable for VLSI implementation and real-time extraction and/or selection), definition of pattern
performance. It is the first special- purpose proximity measure appropriate to the data domain,
architecture that has been proposed for this clustering or grouping, Data abstraction (if needed),
important problem of clustering massive amounts and Assessment of output (if needed) [4] as shown in
of data which is a very computationally intensive Fig.1.
task and the performance is improved by making
the architecture truly systolic. The architecture has
also been prototyped using a Xilinx FPGA
development environment.
I.INTRODUCTION
22
VCCC‘08
choose the best fitting features and/or clustering N Æ dimensions of the features;
method based on problem domain [6], type of CS (k) Æ length of the histogram cell in the
imagery, or lighting conditions. kth dimension;
f max(k)Æ maximum value of the kth
dimension of the features;
f min(k)Æ minimum value of the kth
dimension of the M features;
QÆ total number of quantization levels for
each dimension of the N-dimensional
histogram;
dkÆ index for a histogram cell in the kth
dimension associated with a given feature
f.
B. Peak-climbing Approach
A. Histogram Generation
23
VCCC‘08
24
VCCC‘08
A. Overall Steps
25
VCCC‘08
V. FPGA IMPLEMENTATION
26
VCCC‘08
REFERENCES
27
VCCC‘08
28
VCCC‘08
decoder
locations might contain matching data, but no 0110 X 01 01 port B
checking is done for this. Storage for the multi-cycle 011XX 10 port C
CAM can be either in distributed RAM (registers) or 10011 11 port D
block RAM.
M bits
M-1 bits 01101 Port B
Last data word of previous entry 0
First tag word 1 Fig. 4. CAM based implementation of he routing
Second tag word 0 table.
Third tag word 0
All four entries in the table are 5-bit words,
First data word 0
with the don’t car care bit “X”, matching both a 0
Second data word 0
and a 1 in that position. Due to the “X” bits, the first
Third data word 0
three entries in the Table represent a range of input
Fourth data word 0
addresses, i.e., entry 1 maps all addresses in the range
First tag word of next entry 1 10100 to 10111 to port A. The router searches this
One entry with 3*(M-1) tag bits and 4*(M-1) data table for the destination address of each incoming
bits packet, and selects the appropriate output port.
Fig.2. Cam Storage Organization For example, if the router receives a packet
In a typical CAM, each memory word is With the destination address 10100, the packet is
divided into two parts. 1. A tag field .2.An address forwarded to port A. In the case of the incoming
field. Each tag field is associated with one address 01101, the address lookup matches both entry
comparator. Each comparator compares the associated 2 and entry 3 in the table. Entry 2 is selected since it
tag with the input tag bits, and if a match occurs, the has the fewest “X” bits, or alternatively it has the
corresponding data bits are driven out of the memory. longest prefix, indicating that it is the most direct
Although this is fast, it is not suitable for applications route to the destination. This lookup method is called
with a very wide tag or data width. Wide tags lead to longest-prefix matching.
large comparators, which are area inefficient, power Fig illustrates how a CAM accomplishes address
hungry, and often slow. lookup by implementing the routing table shown in
Table able I. On the left of Fig, the packet destination-
29
VCCC‘08
address of 01101 is the input to the CAM. As in the classified. This typically involves some type of
table, two locations match, with the (priority) encoder search. Current software based approaches rely on
choosing the upper entry and generating the match standard search schemes such as hashing.
location 01, which corresponds to the most-direct This results in savings not only in the cost of the
route. processor itself, but in other areas such as power
This match location is the input address to a RAM consumption and overall system cost. In addition, an
that contains a list of output ports, as depicted in Fig. external CAM provides networking hardware with the
A RAM read operation outputs the port designation, ability to achieve packet processing in essentially
port B, to which the incoming Packet is forwarded. constant time. Provided all elements to be matched fit
We can view the match location output of the CAM in the CAM circuit, the time taken to match is
as a pointer that retrieves the associated word from independent of the number of items being matched.
the RAM. In the particular case of pack packet
forwarding the- associated word is the designation of
the output port. This CAM/RAM System is a
complete implementation of an address-lookup engine
for packet forwarding.
L L L L L L L
III. THE RECONFIGURABLE CONTENT
ADDRESSABLE MEMORY (RCAM)
The Reconfigurable Content Addressable & & &
Memory or RCAM makes use of run-time
reconfiguration to efficiently implement a CAM
circuit. Rather than using the FPGA flip-flops to store
the data to be matched, the RCAM uses the FPGA
Look up Tables or LUTs. Using LUTs rather than Fig.6. IP Match circuit using the RCAM.
flip-flops results in a smaller, faster CAM. The
approach uses the LUT to provide a small piece of Figure above shows an example of an IP
CAM functionality. In Figure, a LUT is loaded with Match circuit constructed using the RCAM approach.
data which provides\match 5" functionality. That is, Note that this example assumes a basic 4-input LUT
whenever the binary encoded value \5" is sent to the structure for simplicity. Other optimizations,
four LUT inputs, a match signal is generated. including using special-purpose hardware such as
4 input LUT carry chains are possible and may result in substantial
circuit area savings and clock speed increases.
0000
This circuit requires one LUT input per
0100
matched bit. In the case of a 32-bit IP address, this
0000
circuit requires 8 LUTs to provide the matching, and
0000
three additional 4-input LUTs to provide the ANDing
for the MATCH signal. An array of this basic 32-bit
Fig. 5. A simple Look up Table. … 000100 matching block may be replicated in an array to
produce the CAM circuit.
Note that using a LUT to implement CAM
functionality, or any functionality for that matter, is IV. FINITE STATE MACHINES
not unique. An N-input LUT can implement any
arbitrary function of N inputs; including a CAM. This If a combinational logic circuit is an
circuit demonstrates the ability to embed a mask in implementation of a Boolean function, then,
the configuration of a LUT, permitting arbitrary sequential circuit can be considered an
disjoint sets of values to be matched, within the LUT. implementation of finite state machine.
This function is important in many matching The goal of FSM is not accepting or rejecting things,
applications, particularly networking. This approach but generating a set of outputs, given, a set of inputs.
can be used to provide matching circuits such as They describe how the inputs are being processed,
match all or match none or any combination of based on input and state, to generate outputs. This
possible LUT values. FSM uses only entry actions, that is, output depends
One currently popular use for CAMs is in only on the state.
networking. Here data must be processed under
demanding real-time constraints. As packets arrive, In a Moore finite state machine, the output of the
their routing information must be processed. In circuit is dependent only on the state of the machine
particular, destination addresses, typically in the form and not its inputs. This FSM uses only input actions,
of 32-bit Internet Protocol (IP) addresses must be
30
VCCC‘08
that is, output depends on input and state. The usage There are two types of ATM installations, on and off
of this machine results in the reduction in the number premise. On premise ATMs are typically more
of states. This FSM uses only input actions, that is, advanced, multi-function machines that complement
output depends on input and state. The usage of this an actual bank branch's capabilities and thus more
machine results in the reduction in the number of expensive. Off premise machines are deployed by
states. financial institutions and also ISOs (or Independent
In a Mealy Finite state machine, the output is Sales Organizations) where there is usually just a
dependent both on the machine state as well as on the straight need for cash, so they typically are the
inputs to the finite state machine. Notice that in this cheaper mono-function devices.
case, outputs can change asynchronously with respect
to clock. C. Hardware
One of the best ways of describing a Mealy An ATM typically is made up of the following
finite state machine is by using two always devices.
statements, on be for describing the sequential logic, CPU (to control the user interface and transaction
and one for describing the combinational logic (this devices), Magnetic and/or Chip card reader (to
includes both next state logic and output logic). It is identify the customer), PIN Pad (similar in layout to a
necessary top do this since any changes on inputs Touch tone or Calculator keypad), often
directly the outputs used to describe combinational manufactured as part of a secure enclosure, Secure
logic, the state of the machine using a reg variable. crypto processor, generally within a secure enclosure,
Display (used by the customer for performing the
V. THE AUTOMATED TELLER MACHINES transaction). Function key buttons (usually close to
the display) or a Touch screen (used to select the
An ATM is also known, in English, as various aspects of the transaction), Record Printer (to
Automated Banking Machine, Money machine, Bank provide the customer with a record of their
Machine transaction) , Vault (to store the parts of the
machinery requiring restricted access), Housing (for
A. Usage aesthetics and to attach signage to). Cheque
Processing Module ,Batch Note Acceptor .Recently,
Encrypting PIN Pad (EPP) with German due to heavier computing demands and the falling
markings on most modern ATMs, the customer price of computer-like architectures, ATMs have
identifies him or herself by inserting a plastic card moved away from custom hardware architectures
with a magnetic stripe or a plastic smartcard with a using microcontrollers and/or application-specific
chip, that contains his or her card number and some integrated circuits to adopting a hardware architecture
security information, such as an expiration date or that is very similar to a personal computer.
CVC (CVV). The customer then verifies their identity Many ATMs are now able to use operating
by entering a pass code, often referred to as a systems such as Microsoft Windows and Linux.
Personal Identification Number Although it is undoubtedly cheaper to use commercial
off-the-shelf hardware, it does make ATMs
B. Types of ATMs vulnerable to the same sort of problems exhibited by
conventional computers.
Mono-function devices, which only one type of
mechanism for financial transactions is present (such D. Future
as cash dispensing or statement printing) Multi- ATMs were originally developed as just cash
function devices, which incorporate multiple dispensers; they have evolved to include many other
mechanisms to perform multiple services (such as bank-related functions. ATMs can also act as an
accepting deposits, dispensing cash, printing advertising channel for companies to advertise their
statements, etc.) all within a single footprint. own products or third-party products and services.
There is now a capability (in the U.S.and Europe, at
least) for no-envelope deposits with a unit called a VI. IMPLEMENTATION
Batch- or Bunch-Note Acceptor, or BNA, that will
SERV
accept up to 40 bills at a time. There is another unit
called a Cheque Processing Machine, or Module,
(CPM) that will accept your Cheque, take a picture of
both sides, read the magnetic ink code line which is at
the bottom of every Cheque, read the amount written
ATM ATM ATM
on the Cheque, and capture your Cheque into a bin,
giving you instant access to your money, if your
account allows. Fig.9.Current ATM System
31
VCCC‘08
SERVE
R
DP D
RCA RC RC RC
M AM AM AM
Fig.11.Block Diagram
32
VCCC‘08
REFERENCES
33
VCCC‘08
34
VCCC‘08
Abstract The paper describes the architecture and (SPRs) and 64 Double-Precision-Floating-Point-Unit
design of the pipelined execution unit of a 32-bit (DPFPU) registers. The Instruction Set Architecture
RISC processor. Organization of the blocks in
different stages of pipeline is done in such a way (ISA) has total 136 instructions. The processor has
that pipeline can be clocked at high frequency. two modes of operation, user mode and supervisor
Control and forward of 'data flow' among the mode (protected mode). Dependency-Resolver detects
stages are taken care by dedicated hardware logic. and resolves the data hazard within the pipeline. The
Different blocks of the execution unit and execution unit is interfaced with instruction channel
dependency among themselves are explained in and data channel. Both the channels operate in
details with the help of relevant block diagrams. parallel and Communicate with external devices
The design has been modeled in VHDL and through a common Bus Interface Unit (BIU). The
functional verification policies adopted for it have instruction channel has a 128-bit Instruction Line
been described thoroughly Synthesis of the design Buffer, 64-KB Instruction Cache [1], a Memory
is carried out at 0.13-micron standard cell Management Unit (MMU) [1] and a 128-bit Prefetch
technology. Buffer [3]. The data channel has a 128-bit Data Line
Buffer, 64-KB Data Cache, a MMU and a Swap
Keywords: ALU, Pipeline, RISC, VLSI, Multistage Buffer [3].
The Pre-fetch Buffer and Swap Buffer are introduced
I. INTRODUCTION to reduce memory latency during instruction fetch and
data cache misses respectively. The external data flow
The worldwide development of high-end, through the instruction channel and data channel is
sophisticated digital systems has created a huge controlled by respective controller-state-machine. The
demand for high speed, general-purpose processor. processor also has seven interrupt-requests (IRQs)
The performance of processors has increased and one non-maskable interrupt (NMI) inputs. The
exponentially since their launch in 1970. Today's high Exception Processing Unit controls the interrupts and
performance processors have a significant on-chip exceptions.
impact on the commercial marketplace. This high
growth rate of the processors is possible due to III. EXECUTION UNIT
dramatic technical advances in computer architecture,
circuit design, CAD tools and fabrication methods. CORE-I execution unit contains an ALU unit, a
Different processor architectures have been developed branch/jump unit, a single-precision floating-point
and optimized to achieve better performance. RISC unit (SPFPU) and a double-precision floating-point
philosophy [1] has attracted microprocessor designers unit (DPFPU) [4]. The execution unit is implemented
to a great extent .Most computation engines used in six pipeline stages - IF (Instruction Fetch), ID
these days in different segments like server, (Instruction Decode), DS (Data Select), EX
networking, signal processing are based on RISC (Execution Unit), MEM (Memory) and WB (Write
philosophy [1].To cater to the needs of multi-tasking, Back). The main blocks in different stages of the
multi-user applications in high-end systems, a 32-bit pipeline are shown in Figure1
generic processor architecture, CORE-I, has been
designed based on RISC philosophy [2]. A.Program Counter Unit (PCU):
II. PROCESSOR OVERVIEW Program Counter Unit provides the value of program
Counter (PC) in every cycle for fetching the next
CORE-I is a 32-bit RISC processor with 6 stage instruction from Instruction Cache. In every cycle, it
pipelined execution unit based on load-store checks for presence of any branch instruction in
architecture. ALU supports both single-precision and MEM stage, jump instruction in
double-precision floating-point operations. CORE-I EX stage, any interrupt or on-chip exceptions and
Register-File has 45 General- Purpose Registers presence of RTE (return from exception) instruction
(GPRs), 19 Special-Purpose Registers [2] inside the pipeline. In the absence of any one of
35
VCCC‘08
36
VCCC‘08
and DS/EX stage. Below figure shows the stall enable 3 cycles and for other instructions like 32x32
and Freeze signal generation blocks in the ID, DS and multiplication, single-precision and double-precision
EX stage. EX-MEM-WB stages are not stalled. So floating-point operations, the number of cycles is
EX result moves to MEM and forwarded to DS programmable. It can be programmed through
instructions or setting values in dedicated input ports.
D.II Forwarding When the instruction reaches EX stage, the pipeline is
frozen for required number of cycles
In CORE-I architecture data are forwarded
from MEM,WB, WB+ and WB++ stages to DS stage.
WB+ and WB++ are the stages used for forwarding
only and contain the 'flopped' data of the previous
stage for the purpose of forwarding. Generation of all
the control signals for data forwarding as well as the
actual transfer of data is time critical. Uniqueness of
the CORE-I data forwarding is that, all the control
signals used for the forwarding multiplexers, one
clock cycle earlier. Then they are latched and used in
DS stage as shown in below figure. Fig3.Forwarding Scheme
The consequences of the early select signals F. Branch and Jump Unit:
generation are -
1. The forwarding instruction has to be CORE-I supports 14 conditional branch
checked one stage before the actual stage of instructions, 2 jump instructions and 1 return
forwarding. For example, to forward data from MEM instruction. Jump and return instructions are
stage, the instruction in the EX stage has to be scheduled in EX stage, i.e. PC value is updated when
checked with the receiving instruction. the instruction is in EX stage. But for the
2. The receiving instruction also has to be conditional branch instructions, condition is evaluated
compared with the forwarding instruction one stage in EX stage and PC value is updated in MEM stage.
before.Receiving stage is always DS. In most of the All these instructions have 3-delay slots [1].
situations the receiving instruction in DS, one clock
cycle back remains in ID. So the ID stage instruction G. Exception Processing Unit:
is compared with the forwarding instruction. But in
case of successive dependency, when IF, ID and DS CORE-I has external seven IRQs, 1 NMI,
stages are stalled for one cycle, the receiving and 1 PIO interrupt [2]. In addition to these external
instruction, before the forwarding remains in the DS interrupts, the onchip interrupt controller serves SIO,
stage itself. In that case the DS instruction is Timer interrupts, onchip exceptions due to arithmetic
compared with the forwarding instruction. For time operations, bus errors and illegal opcodes. CORE-I
constraint Dependency Resolver generates the control also supports software interrupts with TRAP
signals for both the cases and finally the correct one is instructions. The Interrupt Service Routine (ISR)
selected in DS stage. address for the interrupt is calculated in EX stage and
fed to the PCU. The return PC value and processor
status word is saved in the interrupt stack pointer
before transferring control to routine. At the time of
exception processing, if higher priority interrupts
come, interrupt controller serves the higher priority
one.
37
VCCC‘08
V. CONCLUSION
REFERENCES
38
VCCC‘08
Abstract- The embedded remote electronic measuring embedded keyboard. Then the embedded remote
system board, with interface modules uses an measurement system will output the voltage waveform
embedded board to replace a computer. The embedded into circuit under test. The users can then observe the
board has the advantages of being portable, operates in waveforms by means of the oscilloscope interface of the
real time, is low cost and also programmable with on embedded board. If the users are not satisfied with this
board operating system. waveform, they can re-setup another waveform. The
The design provides step by step function to dual channel LCM (Liquid Crystal Monitor) provides a
help the user operate, such as keying in the waveform comparison between the input and response waveforms
parameters with the embedded board keyboard, to determine the characteristics of the circuit under test.
providing test waveforms and then connecting the The network function can also transfer the measured
circuit-under-test to the embedded electronic waveforms to a distant server. Our design can provide
measurement system. This design can also display the different applications, in which any electronic factory
output response waveform measured by the embedded can build the design house and the product lines at
measurement system (from the circuit-under-test) on different locations. The designer can measure the
the embedded board LCM (Liquid Crystal Monitor). electronic products through the Internet and the
necessary interfaces. If there are any mistakes in the
I. INTRODUCTION circuit boards on the production line, our design the
Initially designed remote electronic measurement engineers can solve these problems through network
system used interface chips to design and implement the communications; which will reduce the time and cost.
interface cards for the functions of the traditional
electronic instruments of a laboratory such as power II. HARDWARE OF THE EMBEDDED REMOTE
supply, a signal generator and an oscilloscope. They ELECTRONIC MEASUREMENT SYSTEM
integrate the communication software to control the The embedded remote electronic measurement
communication between the computer and interface card system includes power supply, a signal generator and an
by means of an ISA (Industry Standard Architecture) bus oscilloscope. Fig. 1 shows the embedded remote
to transmit or receive the waveform data. By utilizing measurement system architecture.
widely used software and hardware, the users at the client
site will convey the waveform data through the computer
network and can not only receive the input waveforms
from the clients, but can also upload the measurement
waveforms to a server. In this remote electronic
measurement system has some disadvantages: it requires
operation with a computer and it is not portable.
The intended embedded electronic measurement
system overcomes this disadvantage by replacing the Fig 1. The embedded remote electronic measurement
computer with the embedded board, since it has an system architecture
advantage of portability, has real time operation, is low
cost, and is programmable. Fig. 2 shows the hardware interface modules of
In this design the users need only key in the embedded remote electronic measurement system.
the waveform and voltage parameters using the We can divide these interface modules into three parts,
ADC, DAC and the control modules. The function of the
ADC module is for the oscilloscope that is used mainly
to convert the analog signal to the digital format for the
measurement waveform. The function of the DAC
module is to convert the digital signal to analog signal
39
VCCC‘08
40
VCCC‘08
42
VCCC‘08
programming data for both the devices and the be fixed and supportive, but potentially constraining,
interconnections among them can make a difference or free and flexible In interactive systems design
in shortening the design time. The problems that we prototypes are critical in the early stages of design
must solve include: fast and easy implementation of a unlike other fields of engineering design where design
prototype of the embedded system: validation of decisions can initially be carried out analytically
hardware and software communication without relying on a prototype. In systems design,
(synchronization between hardware and software prototyping is used to create interactive systems
heavily impacts the performance of the final product). design where the completeness and success of the
Our approach is based on the following basic user interface can utterly depend on the testing.
ideas: 1.Use of a programmable board, a sort of Embedded systems are found everywhere. A
universal printed circuit board providing re- specialized computer system that is part of a larger
programmable connections between components. system or machine. Typically, an embedded system is
With a programmable board as prototyping vehicle, housed on a single microprocessor board with the
the full potential the FPGA can be exploited FPGA programs stored in ROM. Virtually all appliances that
Programming no longer affected by constraints such have a digital interface--watches, microwaves, VCRs,
ns a fixed pin assignment due to the custom printed cars -- utilize embedded systems. Some embedded
board or a wire wrap prototype. 2. Debugging the systems include an operating system , but many are so
prototype by loading the code on the target emulator specialized that the entire logic can be implemented
and making it run, programming the FPGA, providing as a single program.
signals to the board via pattern generator and In order to deliver correct-the-first-time products
analyzing the output signals via a logic analyzer. This with complex system requirements and time-to-
approach can be significantly improved by using market pressure, design verification is vital in the
debugging tools for both software and hardware in embedded system design process. A possible choice
order to execute step by step the software part and the for verification is to simulate the system being
clock-triggered hardware part. We argued that designed. Since debugging of real systems has to take
prototyping is essential to validate an embedded into account the behavior of the target system as well
system. However, to take full advantage of the as its environment, runtime information is extremely
prototyping environment, it is quite useful to simulate important. Therefore, static analysis with simulation
the design as much as feasible at all levels of the methods is too slow and not sufficient. And
hierarchy. Simulation is performed at different stages simulation cannot reveal deep issues in real physical
along the design flow. At the specification level we system. A hardware prototype is a faithful
use an existing co-simulation environment for representation of the final design, guarantying its real-
heterogeneous systems, which provides interfacing a time behavior. And it is also the basic tool to find
well-developed set of design aids for digital signal deep bugs in the hardware. For these reasons, it has
processing. become a crucial step in the whole design flow.
Prototyping can be used to gain a better Traditionally, a prototype is designed similarly to the
understanding of the kind of product required in the target system with all the connections fixed on the
early stages of system development where several PCB (printed circuit boards)
different sketch designs can be presented to users and As embedded systems are getting more complex,
to members of the development team for critique. The the needs for thorough testing become increasingly
prototype is thrown away in the end although it is an important. Advances in surface-mount packaging and
important resource during the products development multiple-layer PCB fabrication have resulted in
of a working model. The prototype gives the designer smaller boards and more compact layout, making
a functional working model of their design so they traditional test methods, e.g., external test probes and
can work with the design and identify some of its "bed-of-nails" test fixtures, harder to implement. As a
possible pros and cons before it is actually produced. result, acquiring signals on boards, which is beneficial
The prototype also allows the user to be involved in to hardware testing and software development,
testing design ideas Prototyping can resolve becomes infeasible, and tracking bugs in prototype
uncertainty about how well a design fits the user's becomes increasingly difficult. Thus the prototype
needs. It helps designers to make decisions by design has to take account of testability. If errors on
obtaining information from users about the necessary the prototype are detected, such as misconnections of
functionality of the system, user help needs, a suitable signals, it could be impossible to correct them on the
sequence of operations, and the look of the interface. multiple-layer PCB board with all the components
It is important that the proposed system have the mounted. All these would lead to another round of
necessary functionality for the tasks that users may prototype fabrication, making development time
want to perform anywhere from gathering information extend and cost increase.
to task analysis. Information on the sequence of Besides testability, it is important to maintain high
operations can tell the designers what users need to flexibility during development of the prototype as
interact successfully with the system. Exchanges can design specification changes are common. Nowadays
43
VCCC‘08
complex systems are often not built from scratch but processors, adopt a similar architecture as the one
are assembled by re using previously designed shown in Fig. 1. The integrated memory controller
modules or off-the-shelf components such as provides an external memory bus interface supporting
processors, memories or peripheral circuitry in order various memory chips and various operation modes
to cope with more aggressive time-to-market (synchronous, asynchronous, burst modes). It is also
constraints. Following the top-down design possible to connect bus-extended peripheral chips to
methodology, lots of effort in the design process is the memory bus. The on-chip peripherals may include
spent on decomposing the customers’ requirements interrupt controller, OS timer, UART, I2C, PWM,
into proper functional modules and interfacing them AC97, and etc. Some of these peripherals signals are
to compose the target system. Some previous research multiplexed with general-purpose digital I/O pins to
works have suggested that FPLD (field programmable provide flexibility to user while other on-chip
logic device) could be added to the final design to peripherals, e.g. USB host/client, may have dedicated
provide flexibility as FPLD’S can offer peripheral signal pins. By connecting or extending
programmable interconnections among their pins and these pins, user may use these on chip peripherals.
many more advantages. However, extra devices may When the on-chip peripherals cannot fulfill the
increase production cost and power dissipation, requirement of the target system, extra peripheral
weakening the market competition power of the target chips have to be extended.
system. To address these problems, there are also To enable rapid prototyping, the platform should be
suggestions that FPLD’S could be used in hardware capable of quickly assembling parts of the system into
prototype as an intermediate aproach Moreover, a whole through flexible interconnection. Our basic
modules on the prototype cannot be reused directly. idea is to insert a reconfigurable interconnection
In industry, there have been companies that provide module composed by FPLD into the system to
commercial solutions based on FPLD’S for rapid provide adjustable connections between signals, and
prototyping. Their products are aimed at SOC (system to provide testability as well. To determine where to
on a chip) functional verification instead of embedded place this module, we first analyze the architecture of
system design and development. It also encourages the system. The embedded system shown in Fig. 2 can
concurrent development of different parts of system be divided into two parts. One is the minimal system
hardware as well as module reusing. composed of the embedded processor and memory
devices. The other is made up of peripheral devices
extended directly from on-chip peripheral interfaces
of the embedded processor, and specific peripheral
chips and circuits extended by the bus. The minimal
system is the core of the embedded system,
determining its processing capacity. The embedded
processors are now routinely available at clock speeds
of up to 400MHz, and will climb still further. The
speed of the bus connecting the processor and the
memory chips is exceeding 100MHz. As pin-to-pin
propagation delay of a FPLD is in the magnitude of a
few nanoseconds, inserting such a device will greatly
impair the system performance. The peripherals
enable the embedded system to communicate and
interactive with the circumstance in the real world. In
general, peripheral circuits are highly modularized
Fig.1 FPGA architecture and independent to each other, and there are hardly
needs for flexible connections between them.
Here we apply a reconfigurable interconnection
module to substitute the connections between
microcomputer and the peripherals, which enables
flexible adjusting of connections to facilitate
II. THE DESIGN OF A RAPID PROTOTYPING interfacing extended peripheral circuits and modules.
PLATFORM As the speed of the data communication between the
A. Overview peripherals and the processor is much slower than that
in the minimal system, the FPLD solution is feasible.
ARM based embedded processors are wildly used in Following this idea, we design the Rapid Prototyping
embedded systems due to their low-cost, low-power Platform as shown in Fig. 2 We define the interface
consumption and high performance. An ARM based ICB between the platform and the embedded
embedded processor is a highly integrated SOC processor core board that holds the minimal system of
including an ARM core with a variety of different the target embedded system. The interface IPB
system peripherals. Many arm based embedded
44
VCCC‘08
45
VCCC‘08
46
VCCC‘08
47
VCCC‘08
REFERENCES
Boundary scan
PC o/p ports
register [1] S. Trimberger, “A Reprogrammable Gate Array
and Applications,”Proc. IEEE, Vol. 81, No. 7, July
TDI data in
0101
1993, pp. 1030-1041.
IR
[2] Hauck, S, "The roles of FPGA’S in
48
NCVCCC-
‘08
Abstract-This paper presents the implementation techniques to reduce power consumption of FIR
of high throughput and low power FIR filtering IP filters. The authors in [l] utilize differential
cores. Multiple datapaths are utilized for high coefficients method (DCM) which involves using
throughput and low power is achieved through various orders of differences between coefficients
coefficient segmentation, block processing and along with stored intermediate results rather than
combined segmentation and block processing using the coefficients themselves directly for
algorithms.Also coefficient reduction algorithm is computing the partial products in the FIR equation.
proposed for modifying values and the number of To minimize the overhead while retaining
non-zero coefficients used to represent the FIR the benefit of DCM, differential coefficient and input
digital pulse shaping filter response. With this method (DCIM) [2] and decorrelating (DECOR) [3]
algorithm, the FIR filter frequency and phase have been proposed. Another approach used in [4] is
response can be represented with a minimum to optimize word-lengths of input/output data samples
number of non-zero coefficients. Therefore, and coefficient values. This involves using a general
reducing the arithmetic complexity needed to get search based methodology, which is based on
the filter output. Consequently, the system statistical precision analysis and the incorporation of
characteristic i.e. power consumption, area usage, cost/performance/power measures into an objective
and processing time are also reduced The paper function through word-length parameterization. In
presents the complete architectural [5], Mehendale et al. presents an algorithm for
implementation of these algorithms for high optimizing the coefficients of an FIR filter. So as to
performance applications. Finally this FIR filter is reduce power consumption in its implementation on a
designed and implemented in FPGA. programmable digital signal processor.
This paper presents the implementation of
1.INTRODUCTION high throughput and low power FIR filtering
Intellectual Property (IP) cores. This paper shows
One of the fastest growing areas in the computing their implementation for increased throughput as well
industry is the provision of high throughput DSP as low power applications, through employing
systems in a portable form. With the advent of SoC multiple datapaths. The paper studies the impact of
technology, Due to the intensive use of FIR filters in parameterization in terms of datapaths parallelization
video and communication systems, high performance on the power /speed/ area performance of these
in speed, area and power consumption is demanded. algorithms.
Basically, digital filters are used to modify the
characteristic of signals in time and frequency domain II.GENERAL BACKGROUND
and have been recognized as primary digital signal
processing operations For high performance Finite Impulse Response filters have been used in
low power applications, there is a continuous signal processing as ghost cancellation and channel
demand for DSP cores, which provide high equalization . FIR filtering of which the output is
throughput while minimizing power consumption. described in Equation 1 is realized by a large number
Recently, more and more traditional applications and of adders, multipliers and delay elements.
functionalities have been targeted to palm-sized
devices, such as Pocket PCs and camera-enabled
mobile phones with colorful screen. Consequently,
not only is there a demand of provision of high data
processing capability for multimedia and
communication purposes, but also the requirement of Where Y[n] is the filter output, X[n k]is input data,
power efficiency has been increased significantly. and h[k]is the filter coefficient. Direct form of a finite
Furthermore, power dissipation is becoming word length FIR filter generally begins with rounding
a crucial factor in the realization of parallel mode FIR or truncating the optimum infinite precision
filters. There is increasing number of published
49
NCVCCC-
‘08
50
NCVCCC-
‘08
B. Data Block-Processing
The main objective of block processing is to
implement signal processing schemes with high
inherent parallelism . A number of researchers have
studied block processing methods for the
development of computationally efficient high order
recursive filters, which are less sensitive to
roundoff error and coefficient accuracy During
filtering, data samples in fixed size blocks, L, are
processed consecutively. This procedure reduces Fig.5.The block processing algorithm with 2 data
power consumption by decreasing the switching paths.
activity, by a factor depending on L, in the following:
(1) coefficient input of the multiplier, (2) data and
coefficient memory buses, (3) data and coefficient C. Combination Coefficient Segmentation and Block
address buses.Due to the successive change of both Processing Algorithm
coefficient and data samples at each clock cycle, there
is a high switching activity within the multiplier unit The architectures of coefficient segmentation
of the datapath. This high switching activity can be and block processing algorithms can he merged
reduced significantly, if the coefficient input of the together. This will reduce the switching activity
multiplier is kept unchanged and multiplied with a at both coefficient and data inputs of the multiplier
block of data samples. units within the data paths with only slight overhead
Once a block of data samples are processed, in area. The algorithm commences by processing the
then a new coefficient is obtained and multiplied with coefficient set through the segmentation algorithm.
a new block of data samples. However, this process The algorithm segments the coefficients into two
requires a set of accumulator registers corresponding primitive parts.
to the size of the data block size. The previous results The first part Sk ,is processed through a
have shown that a block size of 2 provides the best shifter and the remaining part mk is applied to the
51
NCVCCC-
‘08
multiplier input. The algorithm performs the Registers R0,R1,….RL-1 respectively. This will form
segmentation through selecting a value of sk which the first block of data samples.
leaves mk to be smallest positive number. This results 5.Apply R0 to both the multiplier and shifter
in a significant reduction in the amount of switched units .Add their results and the content of
capacitance. The resulting sk and mk values are then accumulator Acc0 together and store the final result
stored in the memory for filtering operations. The into accumulator ACC0.Repeat this for the
filtering operation commences by fetching sk and mk remaining data registers R1 to RL-1, this
values and applying these to both shifter and time using accumulators ACC1 to ACCL-1
multiplier inputs respectively. Next, a block of L data respectively.
samples(x0,x1,…xL-1) are fetched from the data 6.Get the multiplier part ,m(N-2),and the shifter part
memory and stored in the register file. s(N-2) of the next coefficient ,h(N-2) and apply these
This is followed by applying the first data to the multiplier and shifter inputs respectively.
sample x0,in the register file to both shifter and 7.Update the data block formed in step (4) by
multiplier units. The resulting values from both shifter getting the next data sample, x[n-(N-L-1)] and
and multiplier units are then summed together and the storing it in data register R0 overwriting the oldest
final result is added to the first accumulator. The data sample in the block.
process is repeated for all other data samples. The 8.Process the new data block as in step(5)
contents of the register file are updated with the However, start processing with R1, followed by
addition of a single new data entry which will replace R2…RL-1,R0 in a circular manner. During this
the first entry in the previous cycle. procedure use accumulators in the same order as
This procedure reduces the switching activity data registers
at coefficient inputs of the multiplier, since the same 9.Process the remaining multiplier and shifter parts
coefficient is used in all data samples in the block. In as in steps(6) to(8).
addition less memory access to both data and 10.Get the first block of filter outputs y(n),y(n-
coefficient memories are required since coefficient 1)….y(n-L)from ACC0,ACC1..ACCL-1
and data samples are obtained through internal 11.Increment n by L and repeat steps(1) to (10) to
registers. obtain the next block of filter outputs.
52
NCVCCC-
‘08
REFERENCES
53
NCVCCC-
‘08
Abstract This paper presents a design and control, ultrasonic transducer control, electro-static
implementation of stacked MOSFET circuit for device control, piezoelectric positioning, and many
output drivers of low voltage CMOS technologies. others. Existing methods for handling such high-
A monolithic implementation of series connected voltage switching can be divided into two general
MOSFETs for high voltage switching is presented. categories: device techniques and circuit techniques
Using a single low voltage control signal to trigger [1]. Device techniques include lateral double-diffused
the bottom MOSFET in the series stack, a voltage MOSFETs (LDMOSFETs) and mixed voltage
division across parasitic capacitances in the circuit fabrication processes. These methods increase the
is used to turn on the entire stack of devices. individual transistor’s breakdown voltage by
Voltage division provides both static and dynamic modifying the device layout. In the past
voltage balancing, preventing any device in the LDMOSFETs have been used to achieve extremely
circuit from exceeding its nominal operating high operating voltages in standard CMOS
voltage. This circuit, termed the stacked technologies [2]. This is accomplished by adding a
MOSFET, is n x scalable, allowing for on-die lightly-doped drift region between the drain and gate
control of voltages that are n x the fabrication channel. The layout of such devices is unique to each
processes rated operating voltages. The governing process they are implemented in, and as such, are
equations for this circuit are derived and reliable very labor and cost intensive. Further, because
operation is demonstrated through simulation and modern fabrication processes utilizing thinner gate
experimental implementation in a 180 nm SOI oxides and reduced overall process geometries, new
CMOS process. LDMOSFETs are becoming less effective. Even
circular LDMOSFETs, the most effective device
Key Words CMOS integrated circuits, high shape for mitigating high e-field stress, are less
voltage techniques, Buffer circuits, input/output effective than they once were.
(I/O). Mixed voltage fabrication processes essentially
I.INTRODUCTION take a step back in time, allowing for the fabrication
of large geometry, thick oxide devices on the same
High-voltage switching in current MOSFET substrate as sub-micrometer geometry devices [3].
technology is becoming increasingly difficult due to Although effective, these processes are more
the decreasing gate-oxide thickness. Devices with expensive due to their added mask and process steps,
reduced gate-oxide are and still exhibit an upper limit on operating voltage.
Optimized for speed, power consumption and size of Further, because more die-space per transistor is
the device. Stacked MOSFETs in combination with required, the performance per area is relatively poor.
level shifters are one circuit technique to switch high- Circuit techniques used for on-die high-voltage
voltages and overcome the decreased gate-oxide control include level shifters and monolithic high-
break down. The Stacked MOSFETs enables rail-to- voltage input/output (I/O) drivers. Taking many
rail high voltage switching. On-Die high-voltage different forms, level-shifters work by upwardly
switching (where high-voltage is defined as any translating low-voltage signals, such that the voltage
voltage greater than the rated operating voltage of the across any two terminals of each device in the circuit
CMOS fabrication process being used) is a system- never exceeds the rated operating voltage [1],[4]. In
on-chip (SOC) design challenge that is becoming ever doing this, an output voltage that is greater than the
more problematic. Such difficulty is a direct result of individual transistor breakdown voltages can be
the reduced breakdown voltages that have arisen from controlled. However, the magnitude of the output
the deep sub-micrometer and nanometer scaling of voltage swing is still limited by the individual
MOSFET geometries. While these low-voltage transistor breakdown. This requires an output signal
processes are optimized for minimum power that does not operate rail-to-rail. As such, these level-
consumption, high speed, and maximum integration shifters are only suitable for applications where the
density, they may not meet the requirements of addition of off-die high-voltage transistors is possible.
system applications where high-voltage capabilities Monolithic high-voltage I/O drivers are a relatively
are needed. Such applications of on-die high-voltage new technique for the on-die switching of voltages
switching include MEMS device control, monolithic greater than the rated operating voltage of the process
power converter switching, high-voltage aperture [6]. These circuits enable high-voltage switching
using only the native low-voltage FETs of the
54
NCVCCC-
‘08
fabrication process. Reference [5] reports a circuit derivation, equations representing an -device Stacked
that switches 2(1/2) x the rated voltage of the process. MOSFET will be generated.
While this topology is theoretically n x scalable, it
requires an excessive number of devices in the signal
path, not only taking up a large amount of die area,
but also increasing the on-resistance. Ref. [6] reports
achieving 3x the rated voltage of the process using
only three devices in the signal path. This minimizes
the on-resistance, but the design is not n x scalable.
In this paper, we present a circuit technique for on-
die high-voltage switching that uses a minimum
number of devices in the signal path while remaining
n x scalable. Termed the stacked MOSFET, this
circuit uses the native low-voltage FETs of the
fabrication process to switch voltages n x grater than
the rated breakdown voltage of the process used. That
is, this circuit is scalable to arbitrarily high output
voltages, limited only by the substrate breakdown
voltage. Fig. 1. Schematic of two stacked MOSFET.
The goal of this paper is to show that the stacked
MOSFET is scalable in integrated circuits [7]. The A. Derivation for a Two-Device stacked MOSFET
topology is not changed from [7], but the governing
equations are rederived here such that the design
variables are those that are commonly available in any
IC process ([7] derived the governing equations based
on design variables commonly available for discrete
high voltage
power MOSFETs). First, an overview of the Stacked
MOSFET topology is presented, along with a
derivation of its governing equations. This discussion
focuses on the specific realization of a two-MOSFET
stack, with a generalization given for extending to an
-MOSFET stack. Second, circuit simulation results
are presented, giving validity to our mathematical
model. Third, and finally, experimental results are
presented, revealing excellent correlation between the
analytic models, simulation results, and measured
results.
II.DERIVATION OF CHARACTRISTICS
EQUATIONS
Fig. 2. Two-device Stacked MOSFET, including
Fig. 1 shows the topology of a two-device Stacked parasitic capacitances, with definition of notation
MOSFET. By placing MOSFETs in series and used in derivation.
equally dividing the desired high-voltage across them,
for the entire switching period, reliable high-voltage The triggering of the Stacked MOSFET is
control can be achieved. In switching applications this accomplished through capacitive voltage division. As
circuit will act as a single MOSFET switch, being shown in Fig. 2, there exists an inherent parasitic
controlled by a low-voltage logic level. Hess and capacitance Cp between the gate and source of M2.
Baker implemented this circuit using discrete power This capacitance, along with a capacitor C2 inserted
MOSFETs. As such, their characterization of the in the gate leg of M2 will set the value of Vgs2 that
circuit was well suited to the discrete design process, turns on M2.
utilizing spec sheet parameters such as MOSFET Consider the circuit in Fig. 2 with an initial
input and output capacitance. To realize this circuit condition of both devices turned off. If the resisters
concept in IC technology the governing equations are sized such that
need to be recharacterized for IC design parameters.
The following is a derivation of the governing Rbias << R1+ R2
equations for the two-device Stacked MOSFET based (1)
on conservation of charge principles. From this
55
NCVCCC-
‘08
Then the output voltage rise to Vdd . (Note that this charge on the parallel combination of the two
assumes that the off state leakage current through M1 capacitors. By the conservation of charge, the total
and M2 is much less than the current through R1 and charge on the parallel combination of C2 and Cp will
R2.) Since M1 is off, the node Vdrain is free to take be the sum of their initial charges
on the value dictated by the voltage divider of R1 and Qtotal= Q2 (initial) + Qp (initial)
R2. (7)
If R1 and R2 are sized equally then Where
Q2(initial)=C2(Vdd/2-Vdiode)
Vdrain = V2dd Qp(initial)=Cp(-Vdiode).
(8)
(2)
This voltage is grater than Vg 2 (the reason for this The resulting gate-source voltage will be
will become more apparent later in the derivation),
Q total
and causes the diode to be forward biased. The V gs 2 ( final ) =
resulting voltage at the gate of M2 will be Q parallel
Vg 2 = Vdrain − Vdiode = V2dd − Vdiode
(9)
Substituting in (8)
(3)
V gs2 =
C 2 ( vdd/2 - vdiode )+ C p (− vdiode )
where Vdiode is the forward voltage across the diode. C 2 + Cp
Equation (2) and (3) dictate a Vg 2 of (10)
This simplifies to
Vg 2 = - Vdiode
Vgs2 = C2 ⎛⎜ Vdd −Vdiode⎞⎟ + C2 (−Vdiode)
(4) C2+Cp ⎜⎝ 2 ⎠ C2+Cp
⎟
(11)
keeping M2 off. As such, the off condition, with the Solving (11) for C2, an expression for setting the
output voltage at Vdd and Vdrain at Vdd/2 exhibits desired gate-source voltage to turn on M2 can be
static voltage balancing and results in this condition found as
being safely held.
When Vin rises high, M1 is turned on, pulling ⎛ ⎞
Vdrain to ground. This reverse biases the diode, ⎜ ⎟
C2 = C ⎜ V gs +V diode ⎟
v dd − (V gs +V diode )
leaving the gate-source voltage of M2 to be set by the p⎜ ⎟
capacitive voltage divider of C2 and Cp. Cp ⎜ ⎟
⎝ 2 ⎠
represents the lumped total parasitic capacitance
across the gate-source of M2 and can be solved for as (12)
M2 will then be held on as long as the charge on C2
maintains a voltage greater than the threshold of M2.
C C C C C E C E
p = diode+ gs + gb + gd(1- v1) + ds(1- v2) This implies an inherent low-frequency limitation,
(5) due to on-state leakage current dissipating the charge
on C2. If frequencies of operation less than those
Where Cdiode is the reverse bias junction capacitance allowed by the given value of C2 are desired, C2 and
of the diode and Cgs, Cgb, Cgd, and Cds are the Cp can be simultaneously scaled up according to the
corresponding MOSFET junction capacitances. Ev1 ratio
and Ev2 are used to approximate the Miller
⎛ ⎞
capacitance resulting from Cgd and Cds, respectively,
C ⎜ V + V ⎟
⎜ ⎟
2 gs diode
and are defined as
=
Cp ⎜ v dd − (V gs + V diode ) ⎟
v dd ⎜ ⎟
E v1 =
∆ V ds = −
2 ⎝ 2 ⎠
∆ V gs V gs +V diode (13)
(6a) Because MOSFETs are majority charge carrier
v dd +V gs +V diode devices and each device in this circuit is capacitively
Ev 2 = ∆V dg = −
2 coupled to the next, it is expected that all devices in
∆ V gs V gs +V diode the stack will turn on and turn off together,
(6b) maintaining a dynamic voltage balancing. This will be
At turn-on C2 and Cp are in parallel, resulting in the experimentally verified in the following.
final gate-source voltage being dictated by the total
56
NCVCCC-
‘08
+
Cp(i)
(− Vdiode)
C(i) + Cp(i)
(14)
⎛ ⎞
C(i) = C ⎜
p(i)⎜
V V
gs+ diode ⎟
⎜ (i-1)⋅(Vdd/n)-( gs+ diode⎟⎟
V V
⎝ ⎠ Fig. 3. Generalized n-device Stacked MOSFET.
(15)
The (i-1) (Vdd/n) term in the denominator of (15)
will increase for devices higher in stack, and result in
a value for C(i) that is less than C(i-1). This reduction III. DESIGN AND SIMULATION
in C(i) implies that the ratio of die-space occupied to
output voltage decreases for higher voltages. In other Utilizing the previous equations, a two-device
words, less overhead space is required for devices at Stacked MOSFET has been designed and simulated
the top of the stack than at the bottom. for implementation in Honeywell’s 0.35- m PD SOI
As with the two-device Stacked MOSFET, if CMOS process. This process is rated for 5-V
frequencies of operation less than those allowed by operation. The models used are the BSIMSOI models
the given value of C(i) are desired, C(i) and Cp(i) can provided by Honeywell. Also, to illustrate the validity
be simultaneously scaled up according to the ratios as of the design equations for the general –device
follows: Stacked MOSFET, simulation results for an eight-
device Stacked MOSFET are included.
C (i) ⎛ ⎞
⎜
= ⎜
V V
gs + diode ⎟ TWO-DEVICE STACKED MOSFET
⎜ (i -1) ⋅ (Vdd/n) - ( gs + ⎟
C p(i) ⎝ V V diode ⎟⎠
Consider the two-device Stacked MOSFET, shown
(16)
in Fig. 1. If each FET used in the stack is sized to
have a W/L of 500 and a gate-source voltage of 4V,
then the parasitic capacitances, under the desired
operating conditions, can be extracted from the device
models as shown in Table I. This table also includes
the extracted diode junction capacitance at the
appropriate biasing conditions. Accordingly, can be
sized using (5) and (12) to be 14.6 pF. The simulated
drain voltages resulting from the previous design
values are shown in Fig. 4. The top trace is the drain
voltage for M2 and the lower trace is the drain
voltage for M1. Note that the voltages are evenly
distributed causing neither device to exceed its drain-
source breakdown voltage. The gate-source voltage
57
NCVCCC-
‘08
Fig.4. Drain voltages for two-device Stacked The previously simulated two-device Stacked
MOSFET operating with a 10v supply. MOSFET has been implemented in the same
TABLE I Honeywell 0.35- m PD SOI CMOS process. The
MODELED JUNCTION CAPACITANCES layout and test structure is shown in Fig. 9. In
implementing this circuit it is important to take into
Capacitance Extracted account any parasitics that are introduced in layout as
values well as in the test and measurement setup. All
Gate-Source 838.63 capacitances will affect the operation of the Stacked
Gate-Bulk 16.62 MOSFET. For this reason, good layout techniques,
Gate-Drain 52.03 coupled with post-layout parasitic simulation of the
Drain-Source 10.87 circuit, are critical. Further, realistic models of
Diode 9.96 capacitances and inductances introduced by probe
tips, bond wires, or other connections should be
considered.
Fig. 8 shows a drain voltage characteristic similar to
the simulation results shown in Fig. 4. This
characteristic results from the two-device Stacked
MOSFET being biased with a 10-V supply, operating
at 50 kHz. As predicted, these measurements show
that in the off state static voltage balancing is
achieved. This balancing ensures that each device is
supporting an even 5-V share of the 10-V output.
When the stack turns on, both devices turn on almost
simultaneously, pulling the output to ground.
As discussed previously, because the MOSFET is a
majority charge carrier device, and each device is
capacitively coupled to the next, all of the devices in
Fig.5. Gate-Source voltage for M2 in a Two-device the stack rise and fall together. This dynamic voltage
Stacked MOSFET operating with a 10-v supply. sharing is what allows for each component in the
circuit to operate very near the edge of its rating.
58
NCVCCC-
‘08
59
NCVCCC-
‘08
60
NCVCCC-
‘08
(88%). A test set created by a conventional ATPG III. CALCULATION OF THE OUTPUT
tool aiming at single detection may have up to 6% of DEVIATION
stuck-at faults detected only once, and up to 10% of Output deviation is the metric which tells
stuck-at faults detected only once or twice. This may how much the output deviates from the expected
result in inadequate coverage of node-to-node value.
bridging defects. The experimental results show that We use ISCAS-85 benchmark circuits for
in general, n-detection tests can effectively improve calculation of the output deviation methods.
the diagnostic algorithm’s ability to locate the real
fault locations even though use the single-stuck-at-
fault based diagnosis algorithm.
A. Fault model
Using the method used in[7],we calculate the output B. Importance of test selection
probability of the above gate. with the input
combination as 00,the output probabilities Therefore, test selection is necessary to
pc0=0.1,pc1=0.9.pc0 is the probability of output being 0 ensure that the most effective test patterns are chosen
and pc1 is the probability of output being 1 for the from large test sets during time constrained and high-
corresponding input combination. volume production testing. If highly effective test
61
NCVCCC-
‘08
patterns are applied first, a defective chip can fail simulation in future. We are also implementing a
earlier, further reducing test application time bridging fault simulator to calculate the coverage of
environment. Moreover, test compression is not single non-feedback, zero-resistance bridging faults
effective if the patterns are delivered without a (sNFBFs). To eliminate any bias in the comparison of
significant fraction of don't-care bits. In such cases, different methods for test set selection, we use two
test set selection can be a practical method to reduce arbitrarily chosen sets of confidence level vectors for
test time and test data volume. our experiments.One more method to evaluate the test
In this paper, we use the output deviation patterns selected using output deviation method is
metric for test selection. To evaluate the quality of the Gate exhaustive (GE) testing metrics [8]which are
selected test patterns, we determine the coverage that computed using an inhouse simulation tool based on
they single non-feedback zero-resistance bridging the fault simulation program FSIM [9].A CMOS
faults (s-NFBFs),and stuck open faults. Experimental combinational circuit under the presence of a SOP
results show that patterns selected fault behaves like a sequential circuit[10]. In CMOS
using the probabilistic fault model and output circuits, the traditional line stuck-at fault model does
deviations provide higher fault coverage than patterns not represent the behaviors of stuck-open (SOP) faults
selected using other methods. properly. A sequence of two test patterns is required
to detect a SOP fault. SOPRANO[11] an efficient
V. ALGORITHMS automatic test pattern generator for stuck-open faults
in cmos combinational circuits. We also apply the
Using theory of output deviation method test output deviation algorithms to the stuck open faults to
pattern selection algorithm is done. In that selection of evaluate the quality of selected test patterns in a high
small number of test patterns T11 from a large test set volume production testing environment. We are
called T1. To generate T1, we run ATALANTA a currently concentrating to get the tools to evaluate the
single stuck- at fault ATPG tool. The ATPG tool stuck open faults.
generates n-detection test patterns for each single
stuck at fault. Each time a test pattern is selected, we VII.CONCLUSION
perform fault simulation and drop those faults that are Evaluation of pattern grading using the fault coverage
already detected n times. The set T1 is randomly for stuck open faults and non feedback bridging faults
reordered before being provided as input to is being done to demonstrate the effectiveness of
Procedure1.The flow chart for the procedure is shown output deviation as a metric to model the quality of
in fig.3. test patterns. This proves especially useful in high
Then we sort T1 such that test patterns with volume and time constraint production testing
high deviations can be selected earlier than test environment . The work is on progress and final
patterns with low deviations. For each primary output results will be exhibited during the time of
(PO), all test patterns in T1 are sorted in descending presentation.
order based on their output deviations ,we get test set
T2. REFERENCES
Using the sorted test is applied to [I] M.Abramovici, M.A.Breuer, A.D.Friedman,
procedure1, therefore we get the optimized n- “Digital
detection test set that normally contains smaller Systems Testing and Testable Design”,1990,
number of test patterns and achieves high defect Computer
coverage. Science Press,pp.94-95.
In procedure2 selection of test patterns with [2]. Y. Tian, M. Mercer, W. Shi, and M. Grimaila,
low output deviations are selected earlier than test “An optimal test pattern selection method to improve
patterns with high output deviations. But this the defect coverage” in Proc. ITC, 2005.
procedure takes one more parameter called [3]. “Multiple Fault Diagnosis Using n-Detection
threshold[7]. Tests”, Zhiyuan Wang, Malgorzata Marek-
Sadowska1 Kun-Han Tsai2 Janusz Rajski2,
Proceedings of the 21st International Conference on
VI.EXPERIMENTAL RESULTS
Computer Design (ICCD’03) [4]. E.J. McCluskey, C.-
W. Tseng,“Stuck-fault tests vs. actual defects”, Proc.
The work is in progress .All experiments are of Int'l Test Conference, 2000, pp. 336 -342.
being performed on a Pentium 4 PC running Linux [5] Z. Wang, K. Chakrabarty, and M. Goessel,”Test
with a 2.6 Ghz processor and 1G memory. The set enrichment using a probabilistic fault model and
program to compute output deviations is to be the theory of output deviations”in Proc.DATE Conf.,
implemented using C. Atalanta and its associated 2006, pp. 1275.1280. [6] K. P. Parker and E. J.
simulation engine are used to generate n-detection test McCluskey,”.Probablistic treatment of general
sets. We have written the fault simulation program in combinational networks”, IEEE Trans. Computers,
c language so that we can add constrains in the vol. C-24, pp. 668.670, Jun. 1975.
62
NCVCCC-
‘08
63
NCVCCC-
‘08
Abstract: In today’s world Digital to Analog Fig 1 illustrates an 8-bit R-2R ladder. Starting at the
converters are used in the wide range of right end of the network, notice that the resistance
applications like wireless networking (WLAN, looking to the right of any node to ground is 2R. The
voice/data communication and Bluetooth), wired digital input determines whether each resistor is
communication (WAN and LAN), and consumer switched to ground (non inverting input) or to the
electronics (DVD, MP3, digital cameras, video inverting input of the op-amp. Each node voltage is
games, and so on). Therefore the DAC unit must related to VREF, by a binary-weighted relationship
be fault free, and there is a need for a system to caused by the voltage division of the ladder network.
detect the fault occurrence. This paper deals with The total current flowing from VREF is constant, since
designing an efficient system to detect and classify the potential at the bottom of each switched resistor is
the fault in the DAC unit. R-2R DAC has been always zero volts (either ground or virtual ground).
used for analysis and the back propagation neural Therefore, the node voltages will remain constant for
network algorithms are used in classifying the any value of the digital input.
faults. Efficiency of 77% is achieved in classifying
the fault by implementing three back propagation Fig 1 Schematic of R-2R digital to analog
neural network algorithms. converter
Vref
R R R R R R R R R
I. INTRODUCTION
2R 2R 2R 2R 2R 2R 2R 2R 2R
There are many challenges for mixed signal design to
D7 D6 D5 D4 D3 D2 D1 D0
be adaptable for SOC implementation. The major R
considerations in designing these mixed signal
circuits for the complete SOC are high speed, low
power, and low voltage. Both cost and high speed
out
operation are limitations of the complete SOC.
Accordingly, to remove the speed gap between a
processor and circuits in the complete SOC The output voltage, out, is dependent on currents
implementation, architectures must not only be fast flowing through the feedback resistor, RF(=R), such
but also cheap. The next challenge is low power that
consumption. In the portable device market, reducing out = -iTOT . RF (1)
the power consumption is one of the main issues. Low Where iTOT is the sum of the currents selected by the
voltage operation is one of the difficult challenges in digital input by
the mixed-signal ICs. Above all the circuits designed N −1
VREF
must be fault free. If any fault occurs then it must be itot = ∑ Dk N −k
(2)
detected. Therefore the fault classification is one of k =0 2 * 2R
the major needs in mixed signal ICs. This paper aims Where Dk is the k-th bit of the input word with a
at implementing efficient fault classification in DAC value that is either a 1 or a 0.The voltage scaling DAC
unit using neural network. structure is very regular and thus well suited for MOS
technology. An advantage of
this architecture is that it guarantees monotonicity, for
II. R-2R DAC the voltage at each tap cannot be less than the tap
below. The area required for the voltage scaling DAC
is large if the number of bits is eight or more. Also,
R-2R D/A Converter work under the principle of
the conversion speed of the converter will be sensitive
voltage division and this configuration consist of a
to parasitic capacitance at each of its internal nodes.
network of resistors alternating in value of R and 2R.
6464
NCVCCC-
‘08
FIG 2 OUTPUT RESPONSE OF 8-BIT R-2R DAC Offset error 0.002484 LSB
The output of R-2R DAC for the 8-bit pattern counter 1) Gate-to-Source Short (GSS)
input is shown below in Fig 2. The output is very 2) Gate-to-Drain Short (GDS)
linear, glitch free and rises to the supply voltage of 3) Drain-to-Source Short (DSS)
2.5 V within 256 µs. 4) Resistor Short (RS)
5) Capacitance Short (CS)
The INL and DNL curves for the fault free case are 6) Gate Open (GO)
plotted using MATLAB and are shown in Fig 3. The 7) Drain Open (DO)
maximum INL and DNL are found to be 0.038LSB 8) Source Open (SO)
and -0.012LSB respectively. 9) Resistor Open (RO)
FIG 3.1 INL CURVES OF R-2R DAC The structural faults are illustrated in fig 4. The low
resistance (1 ) and high resistance (10M ) are
frequently used to simulate structural faults. Restated,
a transistor short is modeled using low resistance
(1 ) between the shorted terminals, and an open
transistor is modeled as a large resistance (10M ) in
series with the open terminals.
6565
NCVCCC-
‘08
(A)TRAINBFG
Gate Open Drain Open Source Open Trainbfg can train any network as long as its weight,
net input, and transfer functions have derivative
functions.
Back propagation is used to calculate derivatives of
performance with respect to the weight and bias
D
10Mohm variables X. Each variable is adjusted according to the
10Mohm G following:
G X = X + a*dX; (4)
10Mohm where dX is the search direction. The parameter a is
S 10Mohm selected to minimize the performance along the search
direction.
The first search direction is the negative of the
IV. MONTE CARLO ANALYSIS gradient of performance. In succeeding iterations the
search direction is computed according to the
All types of faults are introduced in each transistor following formula:
and resistor and Monte Carlo simulation is done for dX = -H\gX; (5)
each case. The Monte Carlo analysis in T-Spice is where gX is the gradient and H is an approximate
used to perform simulation by varying the value of Hessian matrix.
the threshold voltage(parameter).The iteration value
of the Monte Carlo analysis specifies the number of (B)TRAINCGB
times the file should be run by varying the threshold Traincgb can train any network as long as its weight,
value. Syntax in T-Spice to invoke the Monte Carlo net input, and transfer functions have derivative
Analysis: functions. Back propagation is used to calculate
.param VTHO_N=unif(0.3694291,.05,2) derivatives of performance with respect to the weight
VTHO_P=unif(0.3944719,.05,2) (3) and bias variables X. Each variable is adjusted
The result thus obtained is stored in the spread sheet according to the following:
for further fault classification using neural network X = X + a*dX; (6)
where dX is the search direction. The parameter a is
V. FAULT CLASSIFICATION USING BACK selected to minimize the performance along the search
PROPAGATION NEURAL NETWORK direction. The first search direction is the negative of
the gradient of performance. In succeeding iterations
Any function from input to output can be the search direction is computed from the new
implemented as a three-layer neural network. In order gradient and the previous search direction according
to train a neural network to perform some task, to the formula:
weights and bias value must be adjusted at each dX = -gX + dX_old*Z; (7)
iteration of each unit in such a way that the error where gX is the gradient. The parameter Z can be
between the desired output and the actual output is computed in several different ways.
reduced. This process requires that the neural network
compute the error derivative of the weights (EW). In (C)TRAINOSS
other words, it must calculate how the error changes Trainoss can train any network as long as its weight,
as each weight is increased or decreased slightly. The net input, and transfer functions have derivative
back propagation algorithm is the most widely used functions. Back propagation is used to calculate
method for determining the EW. The goal now is to derivatives of performance with respect to the weight
set the interconnection weights based on the training
6666
NCVCCC-
‘08
Epochs for Training 100 to 1500 Epochs Fig7: performance graph for trainoss algorithm with
parameters values learning rate=0.03, hidden
layer = 8
6767
NCVCCC-
‘08
Fig9 : performance graph for traincgp algorithm with Fig12 : performance graph for trainbfg algorithm with
parameters values learning rate=0.01, epochs= 1000 parameters values no of hidden layers= 8,
epochs= 1000
6868
NCVCCC-
‘08
VII. CONCLUSION
REFERENCES
1. Chi Hung Lin, Klass Bult., “A 10-b 500-
MSamples CMOS DAC in 0.6m2 “, IEEE
J.Solid State Circuit, pp. 1948-1958, Dec 1998.
2. Swapna Banerjee et al., “A 10-bit 80-MSPS
2.5-V 27.65-mW 0.185mm2 Segmented
Current Steering CMOS DAC”,18th
International Conference on VLSI Design, pp.
319-322, Jan.2005,.
3. Jan M. Rabaey, ”Digital integrated circuits: a
design perspective”, Prentice Hall of India Pvt
Ltd, new delhi, 2002.
4. Morris Mano M.”Digital Design-third edition,”
Prentice Hall of India Pvt Ltd, new delhi, 2000.
5. S N Sivanandam, S Sumathi, S N
Deepa,”Introduction to Neural Network using
MATLAB 6.0”,2006.
6. Grzechca. D, Rutkowski. J, “New Concept to
Analog Fault Diagnosis by Creating Two
Fuzzy-Neural Dictionaries Test”, IEEE
MELCON, May 2004.
6969
NCVCCC-
‘08
Abstract- Path delay testing of FPGAs is especially This path delay testing method is applicable to
important since path delay faults can render an FPGA’s in which the basic logic elements are
otherwise fault-free FPGA unusable for a given implemented by LUTs. The goal of this work is to test
design layout. In this approach, select a set of a set of paths, called target paths, to determine
paths in FPGA based circuits that are tested in whether the maximum delay along any of them
same test configuration. Each path is tested for all exceeds the clock period of the circuit. These paths
combinations of signal inversions along the path are selected based on static timing analysis using
length. Each configuration consists of a sequence nominal delay values and actual routing information.
generator, response analyzer and circuitry for Circuitry for applying test patterns and observing
controlling inversions along tested paths, all of results is configured using parts of the FPGA that are
which are formed from FPGA resources not not under test.
currently under test. The goal is to determined by
testing whether the delay along any of the path in INTRODUCTION TO APPROACH
the test exists the clock period.Two algorithms are The delay of a path segment usually depends on
presented for target path partitioning to determine the direction of signal transition in it. The direction of
the number of required test configurations. Test the signal transition in any segment is determined by
circuitry associated with these methods is also that of the transition at the source and the inversions
described. along the partial path leading to the particular
segment. A test to determine whether the maximum
Index terms- Design automation, Field delay along a path is greater than the clock period
Programmable Gate Arrays ,Programmable logic must propagate a transition along the path and
devices,testing. produce a combination of side-input values that
maximizes the path delay. This approach is not
I.INTRODUCTION usually feasible because of the difficulty of
determining the inversions that maximize the path
This paper is concerned with testing paths in delay and the necessary primary input values to
lookup-table (LUT) based FPGAs after they have produce them. Instead, we propose to test each target
been routed. While this may be regarded as user path for all combinations of inversions along it,
testing , we are considering an environment in which guaranteeing that the worst case will also be included.
a large number of manufactured FPGA devices Although the number of combinations is
implementing a specific design are to be tested to exponential in the number of LUTs along the path,
ensure correct operation at the specified clock speed. the method is feasible because application of each test
It is thus akin to manufacturing tests in that the time requires only a few cycles of the rated clock.
needed for testing is important. Ideally, we would like However, the results may be pessimistic in that a path
to verify that the actual delay of every path between that fails a test may operate correctly in the actual
flip-flops is less than the design clock period. Since circuit, because the combination of inversions in the
the number of paths in most practical circuits is very failing test may not occur during normal operation.
large, testing must be limited to a smaller set of paths. The method of testing a single path in a circuit is
Testing a set of paths whose computed delay is within reprograms the FPGA to isolate each target path from
a small percentage of the clock period may be the rest of the circuit and make inversions along the
sufficient in most cases. Thus, our goal is to path controllable by an on-chip test controller. Every
determine by testing whether the delay along any of LUT along the path is re-programmed based on its
the paths in the set exceeds the clock period. original function. If it is positive unate in the on-path
input, the LUT output is made equal to the on-path
II.BASIC APPROACH input independent of its side inputs. Similarly,
negative unate functions are replaced by inverters. If
the original function is binate in the on-path input, the
LUT is re-programmed to implement the exclusive-
OR (XOR) of the on-path input and one of its side-
inputs, which we shall call its controlling sideinput.
70
NCVCCC-
‘08
As mentioned earlier,this change of functionality does the s steady at zero for the preceding three clock
not affect the delay of the path under test because the cycles. A test for the falling transition starts at
delay through an LUT is unaffected by the function 6T,with the input steady at one for the preceding three
implemented. Inversions along the path are controlled clock cycles. Results are sampled at d at time 4T(for
by the signal values on the controlling side inputs.For rising edge s transition)and 7T (for falling edge s
each combination of values on the controlling side transition),respectively.Thus,both rising and falling
inputs we apply a signal transition at the source of the transitions are applied at the source for each
path and observe the signal value at the destination combination of inversions in time 6T. As the falling
after one clock period.The absence of a signal transition is applied at 6T,the enable input E of the
transition will indicate that the delay along the tested counter is set to 1.This action starts a state
pathexceeds the clock period for the particular (counter)change at 7T to test the path for the next
combination of inversions. combination of inversions .A counter change at this
The basic method described above can be time point allows2T of settling time before the
implemented by the circuitry shown in Fig. 1, following transition occurs at the source s.By
consisting of a sequence generator, a response ensuring that the counter reaches its final value within
analyzer and a counter, that generates all Tand propagates to the path destination d within an
combinations of values in some arbitrary order. A additional T,d is ensured to be stable before the
linear feedback shift register modified to feedback following source transition. Thus, the destination will
shift register modified to include the all-0’s output reach the correct stable value corresponding to the
may be used as the counter.The controlling side new combination of inversions if no path from the
inputs are connected to the counter.The controller and counter to the destination has a delay greater than
the circuitry for applying tests and observing results 2T.This delay explains the need for a 3T period
are also formed during configuration in parts of the betweens transitions (1T to perform the test,1T for
FPGA that do not affect the behavior of the path(s) possible counter state changes ,and 1T for subsequent
under test. be used as the counter.The controller and propagation of the counter changes to d).
the circuitry for applying tests and observing results
are also formed during configuration in parts of the III. TEST STRATEGY
FPGA that do not affect the behavior of the path(s)
under test. The method described in the preceding section
The sequence generator produces a sequence of requires the test control circuitry to be reconfigured
alternating zeros and ones, with period equal to for every path to be tested. The total time for testing a
6T,where T is the operational clock period. The set of target paths in a circuit consists of the test
response analyzer checks for an output transition for application time and the reconfiguration time. Our
every test, and sets an error flip-flop if no transition is goal is to reduce both components of the total time for
observed at the end of a test.The flip-flop is reset only testing a specified set of paths. Since the time needed
at the beginning of the test session ,and will indicate for configuring the test structure is usually larger than
an error if and only if no transition is produced in that for applying test patterns generated on chip we
some test. The counter has as many bits as the number shall focus on reducing the number of test
of binate LUTs along the tested path. configurations needed by testing as many paths as
The test for a path for each direction of signal possible in each configuration.
direction consists of two parts ,an initialization part . Two approaches to maximize the number of
and a propagation part ,each of duration 3T.A path is paths tested in a test configuration suggest
tested in time 6T by overlapping the initialization part themselves. First, we can try to select a set of target
of each test with the propagation part of the preceding paths that can be tested simultaneously. This will also
test.In addition the change of counter state for testing have the effect of reducing test application time.
a path for a new combination of inversions is also Secondly, we can try to select a set of simultaneously
done during the initialization phase of rising transition testable sets that can be tested in sequence with the
tests. same configuration. In this case, the number of
Fig.2 shows the timing of the signals during the simultaneously tested paths may have to be reduced
application of a test sequence. It can be seen from the so as to maximize the total number of paths tested
figure that the source s of the test path toggles every with the configuration. These two approaches will be
three clock cycles.For correct operation, the input elaborated in the next two sections,but first we define
transition occurring at 3T must reach the destination a few terms..
within time T(i.e., before 3T+T).On the following The simultaneous application of a single rising
clock edge at 3T+T,the result of the transition is or falling transition at the sources of one or more
clocked into the destination flip-flop at d.A change paths and observing the response at their destinations
must be observed at the destination for every test, is called a test. The set of tests for both rising and
otherwise a flip-flop is set to indicate an error. In falling transitions for all combinations of inversions
Fig.2,a test for the rising edge starts at time 3T,with along each path is called a test phase, or simply, a
71
NCVCCC-
‘08
72
NCVCCC-
‘08
73
NCVCCC-
‘08
The set of sessions may not be unique and depends on alternating 1’s and 0’s, a four-bit counter for
the choices made. Also note that not all sessions inversion control and a path selector. The path
obtained are multiphase sessions. Session 3, for selector is a shift register that produces an output
example, became a single-phase session because no sequence, 000, 100, 010, 001 for the 4-phase test of
path qualified as a side path of mDGKMz, which was the first session in our example.
arbitrarily chosen as the main path. No paths could be It can be verified from the figure that the main paths
concurrently tested with those in Sessions 4, 5, and 6 are selected when all selector outputs are 0. When any
because all paths to z had already been targeted. The output is 1, exactly one side path to each destination
sets of target paths obtained by Procedure 1 are such is selected. Input transitions are applied to all paths
that each 2-path LUT has a main path and a side path simultaneously, but propagate only up to the first 2-
through it. Thus, a single binary signal is sufficient to path LUT on all paths except the selected
select the input through which the signal is to be ones.Thus,only one path to each destination will have
propagated. Since the side path continues along the transitions along its entire length.since these paths are
main path, selecting the appropriate input at the 2- disjoint,no interaction can occur among them.
path LUT where it meets the main path is sufficient
for selecting the side path for testing. By using the
same path selection signal, one side path to each
destination can be selected simultaneously and tested
in parallel.
The FPGA configuration for a test session is
obtained by the following procedure:
PROCEDURE 2
1) Configure a sequence generator and connect its
output to the sources of all target paths of the session.
2) Configure a counter to control inversion parity,
with the number of bits equal to the largest number of
binate LUTs along any target path for the test session.
3) Configure a path selector to select the set of paths
IV. CONCLUSION
tested in each test phase, with the number of bits
equal to the
In this paper, a new approach to testing selected sets
number of side paths to a destination.
of paths in FPGA-based circuits is presented. Our
4) Designate a free input of each LUT as its inversion
approach tests these paths for all combinations of
controlinput p, and connect it to the counter output
inversions along them to guarantee that the maximum
corresponding
delays along the tested paths will not exceed the clock
to its level.
period during normal operation. While the test
5) Designate another free input of each 2-path LUT as
method requires reconfiguring the FPGA for testing,
its selector input s, and connect it to the path selector.
the tested paths use the same connection wires,
6) Modify the LUT of each 1-path LUT with on-path
multiplexers and internal logic connections as the
input a to implement f = a ⊕ p, if the original original circuit, ensuring the validity of the tests.
function is binate in a; otherwise f = a if it is positive Following testing, the test circuitry is removed from
or a if it is negative in a. the device and the original user circuit is programmed
7) Modify the LUT of each 2-path LUT to implement into the FPGA. Two methods have been presented for
reducing the number of test configurations needed for
f=
a given set of paths. In one method, called the single-
where a and b are on the main path and a side path,
phase method, paths are selected so that all paths in
respectively.
each configuration can be tested in parallel. The
The above modification for 2-path LUTs assumes
second method, called the multi-phase method,
that they are binate in both on-path inputs. If the
attempts to test the paths in a configuration with a
output of a 2-path LUT is unate in a or b or both, a
sequence of test phases, each of which tests a set of
slightly different function f is needed. For example, if
paths in parallel. Our experimental results with
the LUT output is binate in a and negative in b, the
benchmark circuits show that these methods are
modified LUT must implement viable, but the preferable method depends on the
Figure 4 shows the test structure for the circuit of Fig. circuit structure. The use of other criteria, such as the
3.Only target paths that were selected for the first test total time for configuration and test application for
session are shown, and all LUT functions are assumed each configuration, or better heuristics may lead to
to be binate in their inputs. The test circuitry consists more efficient testing with the proposed approach.
of a sequence generator that produces a sequence of
74
NCVCCC-
‘08
REFERENCES
75
NCVCCC-
‘08
76
NCVCCC-‘08
A lightweight controller provides the system-level decoupling the processing rate of a CE from the inter-
interface for the IP module and executes a program that CE communication rate.
dictates how the IP is used in the system. Localizing
the control for the IP to this program simplifies any For the purposes of this discussion, we assume a FIFO
necessary redesign of the IP for other applications. width of 33 bits, but leave the depth variable.
Most of the applications are ready to use, hence with
ON-CHIP
slight modification, we can make it compatible for any
other applications or complicated architecture. C
77
NCVCCC-‘08
modules as programmable coarse grained functional peripheral, and interacts with the rest of the system via
units. Designers can then reprogram the IP module’s the SIMPPL controller, which interfaces with the
usage in the system to adapt to the requirements of new internal communication links to receive and transmit
applications. instruction packets. The SIMPPL Control Sequencer
(SCS) module allows the designer to specify, or
H/W IP H/W IP ‘‘program’’, how the PE is used in the SoC. It contains
to OCP to OCP
OCP
the sequence of instructions that are executed by the
controller for a given application. The controller then
manipulates the control bits of the PE based on the
current instruction being executed by the controller and
OCP to Bus A OCP to Bus B
the status bits provided by the PE.
Bus A Bus B
H/W IP H/W IP
B. CE Abstraction
The strength of the SIMPPL model is the
CE abstraction, which allows designers to decouple the Fig. 3. Hardware CE abstraction.
functionality of a module from system-level
communication and control via a programmable III. SIMPPL CONTROLLER
controller. This design aim at reduces design time by The SIMPPL controller acts as the physical
facilitating design reuse, system integration, and interface of the IP core to the rest of the system. Its
system verification. The CE is an abstraction of instruction set is designed to facilitate controlling the
software or hardware IP that facilitates design reuse by core’s operations and reprogramming the core’s use for
separating the datapath (computation), the inter-CE different applications. As told above, we have to
communication, and the control. Researchers have design two versions of controllers- a execution-only
demonstrated some of the advantages of isolating version and a run-time debugging version, in other
independent control units for a shared datapath to words, a execute controller and a debug controller. The
support sequential procedural units in hardware. This is Execute controller has 3 parts, namely, consumer
similar to when a CE is implemented as software on a execute, producer execute and full execute. The Debug
processor (software CE), the software is designed with controller also has 3 parts, a consumer debug, producer
the communication protocols, the control sequence, debug and full debug.
and the computation as independent functions. Ideally,
a controller customized to the datapath of each CE A. Instruction Packet Format
could be used as a generic system interface, optimized
for that specific CE’s datapath. To this end, we have SIMPPL uses instruction packets to pass
created two versions of a fast, programmable, both control and data information over the internal
lightweight controller—an execution-only (execute) communication links shown in Fig. 1. Fig. 4 provides a
version and a run-time debugging (debug) version— description of the generic instruction packet structure
that are both adaptable to different types of transmitted over an internal link. Although the current
computations suitable to SoC designs, one of them is SIMPPL controller uses a 33-bit wide FIFO, the data
field-programmable gate array (FPGAs). Fig.3 word is only 32 bit. The remaining bit is used to
illustrates how the control, communications and the indicate whether the transmitted word is an instruction
datapath are decoupled in hardware CEs. The or data. The instruction word is divided into the least
processing element (PE) represents the datapath of the significant byte, which is designated for the opcode,
CE or the IP module, where an IP module implements and the upper 3 bytes, which represents the number of
a functional block having data ports and control and data words (NDWs) sent or received in an instruction
packet. The current instruction set uses only the five
status signals. It performs a specific function, be it a
least significant bits (LSBs) of the opcode byte to
computation or communication with an off-chip
represent the instruction. The remaining bits are
78
NCVCCC-‘08
reserved for future extensions of the controller where only one instruction is in flight at a time, to
instruction set. reduce design complexity and to simplify program
writing for the user. The SIMPPL controller also
monitors the PE-specific status bits that are used to
generate status bits for the SCS, which are used to
determine the control flow of a program. The format
of an output data packet sent via the internal transmit
(Tx) link is dictated by the instruction currently being
executed. The inputs multiplexed to the Tx link are
the Executing Instruction Register (EX IR), an
immediate address that is required in some
instructions, the address stored in the address register
a0 and any data that the hardware IP transmits. Data
can only be received and transmitted via the internal
links and cannot
originate from the SCS. Furthermore, the controller
can only send and receive discrete packets of data,
which may not be sufficient for certain types of PEs
requiring continuous data streaming. To solve this
problem, the controller supports the use of optional
asynchronous FIFOs to buffer the data transmissions
Fig. 4. An internal link’s data packet format. between the controller and the PE.
B. Controller Architecture
79
NCVCCC-‘08
TABLE 1-Current Instruction Set Supported by the address of the next instruction of the program to
SIMPPL Controller be fetched from memory. While a SIMPPL controller
and program perform the equivalent operations to a
program running on a generic processor, the controller
uses a remote PC in the SCS to select the next
instruction to be fetched. Figure 6 illustrates the SCS
structure and its interface with the SIMPPL controller
via six standardized signals. The 32-bit program word
and the program control bit, which indicates if the
program word is an instruction or address, are only
valid when the valid instruction bit is high. The valid
Although some instructions required to fully instruction signal is used by the SIMPPL controller in
support the reconfigurability of some types of combination with the program instruction read to
hardware PEs may be missing, the instructions in fetch an instruction from the Store Unit and update the
Table 1 support the hardware CEs that have been built PC. The continue program bit indicates whether the
to date. Furthermore, the controller supports the current program instruction has higher priority than
expansion of the instruction set to meet future the instructions received on the CE Rx link. It can be
requirements. The first column in Table 1 describes used in combination with PE-specific and controller
the operation being performed by the instruction. status bits to help ensure the correct execution order
Columns 2 through 4 are used to indicate whether the of instructions.
different instruction types can be used to request data
(Rd Req), receive data (Rx), or write data (Wr). The
next two columns are used to denote whether each
instruction may be issued from or executed from the
SCS (S) or internal Receive Communication Link (R).
Finally, the last two columns are used to denote
whether the instruction requires an address field
(Addr Field) or a data field (Data Field) in the packet
transmission. The first instruction type described in
Table 1 is the immediate data transfer instruction. It
consists of one instruction word of the format shown
in Figure 4, excluding the address field, where the two
LSBs of the opcode indicates whether the data transfer
is a read request, a write, or a receive. The immediate Fig. 6. Standard SIMPPL control sequencer structure
data plus immediate address instruction is similar to and interface to the SIMPPL controller.
the immediate data transfer instruction except that an
address field is required as part of the instruction A. Consumer Controller
packet.Designers can reduce the size of the controller We have 4 interfacing blocks for
by tailoring the instruction set to the PE. Although communication within the consumer execute
some CE’s receive and transmit data, thus requiring controller. They are Master, Slave, Processing
the full instruction set, others may only produce data Element, and Programmable Interface. The Consumer
or consume data. The Producer controller (Producer) writes data to the Master. Slave is from wherein the
is designed for CEs that only generate data. It does not Consumer reads data. The signals of the Master block
support any instructions that may read data from a CE. are Master clock, Master write, Master data, Master
The Consumer controller (Consumer) is designed for control Master full. Following are the signals of the
CEs that receive input data without generating output Slave block, Slave clock, Slave data, Slave control,
data. It does not support any instructions that try to Slave read, Slave exist. There are 2 more signals that
write PE data to a Tx link. are generated in relation to the Processing Element
and these are generated from the Processing Element
IV. SIMPPL CONTROL SEQUENCER to the Consumer. Following are the signals:
The SIMPPL Control Sequencer provides the can_write_data, can_write_addr. The signals
local program that specifies how the PE is to be used generated from the Programmable Interface to the
by the system. The operation of a SIMPPL controller Consumer are as follows: program control bit program
is analogous to a generic processor, where the
valid instruction, cont_program, and program
controller’s instruction set is akin to assembly
instruction. The signals generated from the Consumer
language. For a processor, programs consist of a series
to the Programmable Interface are as follows:
of instructions used to perform the designed
prog_instruction_read. The input signals of the blocks
operations. Execution order is dictated by the
are given to the consumer controller and the output
processor’s Program Counter (PC), which specifies
80
NCVCCC-‘08
signals are directed to the blocks from the consumer decryption of data will be done at the producer and consumer
controller. controller ends respectively as an enhancement part of this
Initially when the process begins, the controller checks project.
whether the instruction is a valid instruction or not. If not, the RFERENCES
instruction is not executed, as the valid instruction bit is not
set as high. On the receiving of a valid instruction, the valid[1] M. Keating and P. Bricaud, Reuse Methodology Manual
instruction bit goes high, the instruction is identified then for
by System-on-a-Chip Designs. Norwell, MA: Kluwer
the control bit. We may receive either data or instruction. Academic, 1998.
When data is received from the slave, the consumer will read [2] H. Chang, L. Cooke, M. Hung, G. Martin, A. J. McNelly,
the data and store it in the Processing Element. When the and L. Todd, Surviving the SOC Revolution: A Guide to
slave read pin is becomes ‘1’, the slave data will Platform-Based
be Design. Norwell, MA: Kluwer Academic,
transferred. Once this data is received, the Processing 1999.
Element checks for the condition whether its ready to set the [3]L.Shannon and P.Chow, “Maximizing system
can_write_data pin or can_write_address. This is known once performance:Using reconfigurability to monitor system
the data is sent to the consumer and hence the can_write_data communications,” in Proc. IEEE Int. Conf. on Field-
is set. After this the corresponding acknowledge signals are Programm. Technol., Dec. 2004, pp. 231–238.
sent and once the data transfer is ensured, the [4] ——, “Simplifying the integration of processing elements
can_write_address pin is set to ‘1’ from the Processing in computing systems using a programmable controller,” in
Element. Once this write_address is received, the data in the proc. IEEE Symp. on Field-Programm.Custom Comput.
Mach., Apr. 2005, pp. 63–72.
slave is transferred to the Processing Element. When the
[5] E. Lee and T. Parks, “Dataflow process networks,” Proc.
consumer communicates with the Master, all the data is
IEEE, vol. 83, no. 5, pp. 471– 475, May 1995.
transferred to the Master. Master block deals with pure data
[6] K. Jasrotia and J. Zhu, “Stacked FSMD: A power
transfer, hence on receiving pure data instead of instruction,
efficient micro-architecture for high level synthesis,” in Proc.
the Slave|_data is stored as Master_data. The address to store
Int. Symp. on Quality Electronic Des., Mar. 2004, pp. 425–
this Master_data is governed by the Consumer controller. 430.
The two important facts we are dealing here is
concerning the program instruction and the slave data. The
slave data for this module is a fixed value. The program
instruction is given any random value. It contains the
instruction and the size of the data packets that is data words.
These data words are in a continuous format and are
generated as per the counter.
V RESULTS
VI FUTURE WORK
81
NCVCCC-
‘08
Clock Period Minimization of Edge Triggered Circuit
1
.D.Jacukline Moni, 2S.Arumugam,1 Anitha.A,
1
ECE Department, Karunya University,
2
Chief Executive, Bannari Amman Educational Trust
Abstract--In a sequential VLSI circuit, due to increase, more techniques are developed for clock period
differences in interconnect delays on the clock minimization. An application of optimal clock skew
distribution network, clock signals do not arrive at all scheduling to enhance the speed characteristics of
of the flip-flops (FF) at the same time. Thus there is a functional blocks of an industrial chip was demonstrated
skew between the clock arrival times at different in [1].
latches. Among the various objectives in the II.PROJECT DESCRIPTION
development of sequential circuits, clock period
minimization is one of the most important one. Clock This paper deals with the clock period
skew can be exploited as a manageable resource to minimization of edge triggered circuits. Edge triggered
improve circuit performance. However, due to the circuit are the sequential circuits that use the edge-
limitation of race conditions, the optimal clock skew triggered clocking scheme. It consists of registers and
scheduling often does not achieve the lower bound of combinational logic gates with wires connecting between
sequential timing optimization. This paper presents the them. Each logic gate has one output pin and one or more
clock period minimization of edge-triggered circuits. input pin. A timing arc is used to denote the signal
The objective here is not only to optimize the clock propagation from input pin to output pin and suitable
period but also to minimize the required inserted delay delay value for the timing arc is also taken in to account.
for resolving the race conditions. This is done using In the design of an edge-triggered circuit, if the clock edge
Modelsim XE 11 5.8c. arrives at each register exactly simultaneously, the clock
period cannot be shorter than the longest path delay. If this
I. INTRODUCTION circuit has timing violations cause by long paths, an
improvement can be done by an optimization step. There
Most integrated circuits of sufficient complexity are two approaches to resolve the timing violations of long
utilize a clock signal in order to synchronize different paths. One is to apply logic optimization techniques for
parts of the circuit and to account for the propagation reducing the delays of long paths; and the other is to apply
delays. As ICs become more complex, the problem of sequential timing optimization techniques, such as clock
supplying accurate and synchronized clocks to all the skew scheduling [7] and retiming transformation [5] [8] ,
circuits become difficult. One example of such a complex to adjust the timing slacks among the data paths. Logic
chip is the microprocessor, the central component of optimization techniques are applied earlier. For those long
modern computers. A clock signal might also be gated or paths whose delays are difficult to further reduce,
combined by a controlling signal that enables or disables sequential timing optimization techniques are necessary.
the clock signal for a certain part of a circuit. In It is well known that the clock period of a
synchronous circuit, clock signal is a signal used to nonzero clock skew circuit can be shorter than the longest
coordinate the actions of two or more circuits. A clock path delay if the clock arrival times of registers are
signal oscillates between high and a low state and is properly scheduled. The optimal clock skew scheduling
usually in the form of a square wave. Circuits using the problem can be formulated as a constraint graph and
clock signal for synchronization may become active at solved by polynomial time complexity algorithms like
either rising edge, falling edge or both edges of the clock cycle detection method [6],binary search algorithms,
cycle. A synchronous circuit is one in which all the parts shortest path algorithms [2] etc. Given a circuit graph G,
are synchronized by a clock. In ideal synchronous circuits, the optimal clock skew scheduling problem is to
every change in the logical levels of its storage determine the smallest feasible clock period and find an
components is simultaneous. These transactions follow the optimal clock skew schedule, which specifies the clock
level change of a special signal called the clock. Ideally, arrival times of registers for this circuit to work with the
the input to each storage element has reached its final smallest feasible clock period. Due to the limitation of
value before the next clock occurs, so the behaviors of the race conditions, the optimal clock skew scheduling often
whole circuit can be predicted exactly .Practically, some does not achieve the lower bound of sequential timing
delay is required for each logical operation, resulting in a optimization. Thus, a combination of optimal clock skew
maximum speed at which each synchronous system can scheduling and delay insertion may lead to further clock
run. To make these circuits work correctly, a great deal of period reduction. For this circuit graph shown below is
care is needed in the design of the clock distribution taken for analysis. This approach of combining optimal
network. This paper deals with the clock period clock skew scheduling and delay insertion for the
minimization of edge triggered circuit. Clock skew is a synthesis of nonzero clock skew circuits was done using
phenomenon in synchronous circuits in which the clock Delay Insertion and Nonzero Skew (DIANA) algorithm.
signal arrives at different components at different times. The DIANA algorithm is an iteration process
This can be due to wire-interconnect length, temperature between the construction of an effective delay-inserted
violations, capacitive coupling, material imperfections etc. circuit graph and the construction of an irredundant delay-
As design complexity and clock frequency continue to inserted circuit graph. The iteration process repeats until
82
NCVCCC-‘08
the clock period cannot be further reduced. The delay The delay to register ratio of a directed cycle C is
component is then applied to the edge triggered circuit that given by maximum delay of C / the number of registers in
we have taken. C. This gives the lower bound of sequential timing
III.METHOD 1 optimization. From the circuit graph it is clear that, the
A. LOWER BOUND OF SEQUENTIAL TIMING maximum delay to register ratio [9] of the directed cycle is
OPTIMIZATION 4 tu. The waveform of the edge triggered circuit is shown
Fig 1 shows an edge triggered flipflop circuit. It in fig 3.
consists of registers and combinational logic gates with
wires connecting them. The circuit has four registers and
eight logic gates. Each logic gate has one ore more input
pin and one output pin. A timing arc is defined to denote
the signal propagation from input to output. The delays of
the timing arc in the edge triggered circuit are initialized
as shown in the table below. A data path from register Ri
to register Rj denoted as Ri Rj includes the
combinational logic from Ri to Rj. The circuit can also be
modeled as a circuit graph G (V, E) for timing analysis Fig.2. Circuit Graph
where V is the set of vertices
and E is the set of directed edges. Each vertex represents a
register and special vertex called host is used to
synchronize the input and output. A directed edge (Ri, Rj)
represents a data path
Ri Rj, and it is associated with weight which represents
the minimum and maximum propagation delay of the data
path. The circuit graph of the edge triggered flipflop is
shown in fig 2. From the graph it is clear that the
maximum propagation delay path is TPD3,4 (max) and is 6
time units (tu) .
Fig. 3. Waveform of edge triggered flipflop
METHOD 2
83
NCVCCC-‘08
circuit graph works with the clock period P only if the skew schedule is derived by taking zero clocking
clock skew schedule satisfies the clocking constraints. The constraints into account. In the second step, delay
optimal clock skew scheduling problem is to determine insertion is applied to resolve the race conditions.
the smallest feasible clock period of a circuit graph and Consider the circuit graph shown in fig 6 (a) .Here the
find the corresponding clock skew schedule for the circuit lower bound of sequential timing optimization is 3 tu.
graph to work with the smallest feasible clock period. There is no negative cycle in the constraint graph of fig 6
Optimum clock skew scheduling problem is (b). The clock skew schedule is taken as Thost = 0 tu, TC1 =
solved by applying binary search approach. At each step 0 tu, TC2 = 0 tu and TC3 = 1 tu .Here the lower bound is
in the binary search [3], for constant value of the clock achieved with out any delay insertion.
period P, check for negative cycle is done .The binary
approach is repeated until smallest feasible clock period is
attained. After applying this approach, we get the smallest
feasible clock period as 5tu. The corresponding constraint
graph is shown in fig 4(b). When the clock period is 5 tu,
there exist a critical cycle R3 R4 R3 in the constraint
graph. If the clock period is less than 5 tu, this cycle will
become a negative cycle. From the fig 3(b), optimum
clock skew scheduling is limited by the critical cycle R3
R4 R3, which is not a critical z-cycle. This critical cycle
has a critical D-edge ed (R3 R4). The weight of the
D- edge is the minimum delay from Register R3 to Fig.6 (a) Circuit Graph ex2
Register R4. Thus, if we increase the minimum delay, the (b) Constraint Graph Gcg(ex2, 3).
cycle becomes a non critical one. The optimal clock skew
schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu, TC3 On the other hand, fig 6 shows the two step
= 2 tu and TC4 = 3 tu. The corresponding waveform process for obtaining a delay inserted circuit graph which
representation is also shown in fig 5. works with a clock period P = 3tu. In the first step, since
But due to limitation of race conditions, the only zero clocking constraints are considered, clock skew
optimal clock skew scheduling often does not achieve the schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu and
lower bound of sequential timing optimization. Also TC3 = 3 tu. This is shown in fig 7(a). Then in the second
different clock skew schedule will have different race step, delay insertion is applied to resolve the race
conditions. So delay insertion [10] is taken into account in conditions. Here the required increase of the minimum
determining the clock skew schedule. delay from host to R2 is 1 tu and the required increase of
the minimum delay from host to R3 is 2 tu. Fig 7(b) shows
this process. The two step process result in extra delay
insertion. The corresponding waveform is also shown in
fig 8.
84
NCVCCC-‘08
V. PROPOSED METHOD TO CIRCUIT GRAPH
VII.CONCLUSION
This paper describes the clock period
minimization of edge triggered circuit. This paper uses a
delay insertion and non zero skew algorithm to optimize
the clock period .Experimental results of various sections
of this project are shown above. It is clear that clock
period is minimized using this algorithm than any other
approaches. This algorithm is applied to series of
benchmark circuits and the results are shown above.
REFERENCES
86
NCVCCC-‘08
Abstract- Floor planning is important in very large In this paper, we adopted a non-slicing
scale integrated circuits (VLSI) design automation as representation B*-tree with Hybrid Particle Swarm
it determines the performance, size and reliability of Optimization (HPSO) algorithm.HPSO [1] utilizes
VLSI chips. This paper presents a floorplanning the basic mechanism of PSO [7, 8] and the natural
method based on hybrid Particle Swarm selection method, which is usually utilized by EC
Optimization (HPSO).B*-tree floorplan structure is methods such as genetic algorithm (GA).Since
adopted to generate an initial floorplan without any search procedure by PSO deeply depends on pbest
overlap and then HPSO is applied to find out the and gbest, the searching area may be limited by
optimal solution.HPSO has been implemented and pbest and gbest. On the contrary, by the introduction
tested on popular MCNC and GSRC benchmark of natural selection, broader area search can be
problems for nonslicing realized.
and hard module VLSIfloorplanning.Experimental The remainder of this paper is organized as
results show that the HPSO can quickly produce follows. Section 2 describes the PSO and HPSO
optimal or nearly optimal solutions for all popular methodology. Section 3 presents B*-tree
benchmark circuits. representation and our proposed methods for
floorplanning.The experimental results are reported
I.INTRODUCTION in Section 4.Finally, the conclusion is in section 5.
87
NCVCCC-‘08
where, v i,j is velocity of the particle of the jth dimension C. Steps of Hybrid Particle Swarm Optimization
for all j belongs to 1…s, w is the inertia weight of Step 1:
velocity,c1 and c2 denotes the acceleration Generation of initial condition of each agent .Initial
coefficients,r1 and r2 are the elements from two uniform searching points (si0) and velocities (vi0) of each
random sequences in the range (0, 1) and t is number of agent are usually generated randomly within the
generations. The new position of the particle is allowable range. The current searching point is set to
calculated as follows pbest for each agent. The best-evaluated value of
xi(t+1)=xi(t)+vi(t+1) (2) pbest is set to g best and the agent number with the
The local best position of each particle is updated best value is stored.
by(3). Step 2:
⎧yi(t),→if →f (xi(t +1)) ≥ f ( yi(t)) Evaluation of searching point of each agent. The
yi(t +1) =⎨ objective function value is calculated for each agent.
⎩xi(t +1),→if →f (xi(t +1)) < f ( yi(t)) (3)
If the value is better than the current pbest of the
agent, the pbest value is replaced by the current
The global best position y found from all particles during value. If the best value of pbest is better than the
previous three steps are defined as current g best, g best is replaced by the best value
and the agent number with the best value is stored.
yi (t +1) = argminf ( yi (t +1)), →1 ≤ i ≤ n Step 3:
yi Natural selection using evaluation value of each
(4)
B. Hybrid particle swarm optimization (HPSO) searching point is done.
The structure of the hybrid model is illustrated below Step 4:
begin Modification of each searching point. The current
initialize searching point of each agent is changed.
while (not terminate-condition) do Step 5:
begin Checking the exit condition. If the current iteration
evaluate number reaches the predetermined maximum
calculate new velocity vectors iteration number, then exit, otherwise go to step 2.
move III.B*-TREE REPRESENTATION
Natural Selection Given an admissible placement P, we can represent
end it by a unique (horizontal) B*-tree T. Fig 2(b) gives
end an example of a B*-tree representing the placement
The breeding is done by first determining which of the of Fig 2(a). A B*-tree is an ordered binary tree
particles that should breed. This is done by iterating whose root corresponds to the module on the bottom
through all the particles and, with probability (pi) mark a left corner. Similar to the DFS procedure, we
given particle for breeding. Note that the fitness is not construct the B*-tree T for an admissible placement
used when selecting particles for breeding. From the P in a recursive fashion: Starting from the root, we
pool of marked particles we now select two random first recursively construct the left subtree and then
particles for breeding. This is done until the pool of the right subtree. Let Ri denote the set of modules
marked particles is empty. The parent particles are located on the right hand side and adjacent to bi. The
replaced by their offspring particles, thereby keeping the left child of the node ni corresponds to the lowest
population size fixed where pi is a uniformly distributed module in Ri that is unvisited. The right child of the
random value between 0 and 1.The velocity vectors of node ni represents the lowest module located above
the offspring is calculated as the sum of the velocity and with its x coordinates equal to that of bi.
vectors of the parents normalized to the original length Following the above mentioned DFS procedure and
of each parent velocity vector. The flow chart of HPSO definitions, we can
is shown in figure.1 guarantee the 1-to-1 correspondence between an
admissible placement and its induced B*-tree.
88
NCVCCC-‘08
(a) corresponding packing (i.e compute the x and y
coordinates for all modules) is amortized linear time.
circuit #of With B*-tree Our method
blo Area Time Area Time B*-tree Perturbations
cks (mm2) (sec) (mm2) (sec) Given an initial B*-tree, we perturb the B*-
tree to another using the following three operations.
Apte 9 46.92 7 46.829 1.31 • Op1 : Rotate a module
Xerox 10 20.06 25 19.704 3.69 • Op 2 : Move a module to another place
Ami33 33 1.27 3417 1.26 4.44 • Op 3 : swap two modules
•
Op 1 rotates a module, and the B * -tree structure is
not changed. Op 2 deletes and inserts a node . Op 2
and Op 3 need to apply the deletion and insertion
operations for deleting and inserting a node from and
to a B*-tree.
A. Floorplanning using B*-tree
IV .EXPERIMENT RESULTS
The experiments in this study employed
(b)
GSRC and MCNC bench marks[22] for the
proposed floorplanner and compared with [2].The
simulation programs were written in C++ compiled
Fig2: (a) An admissible placement (b) The (horizontal)
using Microsoft Visual C++,and the results were
B*-tree representing the placement
obtained on a Pentium 4 2Ghz with 256MB RAM.
The PSO experiments with w, c1, c2 initializations
As shown in fig 2, it makes the module a , the root of T
were 0.4, 1.4, and 1.4 respectively. For HPSO, the
since module a, is on the bottom - left corner.
probability of selection is chosen as 0.6.The particle
Constructing the left subtree of na recursively it makes nh
number is set as twenty. The floorplanner was run
the left child of na . Since the left child of nh does not
for 10 times and average values of chip area and run
exist, it then constructs the right subtree of nh (which is
time were taken.
routed by ni). The construction is recursively performed
The results are shown in Table 1.Compared
in the DFS order. After completing the left subtree of na
with [2], our method can find a better placement
the same procedure applies to the right subtree of na. The
solution in even less computation time. Under the
resulting B *tree for the placement of fig 2( a) is shown
same tree structure, our approach has more
in fig 2(b) .The construction takes only linear time.
efficiency and solution searching ability for
Given a B* tree T , we shall compute the x and
floorplan.
y coordinates for each module associated with a node in
the tree. The x and y coordinates of the module
Table 1 Results of Hard Modules using B*-tree
associated with the root (xroot, yroot) = (0, 0) since the root
based HPSO
of T represents the bottom- left module. The B* -tree
keeps the geometric relationship between two modules
V .CONCLUSION AND FUTURE WORK
as follows. If node nj is the left child of node ni , module
In this paper, we proposed a floorplanner based on
bj must be located on the right- hand side and adjacent to
HPSO with B*-tree structure for placing blocks.
module bi in the admissible placement ; xj = xi + wi. .
HPSO exhibits the ability for searching the solution
Besides if node nj is the right child of node ni , module bj
space more efficiently than SA.The experimental
must be located above, with the x- coordinate of bj equal
results proved that the proposed HPSO method can
to that of bi i.e xj = xi. Therefore, given a B* -tree, the x
lead to a more optimal and reasonable solutions on
coordinates of all modules can be determined by
the hard IP modules placement problem. Our future
traversing the tree once. The contour data structure is
work is to deal with soft IP modules and also to
adopted to efficiently compute the y- coordinate from a
include constraints such as alignment and
B* -tree. Over all, given a B*-tree we can determine the
performance constraints.
89
NCVCCC-‘08
REFERENCES [20] E.F.Y. Young, C.C.N. Chu and Z.C. Shen,
“Twin Binary Sequences: A Nonredundant
[1] P.J. Angeline “Using Selection to Improve Particle Representation for General Nonslicing Floorplan,”
Swarm Optimization.” In Proceedings of the IEEE IEEE Trans. on CAD 22(4), pp. 457–469, 2003.
Congress on Evolutionary Computation, 1998 pages 84- [21] S. Zhou, S. Dong, C.-K. Cheng and J. Gu,
89 IEEE Press. “ECBL: An Extended Corner Block List with
[2] Y.-C. Chang, Y.-W. Chang, G.-M. Wu and S.- Solution Space including Optimum Placement,”
W.Wu, “B *-trees: A New representation for Non- ISPD 2001, pp. 150-155.
Slicing Floorplans,” DAC 2000, pp.458-463. [22]
[3]R.C.Eberhart and J.kennedy “A New Optimizer using http://www.cse.ucsc.edu/research/surf/GSRC/progre
Particle Swarm Theory.” In Proceedings of the Sixth ss.html
International Symposium on Micromachine and Human
Science, 1995 ,pages 39-43.
[4] P.-N. Guo, C.-K. Cheng and T. Yoshimura, “An O-
tree Representation of Non-Slicing Floorplan,” DAC
‘99, pp. 268-273.
[5] X. Hong et al., “Corner Block List: An Effective and
Efficient Topological Representation of Non-Slicing
Floorplan,” ICCAD 2000, pp. 8-13.
[6] A. B. Kahng, “Classical floorplanning harmful?”
ISPD 2000, pp. 207-213.
[7] J.Kennedy and R.C.Eberhart ‘Particle Swarm
Optimization.’ In Proceedings of the IEEE International
Joint Conference on Neural Networks, (1995) pages
1942-1948.IEEE Press
[8] J.Kennedy ‘The Particle Swarm: Social Adaptation
of Knowledge.’ In Proceedings of the IEEE
International Conference on
Evolutionary Computation, 1997, pages 303-308.
[9]M.Lai and D. Wong,“SlicingTree Is a Complete
FloorplanRepresentation,” DATE 2001, pp. 228–232.
[10] J.-M. Lin and Y.-W Chang, “TCG: A Transitive
Closure Graph-Based Representation for Non-Slicing
Floorplans,” DAC 2001, pp. 764–769.
[11] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal
Coupling of P*-admissible Representations for General
Floorplans,” DAC 2002, pp. 842–847.
[12] H. Murata, K. Fujiyoshi, S. Nakatake and, “VLSI
Module Placement Based on Rectangle-Packing by the
Sequence Pair,” IEEE Trans. on CAD 15(12), pp. 1518-
1524, 1996.
[13] H. Murata and E. S. Kuh, “Sequence-Pair Based
Placement Methods for Hard/Soft/Pre-placed Modules”,
ISPD 1998, pp. 167-172.
[14] S.Nakatake, K.Fujiyoshi, H.Murata,and Y.Kajitani,
“Module placement on BSG structure and IC Layout
Applications,” Proc.ICCAD,pp.484-491,1998.
[15] K.E. Parsopoulos and M.N. Vrahatis, “Recent
Approaches to Global ptimization Problems through
Particle Swarm Optimization.” Natural Computing,
2002, 1(2-3):235-306.
[16] X. Tang, R. Tian and D. F. Wong, “Fast Evaluation
of Sequence Pair in Block Placement by Longest
Common Subsequence Computation,” DATE 2000, pp.
106-111.
[17] X. Tang and D. F.Wong, “FAST-SP: A Fast
Algorithm for Block Placement Based on Sequence
Pair,” ASPDAC 2001, pp. 521-526.
[18] D.F.Wong and C.L.Liu, “A New Algorithm For
Floorplan Design,” DAC 1986,PP.101-107.
[19] B. Yao et al., “Floorplan Representations:
Complexity and Connections,” ACM Trans. on Design
Autom. of Electronic Systems 8(1), pp. 55–80, 2003.
90
NCVCCC-‘08
91
NCVCCC-‘08
2. Develop a unique number for the above, giving
II.BLOCK DIAGRAM weights to various constructs.
93
NCVCCC-‘08
REFERENCES
94
NCVCCC-‘08
Abstract - In this paper we propose a new scheme for apply patterns that cannot appear during normal operation
Built in self test .we proposing different architectures to the state inputs of the CUT during test application.
that have reduce power dissipation . The architectures Furthermore, the values applied at the state inputs of the
are designed With techniques that reducing the power CUT during scan shift operations represent shifted values
dissipation. The BIST with different technique of test vectors and circuit responses and have no particular
decreases transitions that occur at scan inputs during temporal correlation. Excessive switching activity due to
scan shift operations and hence reduces power low correlation between consecutive test patterns can cause
dissipation in the CUT. Here we doing the comparison several problems [14].Since heat dissipation in a CMOS
among different architectures of BIST. In this paper circuit is proportional to switching activity, a CUT can be
We are fixing the values at the inputs of BIST permanently damaged due to excessive heat dissipation if
architecture & at the output we are restructuring the switching activity in the circuit during test application is
scan chain to get the optimized results. Experimental much higher than that during its normal operation. Heat
results of the proposed technique show that the power dissipated during test application is already in-fluencing the
dissipation is reduced signifcantly compared to existing design of test methodologies for practical circuits [14].
work.
I.INTRODUCTION II-MINIMIZING POWER DISSIPATION BY
REDUCING SWITCHING ACTIVITY
Circuit power dissipation in test mode is much higher than
the power dissipation in function mode [21]. High power The BIST TPG proposed in this paper reduces switching
consumption in BIST mode is especially a serious concern activity in the CUT during BIST by reducing the number of
because of at-speed testing. Low power BIST techniques transitions at scan input during scan shift cycles fig 1.. If
are gaining attention in recent publications [11]. The first scan input is assigned , where , at time and assigned the
advantage of low power BIST is to avoid the risk of opposite value at time , then a transition occurs at at time .
damaging the Circuits Under Test (CUT). Low power BIST The transition that occurs at scan input can propagate into
techniques save the cost of expensive packages or external internal circuit lines causing more transitions. During scan
cooling devices for testing. Power dissipation in BIST shift cycles, the response to the previous scan test pattern is
mode is made up of three major components: the also scanned out of the scan chain. Hence, transitions at
combinational logic power, the sequential circuit power, scan inputs can be caused by both test patterns and
and the clock power. In the clock power reduction responses. Since it is very difficult to generate test patterns
category, disabling or gating the clock of scan chains are by a random pattern generator that cause minimal number
proposed [2]. By modifying the clock tree design, these of transitions while they are scanned into the scan chain
techniques effectively reduce the clock power consumption, and whose responses also cause minimal number of
which is shown to be a significant component of the test transitions while they are scanned out of the scan chain, we
power [23]. However, clock trees are sensitive to the focus on minimizing the number of transitions caused only
change of timing; even small modifications sometimes can by test patterns that are scanned in. Even though we focus
cause serious failure of the whole chip. Modifying the on minimizing the number of transitions caused only by test
clocks, therefore, not only increases the risk of skew patterns, our extensive experiments show that the proposed
problems but also imposes constraints on the test patterns TPG can still reduce switching activity significantly during
generation. The low transition random test pattern BIST . Since circuit responses typically have higher
generator (LT-RTPG) is proposed to reduce the number of correlation among neighborhood scan outputs than test
toggles of the scan input patterns . In 3-weight weighted patterns, responses cause fewer transitions than test patterns
random technique while being scanned out. A transition at the input of the
scan chain at scan shift cycle , which is caused by scanning
we are fixing transition at the input so in this way we are in a value that is opposite to the value that was scanned in
reducing power in 3 wt wrbist. switching activity in a at the previous scan shift cycle , continuously causes
circuit can be significantly higher during BIST than that transitions at scan inputs while the value travels through the
during its normal operation. Finite-state machines are often scan chain for the following scan shift cycles.. describes
implemented in such a manner that vectors representing scanning a scan test pattern 01100 into a scan chain that has
successive states are highly correlated to reduce power five scan flip-flops. Since a 0 is scanned into the scan chain
dissipation [16]. Use of scan allows to at time , the 1 that is scanned into the scan chain at time
causes a transition at the input of the scan chain and
continuously causes transitions at the scan flip-flops it
passes through until it arrives at its final destination at time
.
95
NCVCCC-‘08
In contrast, the 1 that is scanned into the scan chain at the (since the generators are 9 bits wide, When the content of
next cycle causes no transition at the input of the scan chain the shift counter is , where k = 0,1,……8, A value for input
and arrives at its final destination at time without causing pk is scannes into the scan chain The generator counter
any transition at the scan flip-flops it passes through[14]. selects appropriate generators; when the content of the
This shows that transitions that occur in the entire scan generator counter is , test patterns are generated by using
chain can be reduced by reducing transitions at the input of generator Pseudo-random pattern sequences generated by
the scan chain. Since transitions at scan inputs propagate an LFSR are modified (fixed) by controlling the AND and
into internal circuit lines causing more transitions, reducing OR gates with overriding signal s0 and s1 . fixing a random
transitions at theinput scan chain can eventually reduce value to 0 is achieved by setting s0 to 1 and s1 to 1.
switching activity in the entire circuit. overriding of signals
s0 and s1 is driven by T flip flops , TF0 and TF1 . The
inputs of TF0 and TF1 is driven by D0 and D1 respectively
which are generated by the outputs of shift counter and
generator counter . The shift counter is required by all scan-
based BIST techniques and not particular to the proposed
3-weight WRBIST scheme.All BIST controllers need a
pattern counter that counts the number of test patterns
applied. The generator counter can be implemented from
logG where G is the number of generator counter no
additional hardware is required hardware overhead for
Fig1 -Transitions at scan chain input implementing a 3-weight WRBIST is incurred only by the
decoding logic and the fixing logic, which includes two
III. ARCHITECTURE OF 3WT-WRBIST toggle flip-flops ( flip-flops), an AND and an OR gate.
Since the fixing logic can be implemented with very little
hardware, overall hardware overhead for implementing the
serial fixing 3-weight WRBIST is determined by hardware
overhead for the decoding logic. both d0 and d1 are set to 0
hence the t flip flops hold totheirprevious state in cycles
when a scan value of Pk is scanned in also assume that T
flip flop TF0 is initialized to 1 TF1 initialized to 0 . flip
flops placed in scan chain in descending order of their
subscript number hence the value of p0 is scanned first and
p8 is scanned last Random patterns generated by the LFSR
can be fixed by controlling the AND/OR gates directly by
the decoding logic without the two T flip flops . however
(a) (b)
this scheme incur larger hardware overhead for the
decoding logic and also more transition in the circuit under
Fig. 2.generator: (a) with toggle flip-flops TF and TF test (CUT) during BIST than the scheme with T flip flops .
and (b)without toggle flip-flops.
in the scheme shows TF0 ,TF1, D0 and D1 values for the
scheme in T flip flops that is implemented .
IV-ARCHITECTURE OF LT-RTPG BIST
The LT-RTPG proposed in reduces switching activity
during BIST by reducing transitions at scan inputs during
scan shift operations. An example LT-RTPG is shown in
Fig. 4. The LT-RTPG is comprised of an -stage LFSR, a -
input AND gate, and a toggle flip-flop (T flip-flop). Hence,
it can be implemented with very little hardware. Each of
inputs of the AND gate is connected to either a normal or
an inverting output of the LFSR stages. If large is used,
large sets of neighboring state inputs will be assigned
identical values in most test patterns, resulting in the
decrease fault coverage or the increase in test sequence
length. Hence, like [15], in this paper, LT-RTPGs with only
Fig. 3wt-WRBIST or 3 are used. Since a flip-flop holds previous values until
the input of the flip-flop is assigned a 1, the same value ,
shows a set of generators and Fig.3 shows an where , is repeatedly scanned into the scan chain until the
implementation of the 3-weight WRBIST for the generators value at the output of the AND gate becomes 1. Hence,
shown The shift counter is an (m+1) modulo counter, adjacent scan flip-flops are assigned identical values in
where m is the number of scan elements in the scan chain most test patterns and scan inputs have fewer transitions
96
NCVCCC-‘08
during scan shift operations. Since most switching activity REFRENCES
during scan BIST occurs during scan shift operations (a [1] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg,
capture cycle occurs at every cycles), the LT-RTPG can “Exhaustive genera-tion of bit patterns with applications to
reduce heat dissipation during overall scan testing. Various VLSI self-testing,’’ IEEE Trans.Comput., vol. C-32, no. 2,
properties of the LT-RTPG are studied and a detailed pp. 190-194, Feb. 1983.
methodology for its design is presented in . It has been [2] L. T. Wang and E. J. McCluskey, “Circuits for pseudo-
observed that many faults that escape random patterns are exhaustive testpattern generation,” in Proc. IEEE Inr. Tesr
highly correlated with each other and can be detected by Con$, 1986, pp. 25-37.
continuously complementing values of a few inputs from a [3] W. Daehn and J. Mucha, “Hardware test pattern
parent test vector. This observation is exploited in [22], and generators for built-in test,’’ in Proc. IEEE Int. Tesr Con$,
to improve fault coverage for circuits that have large 1981, pp. 110-113.
numbers of RPRFs. We have also observed that tests for [4] S. Hellebrand, S. Tarnick, and J. Rajski, “Generation of
faults that escape LT-RTPG test sequences share many vector patterns through reseeding of multiple-polynomial
common input linear feedback shift registers,”in Proc. IEEE Int. Test
Conf., 1992, pp. 120–129.
[5] N. A. Touba and E. J. McCluskey, “Altering a pseudo-
random bit sequence
for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1996,
pp.167–175.
[6] M. Chatterjee and D. K. Pradhan, “A new pattern
biasing technique for BIST,” in Proc. VLSITS, 1995, pp.
417–425.
7] N. Tamarapalli and J. Rajski, “Constructive multi-phase
Fig4 LT-RTPG test point insertion for scan-based BIST,” in Proc. IEEE
Int. Test Conf., 1996, pp. 649–658.
[8] Y. Savaria, B. Lague, and B. Kaminska, “A pragmatic
approach to the design of self-testing circuits,” in Proc.
IEEE Int. Test Conf., 1989, pp. 745–754.
[9] J. Hartmann and G. Kemnitz, “How to do weighted
random testing for BIST,” in Proc. IEEE Int. Conf.
Comput.-Aided Design, 1993, pp.568–571.
[ [10] J. Waicukauski, E. Lindbloom, E. Eichelberger, and
assignments. This implies that RPRFs that escape LT- O. Forlenza, “A method for generating weighted random
RTPG test sequences can be effectively detected by fixing test patterns,” IEEE Trans. Comput., vol. 33, no. 2, pp.
selected inputs to binary values specified in deterministic 149–161, Mar. 1989.
test cubes for these RPRFs and applying random patterns to [11] H.-C. Tsai, K.-T. Cheng, C.-J. Lin, and S. Bhawmik,
the rest of inputs. This technique is used in the 3-weight “Efficient testpoint selection for scan-based BIST,” IEEE
WRBIST to achieve high fault coverage for random pattern Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4,
resistant circuits. In this paper we demonstrate that pp. 667–676, Dec. 1998.
augmenting the LT-RTPG with the serial fixing 3-weight [12] W. Li, C. Yu, S. M. Reddy, and I. Pomeranz, “A scan
WRBIST proposed in [15] can attain high fault coverage BIST generation method using a markov source and partial
without excessive switching activity or large area overhead BIST bit-fixing,” in Proc.IEEE-ACM Design Autom. Conf.,
even for circuits that have large numbers of RPRFs. 2003, pp. 554–559.
[13] N. Z. Basturkmen, S. M. Reddy, and I. Pomeranz,
V.CONCLUSION “Pseudo random patterns using markov sources for scan
BIST,” in Proc. IEEE Int. Test Conf., 2002, pp. 1013–1021.
This paper presents a low hardware overhead TPG for [14] S. B. Akers, C. Joseph, and B. Krishnamurthy, “On the
scanbased BIST that can reduce switching activity in CUTs role of independent fault sets in the generation of minimal
during BIST . The main objective of most recent BIST test sets,” in Proc. IEEE Int Test Conf., 1987, pp. 1100–
techniques has been the design of TPGs that achieve Low 1107.
power dissipation . Since the correlation between [15] S. W. Golomb, Shift Register Sequences. Laguna Hills,
consecutive patterns applied to a circuit during BIST is CA: Aegean Park, 1982.
significantly lower, switching activity in the circuit can be [16] C.-Y. Tsui, M. Pedram, C.-A. Chen, and A. M.
significantly higher during BIST than that during its normal Despain, “Low power state assignment targeting two-and
operation. multi-level logic implementation,” in Proc. IEEE Int. Conf.
Comput.-Aided Des., 1994, pp. 82–87
[17] P. Girard, L. Guiller, C. Landrault,
andS.Pravossoudovitch, “A test vector inhibiting technique
97
NCVCCC-‘08
for low energy BIST design,” in Proc. VLSI Test. Symp., [22] B. Pouya and A. L. Crouch, “Optimization trade-offs
1999, pp. 407–412. for vector volume and test power,” Proc. Int’l Tset Conf.,
[18] J. A. Waicukauski, E. B. Eichelberger, D. O. Forlenza, 2000, pp. 873 881.
E. Lindbloom,and T. McCarthy, “Fault simulation for
structured VLSI,” VLSI Syst. Design, pp. 20–32, Dec. 1985 [23] Y Bonhomme, P. Girard, L. Guiller, C. Landrault and
[19] R. M. Chou, K. K. Saluja, and V. D. Agrawal, S. Pravossoudovitvh, “A gated clock scheme for low power
“Scheduling tests for VLSI systems under power scan testing of logic ICs or embedded cores,” Proc. 10th
constraints,” IEEE Trans. Very Large Scale Integr. (VLSI) Asian Test Symp., 2001, pp. 253 258.
Syst., vol. 5, no. 2, pp. 175–185, Jun. 1997. [24] Y. Zorian, “A distributed BIST control scheme for
[20] T. Schuele and A. P. Stroele, “Test scheduling for complex VLSI design,” Proc. 11th IEEE VLSI Test Symp.,
minimal energy consumption under power constrainits,” in 1993, pp. 4
Proc. VLSI Test. Symp., 2001,pp. 312–318. [25] P. Girard, “Survey of low-power testing of VLSI
[21] N. H. E.Weste and K. Eshraghian, Principles of CMOS circuits,” IEEE Design and Test of Computers,May-June
VLSI Design: A Systems Perspective, 2nd ed. Reading, MA: 2002, pp. 82 92.
Addison-Wesley, 1992.
98
NCVCCC-‘08
Abstract--Advances in the Built-in-self-test (BIST) circuit’s normal operation. In order to ensure non-destructive
techniques have enabled IC testing using a combination of testing of such a circuit, it is necessary to either apply test
external automated test equipment and BIST controller on vectors which cause a switching activity that is comparable to
the chip. A new low power test pattern generator using a that during normal circuit operation or remove any excessive
linear feedback shift register (LFSR), called LP-TPG, is heat generated during test using special cooling equipment.
presented to reduce the average and peak power of a The use of special cooling equipment to remove excessive heat
circuit during test. The correlation between the test dissipated during test application becomes increasingly
patterns generated by LP-TPG is more than conventional difficult and costly as tests are applied at higher levels of
LFSR. LP-TPG inserts intermediate patterns between the circuit integration, such as BIST at board and system levels.
random patterns. The goal of having intermediate patterns Elevated temperature and current density caused by excessive
is to reduce the transitional activities of primary inputs switching activity during test application will severely
which eventually reduces the switching activities inside the decrease the reliability of circuits under test due to metal
circuit under test, and hence, power consumption. The migration or electro-migration.
random nature of the test patterns is kept intact. In the past, the tests were typically applied at rates much
Keyword—Lp-LFSR, R-injection, test patterns lower than a circuit’s normal clock rate. Circuits are now
tested at higher clock rates, possibly at the circuit’s normal
I. INTRODUCTION clock rate (at- speed testing). Consequently, heat dissipation
during test application is on the rise and is fast becoming a
The Linear Feedback Shift Register (LFSR) is commonly problem. A new low power test pattern generator using a
used as a test pattern generator in low overhead built-in-self- linear feedback shift register, called LP-TPG, is presented to
test (BIST). This is due to the fact that an LFSR can be built reduce the power consumption of a circuit during test.
with little area overhead and used not only as a TPG, which The original patterns are generated by an LFSR and the
attains high fault coverage for a large class of circuits, but also proposed technique generates and inserts intermediate patterns
as an output response analyzer. An LFSR TPG requires between each pair patterns to reduce the primary input’s (PI’s)
unacceptably long test sequence to attain high fault coverage activities.
for circuits that have a large number of random pattern II. LOW POWER TEST PATTERN GENERATION
resistant faults. The main objective of most recent BIST
techniques has been the design of TPG’s that achieve high The basic idea behind low power BIST is to reduce the PI
fault coverage at acceptable test lengths. Another objective is activities. Here we propose a new test pattern generation
to reduce the heat dissipation during test application. technique which generates three intermediate test patterns
A significant correlation exists between consecutive between each two consecutive random patterns generated by a
vectors applied to a circuit during its normal operation. This conventional LFSR. The proposed test pattern generation
fact has been motivating several architectural concepts, such method does not decrease the random nature of the test
as cache memories and also for high speed circuits that process patterns. This technique reduces the PI’s activities and
digital audio and video signals. In contrast, the consecutive eventually the switching activities in the CUT.
vectors of a sequence generated by an LFSR are proven to Assume that Ti and Ti+1 are the two consecutive test
have low power correlation. Since the correlation between patterns generated by a pseudorandom pattern generator.
consecutive test vectors applied to a circuit during BIST is Suppose the two vectors are
significantly lower, the switching activity in the circuit can be Ti = {t1i , t2i,…,tni} and
significantly higher during BIST than that during its normal Ti+1 = {t1i+1, t2i+1,…,t ni+1},
operation. where n is the number of bits in the test patterns which is equal
Excessive switching activity during test can cause several to the number of PI’s in the circuit under test.
problems. Firstly, since heat dissipation in a CMOS circuit is Assume that Tk1, Tk2, and Tk3 are the intermediate patterns
proportional to switching activity, a circuit under test (CUT) between Ti and Ti+1. Tk2 is generated as
can be permanently damaged due to excessive heat dissipation Tk2 = {t1i,…, tn/2i,tn/2+1i+1,…,tni+1}
if the switching activity in the circuit during test application is Tk2 is generated using one half of each of the two random
much higher than that during its normal operation. The patterns Ti and Ti+1. Tk2 is also a random pattern because it is
seriousness of excessive heat dissipation during test generated using two random patterns. The other two patterns
application is worsened by trends such as circuit are generated using Tk2. Tk1 is generated between Ti and Tk2
miniaturization for portability and high performance. These and Tk3 is generated between Tk2 and Ti+1.
objectives are typically achieved by using circuit designs that Tk1 is obtained by
decrease power dissipation and reducing the package size to tjk1 = { tji; if tji =tjk2
aggressively match the average heat dissipation during the R if tji tjk2}
99
NCVCCC-‘08
where j {1,2,…,n} and R is a random bit. This method of The first half of LFSR is active and second half is in idle
generating Tk1 and Tk3 is called R-injection. If two mode. Selecting sel1sel2=11, both halves of LFSR are sent to
corresponding bits in Ti and Ti+1 are the same, the same bit is the outputs O1 to On. Here Ti is generated.
positioned in the corresponding bit of Tk1, otherwise a random Step 2: en1en2=00, sel1sel2=10
bit (R ) is positioned. R can come from the output of the Both halves of LFSR are in idle mode. The first half of the
random generator. In this method, the sum of the PI’s activities LFSR is sent to the outputs O1 to On/2, but the injector circuit
between Ti and Tk1 (Ntransi.,k1), Tk1 and Tk2 (Ntransk1,k2), Tk2 and outputs are sent to the outputs On/2+1 to On. Tk1 is generated.
Tk3 (Ntransk2,k3) and Tk3 and Ti+1 (Ntransk3,i+1) are equal to the Step 3: en1en2=01, sel1sel2=11
activities between Ti and Ti+1 (Ntransi,i+1). The second half of the LFSR is active and the first half is
Ntransi.,k1+Ntransk1,k2+Ntransk2,k3 +Ntransk3,i+1 =Ntransi,i+1 in idle mode. Both halves are transferred to the outputs O1 to
III. LP-TPG On and Tk2 is generated.
Step 4: en1en2=00, sel1sel2=01
The proposed technique is designed into LFSR Both halves of the LFSR are in idle mode. From the first
architecture to create LP-TPG. Figure 2 shows LP-TPG with half, the injector outputs are sent to the outputs O1 to On/2
added circuitry to generate intermediate test patterns. and the second half sends the exact bits in the LFSR to the
outputs On/2+1 to On. Thus Tk3 is generated.
Step 5:
The process continues by going through step 1 to generate
Ti+1.
The LP-TPG with R-injection circuit keeps the random
nature of the test patterns intact. The FSM control the test
pattern generation throughout the steps and it is independent
of the LFSR size and polynomial. Clk and test_en signals are
the inputs of the FSM.
When test_en=1, FSM starts with step 1 by setting
en1en2=10 and sel1sel2=11. It continues the process by going
through steps 1 to 4. One pattern is generated in each clock
cycle. The size of the FSM is very small and fixed. FSM can
be part of BIST controller used in the circuit to control the test
process.
100
NCVCCC-‘08
The example shows an LP-TPG using an 8-bit LFSR with Total estimated
polynomial x8+x+1 and seed=01001011. Two consecutive power consumption: 9
patterns T1 and T2 and three intermediate patterns are ---
generated. Vccint 2.50V: 1 3
First and second halves of Tk2 are equal to T1 and T2 Vcco33 3.30V: 2 7
respectively. Tk1 and Tk2are generated using R-injection (R=0 ---
injected in the corresponding bits of Tk1 and Tk2). Clocks: 0 0
Ntrans1,2=7, Ntrans1.,k1=2, Ntransk1,k2=1, Ntransk2,k3=2, Ntransk3,2 =2. Inputs: 0 0
This reduction of PI’s activities reduces the switching Logic: 0 0
activities inside the circuit and eventually power consumption. Outputs:
Having three intermediate patterns between each consecutive Vcco33 0 0
pattern may seem to prolong the test session by a factor of 3. Signals: 0 0
However, empirically many of the intermediate patterns can do ---
as good as conventional LFSR patterns in terms of fault Quiescent
detection. Vccint 2.50V: 1 3
Vcco33 3.30V: 2 7
Thermal summary:
------------------------------------------
Estimated
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W
Decoupling
Network Summary:Cap Range (uF) #
------------------------------------------
Capacitor Recommendations:
Total for Vccint : 8
470.0 - 1000.0 : 1
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2
0.0010 - 0.0047 : 4
---
Total for Vcco33 : 1
Fig 3. Block diagram of 8-bit LP-TPG. 470.0 - 1000.0 : 1
101
NCVCCC-‘08
Vcco33 3.30V: 2 7
---
Clocks: 0 0
Inputs: 0 0
Logic: 0 0
Outputs:
Vcco33 0 0
Signals: 0 0
---
Quiescent
Vcco33 3.30V: 2 7
Thermal summary:
-----------------------------------------
Estimated
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W
Decoupling
Network Summary:Cap Range (uF) #
-----------------------------------------
Capacitor Recommendations: Fig .4. Waveform of LP-LFS
Total for Vccint : 8
470.0 - 1000.0 : 1 VII. PAPER OUTLINE
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2 The proposed technique reduces the correlation between
0.0010 - 0.0047 : 4 --- the test patterns. Original patterns are generated by an LFSR
Total for Vcco33 : 3 and the proposed technique generates and inserts intermediate
470.0 - 1000.0 : 1 patterns between each pair patterns to reduce the primary
0.0010 - 0.0047 : 2 inputs (PI’s) activities which reduces the switching activity
Analysis inside the CUT and hence the power consumption. Adding test
completed: Fri Jan 25 11:00:00 2008 patterns does not prolong the overall test length. Hence
The power report of a low power LP-TPG shows a total application time is still same. The technique of R-injection is
power consumption of 7 mW. This shows that there has been embedded into a conventional LFSR to create LP-TPG.
much reduction in power in an LP-TPG compared to a normal
LFSR. REFERENCES
VI. RESULTS
The LP-LFSR was simulated using Xilinx software. The [1] Y.Zorian, ”A Distributed BIST Control Scheme for
conventional LFSR generated a total power of 9mW whereas Complex VLSI Devices,” in Proc. VLSI Test Symp. (VTS’93),
the LP-TPG has a much reduced power of 7mW. The output pp. 4-9, 1993.
waveform is shown in figure 4. [2] S. Wang and S. Gupta, ”DS-LFSR: A New BIST TPG for
Low Heat Dissipation,” in Proc. Int. Test Conf. (ITC’97), pp.
848-857, 1997.
[3] F. Corno, M. Rebaudengo, M. Reorda, G. Squillero and M.
Violante,”Low Power BIST via Non-Linear Hybrid Cellular
Automata,” in Proc. VLSI Test Symp. (VTS’00),pp. 29-34,
2000.
[4] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, H.
-J. Wunderlich, ”A modified Clock Scheme for a Low Power
BIST Test Pattern Generator,” in Proc. VLSI Test Symp.
(VTS’01), pp. 306-311, 2001.
[5] D. Gizopoulos et. al.,”Low Power/Energy BIST Scheme
for Datapaths,” in Proc. VLSI Test Symp. (VTS’00), pp. 23-28,
2000.
102
NCVCCC-‘08
[6] X. Zhang, K. Roy and S. Bhawmik, ”POWERTEST: A
Tool for Energy Conscious Weighted Random Pattern
Testing,” in Proc. Int. Conf. VLI Design, pp. 416-422, 1999.
[7] S. Wang and S. Gupta,”LT-RTPG: A New Test-Per-Scan
BIST TPG for Low Heat Dissipation,” in Proc. Int. Test Conf.
(ITC’99),pp. 85-941999.
[8] P. Girard et. al.,”Low Energy BIST Design: Impact of the
LFSR TPG Parameters on the Weighted Switching Activity,”
in Proc Int. Symp. on Circuits and Systems (ISCAS’99), pp. ,
1999.
[9] P. Girard, et. al.,”A Test Vector Inhibiting Technique for
Low Energy BIST Dsign,” in Proc. VLSI Test Symp.
(VTS’99),pp. 407-412, 1999.
[10] S. Manich, et. al.,”Low Power BIST by fi ltering Non-
Detecting Vectors,” in Proc. European Test Workshop
(ETW’99), pp. 165-170, 1999.
[11] F. Corno,M. Rebaudengo,M. Sonza Reorda andM.
Violante, ”A New BIST Architecture for Low Power
Circuits,” in Proc. European TestWorkshop (ETW’99), pp.
160-164, 1999.
[12] X. Zhang and K. Roy,”Peak Power Reduction in Low
Power BIST,” in Proc. Int. Symp. on Quality Elect. Design
(ISQED’00),pp. 425-432, 2000.
[13] Synopsys Inc., “User Manuals for SYNOPSYS Toolset
Version 2002.05,” Synopsys, Inc., 2002.
[14] S. Manich and J. Figueras,”Sensitivity of the Worst Case
Dynamic Power Estimation on Delay and Filtering Models,”
in Proc.PATMOS Workshop, 1997.
103
NCVCCC-‘08
104
NCVCCC-‘08
106
NCVCCC-‘08
V. PERFORMANCE EVALUATION
107
NCVCCC-‘08
REFERENCES
108
NCVCCC-‘08
Abstract-- In this paper, a design-for-testability (DFT) defects such as silicide resistive open. Memory elements
technique is proposed to detect resistive open in like latches and flip-flops are widely used in the design of
conducting paths of clocked inverter stage in CMOS digital CMOS integrated circuits. Their application depends
latches and flip-flops is proposed. The main benefit of on the requirements of performance, gate count, power
this paper is, it is able to detect a parametric range of dissipation, area, etc. Resistive opens affecting certain
resistive open defects. The testability of the added DFT branches of fully static CMOS memory elements are
circuitry is also addressed. Application to large number undetected by logic and delay testing. For these opens the
of cells also considered. Comparison with other input data is correctly written and memorized. However,
previously proposed testable latches is carried out. for high resistive opens the latch may fail to retain the
Circuits with the proposed technique have been information after some time in the presence of leakage or
simulated and verified using TSPICE. noise. Testable latches have been proposed for making
Index Terms—Design-for-testability (DFT), flip-flop, detectable stuck-open faults in these otherwise undetectable
latches, resistive open. branches. Reddy have proposed a testable latch where an
additional controllable input is added to the last stage of the
latch. Then a proper sequence of vectors is generated for
I.INTRODUCTION testing these opens. Also, the delay is penalized due to the
added series transistors. Rubio have proposed a testable
The conventional tests cannot detect FET stuck open latch. The number of test vectors is lower than that
faults in several CMOS latches and flip-flops. The stuck- proposed by Reddy. One additional input is required. The
open faults can change static latches and flip-flops into delay is also penalized due to the added series transistors. In
dynamic devices-a danger to circuits whose operation this paper, a design-for-testability (DFT) technique for
requires static memory, since undetected FET stuck-open testing full and resistive opens in undetectable branches of
faults can cause malfunctions. Designs given for several fully static CMOS memory elements is proposed. This is
memory devices in which all single FET stuck-open faults the first testable latch able to cover both full opens and
are detectable. These memory devices include common parametric resistive opens in otherwise undetectable faulty
latches, master-slave flip-flops, and scan-path flip-flops that branches. Design considerations for the DFT circuitry are
can be used in applications requiring static memory stated. The results are compared with previous reported
elements whose operation can be reliably ascertained testable structures.
through conventional fault testing methods. Stuck at faults Here a fault free circuit is taken and simulated. A
occur due thin oxide shorts (the n transistor gate to Vss or faulty circuit is taken and DFT circuitry is added and
the p transistor gate to Vdd), metal metal shorts. Stuck open simulated. Now compare both the results of the simulation
or stuck closed is due to missing source, drain or gate and the fault is located.
connection .A open or break at the drain or source of a
MOSFET give rise to a class of conventional failure called II.DESIGN FLOW
stuck open faults. If a stuck open exits, a test vector may not
always guarantee a unique repeatable logic value at the
output because there is no conducting path from the output
node to either Vdd or Vss. Undetectable opens may occur in
some branches of CMOS latches and flip-flops. This
undetectable opens occur in the clocked inverter stage (CIS)
of the symmetric D-latch. This is because the input data is
correctly written through the driver stage despite the
defective stage. Opens in vias-contacts are likely to occur.
The number of vias-contacts is high in actual integrated
circuits due to the many metal levels. In the damascene-
copper process, vias and metal are patterned and etched
prior to the additive metallization. The open density in
copper shows a higher value than those found in aluminum. Fig.1. DFT design flow
Random particle induced-contact defects are the main test
target in production testing. In addition, silicided opens can
occur due to excess anneal during manufacturing. Low
temperature screening technique can detect cold delay
109
NCVCCC-‘08
III.METHODOLOGY latch is higher (lower) than for the defect-free latch. When
the transistors and are deactivated, the cell evolves to a
The methodology used here is DFT circuitry, which is stable quiescent state. The transistors and are sized such that
used to detect fault in the undetectable branches in CMOS the state of the defective latch flips its state but the state of
latches and flip-flops. This open cannot be detected by the defect-free latch remains unchanged.
delay and logic testing. This approach not only considers Let Vpg is the PMOS gate voltage and Vng is the NMOS
stuck-open faults, but also resistive opens in the CIS gate voltage. L and W correspond to length and width of the
branches. Opens are modeled with a lumped resistance transistors. Rop corresponds to resistive open. Based on the
which can take a continuous range of values. The proposed values of Vpg, Vng, L, W, we get different resistive open
testable CMOS latch cell has four additional transistors and
only one control signal are required. The network under
test (NMOS or PMOS) is selected by proper initialization of
the latch state
IV.DFT PROPOSAL
110
NCVCCC-‘08
V. TESTABILITY OF THE DFT CIRCUITRY simultaneously activated the current drawn from the power
supply could be important. Due to the high current density,
The testability of the added DFT circuitry is done. The mass transport due to the momentum transfer between
DFT circuitry is composed of the transistors MTP, MTN and conducting electrons and diffusion metal atoms can occur.
the inverter (see Fig. 3). Let us focus in the transistors MTP This phenomenon is known as electro migration. As a
and MTN. Defects affecting the DFT inverter can be consequence the metal lines can be degraded and even an
analyzed in a similar way. Stuck-open faults, resistive open open failure can occur. The activation of the DFT circuitries
defects and stuck-on faults are considered. Resistive opens for blocks of scan cells can be skewed to minimize stressing
located in conducting paths of the two DFT transistors can on the power buses during test mode. This is implemented
be tested using the same procedure than for opens affecting inserting delay circuitries in the path of the control signal of
undetectable branches of the latch. For a stuck-open fault at blocks of scan cells (see Fig. 7).In this way, the activation
the NMOS DFT transistor (see Fig. 3) the latch is initialized of the DFT circuitries of each block of scan cells is time
to one logic. When in memory phase the two DFT skewed. Hence, at a given time there is a stressing current
transistors are activated, the voltage at Qbar(Q) pulse due to only one block of flip-flops. For comparison
increases(decreases). The voltage at Qbar(Q) tends to a purposes, the current drawn from the power supply for 4
higher (lower) value than for the defect-free case because symmetrical flip-flops cells simultaneously activated and
the NMOS transistor if off. After the two DFT transistors time skewed is shown in fig. 8.and fig.9. In this example,
are deactivated the defect-free (defective) maintains the scan chain has been divided in three blocks of 4 cells
(changes) the initialized state. Hence, the defect is detected. each one. A delay circuitry composed of 4 inverters has
Resistive opens are tested in a similar way. Low values of been implemented.
resistive opens can be detected. For the used latch topology
resistive opens as low as 5 k is detectable.
111
NCVCCC-‘08
VII. COMPARISON WITH OTHER TESTABLE
LATCHES
VII. CONCLUSION
Technique Add Add RDET
Inpu Trans A DFT technique to test resistive opens in otherwise
t . undetectable branches in fully static CMOS latches and flip-
[2] 1 4 R∞ flops has been proposed. The main benefit of this proposal
[3] 2 4 R∞ is that it is able to detect a parametric range of resistive
This 1 4 >40k- opens with reduced performance degradation. We can
Proposal ∞ apply this DFT technique for other flipflops.
Table.1.Comparison with other testable latches
REFERENCES
Table.1. shows a comparison between our proposal and
other testable latch structures [2], [3]. This proposal [1]Antonio Zenteno Ramirez, Guillermo Espinosa, and
requires one additional input. The number of additional Victor Champac “Design-for-Test Techniques for Opens in
inputs for proposals previously reported is also given. In Undetected Branches in CMOS Latches and Flip-Flops,”
this proposal, the number of additional transistors per cell is IEEE Transaction on VLSI Systems, vol.15, no. 5, may
smaller than for the other techniques. The delay 2007.
penalization using our proposal is significantly small. This [2] M. K. Reddy and S. M. Reddy, “Detecting FET stuck-
technique requires eight vectors for testing both CIS open faults in CMOS latches and flip-flops,” IEEE Design
branches of the latch. For testing one branch, the first vector Test, vol. 3, no. 5, pp. 17–26, Oct. 1986.
writes the desired state into the latch. The second vector [3] A. Rubio, S. Kajihara, and K. Kinoshita, “Class of
memorizes this state. Then, the third vector activates the undetectable stuck open branches in CMOS memory
DFT circuitry and the fourth vector deactivates the DFT elements,” Proc. Inst. Elect. Eng.-G, vol. 139, no. 4, pp.
circuitry. A similar sequence is required for complementary 503–506, 1992.
branch. The main benefit of this proposal is that it can [4] C. -W. Tseng, E. J. McCluskey, X. Shao, and D. M. Wu,
detect a parametric range of the resistance of the open. The “Cold delay defect screening,” in Proc. 18th IEEE VLSI
other proposals only detect a line completely open (or Test Symp., 2000, pp. 183–188.
infinite resistive open). [5]Afzel Noore,”Reliable detection of CMOS stuck open
faults due to variable internal delays”,IEICE Electronics
Express,vol..2, no.8, pp. 292-297.
[6] S. M. Samsom, K. Baker, and A. P. Thijssen, “A
comparative analysis of the coverage of voltage and tests of
realistic faults in a CMOS flip-flop,” in Proc. ESSCIRC
20th Eur. Solid-State Circuits Conf., 1994, pp. 228–231.
[7] K. Banerjee, A. Amerasekera, N. Cheung, and C. Hu,
“High-current failure model for VLSI interconnects under
short-pulse stress conditions,” IEEE Electron Devices Lett.,
vol. 18, no. 9, pp. 405–407, Sep.1997.
112
NCVCCC-‘08
113
NCVCCC-‘08
• Component sharing: a given component instance have taken those elements for receiver array which
can be included (or shared) by more than one will never cause an overlapping.
component.
D. Pulse-Echo Response
• Binding components: a single abstraction for
components connections that is called bindings. The layout should have optimal pulse-echo
Bindings can embed any communication semantics performance, i.e. the pulse-echo radiation pattern should
from synchronous method calls to remote have as low sidelobe level as possible for a specified
procedure calls mainlobe width for all angles and depths of interest. To
compute the pulse-echo response for a given transmit and
• Execution model independence: no execution receive layout is time consuming. A simplification
model is imposed. In that, components can be run commonly used is to evaluate the radiation properties in
within other execution models than the classical continuous wave mode in the far field. An optimal set of
thread-based model such as event-based models layouts for continuous waves does not necessarily give
and so on. optimal pulse-echo responses. To ensure reasonable pulse-
echo performance, additional criteria which ensure a
• Open: extra-functional services associated to a uniform distribution of elements could be introduced. This
component can be customized through the notion will limit the interference in the sidelobe region between
of a control membrane. pulses transmitted from different elements and reduce the
sidelobe level.
A. Sierpinski Fractal
• Sierpinski triangle
• Sierpinski carpet
B. Sierpniski Triangle
In this paper we are mainly considering carpet layout III. RESULTS AND DISCUSSION
because we are considering 2-D array. Fractal layout exploits the advantages of both the
periodic and random arrays. Our main aim is to suppress the
• Transmitter array: transmit array is drawn using a
sidelobes and to narrow down the mainlobe. Firstly we have
matrix M consisting of both ones and zeros. These
created transmit and receive array layouts. Both the layouts
arrays have been constructed by considering a
have been constructed in such a way they both won’t
large array of element surrounded by a small
overlap each other. Transmit array is designed using a
matrix. In carpet fractal array first of all we have
matrix M. Iterations up to 3, were taken to construct the
drawn a square at the right middle and this small
transmit array. The intensity distributions were taken to find
square will occupy 1/3rd of the original big array.
out the spreading of the sidelobe and the mainlobe.
Surrounding the above built square we have
In our paper we have taken into consideration different
constructed small squares.
specifications such as speed of the sound wave i.e. 1540
m/s, initial frequency, sampling frequency as 100.10^6 HZ,
• Receiver array: in the sparse 2-D array layout to
width and height of the array, kerf is also considered that is
avoid overlapping we are selecting different
the height between the elements in an array.
receiver and transmitter arrays. In our paper we
114
NCVCCC-‘08
A. case I: kerf = 0
(c ) Pulse-Echo Response
115
NCVCCC-‘08
116
NCVCCC-‘08
REFERENCES
IV. CONCLUSION
To construct a 2-D array for 4-D ultrasound imaging we
need to meet many constraints in which an important one is
regarding the mainlobe and sidelobe level. To execute this
we are going for pulse-echo response. We have shown it is
possible to suppress the unwanted sidelobe levels by
adjusting different parameters of the array layout. We have
also shown the changes in the intensity level while
adjusting the spacing between array elements. As a future
we will calculate the mainlobe BW, ISLR and the sidelobe
peak value to take the correct fractal, the above shown
parameters will affect the image quality.
117
NCVCCC’08
Abstract: With the rapid growth of Internet technologies information like a company logo to indicate the ownership of
and wide availability of multimedia computing facilities, the multimedia. The visible watermarking causes distortion
the enforcement of multimedia copyright protection of the cover image, and hence the invisible watermarking is
becomes an important issue. Digital watermarking is more practical. Invisible watermarking, as the name
viewed as an effective way to deter content users from suggests, the watermark is imperceptible in the watermarked
illegal distributing. The watermark can be used to image. Invisible watermarking can be classified into three
authenticate the data file and for tamper detection. This types, robust, fragile and semi-fragile.
is much valuable in the use and exchange of digital A popular application of watermarking techniques is to
media, such as audio and video, on emerging handheld provide a proof of ownership of digital data by embedding
devices. However, watermarking is computationally copyright statements into video or image digital products.
expensive and adds to the drain of the available energy in Automatic monitoring and tracking of copy-write material
handheld devices.This paper analyzes the performance of on web, automatic audit of radio transmissions, data
energy, average power and execution time of various augmentation, fingerprinting applications, all kind of data
watermarking algorithms. Also we propose a new like audio, image, video, formatted text models and model
approach in which a partition is made for the animation parameters are examples where watermarking can
watermarking algorithm to embed and extract by be applied.
migrating some tasks to the proxy server. Security To allow the architecture to use a public-key security
measures have been provided by DWT, which leads to a model on the network while keeping the devices themselves
lower energy consumption on the handheld device simple, we create a software proxy for each device. All
without compromising the security of the watermarking objects in the system, e.g., appliance, wearable gadgets,
process. Proposed approach shows that executing the software agents, and users have associated trusted software
watermarking tasks that are partitioned between the proxies that either run on an embedded processor on the
proxy and the handheld devices, reduce the total energy appliance, or on a trusted computer.
consumed by a good factor, and improve the In the case of the proxy running on an embedded
performance by two orders of magnitude compared to processor on the appliance, we assume that device to proxy
running the application on only the handheld device. communication is inherently secure. If the device has
minimal computational power and communicates to its
Keywords:- energy consumption, mobile computing, proxy through a wired or wireless network, we force the
proxy server, security, watermarking. communication to adhere to a device to proxy protocol. The
proxy is software that runs on a network-visible computer.
The proxy’s primary function is to make access-control
I. INTRODUCTION decisions on behalf of the device it represents. It may also
Watermarking is used to provide copyright protection perform secondary functions such as running scripted actions
for digital content. A distributor embeds a mark into a digital on behalf of the device and interfacing with a directory
object, so ownership of this digital object can be proved. service. The device to proxy protocol varies for different
This mark is usually a secret message that contains the types of devices. In particular, we consider lightweight
distributor’s copyright information. The mark is normally devices with higher bandwidth devices with low bandwidth
embedded into the digital object by exploiting the usually wireless network connections and slow CPUs and
inherent information redundancy. heavyweight devices with higher bandwidth connections and
The problem arises when a dishonest user tries to delete facter CPUs.
the mark in the digital object before redistribution in order to It was assumed that heavyweight devices are capable of
claim ownership. In consequence, the strength of running proxy software locally. With a local proxy, a
watermarking schemes must be based on the difficulty of sophisticated protocol for secure device to proxy
locating and changing the mark. There are many communication is unnecessary, assuming critical parts of the
watermarking approaches that try to protect the intellectual device are tamper resistant. For lightweight devices, the
property of multimedia objects, especially images, but proxy must run elsewhere.
unfortunately very little attention has been given to software The proxy and device communicate through a secure
watermarking. channel that encrypts and authenticates all the messages.
There are two kinds of digital watermarking, visible and Different algorithms are used for authentication and
invisible. The visible watermarking contains visible encryption. It may use symmetric keys. In this paper the
118
NCVCCC’08
energy profile of various watermarking algorithms are Figure 1 shows our implementation of a watermarking
analyzed, and also analyzed the impact of security and image system in which multimedia content is streamed to a
quality on energy consumption. handheld device via a proxy server. This system consists of
Then a task partitioning scheme for wavelet based three components: mobile devices, proxy servers, and
image watermarking algorithms in which computationally content servers.
expensive portions of the watermarking are offloaded to a A mobile or handheld device refers to any type of
proxy server. The proxy server acts as an agent between the networked resource; it could be handheld (PDA), a gaming
content server and the handheld device is used for various device, or a wireless security camera. Content servers store
other tasks such as data transcoding, load management. The multimedia and database content and stream data (images) to
partitioning scheme can be used to reduce energy a client as per request. All communication between the
consumption associated with watermarking on the handheld mobile devices and the servers are relayed through the proxy
without compromising the security of the watermarking servers.
process. Proxy servers are powerful servers that can, among
other things, compress/decompress images, transcode video
II. WATERMARKING in real-time, access/provide directory services, and provide
services based on a rule base for specific devices. Figure 2
shows the general process of watermarking image data,
The increasing computational capability and availability
where the original image (host image) is modified using a
of broadband in emerging handheld devices have made them
signature to create the watermarked image.
true endpoints of the internet. They enable users to download
In this process, some error or distortion is introduced.
and exchange a wide variety of media such as e-books, To ensure transparency of the embedded data, the amount of
images, etc. Digital watermarking has been proposed as a image distortion due to the watermark embedding process
technique for protecting intellectual property of digital data. has to be small. There are three basic tasks in the
It is the process of embedding a signature/watermark watermarking process with respect to an image as shown in
into a digital media file so that it is hidden from view, but figure 2. A watermark is embedded either in the spatial
can be extracted on demand to verify the authenticity of the domain or in the frequency domain. Detection and extraction
media file. The watermark can be a binary data, a logo, or a refers to whether an image has a watermark and extracting
seed value to a pseudorandom number generator to produce the full watermark from the image. Authentication refers to
a sequence of numbers with a certain distribution. comparing the extracted water mark with the original
Watermarking can be used to combat fraudulent use of watermark.
wireless voice communications, authenticating the identity
of cell phones and transmission stations, and securing the
delivery of music and other audio content. Watermarking
bears a large potential in securing such applications, for
example, e-fax for owner verification, customer
authentication in service delivery, and customer support.
Watermarking algorithms are designed for maximum
security with little or no consideration for other system
constraints such as computational complexity and energy
availability. Handheld devices such as PDAs and cell phones
have a limited battery life that is directly affected by the
amount of computational burden placed by the application.
Digital watermarking tasks place an additional burden on the
available energy in these devices.
Watermarking, like steganography, seeks to hide
Figure 2 Watermarking process (a) watermark generation
information inside another object. Therefore, it should be
and embedding (b) watermark extraction and authentication
resilient to intentional or unintentional manipulations and
resistant to watermark attacks. Although several techniques
Watermarks are used to detect unauthorized
have been proposed for remote task execution for power
modifications of data and for ownership
management, these do not account for the application
security during the partitioning process.
authentication. Watermarking techniques for images and
video differ in that watermarking in video streams takes
advantage of the temporal relation between frames to embed
water marks.
120
NCVCCC’08
TABLE I Embedding Energy, Power and Execution Time Wang 88.00 0.59 147.90
Analysis Xia 82.70 0.57 144.51
Xie 74.06 1.00 73.88
Avg. Zhu 158.80 1.16 137.38
Exec.
Power
Algorithm Energy(J) Time
(W = TABLE III Authentication of Energy, Power and Execution
(s)
J/s) Time Analysis
Bruyndonckx 1.47 0.11 13.46
Corvi 83.20 0.61 136.15 Avg.
Cox 126.00 1.10 115.23 Exec.
Power
Dugad 68.70 0.50 136.64 Algorithm Energy(J) Time
(W =
(s)
Fridrich 196.00 1.15 171.00 J/s)
Kim 73.50 0.52 140.81 Bruyndonckx 0.02 0.59 0.034
Koch 2.19 0.17 12.64 Corvi 0.10 0.73 0.138
Wang 85.80 0.61 140.20 Cox 0.05 1.35 0.037
Xia 90.00 0.67 133.82 Dugad 0.03 0.97 0.031
Xie 154.80 1.05 147.07 Fridrich 0.18 1.36 0.132
Zhu 163.30 1.14 143.74 Kim 0.10 0.76 0.131
Koch 0.04 1.25 0.032
Table I lists the energy usage, average power Wang 0.08 1.36 0.059
(energy/execution time), and execution time for watermark Xia 0.08 1.40 0.057
embedding by the various watermarking algorithms when Xie 0.04 1.00 0.039
they are executed on the handheld device. Calculating Zhu 0.06 1.20 0.050
wavelet and inverse-wavelet transforms is computationally
expensive and, thus, also power hungry. V. CONCLUSION
The large variation in the power consumption of the
different algorithms can be in part attributed to the difference In this paper the energy characteristics of several
in the type of instructions executed in each case. The wavelet based image watermarking algorithms are analyzed
instruction sequence executed is largely dependent on and designed a proxy-based partitioning technique for
algorithmic properties which enable certain optimizations energy efficient watermarking on mobile devices. The
such as vectorization and on the code generated by the energy consumption due to watermarking tasks can be
compiler. minimized for the handheld device by offloading the tasks
We present the energy, power, and execution time completely to the proxy server with sufficient security. So
analysis of watermark extraction in Table II. Watermark this approach maximizes the energy savings and ensures
extraction is more expensive than watermark embedding. security. These approaches can be enhanced by providing
During extraction, the transform is carried out on both the some error correction codes while embedding and on
input image and the output image, and the corresponding extraction stages.
coefficients are normalized.
The correlation between the normalized coefficients of REFERENCES
the input and output is used as a measure of the fidelity of
the watermarked image. The overhead of computing band
[1] A. Fox and S. D. Gribble, “Security on the move:
wise correlation and image normalization accounts for the
Indirect authentication using kerberos,” in Proc.
higher energy consumption.
Mobile Computing Networking, White Plains, NY,
In Table III, we list the energy, power, and execution
1996, pp. 155–164.
time for watermark authentication. This task is
[2] B. Zenel, A Proxy Based Filtering Mechanism for the
computationally inexpensive, since it involves a simple
Mobile Environment Comput. Sci. Dept., Columbia
comparison of the extracted watermark and the original
University, New York, 1995, Tech. Rep. CUCS-0-95.
watermark.
[3] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kuenning, “The remote processing framework for
TABLE II Extracting Energy, Power and Execution Time
portable computer power saving,” in Proc. 1999 ACM
Analysis
Symp. Appl. Comput., 1999, pp. 365–372.
[4] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
Avg. remote task execution for power management: A case
Power Exec. study, Compaq Cambridge Research Laboratory
Algorithm Energy(J)
(W = Time (s) (CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
J/s) [5] P. Rong and M. Pedram, “Extending the lifetime of a
Bruyndonckx 0.22 0.79 0.28 network of battery-powered mobile devices by remote
Corvi 70.30 0.47 150.77 processing: A markovian decision-based approach,”
Cox 121.00 0.95 128.02 in Proc. 40th Conf. Des. Automat., 2003, pp. 906–
Dugad 38.40 0.49 79.00 911.
Fridrich 191.00 1.10 173.60 [6] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kim 91.30 0.55 166.57 Kuenning, “The remote processing framework for
Koch 0.61 0.61 1.00
121
NCVCCC’08
portable computer power saving,” in Proc. 1999 ACM
Symp. Appl. Comput., 1999, pp. 365–372.
[7] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
remote task execution for power management: A case
study, Compaq Cambridge Research Laboratory
(CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
[8] F. Hartung, J. K. Su, and B. Girod, “Spread spectrum
watermarking: Malicious attacks and counterattacks,”
in Security Watermarking Multimedia Contents,
1999, pp. 147–158.s
[9] W. Diffie and M. E. Hellman, “New directions in
cryptography,” IEEE Trans. Inform. Theory, vol. IT-
22, no. 6, pp. 644–654, Nov. 1976. 25] W. Diffie and
M. E. Hellman, “New directions in cryptography,”
IEEE Trans. Inform. Theory, vol. IT-22, no. 6, pp.
644–654, Nov. 1976.
[10] I. Cox, J. Kilian, T. Leighton, and T. Shamoon,
“Secure spread spectrum watermarking for
multimedia,” IEEE Trans. Image Process., vol.S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,
1999.
[11] Arun Kejariwal (S’02) received the B. Tech. degree in
electrical engineering from the Indian Institute of
Technology (IIT), New Delhi, India, in 2002. S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,
1999.
122
NCVCCC’08
II.CONVENTIONAL CRYPTOGRAPHY
Abstract-Internet-enabled wireless devices continue to
proliferate and are expected to surpass traditional In conventional cryptography, also called secret-key or
Internet clients in the near future. This has opened up symmetric-key encryption, one key is used both for
exciting new opportunities in the mobile e-commerce encryption and decryption. The Data Encryption Standard
market. However, data security and privacy remain (DES) is an example of a conventional cryptosystem that is
major concerns in the current generation of “Wireless widely employed by the Federal Government. Figure is an
Web” offerings. All such offerings today use a security illustration of the conventional encryption process.
architecture that lacks end-to-end security. This
unfortunate choice is driven by perceived inadequacies of Key management and conventional encryption
standard Internet security protocols like SSL on less Conventional encryption has benefits. It is very fast. It is
capable CPUs and low-bandwidth wireless lines. This especially useful for encrypting data that is not going
article presents our experiences in implementing and anywhere. However, conventional encryption alone as a
using standard security mechanisms and protocols on means for transmitting secure data can be quite expensive
small wireless devices. We have created new classes for simply due to the difficulty of secure key distribution. Recall
Java 2 Micro-Edition platform that offer fundamental a character from your favorite spy movie: the person with a
cryptographic operations such as message digests and locked.Briefcase handcuffed to his or her wrist. What is in
ciphers as well as higher level security protocols solution the briefcase, anyway? It’s the key that will decrypt the
for ensuring end-to-end security of wireless Internet secret data. For a sender and recipient to communicate
transactions even within today’s technological securely using conventional encryption, they must agree
constraints. upon a key and keep it secret between themselves. If they
are in different physical locations, they must trust a courier,
the Bat Phone, or some other secure communication medium
I. CRYPTOGRAPHY
to prevent the disclosure of the secret key during
transmission. Anyone who overhears or intercepts the key in
Cryptography is the science of using mathematics to encrypt transit can later
and decrypt data. Cryptography enables you to store read, modify, and forge all information encrypted or
sensitive information or transmit it across insecure networks authenticated with that key.
(like the Internet) so that it cannot be read by anyone except
the intended recipient. While cryptography is the science of III. PUBLIC KEY CRYPTOGRAPHY
securing data, cryptanalysis is the science of analyzing and
breaking secure communication. The problems of key distribution are solved by public key
Classical cryptanalysis involves an interesting combination cryptography, the concept of which was introduced by
of analytical reasoning, application of mathematical tools, Whitfield Diffie and Martin Hellman in 1975. Public key
pattern finding, patience, determination, and luck. cryptography is an asymmetric scheme that uses a pair of
Cryptanalysts are also called attackers. Cryptology embraces keys for encryption: a public key, which encrypts data, and a
both cryptography and cryptanalysis. PGP is also about the corresponding private, or secret key for decryption. You
latter sort of Cryptography. Cryptography can be strong or publish your public key to the world while keeping your
weak, as explained above. Cryptographic strength is private key secret. Anyone with a copy of your public key
measured in the time and resources it would require to can then encrypt information that only you can read. Even
recover the plaintext. The result of strong cryptography is people you have never met. It is computationally infeasible
cipher text that is very difficult to decipher without to deduce the private key from the public key. Anyone who
possession of the appropriate decoding tool. How difficult? has a public key can encrypt information but cannot decrypt
Given all of today’s computing power and available time— it.Only the personwho has the corresponding private key can
even a billion Computers doing a billion checks a second—it decrypt Information.
is not possible to decipher the result of strong cryptography
before the end of the universe. One would think, then, that
strong cryptography would hold up rather well against even
an extremely determined cryptanalyst. Who’s really to say?
No one has proven that the strongest encryption obtainable
today will hold up under tomorrow’s computing power.
However, the strong cryptography employed by PGP is the
best available today.
123
NCVCCC’08
124
NCVCCC’08
the information. These features are every bit as fundamental knows that you just deposited $1000 in your account, but
to cryptography as privacy, if not more. A digital signature you do want to be darn sure it was the bank teller you were
serves the same purpose as a handwritten signature. dealing with. The basic manner in which digital signatures
However, a handwritten signature is easy to counterfeit. A are created is illustrated. Instead of encrypting information
digital signature is superior to a handwritten signature in that using someone else’s public key, you encrypt it with your
it is nearly impossible to counterfeit, plus it attests to the private key. If the information can be decrypted with your
contents of the information as well as to the Identity of the public key, then it must have originated with you.
signer. Some people tend to use signatures more than they
use encryption. For example, you may not care if anyone
V. RSA ENCRYPTION Suppose I give you the number 1459160519. I'll even tell
you that I got it by multi-Person a selects two prime
Public Key Cryptography numbers.
1. We will use p = 23 and q = 41 for this example, but keep
One of the biggest problems in cryptography is the in mind that the real numbers person A should use should be
distribution of keys. Suppose you Live in the United States much larger.
and want to pass information secretly to your friend in 2. Person A multiplies p and q together to get PQ = (23)(41)
Europe. If you truly want to keep the information secret, you = 943. 943 are the public keyî, which he tells to person B
need to agree on some sort of key That you and he can use (and to the rest of the world, if he wishes).
to encode/decode messages. But you don't want to keep 3. Person A also chooses another number e, which must be
using The same key or you will make it easier and easier for relatively prime to (p _ 1)
others to crack your cipher.But it's also a pain to get keys to (q _1 ) In this case, (p _ 1)(q _ 1) = (22)(40) = 880, so e = 7
your friend. If you mail them, they might be stolen. If is _ne. e is
You send them cryptographically, and someone has broken Also part of the public key, so B also is told the value of e.
your code, that person will Also have the next key. If you 4. Now B knows enough to encode a message to A. Suppose,
have to go to Europe regularly to hand-deliver the next Key, for this example, that
that is also expensive. If you hire some courier to deliver the The message is the number M = 35.
new key, you have to Trust the courier, etcetera.
5. B calculates the value of C = Me (mod N) = 357(mod
943).
RSA Encryption
6. 357 = 64339296875 and 64339296875(mod 943) = 545.
The number 545 is
In the previous section we described what is meant by a trap-
The encoding that B sends to A.
door cipher, but how do you make one? One commonly used
7. Now A wants to decode 545. To do so, he needs to _nd a
cipher of this form is called RSA Encryption, where RSA
number d such that
are the initials of the three creators: Rivest, Shamir, and
Ed = 1(mod (p _ 1)(q _ 1)), or in this case, such that 7d =
Adleman. It is based on the following idea: It is very simply
1(mod 880). A
to multiply numbers together, especially with computers.
Solution is d = 503, since 7 _ 503 = 3521 = 4(880) + 1 =
But it can be very difficult to factor numbers. For example,
1(mod 880).
if I ask you to multiply together 34537 and 99991, it is a
8. To _nd the decoding, A must calculate Cd (mod N) =
simple matter to punch those numbers into a calculator and
545503(mod 943). This
3453389167. But the reverse problem is much harder.
Looks like it will be a horrible calculation, and at _rst it So this means that 545503 =
seems like it is, but notice That 503 = 545256+128+64+32+16+4+2+1 = 545256545128 _ _ _
256+128+64+32+16+4+2+1 (this is just the binary 5451: But since we only care about the result (mod 943), we
expansion of 503). can calculate all the partial results in that modulus, and by
repeated squaring of 545, we can get all
125
NCVCCC’08
the exponents that are powers of 2. For example, 5452(mod 54532(mod 943) = 795
943) = 545 _ 545 = 297025(mod 943) = 923. Then square 54564(mod 943) = 215
again: 5454(mod 943) = (5452)2(mod 943) = 923 _ 923 = 545128(mod 943) = 18
851929(mod 943) = 400, and so on. We obtain the following 545256(mod 943) = 324
table: So the result we want is:
5451(mod 943) = 545 545503(mod 943) = 324 _ 18 _ 215 _ 795 _ 857 _ 400 _ 923
5452(mod 943) = 923 _ 545(mod 943) = 35:
5454(mod 943) = 400 Using this tedious (but simple for a computer) calculation, A
5458(mod 943) = 633 can decode B's message
54516(mod 943) = 857 And obtain the original message.
10000
VII. CONCLUSIONS AND FUTURE WORK
Our experiments done with RSA & Other crypto 5000
analytic algorithm show that SSL is a viable technology
even for today’s mobile devices and wireless networks. By
carefully selecting and implementing a subset of the SYMMETRIC 0
protocol’s many features, it is possible to ensure acceptable
performance and compatibility with a large installed base to ECC 1 2
secure Web servers while maintaining a small memory 3 4
footprint. Our implementation brings mainstream security DH/RSA
mechanisms, trusted on the wired Internet, to wireless
devices for the first time. explore the use of smart cards as hardware accelerators and
The use of standard SSL ensures end-to-end Elliptic Curve Cryptography in our implementations.
security, an important feature missing from current wireless
architectures. The latest version of J2ME MIDP
incorporating KSSL can be downloaded.
In our ongoing effort to further enhance
cryptographic performance on small devices, we plan to
126
NCVCCC’08
REFERENCES
127
NCVCCC’08
Abstract- This paper discusses about face recognition. Normalization includes the segmentation, alignment and
Where face recognition refers to an automated or semi- normalization of the face images. Finally, recognition
automated process of matching facial images. Since it has includes the representation and modeling of face images as
got its own disadvantage the thermal face recognition is identities, and the association of novel face images with
used. The major advantage of using thermal infrared known models. In order to realize such a system, acquisition,
imaging is to improve the face recognition performance. normalization and recognition must be performed in a
While conventional video cameras sense reflected light, coherent manner.
thermal infrared cameras primarily measure emitted The thermal infrared (IR) spectrum comprises mid-
radiation from objects such as faces [1]. Thermal wave infrared (MWIR) ranging from (3-5 µm), and long-
infrared (IR) imagery offers a promising alternative to wave infrared (LWIR) ranging from (8-12 µm), all longer
visible face recognition as it is relatively insensitive to than the visible spectrum is from (0.4-0.7 µm). Thermal IR
variations in face appearance caused by illumination imagery is independent of ambient lighting since thermal IR
changes. The fusion of visual and thermal face sensors only measure the heat emitted by objects [3]. The
recognition can increase the overall performance of face use of thermal imagery has great advantages in poor
recognition systems. Visual face recognition systems illumination conditions, where visual face recognition
perform relatively well under controlled illumination systems often fail. It will be a highly challenging task if we
conditions. Thermal face recognition systems are want to solve those problems using visual images only.
advantageous for detecting disguised faces or when there
is no control over illumination. Thermal images of II.VISUAL FACE RECOGNITION
individuals wearing eyeglasses may result in poor
performance since eyeglasses block the infrared A face is a three-dimensional object and can be
emissions around the eyes, which are important features seen differently according to inside and outside elements.
for recognition. With taking advantages of each visual Inside elements are expression, pose, and age that make the
and thermal image, the new fused systems can be face seen differently. Outside elements are brightness, size,
implemented in collaborating low-level data fusion and lighting, position, and other Surroundings. The face
high-level decision fusion [4, 6].This survey was further recognition uses a single image or at most a few images of
carried out through neural network and support vector each person are available and a major concern has been
machine. Neural networks have been applied successfully scalability to large databases containing thousands of people.
in many pattern recognition problems, such as optical Face recognition addresses the problem of identifying or
character recognition, object recognition, and verifying one or more persons by comparing input faces with
autonomous robot driving. The advantage of using the face images stored in a database [6].
neural networks for face recognition is the feasibility of While humans quickly and easily recognize faces
training a system to capture the face patterns. However, under variable situations or even after several years of
one drawback of network architecture is that it has to be separation, the problem of machine face recognition is still a
extensively tuned (number of layers, number of nodes, highly challenging task in pattern recognition and computer
learning rates, etc.) to get exceptional performance. vision. Face recognition in outdoor environments is a
Support Vector Machines can also be applied to face challenging task especially where illumination varies
detection [8]. Support vector machines can be considered greatly. Performance of visual face recognition is sensitive
as a new paradigm to train polynomial function, or to variations in illumination conditions. Since faces are
neural networks. essentially 3D objects, lighting changes can cast significant
shadows on a face. This is one of the primary reasons why
I.INTRODUCTION current face recognition technology is constrained to indoor
access control applications where illumination is well
Face recognition has developed over 30 years and is controlled. Light reflected from human faces also varies
still a rapidly growing research area due to increasing significantly from person to person. This variability, coupled
demands for security in commercial and law enforcement with dynamic lighting conditions, causes a serious problem.
applications. Although, face recognition systems have Face recognition can be classified into two broad
reached a significant level of maturity with some practical categories: feature-base and holistic methods. The analytic
success, face recognition still remains a challenging problem or feature-based approaches compute a set of geometrical
due to large variation in face images. Face recognition is features from the face such as the eyes, nose, and the mouth.
usually achieved through three steps: acquisition, The holistic or appearance-based methods consider the
normalization and recognition. This acquisition can be global properties of the human face pattern.
accomplished by digitally scanning an existing photograph Data reduction and feature extraction schemes
or by taking a photograph of a live subject [2]. make the face recognition problem computationally
128
NCVCCC’08
tractable. Some of the commonly used methods for visual CELLULAR NEURAL NETWORK
face recognition is as follows,
Cellular neural networks or cellular nonlinear
NEURAL NETWORK BASED FACE RECOGNITION networks (CNN) provide an attractive paradigm for very
large-scale integrated (VLSI) circuit architecture in
A neural network can be used to detect frontal view applications devoted to pixel-parallel image processing. The
of faces. Each network is trained to provide the output as the resistive-fuse network is well-known as an effective model
presence or absence of a face [9]. In this the training for image segmentation, and some analog circuits
methods are designed to be general, with little customization implementing. Gabor filtering is an effective method for
for faces. Many face detection have used the idea that facial extracting the features of images, and it is known that such
images can be characterized directly in terms of pixel filtering is used in the human vision system. A flexible face
intensities. The algorithm such as neural network-based face recognition technique using this method has also been
detection method describes a retinally connected neural proposed [19]. To implement Gabor-type filter using analog
network examines small windows of an image, and decides circuits, CNN models have been proposed. A pulse-width
whether each window contains a face. It arbitrates between modulation (PWM) approach technique is used for achieving
multiple networks to improve performance over a single time-domain analog information processing. The pulse
network signals which have digital values in the voltage domain and
Training a neural network for the face detection analog values in the time domain. The PWM approach is
task is challenging because of the difficulty in characterizing suitable for the large-scale integration of analog processing
prototypical “no face” images. The two classes to be circuits because it matches the scaling trend in Si CMOS
discriminated in face detection are “images containing technology and leads to low voltage operation [20]. It also
faces” and “images not containing faces”. It is easy to get a has high controllability and allows highly effective matching
representative sample of images which contain faces, but with ordinary digital systems.
much harder to get a representative sample of those which
do not contain faces. III.THERMAL FACE RECOGNITION
129
NCVCCC’08
DTDA = SA = CA
131
NCVCCC’08
Abstract:- In this paper, three different impulse noise impulse noise detection, refinement, and impulse noise
removal algorithms are implemented and their cancellation, which replaces the values of identified noisy
performances are analysed. First algorithm uses alpha pixels with median value.
trimmed mean based approach to detect the impulse
noise. Second algorithm follows the principle of A.IMPULSE NOISE DETECTION
multistate median filter. Third algorithm works under
the principle of thresholding. Experimental result shows Let I denote the corrupted, noisy image of
that these algorithms are capable of removing impulse size l1 × l2 , and X ij is its pixel value at position ( i, j ) . Let
noise effectively compared to many of the standard filters
in terms of quantitative and qualitative analysis. Wij denote the window of size ( 2 Ld + 1) × ( 2 Ld + 1)
centered about. X ij .
I. INTRODUCTION
t− α t
1
The acquisition or transmission of digital images caused by
α
M ij ( I ) = ∑ X
t − 2* α t i = α t +1 (i)
through sensors or communication channels is often
interfered by impulse noise. It is very important to eliminate
noise in the images before subsequent processing, such as t = ( 2 Ld + 1)2 . is the trimming parameter that assumes
image segmentation, object recognition, and edge detection. values between 0 and 0.5, X(i) represents the ith data item in
Two common types of impulse noise are the salt and pepper
the increasingly ordered samples of Wij i.e.
noise and the random value impulse noise. There are large
number of techniques have been proposed to remove x(1) x(2) ………x(t). That is,
impulse noise from corrupted images. Many existing X ( i ) =ith smallest (Wij ( I ))
methods are an impulse detector to determine whether a
α
pixel should be modified. Images corrupted by salt and The alpha trimmed mean M ij ( I ) with appropriately
pepper noise, the noisy pixels can take only the maximum chosen α ,represents the approximately the average noise
and minimum values. Median filter[6] was once the most
popular non linear filter for removing impulse noise, because free pixel values within the window (Wij ( I )) Absolute
of its good denoising power and computational efficiency. difference between xij and M ijα ( I )
However, when the noise level is over 50%, some details
and edges of the original image are smeared by the filter.
r = x M α (I ) .
Different remedies of the median filter have been proposed, ij ij − ij
e.g. the adaptive median filter, the multi-state median filter,
Switching strategy is also another method to identify the rij Should be relatively large for noisy pixel and small for
noisy pixels and then replace them by using the median filter noise free pixels.
or its variants. These filters are good at detecting noise even First, when the pixel xij is an impulse, it takes a value
at a high noise level. The main drawback of median filter is
details and edges are not recovered satisfactorily, especially substantially larger than or smaller than those of its
when the noise level is high. NASM [4] filter performs and neighbors. Second, when the pixel xij is a noise-free pixel,
achieve fairly close performance to that of ideal switching which could belong to a flat region, an edge, or even a thin
median filter. Weighted median filter control the filter ring line, its value will be very similar to those of some of its
performance in order to preserve the signal details. Centre neighbors. Therefore, we can detect image details from noisy
weighted median filter where only the centre pixel of pixels by counting the number of pixels whose values are
filtering window has weighting factor and then Filtering
similar to that of xij in its local window.
should be applied to corrupted pixels only while leaving the
uncorrupted ones. Switching based median ⎧0 xi −u , j −v − xij < T
filter[4]methodologies by applying no filtering to true pixels, δ i −u , j − v = ⎨ ,
standard median filter to remove impulse noise. Mean filter; ⎩1 otherwise
rank filter and alpha trimmed mean filter are also used to
remove impulse noise.
T is a predetermined parameter, δ i −u , j −v =1 indicates the
pixel xi −u , j −v is similar to that of pixel xij . ξij denotes the
II. IMPULSE NOISE DETECTION ALGORITHM
Alpha trimmed mean based approach [1] is used to detect the number of pixels which are similar to that of neighbour
impulse noise. This algorithm consists of three steps: pixels.
132
NCVCCC’08
B. SPACE INVARIANT MEDIAN FILTER
ξij = ∑
− Ld ≤ u , v ≤ Ld
δ i −u , j − v
This paper is under median based switching
⎧0 ξ i , j ≥ N
schemes, called multi-state median[2] (MSM) filter. By
ϕi , j = ⎨ , using simple thresholding logic, the output of the MSM [5]
⎩1 otherwise filter is adaptively switched among those of a group of
center weighted median (CWM) filters that have different
N is a predetermined parameter. ϕi , j =0 indicates xij is a center weights. As a result, the MSM filter is equivalent to
noise free pixel. an adaptive CWM filter with a space varying center weight
⎧0 ϕi , j = 0
which is dependent on local signal statistics. The efficacy of
rij ∗ ϕi , j = ⎨ , this filter has been evaluated by extensive simulations. By
⎩1 ϕi , j = 1 employing simple thresholding logic; the output of the
proposed multi-state median (MSM) filter is then adaptively
R (1) = rij × ϕij . switched among those of a group of CWM filters that have
varying center weights. As a result, the MSM filter is
equivalent to an adaptive CWM[6] filter with a space
varying center weight which is dependent on local signal
statistics.
Sij and Xij denote the intensity values of the original image
and the observed noisy image, respectively, at pixel
location ( i, j ) .
C..1CWM FILTER
(a) (b) The output of CWM filters, in which a weight
adjustment is applied to the center or origin pixel Xij within a
sliding window, can be defined as
Yij = median ( X ijw )
X ijw = { X i − s , j −t , W◊ X ij
The median is then computed on the basis of those 8+w
samples. Here w denotes the centre weight.
(c) (d) The output of a CWM filter with center weight w can also be
represented by
Yijw = median{ X ij ( K ), X ij , X ij ( N + 1 − k ) }
Where k = ( N + 2 − w) / 2 .
Based on the fact that CWM filters with different center
weights have different capabilities of suppressing noise and
preserving details. This can be realized by a simple
(e) thresholding operation as follows.
Fig 1 Impulse noise detection (a) Corrupted by 20%fixed For the current pixel Xij, we first define differences
value impulse noise (b) Absolute difference image (c)Binary
flag (d) Product of binary image and absolute difference d w = Yijw − X ij w = 1,3,5.......N − 2
image.(e)Restored image.
133
NCVCCC’08
Together, differences d1 through d N − 2 reveal even more
information about the presence of a corrupted pixel. A
classifier based on differences d w is employed to estimate
the likelihood of the current pixel being contaminated. An
attractive merit of the MSM filtering technique is that it
provides an adaptive mechanism to detect the likelihood of a .
pixel being corrupted by impulse. As a result, it satisfactorily (a) (b)
trades off the detail preservation against noise removal by Fig 3 Multiple threshold (a) Noisyimage (20%) (b)Restored
adjusting the center weight of CWM filtering, which is image.
dependent on the local signal characteristics. Furthermore, it
possesses a simple computation structure for COMPARISIONS
implementation. 45 ATM
40
35 SIMF
30
D. MULTIPLE THRESHOLD
PSNR
25 MULTIPLE
20 THRESHOLD
15 MEDIAN
10
5
A novel decision-based filter, called the multiple 0
10 20 30 40 50
CW M
WM
thresholds switching (MTS) filter [3], is to restore images NOISE DENSITY
134
NCVCCC’08
Abstract-In this whirlpool world, conveying highly F5 algorithm is different that uses subtraction and matrix
confidential information secretly has become one of the encoding to embed data into the (DCT) coefficients.
important aspects of living. With the increasing distance,
it has enunciated the coverage with communication II.PROPOSED SYSTEM
through computer technology having made simple,
nowadays. Ex for hidden communication is 1. Select the image for Stego image preparation.
STEGANOGRAPHY. The outdated trend of information 2. Generate the Stego image using F5 Algorithm.
hiding (secrets) behind the text, is now hiding the secrets 3. Select the Back Ground image.
behind the clutter images. The change of appearance of 4. Embed the stego image in the back ground image-
pictures smells fragrant rather than changing its Collage Steganography.
features. 5. Finally extract it.
This paper, combines 2 processes. The simple OBJECT
IMAGE is used for the Steganography process that is
done based on F5 algorithm. The prepared Stegoimages A COMPLETE SYSTEM DESIGN
are placed on the BACKGROUND IMAGE that is SENDER SIDE :
COLLAGE STEGANOGRAPHY. Here the patchwork is
done by changing the type of each object as well as its
location. The increased number of images leads to
increased amount of info hiding.
EXTRACTING :
CONCLUSION
ADVANTAGES
BIBLIOGRAPHY
138
NCVCCC’08
Abstract: Wavelet Transform has been successfully 1. To implement the 1-D and 2-D Lifting Wavelet
applied in different fields, ranging from pure Transform (LWT) in MATLAB to understand the
mathematics to applied science. Software implementation concept of lifting scheme.
of the Discrete Wavelet Transform (DWT), however 2. To develop the lifting algorithm for 1-D and 2-D
greatly flexible, appears to be the performance DWT using C language.
bottleneck in real-time systems. Hardware 3. To implement the 1-D LWT in VHDL using
implementation, in contrast, offers a high performance prediction and updating scheme
but is poor in flexibility. A compromise between these 4. To implement the 5/3 wavelet filter using lifting
two is reconfigurable hardware. For 1- D DWT, the scheme.
architectures are mainly convolution-based and lifting-
based. On the other hand direct and line-based methods Lifting Scheme Advantages
are the most possible implementations for the 2-D DWT.
The lifting scheme to construct VLSI architectures for The lifting scheme is a new method for constructing
DWT outperforms the convolution based architectures in biorthogonal wavelets. This way lifting can be used to
many aspects such as fewer arithmetic operations, in- construct second-generation wavelets; wavelets that are not
place implementation and easy management of boundary necessarily translate and dilate of one function. Compared
extension. But the critical path of the lifting based with first generation wavelets, the lifting scheme has the
architectures is potentially longer than that of the following advantages:
convolution based ones and this can be reduced by
employing pipelining in the lifting based architecture.
The lifting based architecture. 1-D and 2-D DWT using • Lifting leads to a speedup when compared to the
lifting scheme have been obtained for signals and images classic implementation. Classical wavelet transform
respectively through MATLAB simulation. The Liftpack has a complexity of order n, where n is the number
algorithm for calculating the DWT has been of samples. For long filters, Lifting Scheme speeds
implemented using ‘VHDL’ language. The Lifting up the transform with another factor of two. Hence
algorithm for 1-D DWT has also been implemented in it is also referred to as fast lifting wavelet transform
VHDL. (FLWT).
• All operations within lifting scheme can be done
entirely parallel while the only sequential part is the
1. INTRODUCTION order of lifting operations.
Mathematical transformations are applied to signals Secondly, the lifting scheme can be used in
to obtain further information from that signal that is not situations where no Fourier transform is available.
readily available in the raw signal. Most of the signals in Typical examples include Wavelets on bounded
practice are time-domain signals (time-amplitude domains, Wavelets on curves and surfaces, weighted
representation) in their raw format. This representation is not wavelets, and Wavelets and irregular sampling.
always the best representation of the signal for most signal
processing related applications. In many cases, the most II. Lifting Algorithm
distinguished information is hidden in the frequency content
(frequency spectrum) of the signal. Often times, the
information that cannot be readily seen in the time-domain The basic idea behind the Lifting Scheme is very
can be seen in the frequency domain. Fourier Transform simple that is to use the correlation in the data to remove
(FT) is a reversible transform, that is, it converts time- redundancy. To this end, first the data is split into two sets
domain signal into frequency-domain signal and vice-versa. (Split phase): the odd samples and the even samples
However, only either of them is available at any given time. (Figure 2). If the samples are indexed beginning with 0
That is, no frequency information is available in the time- (the first sample is the 0th sample), the even set comprises
domain signal, and no time information is available in the all the samples with an even index and the odd set
Fourier transformed signal. Wavelet Transform (WT) contains all the samples with an odd index. Because of the
addresses this issue by providing time-frequency assumed smoothness of the data, it is predicted that the
representation of a signal or an image. The objectives odd samples have a value that is closely related to their
proposed in the thesis are neighboring even samples. N even samples are used to
predict the value of a neighboring odd value (Predict
phase). With a good prediction method, the chance is high
139
NCVCCC’08
that the original odd sample is in the same range as its Here follows a summary of the steps to be taken for both
prediction. The difference between the odd sample and its forward and inverse transform.
prediction is calculated and this is used to replace the odd
sample. As long as the signal is highly correlated, the
newly calculated odd samples will be on the average
smaller than the original one and can be represented with
fewer bits. The odd half of the signal is now transformed.
To transform the other half, we will have to apply the
predict step on the even half as well. Because the even half
is merely a sub-sampled version of the original signal, it
has lost some properties that are to be preserved. In case
of images for instance, the intensity (mean of the samples)
is likely kept as constant throughout different levels. The
Figure.3 The lifting Scheme, inverse transform: Update,
Predict and Merge stages
IV.RESULTS
third step (Update phase) updates the even samples using 500
the newly calculated odd samples such that the desired 450
the even samples and each time half of the even samples
Amplitude
350
detail. 200
150
III.THE INVERSE TRANSFORM Figure .4.The input signal with noise signals
Approximation A1 Detail D1
550 25
One of the great advantages of the lifting scheme realization 500 20
of a wavelet transform is that it decomposes the wavelet 15
450
filters into extremely simple elementary steps, and each of 10
these 400
5
A m plitude
A m plitude
350
0
300
-5
250
-10
200
-15
150 -20
100 -25
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Sampling instant n--> Sampling instant n-->
Figure.8.Recontructed image
V.CONCLUSION
Figure.6 The cameraman input image
The lifting scheme to construct VLSI architectures for DWT
outperforms the convolution based architectures in many
aspects such as fewer arithmetic operations, in-place
Here the Haar wavelet is used as the mother
implementation and easy management of boundary
wavelet function and using elementary lifting steps lifts it.
extension. But the critical path of the lifting based
Then this new lifted wavelet is used to find the wavelet
architectures is potentially longer than that of the
transform of the input signal. This results in two output
convolution based ones and this can be reduced by
signals and they are called as approximated signal and detail
employing pipelining in the lifting based architecture. 1-D
signals. The approximation represents the low frequency
and 2-D DWT using lifting scheme have been obtained for
components present in the original input signal. The detail
signals and images respectively through MATLAB
gives high frequency components in the signal and it
simulation.
represents the hidden details in the signal. If we do not get
sufficient information in the detail then the approximation is
again decomposed into approximation and details. This
REFERENCES:
decomposition occurs until sufficient information about the
image is recovered. Finally inverse lifting scheme is
À A VLSI Architecture for Lifting-Based Forward
performed on the approximation and detail images to
and Inverse Wavelet Transform by Kishore Andra
reconstruct the original image. If we compare the original
et al IEEE 2002
(Figure.4) and reconstructed (Figure.5) images they look
À Flipping Structure: An Efficient VLSI Architecture
exactly same and the transform is loss less.
for Lifting-Based Discrete Wavelet Transform by
Approximation image Detail image-Horizontal Chao-Tsung Huang. Et al IEEE 2004
À Generic RAM-Based Architectures for Two-
Dimensional Discrete Wavelet Transform With
Line-Based Method by Chao-Tsung Huang. et al
IEEE 2005
À Evaluation of design alternatives for the 2-D-
Detail image-Vertical Detail image-Diagonal
discrete wavelet transform by Zervas. N. D, et
al.IEEE 2001
À Efficient VLSI architectures of lifting-based
discrete wavelet transform by systematic design
method Huang. C.-T, et al Proc.IEEE 2002.
À Lifting factorization-based discrete wavelet
transform architecture design Jiang. W, et al.IEEE
Figure.7.Approximation and detail images 2001
141
NCVCCC’08
Abstract: An Impulse based ultra-wide band (UWB) Ultra-wide band technology based on the Wi-Media
receiver front end is presented in this paper. The standard brings the convenience and mobility of wireless
Gaussian modulated pulses of frequency ranges between communications to high-speed interconnects in devices
3.1-10.6GHz satisfying Federal Communication throughout the digital home and office. Designed for low-
Commission spectral mask is received through omni power, short-range, wireless personal area networks, UWB
directional antenna and fed into the corresponding is the leading technology for freeing people from wires,
LNA’s, Filters and detectors. The Low noise amplifiers, enabling wireless connection of multiple devices for
filters, detectors are integrated on a single chip and transmission of video, audio and other high-bandwidth data.
simulated using 0.18 m CMOS Technology. All these
simulation is done using Tanner EDA tool along with
Puff software for supporting filter and amplifier designs.
I.INTRODUCTION
142
NCVCCC’08
Bluetooth,
802.11b 802.11
Emitted G P
CordlessPhones
P C
Signal Microwave
Power
“Part 15Limit”
-41dBm/Mhz
UWB
Spectrum
1.6 1.9 2.4 3.1 5 10.6
Frequency Table.1 Comparison of wireless technologies
Narrowband Bf < 1%
Wideband 1% < Bf < 20%
Ultra-Wideband Bf > 20%
Types of receiver:
Applications:
1. Impulse Type 1. Military.
2 .MultiCarrier Type 2. Indoor Applications (Such as WPAN (Wide Personal
Area Network)
Impulse – UWB 3. Outdoor (substantial) Applications but with very
Pulse of Very Short Duration (Typically few nano seconds) low data rates.
Merits and Demerits: 4. High-data-rate communications, multimedia applications,
1. High Resolution in Multipath Reduce Fading and cable replacement.
Margins, Low Complexity Implementation. Impulse: Radio technology that modulates impulse based
2. High precise synchronization & power during waveforms instead of continuous carrier waves.
the brief interval increases possibility of interference
Pulse Types:
Multi Carrier – UWB
1.Gaussian First derivative, Second derivative
Single data stream is split into multiple data streams of
2. Gaussian modulated Sinusoidal Pulse.
reduced rate, with Each Stream Transmitted into Separate
frequency. (Sub carrier) Sub carriers must be properly
spaced so that they do not interfere.
Merits and Demerits:
1. Well suited for avoiding interference because its carrier
frequency can be precisely chosen to avoid narrowband
interference.
2. Front-end design can be challenging due to variation in
power.
3. High speed FFT is needed.
143
NCVCCC’08
Fig.5
UWB Frequency-domain Behavior
R1 R2
436 436
144
NCVCCC’08
frequency response of the antenna filter that precedes the It only needs to be large enough to ensure that interference
LNA will deviate from its normal operation if there are is suppressed to appoint that it does not cause the
reflections from the LNA back to the filter. Furthermore, undesirable effects. To satisfy these specifications, BPF can
undesirable reflections from the LNA back to the antenna be implemented using a passive LC Filter. The LC Filter
must also be avoided. An impedance match is when the can be combined with the input-matching network of the
reflection coefficient is equal to zero, and occurs when ZS = LNA.A low-pass filter is a filter that passes low-frequency
ZL There is a subtle difference between impedance signals but attenuates (reduces the amplitude of) signals
matching and power matching. As stated in the previous with frequencies higher than the cutoff frequency
paragraph, the condition for impedance matching occurs
when the load impedance is equal to the characteristic
impedance. However, the condition for power matching
occurs when the load impedance is the complex conjugate
of the characteristic impedance. When the impedances are
real, the conditions for power matching and impedance
matching are equal.
For the analysis of LNA design for low noise, the origins of
the noise must be identified and understood. The important
noise sources in CMOS transistors. Thermal noise is due to
the random thermal motion of the carriers in the channel. It
Fig .9 Band pass and Low pass filter Design
is commonly referred to as a white noise source because its
power spectral density holds a constant value up to very
Butterworth filter-3rd order
high frequencies (over 1 THz) Thermal noise is given by
Normalized values:
id 2 µ C22=0.6180F
= 4 K T (-Q )
∆ f L 2 C4=2.0000F
C19=0.6180F
Induced gate noise is a high frequency noise source that is L13=1.6180H
caused by the non-quasi static effects influencing the power L15=1.6180H
spectral density of the drain current Induced gate noise has
a power spectral density given by Square law detector: A square law means that the DC
component of diode output is proportional to the square of
ω 2
C 2 the AC input voltage. So if you reduce RF input voltage by
id 2 g s
= 4 K T δ half, you'll get one quarter as much DC output. Or if you
∆ f 5 g d s 0 apply ten times as much RF input, you get 100 times as
much DC output as you did before.
Noise Figure: Noise figure (NF) is a measure of signal-to- Op-Amp: An operational amplifier, usually referred to as an
noise ratio (SNR) degradation as the signal traverses the op-amp for brevity, is a DC-coupled high-gain electronic
receiver front-end. Mathematically, NF is defined as the voltage amplifier with differential inputs and, usually, a
ratio of the input SNR to the output SNR of the system. single output. In its typical usage, the output of the op-amp
is controlled by negative feedback, which largely
Total Output Noise Power determines the magnitude of its output voltage gain, input
NF = impedance at one of its input terminals and output
Output Noise Power due to Source impedance. Then the output of the op-amp is fed into A/D
NF may be defined for each block as well as the entire converter for specific applications.
receiver. NFLNA, for instance, determines the inherent
noise of the LNA, which is added to the signal through the Simulated results:
amplification process.
145
NCVCCC’08
III.CONCLUSION
REFERENCES:
146
NCVCCC’08
Abstract:- Sensor networks have appeared as a promising situations, the goal of this paper is to find algorithms that do
technology with various applications, where power this computation in a distributed manner. We analyze
efficiency is one of the critical requirements. Each node partially distributed algorithm and completely distributed
has a limited battery energy supply and can generate algorithm to compute such a flow. The algorithms described
information that needs to be communicated to a sink can be used in static networks, or in networks in which the
node. We are assuming that each node in the wireless topology changes slowly enough such that there is enough
network has the capacity to transform information in the time between topology changes to optimally balance the
form of packets and also each node is assumed to be able traffic.Energy efficient algorithms for routing in wireless
to dynamically adjust its transmission power depending networks have received considerable attention over the past
on the distance over which it transmits a packet. To few years. Distributed algorithms to form sparse topologies
improve the power efficiency requirements, without containing Minimum-energy routes were proposed in
affecting the network delay, we propose and study a “Minimum energy mobile wireless networks [1],”
number of schemes for deletion of obsolete information “Minimum energy mobile wireless networks revisited [2].”
from the network nodes and we propose distributed An approximate approach based on discretization of the
algorithms to compute an optimal routing scheme that coverage region of a node into cones was described in
maximizes the time at which the first node in the network “Distributed topology control for power efficient operation
runs out of energy. For computing such a flow we are in multi-hop wireless ad hoc networks[3],” “Analysis of a
analyzing a partially distributed algorithm and a cone-based distributed topology control algorithm for
completely distributed algorithm. The resulting wireless multi-hop networks” [4]. All the above mentioned
algorithms have low computational complexity and are works focused on minimizing the total energy consumption
guaranteed to converge to an optimal routing scheme of the network. However, as pointed out in this can lead to
that maximizes the lifetime of a network. For reducing some nodes in the network being drained out of energy very
the power consumption we are taking source node as quickly. Hence instead of trying to minimize the total energy
dynamically move form one location to the other where it consumption, routing to maximize the network lifetime was
is created and the sensor nodes are static and cannot considered in “Energy conserving routing in wireless ad-hoc
move form one location to the other location where it is networks [5],” “Routing for maximum system lifetime in
created. The results of our study will allow a network wireless ad-hoc networks [6].” The problem was formulated
designer to implement such a system and to tune its as a linear program, and heuristics were proposed to select
performance in a delay-tolerant environment with routes in a distributed manner to maximize the network
intermittent connectivity, as to ensure with some chosen lifetime. However, as illustrated in these papers, these
level of confidence that the information is successfully heuristics do not always lead to selection of routes that are
carried through the mobile network and delivered within globally optimum and a similar problem formulation for
some time period. selection of relay nodes was given in “Topology control for
wireless sensor networks [7],” We note that distributed
I. INTRODUCTION iterative algorithms for the computation of the maximum
lifetime routing flow were described in “Energy efficient
routing in ad hoc disaster recovery networks” [8]. Each-
A network of wireless sensor nodes distributed in a iteration involved a bisection search on the network lifetime,
region. Each node has a limited battery energy supply and and the solution of a max-flow problem to check the
can generate information that needs to be communicating to feasibility of the network lifetime. The complexity of the
a sink node. It is assumed that each wireless node has the algorithm was shown to be polynomial in the number of
capability to relay packets. Also each node is power nodes in the special case of one source node. We use a
depending on the distance over which it transmits a packet. different approach based on the sub gradient algorithm for
We focus on the problem of computing a flow that the solution of the dual problem. We exploit the separable
maximizes the lifetime of the network - the lifetime is taken nature of the problem using dual decomposition to obtain
to be the time at which the first node runs out of energy. partially and fully distributed algorithms. This is similar to
Since sensor networks need to self configure in many
147
NCVCCC’08
the dual decomposition approaches applied to other reducing the power consumption. The problems faced in the
problems in communication networks existing systems are overcome through the proposed system.
When power efficiency is considered, ad hoc Each mobile estimate its life-time based on the traffic
networks will require a power-aware metric for their routing volume and battery state. The extension field in route-
algorithms. Typically, there are two main optimization request RREQ and route reply RREP packets are utilized to
metrics for energy-efficiency broadcast/ multicast routing in carry the life-time (LT) information. LT field is also
wireless ad hoc networks: included into the routing tables. When a RREQ packet is
(1) Maximizing the network lifetime; and send, LT is set to maximum value (all ones). When an
(2) Minimizing the total transmission power intermediate node receives the RREQ, it compares the LT
assigned to all nodes. field of the packet to its own LT. Smallest of the two is set to
Maximum lifetime broadcast/multicast routing algorithms forwarded RREQ packet. When a node having a path to the
can distribute packet relaying loads for each node in a destination hears the RREQ packet, it will compare the LT
manner that prevents nodes from being overused or abused. field of the RREQ with the LT field in its routing table and
By maximizing the lifetime of all nodes, the time before the put the smaller of the two into RREP. In case destination
network is partitioned is prolonged. hears the RREQ, it will simply send RREP with the lifetime
II. OBJECTIVE field equal to the LT in the RREQ. All intermediate nodes
that hear RREP store the path along with the life time
information. In case the source receives several RREPs, it
• We reduce the power consumption for packet selects the path having the largest LT.
transmission.
• We achieve maximum lifetime using the partially • Unattended operation
and fully distributed processing techniques. • Robustness under dynamic operating conditions
• Scalability to thousands of sensors
III.GENERAL BLOCK DIAGRAM • Energy consumption is low
• Efficiency is high
VI.OVERVIEW
VII. MODULES
VIII.ALGORITHM USED
151
Decomposition of EEG Signal Using Source Separation
Algorithms
Kiran Samuel PG student Karunya university coimbatore and Shanty Chacko, Lecturer, Department of
Electronics & communication Engineering Karunya University Samnov17@gmail.com,
ShantyChacko@gmail.com.
152
Fig: 1 Normal EEG wave in time domain
153
memory and learning, especially in the temporal
lobes. Theta rhythms are very strong in rodent
hippocampi and entorhinal cortex during learning and
memory retrieval they can equally be seen in cases of
focal or generalized subcortical brain damage and
epilepsy.
C. Alpha waves
155
VIII. CONCLUSION AND FUTURE PLANS
REFERENCES
156
Segmentation of Multispectral Brain MRI using Source
Separation Algorithm
Krishnendu K, PG student and Shanty Chacko, Lecturer, Department of Electronics & communication
Engineering, Karunya university, Karunya Nagar, Coimbatore – 641 114, Tamil Nadu, India.
Email addresses: krishnenduk@gmail.com, shantychacko@gmail.com
Abstract-- The aim of our paper is to implement o Locate tumors and other pathologies
an algorithm for segmenting multispectral MRI o Measure tissue volumes
brain images and to check whether there is any o Diagnosis
performance improvement. One set of o Treatment planning
multispectral MRI brain image consists of one o Study of anatomical structure
spin-lattice relaxation time, spin–spin relaxation
time, and proton density weighted images (T1, T2,
A. Need for Segmentation
and PD). The algorithm to be used is the ‘source
The purposes of segmenting magnetic
separation algorithm’. Source separation is a more
resonance (MR) images are:
general term used as we can use algorithms like
1) to quantify the volume sizes of different tissue
ICA, BINICA, JADE etc.. For implementing the
types within the body, and
algorithm the first thing needed is the database of
2) to visualize the tissue structures in three
multispectral MRI brain images. Sometimes this
dimensions using image fusion.
database is called as the ‘test database’. After the
B. Magnetic Resonance Imaging (MRI)
image database is acquired implement the
Magnetic Resonance Imaging (MRI) is a technique
algorithm, calculate the performance parameters
primarily used in medical imaging to demonstrate
and check for performance improvement with
pathological or other physiological alterations of
respect to any already implemented technique.
living tissues. Medical MRI most frequently relies on
Keywords – Multispectral MRI, Test Database,
the relaxation properties of excited hydrogen nuclei in
Source Separation Algorithm, Segmentation.
water and lipids. When the object to be imaged is
placed in a powerful, uniform magnetic field, the
I. INTRODUCTION spins of atomic nuclei with a resulting non-zero spin
have to arrange in a particular manner with the
In image processing field, segmentation [1] applied magnetic field according to quantum
refers to the process of partitioning a digital image mechanics. Nuclei of hydrogen atoms (protons) have
into multiple regions (sets of pixels). The goal of a simple spin 1/2 and therefore align either parallel or
segmentation is to simplify and/or change the antiparallel to the magnetic field. The MRI scanners
representation of an image into something that is more used in medicine have a typical magnetic The spin
meaningful and easier to analyze. Image segmentation polarization determines the basic MRI signal strength.
is typically used to locate objects and boundaries For protons, it refers to the population difference of
(lines, curves, etc.) in images. The result of image the two energy states that are associated with the
segmentation is a set of regions that collectively cover parallel and antiparallel alignment of the proton spins
the entire image. Several general-purpose algorithms in the magnetic field. The tissue is then exposed to
and techniques have been developed for image pulses of electromagnetic energy (RF pulses) in a
segmentation. Since there is no general solution to the plane perpendicular to the magnetic field, causing
image segmentation problem, these techniques often some of the magnetically aligned hydrogen nuclei to
have to be combined with domain knowledge in order assume a temporary non-aligned high-energy state. Or
to effectively solve an image segmentation problem in other words, the steady-state equilibrium
for a problem domain. The methods most commonly established in the static magnetic field becomes
used are Clustering Methods, Histogram-Based perturbed and the population difference of the two
Methods, Region-Growing Methods, Graph energy levels is altered. In order to selectively image
Partitioning Methods, Model based Segmentation, different voxels (volume picture elements) of the
Multi-scale Segmentation, Semi-automatic subject, orthogonal magnetic gradients are applied.
Segmentation and Neural Networks Segmentation. The RF transmission system consists of a RF
synthesizer, power amplifier and transmitting coil.
This is usually built into the body of the scanner. The
Some of the practical Medical Imaging power of the transmitter is variable. Magnetic
applications of image segmentation are: gradients are generated by three orthogonal coils,
oriented in the x, y and z directions of the scanner.
157
These are usually resistive electromagnets powered by dark in the T1-weighted image and bright in the T2-
sophisticated amplifiers which permit rapid and weighted image. A tissue with a short T1 and a long
precise adjustments to their field strength and T2 (like fat) is bright in the T1-weighted image and
direction. Some time constants are involved in the gray in the T2-weighted image. Gadolinium contrast
relaxation processes that establish equilibrium agents reduce T1 and T2 times, resulting in an
following the RF excitation. These time constants are enhanced signal in the T1-weighted image and a
T1, T2 and PD. In the brain, T1-weighting causes the reduced signal in the T2-weighted image.
nerve connections of white matter to appear white, . T1 (Spin-lattice Relaxation Time)
and the congregations of neurons of gray matter to Spin-lattice relaxation time, known as T1, is a
appear gray, while cerebrospinal fluid appears dark. time constant in Nuclear Magnetic Resonance and
The contrast of "white matter," "gray matter'" and Magnetic Resonance Imaging. T1 characterizes the
"cerebrospinal fluid" is reversed using T2 or PD rate at which the longitudinal Mz component of the
imaging. magnetization vector recovers. The name spin-lattice
In clinical practice, MRI is used to relaxation refers to the time it takes for the spins to
distinguish pathologic tissue (such as a brain tumor) give the energy they obtained from the RF pulse back
from normal tissue. One advantage of an MRI scan is to the surrounding lattice in order to restore their
that it is thought to be harmless to the patient. It uses equilibrium state. Different tissues have different T1
strong magnetic fields and non-ionizing radiation in values. For example, fluids have long T1s (1500-2000
the radio frequency range. mSec), and water based tissues are in the 400-1200
C. Multispectral MR Brain Images mSec range, while fat based tissues are in the shorter
Magnetic resonance imaging (MRI) is an 100-150 mSec range. T1 weighted images can be
advanced medical imaging technique providing rich obtained by setting short TR (< 750mSec) and TE (<
information about the human soft tissue anatomy. It 40mSec) values in conventional Spin Echo sequences.
has several advantages over other imaging techniques
enabling it to provide three-dimensional data with
high contrast between soft tissues. A multi-spectral
image (fig.1) is a collection of several monochrome
images of the same scene, each of them taken with a
different sensor. The advantage of using MR images
is the multispectral characteristics of MR images with
relaxation times (i.e.,T1 and T2) and proton density
(i.e., PD) information.
158
more water than normal tissue around it, it is usually than the segmentation obtained from each image
brighter on T2. individually or from the addition of the three images’
PD (Proton Density) segmentations.
Proton density denotes the concentration of Some examples are,
mobile Hydrogen atoms within a sample of tissue. An 1) Dark on T1, bright on T2, This is a typical
image produced by controlling the selection of scan pathology. Most cancers have these characteristics.
parameters to minimize the effects of T1 and T2, 2) Bright on T1, bright on T2, blood in the brain has
resulting in an image dependent primarily on the these characteristics.
density of protons in the imaging volume. 3) Bright on T1, less bright on T2, this usually means
the lesion is fatty or contains fat.
4) Dark on T1, dark on T2, chronic blood in the brain
has these characteristics.
Following is a table of approximate values of
the two relaxation time constants for nonpathological
human tissues.
II. METHODOLOGY
Fig 4. PD weighted image
A T1 weighted image is the image which is
usually acquired using short TR (or repetition time of
A. Algorithm
a pulse sequence) and TE (or spin-echo delay time).
1) Loading T1, T2 and PD images.
Similarly, a T2 weighted image is acquired using
2) Converting to double precision format.
relatively long TR and TE and a PD weighted image
3) Converting each image matrix to a row
with long TR and short TE. Since the three images
matrix.
are strongly correlated (or spatially registered) over
4) Combining three row matrices to form a
the patient space, the information extracted by means
matrix.
of image processing from the images together is
5) Computing independent components of the
obviously more valuable than that extracted from each
matrix using FastICA algorithm.
image individually. Therefore, tissue segmentation
6) Separating each rows of the resultant matrix
from the three MR images is expected to produce
to three row matrices.
more accurate 3D reconstruction and visualization
7) Reshaping each row matrix to 256x256.
159
8) Executing dynamic pixel range correction. The ICA algorithm
9) Converting to unsigned integer format.
10) Plotting the input images and segmented ICA rotates the whitened matrix back to the
output images. original space. It performs the rotation by minimizing
the Gaussianity of the data projected on both axes
B. Independent Component Analysis (fixed point ICA). By rotating the axis and
• Introduction to ICA minimizing Gaussianity of the projection, ICA is able
• Whitening the data to recover the original sources which are statistically
• The ICA algorithm independent (this property comes from the central
limit theorem which states that any linear mixture of 2
• ICA in N dimensions
independent random variables is more Gaussian than
• ICA properties
the original variables).
Introduction to ICA
ICA in N dimensions
ICA can deal with an arbitrary high number
ICA is a quite powerful technique and is able
of dimensions. ICA components are the matrix that
to separate independent sources linearly mixed in
allows projecting the data in the initial space to one of
several sensors. For instance, when recording
the axis found by ICA. The weight matrix is the full
magnetic resonance images (MRI) on the scalp, ICA
transformation from the original space.
can separate out artifacts embedded in the data (since
When we write
they are usually independent of each other). ICA is a
S = W X,
technique to separate linearly mixed sources. We used
X is the data in the original space, S is the source
FastICA algorithm for segmenting the images as the
activity and W is the weight matrix to go from the S
code is directly available in World Wide Web.
space to the X space.
The rows of W are the vector with which we
Whitening the data
can compute the activity of one independent
component. After transformation from S space to the
Some preprocessing steps are performed by
X space we need to reproject each component to the S
most ICA algorithms before actually applying ICA. A
space. W-1 is the inverse matrix to go from the source
first step in many ICA algorithms is to whiten (or
space S to the data space X.
sphere) the data. This means that we remove any
X = W-1S
correlations in the data, i.e. the different channels of
If S is a row vector and we multiply it by the
say, matrix Q are forced to be uncorrelated. Why we
column vector from the inverse matrix above we will
are doing whitening is that it restores the initial
obtain the projected activity of one component. All
"shape" of the data and that then ICA must only rotate
the components forms a matrix. Rows of the S matrix
the resulting matrix. After doing whitening the
which are the time course of the component activity.
variance on both axis is now equal and the correlation
ICA properties
of the projection of the data on both axis is 0
• ICA can only separate linearly mixed
(meaning that the covariance matrix is diagonal and
sources.
that all the diagonal elements are equal). Then
applying ICA only means to "rotate" this
• Since ICA is dealing with clouds of point,
representation back to the original axis space. The
changing the order in which the points are
whitening process is simply a linear change of
plotted has virtually no effect on the outcome
coordinate of the mixed data. Once the ICA solution
of the algorithm.
is found in this "whitened" coordinate frame, we can
easily reproject the ICA solution back into the original • Changing the channel order has also no
coordinate frame. effect on the outcome of the algorithm.
Putting it in mathematical terms, we seek a • Since ICA separates sources by maximizing
linear transformation V of the data D such that when their non-Gaussianity, perfect Gaussian
P = V*D we now have Cov(P) = I (I being the identity sources can not be separated.
matrix, zeros everywhere and 1s in the Diagonal; Cov
being the covariance). It thus means that all the rows • Even when the sources are not independent,
of the transformed matrix are uncorrelated. ICA finds a space where they are maximally
independents.
160
III. RESULT IV. CONCLUSION AND FUTURE PLAN
REFERENCES
161
MR Brain Tumor Image Segmentation Using Clustering
Algorithm
Lincy Annet Abraham1, D.Jude Hemanth2
PG Student of Applied Electronics1, Lecturer2
Department of Electronics & Communication Engineering
Karunya University, Coimbatore.
lincyannet@gmail.com,jude_hemanth@rediffmail.com
Abstract- In this study, unsupervised clustering one class, FCM allows pixels to belong to multiple
methods are examined to develop a medical classes with varying degrees of membership. The
diagnostic system and fuzzy clustering is used to
assign patients to the different clusters of brain approach allows additional flexibility in many
tumor. We present a novel algorithm for obtaining applications and has recently been used in the
fuzzy segmentations of images that are subject to processing of magnetic resonances (MR) images.
multiplicative intensity inhomogeneities, such as In this work, unsupervised clustering methods are
magnetic resonance images. The algorithm is to be performed to cluster the patients brain tumor.
formulated by modifying the objective function in Magnetic resonance (MR) brain section images are
the fuzzy algorithm to include a multiplier field, segmented and then synthetically colored to give
which allows the centroids of each class to vary visual representation of the original data. This study
across the image. Magnetic resonance (MR) brain fuzzy c means algorithm is used to separate the tumor
section images are segmented and then from the brain and can be identified in a particular
synthetically colored to give visual representation color. Supervised and unsupervised segmentation
of the original data. The results are compared with techniques provide broadly similar results..
the results of clustering according to classification
performance. This application shows that fuzzy II. PROPOSED METHODOLOGY
clustering methods can be important supportive
tool for the medical experts in diagnostic.
I. INTRODUCTION
Figure 1. shows the proposed methodology
ccording to rapid development on medical of segmentation of images. Magnetic resonance (MR)
devices, the traditional manual data analysis has brain section images are segmented and then
become inefficient and computer-based analysis is synthetically colored to give visual representation of
indispensable. Statistical methods, fuzzy logic, neural the original data wit three approaches: the literal and
network and machine learning algorithms are being approximate fuzzy c means unsupervised clustering
tested on many medical prediction problems to algorithms and a supervised computational neural
provide a decision support system. network, a dynamic multilayered perception trained
Image segmentation plays an important role in with the cascade correlation learning algorithm.
variety of applications such as robot vision, object Supervised and unsupervised segmentation techniques
recognition, and medical imaging. There has been provide broadly similar results. Unsupervised fuzzy
considerable interest recently in the use of fuzzy algorithm were visually observed to show better
segmentation methods which retain more information s eg me n ta tion when compared wit raw image data
from the original image than hard segmentation for volunteer studies.
methods. The fuzzy c means algorithm (FCM), in In computer vision, segmentation refers to the
particular, can be used to obtain segmentation via process of partitioning a digital image into multiple
fuzzy pixel classification. Unlike hard classification regions (sets of pixels). The goal of segmentation is to
methods which force pixels to belong exclusively to simplify and/or change the representation of an image
into something that is more meaningful and easier to
162
analyze. Image segmentation is typically used to To reach a minimum of dissimilarity
locate objects and boundaries (lines, curves, etc.) in function there are two conditions. These are given in
images. Equation (3) and Equation (4).
The result of image segmentation is a set of
∑
regions that collectively cover the entire image, or a n m
set of contours extracted from the image (see edge u x j=1 ij j
detection). Each of the pixels in a region are similar c=
∑u
with respect to some characteristic or computed i n m (3)
property, such as color, intensity, or texture. Adjacent j=1 ij
regions are significantly different with respect to the
same characteristic(s). Some of the practical
1
applications of image segmentation are:
uij = 2/(m−1)
⎛d ⎞
(4)
∑k=1⎜⎜ dij ⎟⎟
• Medical Imaging c
o Locate tumors and other pathologies
o Measure tissue volumes ⎝ kj ⎠
o Computer-guided surgery
o Diagnosis 3.1 ALGORITHM
o Treatment planning This algorithm determines the following steps.
o Study of anatomical structure Step 1. Randomly initialize the membership matrix
• Locate objects in satellite images (roads, (U) that has constraints in Equation (1).
forests, etc.) Step 2. Calculate centroids (ci) by using Equation (3).
• Face recognition Step 3. Compute dissimilarity between centroids and
• Fingerprint recognition data points using equation (2). Stop if its
• Automatic traffic controlling systems improvement over previous iteration is below a
• Machine vision threshold.
Step 4. Compute a new U using Equation (4). Go to
Step 2.
III. FUZZY C - MEANS CLUSTERING
FCM does not ensure that it converges to an
Fuzzy C-means Clustering (FCM), is also known
optimal solution. Because of cluster centers
as Fuzzy ISODATA, is an clustering technique which
(centroids) are initialize using U that randomly
is separated from hard k-means that employs hard
initialized (Equation (3)).
partitioning. The FCM employs fuzzy partitioning
such that a data point can belong to all groups with
3.2 FLOW CHART
different membership grades between 0 and 1.
Figure 2. shows the systematic procedure of the
FCM is an iterative algorithm. The aim of FCM
algorithm and the summation is given above as per
is to find cluster centers (centroids) that minimize a
follows:
dissimilarity function.
1) Read the input image
To accommodate the introduction of fuzzy
2) Set the number of clusters =4
partitioning, the membership matrix (U) is randomly
3) Calculate the eulidean distance
initialized according to Equation
4) Randomly initialize membership matrix
c
∑u =1,∀j =1,...,n
5) Calculate the centroids
ij (1) 6) Calculate the membership coefficient
i=1 7) If threshold is below 0.01 then update the
membership matrix
8) If threshold above 0.01 then display the
The dissimilarity function which is used in FCM is
segmentated image.
given Equation
9) The image is coverted into colour
10) The segmented tumor is displayed in a
c c n
J(U,c1,c2,...,cc)=∑Ji =∑∑uij dij
m 2 particular colour and the rest in another
colour
(2)
i=1 i=1 j=1
IV. IMPLEMENTATION
uij is between 0 and 1;
The set of MR images consist of 256*256 12 bit
ci is the centroid of cluster i;
images. The fuzzy segmentation was done in
dij is the Euclidian distance between ith centroid(ci)
MATLAB software. There four types of brain tumor
and jth data point;
used in this study namely astrocytoma, meningioma,
m [1, ] is a weighting exponent.
glioma, metastase.
163
Table 1. Types and number of datas
Start DATA TYPES NUMBER OF
IMAGES
Astrocytoma 15
Read the input MR images meningioma 25
glioma 20
metastase 10
TOTAL 70
If
threshold
<= 0.01
Figure 3. Input image
stop
=
Figure 2. Algorithm for identification
164
Table 2. Segmentation results
ACKNOWLEDGMENT
165
MRI Image Classification Using Orientation Pyramid and Multi
resolution Method
R. Catharine Joy, Anita Jones Mary
PG student of Applied Electronics, Lecturer
Department of Electronics and Communication Engineering
Karunya University, Coimbatore.
catherinejoy85@gmail.com, tajmp8576@yahoo.com
Abstract--In this paper, a multi-resolution volumetric like MRI head-neck studies has been addressed by
texture segmentation algorithm is used. Textural supervised statistical classification methods, notably EM-
measurements were extracted in 3-D data by sub-band MRF. The segmented portions cannot be seen clearly
filtering with an Orientation Pyramid method. through 2-D slice image. So we are going for 3-D rendering.
Segmentation is used to detect the objects by dividing the Cartilage image also cannot be segmented and viewed
image into regions based on colour, motion, texture etc. clearly. Tessellation or tilling of a plane is a collection of
Texture relates to the surface or structure of an object plane figures that fills the plane with no overlaps and no
and depends on the relation of contiguous elements and gaps.
may be characterised by granularity or roughness,
principal orientation and periodicity. We describe the 2- In this paper we describe fully a 3-D texture
D and 3-D frequency domain texture feature description scheme using a multi-resolution sub-band
representation by illustrating and quantitatively filtering and to develop a strategy for selecting the most
comparing results on example 2-D images and 3-D MRI. discriminant texture features conditioned on a set of training
First, the algorithm was tested with 3-D artificial data images. We propose a sub-band filtering scheme for
and natural textures of human knees will be used to volumetric textures that provide a series of measurements
describe the frequency and orientation multi-resolution which capture the different textural characteristics of the
sub-band filtering. Next, the three magnetic resonance data. The filtering is performed in the frequency domain
imaging sets of human knees will be used to discriminate with filters that are easy to generate and give powerful
anatomical structures that can be used as a starting results. A multi-resolution classification scheme is then
point for other measurements such as cartilage developed which operates on the joint data-feature space
extraction. within an oct-tree structure. This benefits both the efficiency
of the computation and ensures only the certain labelling at
Index Terms- Volumetric texture, Texture Classification, a given resolution is propagated to the next. Interfaces
sub-band filtering, Multi-resolution. between regions (planes), where the label decisions are
uncertain, are smoothed by the use of 3-D “butterfly” filters
I.INTRODUCTION which focus the inter-class labels.
III.VOLUMETRIC TEXTURE
167
Fig: 3 Sub-band filters images of the second orientation pyramid
containing 13 sub-band regions of the human knee MRI. Fig: 4 K-means classification of MR image of a human knee based on
frequency and orientation regions.
The filter bank serves to isolate different frequency
components in a signal. This is useful because for most
applications some frequencies are more important than Once the phase congruency map of an image has
others. For example these important frequencies can be been constructed we know the feature structure of the
coded with a fine resolution. Small differences at these image. However, thresholding is course, highly subjective,
frequencies are significant and a coding scheme that and in the end eliminates much of the important information
preserves these differences must be used. On the other hand, in the image. Some other method of compressing the feature
less important frequencies do not have to be exact. A coarser information needs to be considered, and some way of
coding scheme can be used, even though some of the finer extracting the non-feature information, or the smooth map of
details will be lost in the coding. the image, needs to be developed. In the absence of noise,
the feature map and the smooth map should comprise the
B.PYRAMIDS whole image. When noise is present, there will be a third
component to any image signal and one that is independent
Pyramids are an example of a multi-resolution of the other two.
representation of the image. Pyramids separate information
into frequency bands In the case of images, we can represent VI.EXPERIMENTAL RESULTS
high frequency information (textures, etc.) in a finely
sampled grid Coarse information can be represented in a The 3-D MRI sets of human knees acquired
coarser grid (lower sampling rate acceptable) Thus, coarse different protocols, one set with Spin Echo and two sets with
features can be detected in the coarse grid using a small SPGR. In the three cases each slice had dimensions of 512 x
template size This is often referred to as a multi-resolution 512 pixels and 87, 64, and 60 slices respectively. The bones,
or multi-scale resolution. background, muscle and tissue classes were labelled to
provide for evaluation. Four training regions of size 32 x 32
V.MULTIRESOLUTION CLASSIFICATION x 32 elements were manually selected for the classes of
background, muscle, bone and tissue. These training regions
A multi-resolution classification strategy can exploit the were small relative to the size of the data set, and they
inherent multi-scale nature of texture and better results can remained as part of the test data. Each training sample was
be achieved. The multi- resolution procedure consists of filtered with the OP sub-band filtering scheme.
three main stages: climb, decide and descend. The climbing
stage represents the decrease in resolution of the data by
means of averaging a set of neighbours on one level
(children elements or nodes) up to a parent element on the
upper level. Two common climbing methods are the
Gaussian Pyramid and the Quad tree. The decrease in
resolution correspondingly reduces the uncertainty in the
elements’ values since they tend toward their mean. In
contrast, the positional uncertainty increases at each level.
At the highest level, the new reduced space can be classified
either in a supervised or unsupervised scheme.
Fig: 5 One slice from a knee MRI data set is filtered with a sub-band filter
with a particular frequency.
168
The SPGR (Spoiled Gradient Recalled) MRI data REFERENCES
sets were classified and the bone was segmented with the
objective of using this as an initial condition for extracting [1]C. C. Reyes-Aldasoro and A. Bhalerao, “Volumetric
the cartilage of the knee. The cartilage adheres to the texture description and discriminant feature selection for
condyles of the bones and appears as a bright, curvilinear MRI,” in Proc. Information Processing in Medical Imaging,
structure in SPGR MRI data. In order to segment the C. Taylor and A. Noble, Eds., Ambleside, U.K., Jul. 2003.
cartilage out of the MRI sets, two heuristics were used: [2]W. M.Wells, W. E. L. Grimson, R. Kikinis, and F. A.
cartilage appears bright in the SPGR MRIs and cartilage Jolesz, “Adaptive Segmentation of MRI Data,” IEEE Trans.
resides in the region between bones. This is translated into Med. Imag., vol. 15, no. 4, Aug. 1996.
two corresponding rules: threshold voxels above a certain [3]C. Reyes-Aldasoro and A. Bhalerao, “The Bhattacharyya
Gray level and discard those not close to the region of space for feature selection and its application to texture
contact between bones. segmentation,” Pattern Recognit., vol. 39, no. 5, pp. 812–
826, 2006.
[4]G. B. Coleman and H. C. Andrews, “Image Segmentation
by Clustering,” Proc. IEEE, vol. 67, no. 5, pp. 773–785,
May 1979..
[5]P. J. Burt and E. H. Adelson, “The Laplacian Pyramid as
a compact Image Code,” IEEE Trans. Commun., vol. COM-
31, no. 4, pp. 532–540, Apr. 1983.
[6]V. Gaede and O. Günther, “Multidimensional access
methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170–
231, 1998.
VII.CONCLUSION
169
Dimensionality reduction for Retrieving Medical Images
Using PCA and GPCA
.
W Soumya, ME, Applied Electronics, Karunya University, Coimbatore
Abstract— Retrieving images from large and varied selection or feature extraction. Some of the feature space
collections using image content is a challenging and reduction methods include Principal component analysis
important problem in medical applications. In this (PCA), Independent Component Analysis (ICA), Linear
paper, to improve the generalization ability and Discriminant Analysis (LDA), and Canonical Correlation
efficiency of the classification, from the extracted Analysis (CCA). Among these, PCA finds principal
regional features, a feature selection method called components, ICA finds independent components [11], CCA
principal component analysis is presented to select the maximize correlation [5], and LDA maximize the interclass
most discriminative features. A new feature space variance [10]. PCA is the most well known statistical
reduction method, called Generalized Principal approach for mapping the original high-dimensional features
Component Analysis (GPCA), is also presented which into low-dimensional ones by eliminating the redundant
works directly with images in their native state, as two- information from the original feature space [1]. The
dimensional matrices. In principle, redundant advantage of the PCA transformation is that it is linear and
information is removed and relevant information is that any linear correlations present in the data are
encoded into feature vectors for efficient medical image automatically detected. Then, Generalized Principal
retrieval, under limited storage. Experiments on Component Analysis (GPCA), which is a novel feature
databases of medical images show that, for the same space reduction technique which is superior to PCA, is also
amount of storage, GPCA is superior to PCA in terms of presented [2].
memory requirement, quality of the compressed images,
and computational cost.
171
Fig 2: Schematic view of the key difference between GPCA Step 5: Compute the d eigenvectors (Ri) of MR
and PCA. GPCA works on the original matrix representation corresponding
of images directly, while PCA applies matrix-to-vector to the largest d eigen values.
alignment first and works on the vectorized representation
of images, which may lead to loss of spatial locality Step 6: Form the matrix MR to obtain l2 eigen vectors using
information. equation (8).
Formulation of GPCA: Let Ak, for k = 1,…….., n Step 7: Compute the d eigenvectors (Li) of ML
be the n images in the dataset and calculate mean using the corresponding to the largest d eigen values.
equation (5) given below
n Step 8: Obtain the reduced representation using the
M=1/n (∑ Ak) (5) equation,
k=1
Let, Dj = LTAj R (9)
Aj= Ak –M for all j (6).
EXPERIMENT RESULTS
GPCA aims to compute two matrices L and R with
orthonormal columns, such that the variance var (L, R) is
maximum using equations (7) and (8). The main In this experiment, we applied PCA and GPCA on the 40
observation, which leads to an iterative algorithm for images of size 124x124 in the medical image dataset that
GPCA, is stated in the following theorem: contains brain, chest, breast and elbow images which is
n shown in figure.3. Both PCA and GPCA can be applied for
ML = ∑ Aj Ri RiT AjT (7) medical image retrieval. The experimental comparison of
j=1 PCA and GPCA is based on the assumption that they both
n use the same amount of storage. Hence it is important to
MR = ∑ AjT Li LiT Aj (8) understand how to choose the reduced dimension for PCA
j=1 and GPCA for a specific storage requirement. We use p = 9
Theorem: Let L, R be the matrices maximizing the variance (where p corresponds to the principal components) in PCA
var (L, R). Then, (as shown in TABLE I) and set d = 4 (where d corresponds
_ to the largest two eigen values) for GPCA (as shown in
For a given R, matrix L consists of the l1 eigenvectors TABLE II) correspondingly.
of the matrix ML corresponding to the largest l1 eigen
values.
172
TABLE I of Recurrent Neural Network in which a physical path exists
Features obtained for from output of a neuron to input of all neurons except for
PCA the corresponding input neuron. If PCA features are fed to
Images Eigen Vectors Brain Chest Breast Elbow Hopfield network, then 9 neurons are used in input layer
1 V1 0.5774 0.5774 0.5774 0.5774 since size of PCA feature vector is 1¯9 and if GPCA
0.5774 0.5774 0.5774 0.5774 features are used as classifier input then, 4 neurons are used
0.5774 0.5774 0.5774 0.5774 in the input layer of Hopfield network since GPCA feature
V2 0.4082 -0.0775 0.7071 -0.8128
vector is of size 1¯4. Energy is calculated using equation
(10).
0.4082 -0.6652 -0.7071 0.4735
-0.8165 0.7427 0 0.3393 T
E= -0.5* S*W*S (5)
V3 0.7071 -0.8128 0.4082 -0.0775
where, E is the energy of a particular pattern (S)
-0.7071 0.4735 0.4082 -0.6652
W is the weight value
0 0.3393 -0.8165 0.7427
2 V1 0.5774 0.5774 0.5774 0.5774
The test pattern energy is compared with the stored
0.5774 0.5774 0.5774 0.5774
pattern energy and the images having energy close to the
0.5774 0.5774 0.5774 0.5774 test pattern energy are retrieved from the database.
V2 0.7071 -0.7946 -0.7887 -0.7573
-0.7071 0.5599 0.5774 0.6430 CONCLUSION
0 0.2348 0.2113 0.1144
V3 0.4082 -0.1877 -0.2113 -0.3052
0.4082 -0.5943 -0.5774 -0.5033
To overcome problems associated with high
dimensionality, such as high storage and retrieval times, a
-0.8165 0.7820 0.7887 0.8084
dimension reduction step is usually applied to the vectors to
concentrate relevant information in a small number of
GPCA compute the optimal feature vectors L and R such dimensions. In this paper, two subspace analysis methods
that original matrices are transformed to a reduced 2 x 2 such as Principal Component Analysis (PCA) and
matrices and in PCA, feature vectors are obtained as a 3x3 Generalized Principal Component Analysis (GPCA) is
matrix which is listed in tables 1 and 2. presented and compared. PCA is a simple well known
dimensionality reduction technique that applies matrix-
TABLE II vector alignment first and works on the vectorized
Features obtained for GPCA representation of images, which may lead to loss of spatial
Images Matrix
locality information, while GPCA works on the original
Brain Db1 -3.3762 1.1651 matrix representation of images directly. GPCA is found
superior to PCA in which dimensionality is reduced to a 2x2
-0.2207 -0.6612
matrix, whereas in PCA eigen vectors are obtained as a 3x3
Db2 4.6552 2.6163 matrix. GPCA works directly with images in their native
state, as two-dimensional matrices, by projecting the images
-0.4667 0.7519
to a vector space that is the tensor product of two lower-
Db3 4.6552 -2.7397 dimensional vector spaces.
2.6163 -1.6044
REFERENCES
Db4 -1.7318 0.1744
0.7202 -0.4391
[1] U. Sinha, H. Kangarloo, Principal component analysis
for content-based image retrieval, RadioGraphics 22 (5)
Db5 -1.6252 0.0462 (2002) 1271-1289.
-0.0010 0.1173 [2] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient
dimension reduction scheme for image compression and
retrieval. In KDD ’04: Proceedings of the tenth ACM
Therefore, GPCA has asymptotically minimum memory SIGKDD international conference on Knowledge discovery
requirements, and lower time complexity than PCA, which And data mining, pages 354–363, New York, NY, USA,
is desirable for large medical image databases. GPCA also 2004. ACM Press.
uses transformation matrices that are much smaller than [3] Henning Muller, Nicolas Michoux, David Bandon,
PCA. This significantly reduces the space to store the Antoine Geissbuhler, “A review of content-based image
transformation matrices and reduces the computational time retrieval systems in medical applications - clinical benefits
in computing the reduced representation for a query image. and future directions”, International Journal of Medical
Experiments show superior performance of GPCA over Informatics.,vol. 73, pp. 1 – 23, 2004.
PCA, in terms of quality of compressed images and query [4] Imola K. Fodor Center for Applied Scientific
precision, when using the same amount of storage. Computing,Lawrence Livermore National Laboratory, A
The feature vectors obtained through feature selection survey of dimension reduction techniques.
methods are fed to a Hopfield neural classifier for efficient [5] Marco Loog1, Bram van Ginneken1, and Robert P.W.
medical image retrieval. Hopfield neural network is a type
173
Duin2 “Dimensionality Reduction by Canonical [10] B. Bai, P. Kantor, N. Cornea, and D. Silver. Toward
Contextual Correlation Projections,” T. Pajdla and J. content-based indexing and retrieval of functional brain
[6] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. images. In Proceedings of the (RIAO07), 2007.
Protopapas, Fast and effective retrieval of medical tumor [11] D. Comaniciu, P. Meer, D. Foran, A. Medl, Bimodal
shapes, IEEE Transactions on Knowledge and Data system for interactive indexing and retrieval of pathology
Engineering 10 (6) (1998) 889-904. images, in: Proceedings of the Fourth IEEE Workshop on
[7] S. Baeg, N. Kehtarnavaz, Classification of breast mass Applications of Computer Vision (WACV'98), Princeton,
abnormalities using denseness and architectural distortion, NJ, USA, 1998, pp. 76{81.
Electronic Letters on Computer Vision and Image Analysis1 [12] M. R. Ogiela, R. Tadeusiewicz, Semantic-oriented
(1) (2002) 1-20. syntactic algorithms for content recognition and
[8] F. Schnorrenberg, C. S. Pattichis, C. N. Schizas, K. understanding of images in medical databases, in:
Kyriacou, Content{based retrieval of breast cancer biopsy Proceedings of the second International Conference on
slides, Technology and Health Care 8 (2000) 291{297. Multimedia and Exposition (ICME'2001), IEEE Computer
[9] Two-dimensional nearest neighbor discriminant analysis Society, IEEE Computer Society, Tokyo, Japan, 2001, pp.
Xipeng Qiuf, Lide Wu 0925-2312/$ - see front matter r 2007 621-624.
Elsevier B.V. All rights reserved. [13]www.e-radiography.net/ibase5/index.htm-xray2000
doi:10.1016/j.neucom.2007.02.001 Image base v6 July 2007.
174
Efficient Whirlpool Hash Function
D.S.Shylu J.Piriyadharshini
Sr.Lecturer, ECE Dept, II ME(Applied Electronics)
Karunya University, Karunya University
Coimbatore- 641114. Coimbatore- 641114.
mail id:mail2shylu@yahoo.com mail id: riya_harshini@rediffmail.com
Contact No: 9443496082 Contact No: 9842107110
Abstract —Recent breakthroughs in cryptanalysis of Institute of Standards and Technology (NIST) announced
standard hash functions like SHA-1 and MD5 raise the the updated Federal Information Processing Standard (FIPS
need for alternatives. The latest cryptographical 180-2), which introduced three new hash functions referred
applications demand both high speed and high security. to as SHA-2 (256, 384, 512). In addition, the New
In this paper, an architecture and VLSI implementation European Schemes for Signatures, Integrity, and
of the newest powerful standard in the hash families, Encryption (NESSIE) project, was responsible to introduce
Whirlpool, is presented. It reduces the required a hash function with similar security level. In February
hardware resources and achieves high speed 2003, it was announced that the hash function included in
performance. The architecture permits a wide variety of the NESSIE portfolio is Whirlpool. All the above-
implementation tradeoffs. The implementation is mentioned hash functions are adopted by the International
examined and compared in the security level and in the Organization for Standardization (ISO/IEC) 10118-3
performance by using hardware terms. This is the first standard.
Whirlpool implementation allowing fast execution, and Whirlpool hash function is byte-oriented and consists
effective substitution of any previous hash families’ of the iterative application of a compression function. This
implementations such as MD5, RIPEMD-160, SHA-1, is based on an underlying dedicated 512-bit block cipher
SHA-2 etc, in any cryptography application1. that uses a 512-bit key and runs in 10 rounds in order to
produce a hash value of 512 bits.
In this paper, an architecture and VLSI
I. INTRODUCTION implementation of the new hash function, Whirlpool, is
proposed. It reduces the required hardware resources and
A hash function is a function that maps an input of arbitrary achieves high-speed performance. The proposed
length into a fixed number of output bits, the hash value. implementation is examined and compared, in the offered
Hash functions are used as building blocks in various security level and in the performance by using hardware
cryptographic applications. The most important uses are in terms. In addition, due to no others Whirlpool
the protection of information authentication and as a tool implementations existences, comparisons with other hash
for digital signature schemes. families’ implementations are provided. From the
In recent years the demands for effective and secure comparison results it is proven that the proposed
communications in both wire and wireless networks is implementation performs better and composes an effective
especially noted in the consumer electronics area. In substitution of any previous hash families’ such as MD5,
modern consumer electronics, security applications play a RIPEMD-160, SHA-1, SHA-2 etc, in almost all the cases.
very important role. The interest in financial and other
electronic transactions is grown; so the security applications II. WHIRLPOOL HASH FUNCTION
can provide an important way forconsumers and businesses
to decide which electronic communications they can trust. Whirlpool is a one-way, collision resistant 512-bit
The most known hash function is the Secure Hash hash function operating on messages less than 2^256 bits in
Algorithm-1 (SHA-1).The security parameter of SHA-1 length. It consists of the iterated application of a
was chosen in such a way to guarantee the similar level of compression function, based on an underlying dedicated
security, in the range of 280 operations, as required by the 512-bit block cipher that uses a 512-bit key. The Whirlpool
best currently known attacks. But, the security level of is based on dedicated block cipher, W, which operates on a
SHA-1 does not match the security guaranteed by the new 512-bit hash state using a chained key state, both derived
announced AES Encryption standard, which is specified from the input data. The round function and the key
128-, 192-, and 256-bit keys. Many attempts have been schedule, of the W, are designed according to the Wide
taken place in order to put forward new hash functions and Trail strategy. In the following, the round function of the
match the security level with the new encryption standard. block cipher, W, is defined, and then the complete hash
function is specified. The block diagram of the W block
cipher basic round is shown in Fig. 1. from three algebraic
functions. These functions are the non-linear layer , the
175
cyclical permutation ρ, and the linear diffusion layer . as necessary to obtain a bit string whose length is an odd
So , the round function is the composite mapping þ [k], multiple of 256, and finally with the 256-bit right-justified
parameterized by the key matrix k, and given by: binary representation of L, resulting in the padded message
ρ [k] [k] (1) m,partitioned in t blocks m1, m2, ... ,mt.
ni = (mi),
H0 = W[Hi-1](ni)xor Hi-1 xor ni, 1 i t (7)
176
• Similarly the direct input of ei is XOR with the bits
holding in r.
• Finally the result of e XOR r is fed to the final
mini box eo and the result of ei XOR r is fed to
eoi.
177
of the diffusion layer are given below (equation (8)). Bytes performance of the Whirlpool implementation. It is possible
bi0, bi1, … , bi7 represent the eight bytes of the i row of to insert a negative-edge pipeline register, in round function
the output of the layer θ hash state. Table X implements the ρ, as the Fig. 3, shows (dash line, after the permutation ).
multiplication by the polynomial This register can be inserted, roughly in the middle of the
round function. This is an efficient way in order to reduce
g(x)=x modulo (x8+x4+x3+x2+1) in GF(28) (i.e. X[u] the critical path delay, with a small area (512-bit register)
α x*u ,where u denote the input of the table). penalty. So, the clock frequency can be roughly doubled
and the time performance will increase without any
bi0=ai0xor ai1xor ai3xor ai5xorai7xorX[ai2]xorX2[ai3 xor ai6] algorithm execution latency increase. Another way in order
xor X3[ai1xorai4]. to improve the implementation performance is the usage of
bi1=ai0xor ai1xor ai2xor ai4xorai6xorX[ai3]xorX2[ai4 xor ai7] more pipeline stages. It is possible to insert 3 pipeline
xor X3[ai2xorai5]. stages for the implementation of the round function ρ. The
bi2=ai1xor ai2xor ai3xor ai5xorai7xorX[ai4]xorX2[ai5 xor ai0] first positive-edge pipeline register is inserted in the same
xor X3[ai3xorai6]. position as in the previous paragraph described.
bi3=ai0xor ai2xor ai3xor ai4xorai6xorX[ai5]xorX2[ai6 xor ai1]
xor X3[a41xorai7].
bi4=ai1xor ai3xor ai4xor ai5xorai7xorX[ai6]xorX2[ai7 xor ai2]
xor X3[ai5xorai0].
bi5=ai0xor ai2xor ai4xor ai5xorai6xorX[ai7]xorX2[ai0 xor ai3]
xor X3[ai6xorai1].
bi6=ai1xor ai3xor ai5xor ai6xorai7xorX[ai0]xorX2[ai1 xor ai4]
xor X3[ai7xorai2].
bi7=ai0xor ai2xor ai4xor ai6xorai7xorX[ai1]xorX2[ai2 xor ai5]
xor X3[ai0xorai3].
IV. RESULTS
179
2-D FRACTAL ARRAY DESIGN FOR 4-D ULTRASOUND
IMAGING
Ms. Alice John, Mrs.C.Kezi Selva Vijila
M.E. Applied electronics, HOD-Asst. Professor
Dept. of Electronics and Communication Engineering
Karunya University, Coimbatore
Abstract- One of the most promising techniques for Several methods for finding sparse array layouts for
limiting complexity for real time 3-D ultra sound 4-D ultrasound imaging have been reported. Random
systems is to use sparse 2-D layouts. For a given approaches have been suggested by Turnbull et al.
number of channels, optimization of performance [4], [5] and this work has been followed by Duke
is desirable to ensure high quality volume images. University [6]-[7]. Weber et al. have suggested using
To find optimal layouts, several approaches have genetic algorithms. Similar layouts have been found
been followed with varying success. The most out by Holm et al. using linear programming and by
promising designs proposed are Vernier arrays, Trucco using simulated annealing.
but also these suffer from high peaks in the side Sparse arrays can be divided into 3 categories,
lobe region compared with a dense array. In this random, fractal, periodic. One of the promising
work, we propose new method based on the category is sparse periodic arrays [8]. These are based
principal of suppression of grating lobes. The on the principal of different transmit and receive
proposed method extends the concept of fractal layouts, where the grating lobes in the transmit array
layout. Our design has simplicity in construction, response are suppressed by receive array response and
flexibility in the number of active elements and the vice versa. Periodic arrays utilize partial cancellation
possibility of suppression of grating lobes. of transmit and receive grating lobes. Sparse periodic
arrays have a few disadvantages; one is the use of
Index Terms- 4-D Ultrasound imaging, sparse 2-D overlapping elements, another is the strict geometry
array, fractal layout, sierpinski car pet layout. which fixes the number of elements. An element in a
2-D array will occupy a small area compared to an
1. INTRODUCTION element in a 1-D. The sparse periodic array is having
high resolution but there is frequent occurrence of
The new medical image modality, volumetric side lobes.
imaging, can be used for several applications In the sparse random arrays, one element is
including diagnostics, research and non-invasive chosen at random according to a chosen distribution
surgery. Existing 3-D ultrasound systems are based on function. Due to randomness, the layouts are very
mechanically moving 1-D arrays for data collections easy to find. The sparse random arrays are having low
and preprocessing of data to achieve 3-D images. The resolution but the suppression of side lobes is
main aim is to minimize the number of channels maximum. By exploiting the properties of sparse
without compromising image quality and to suppress random arrays and sparse periodic arrays, we go for
the side lobes. New generations of ultrasound systems fractal arrays. In Fractal arrays, we can obtain high
will have the possibility to collect and visualize data resolution with low side band level by using the
in near real time. To develop the full potential of such advantages of both periodic and random arrays.
a system, an ultrasound probe with a 2-D transducer To simplify future integration of electronics into
array is needed. the probe, the sparse transmit and receive layouts
Current systems use linear arrays with more than should be chosen to be non-overlapping. This means
100 elements. A 2-D transducer array will contain that some elements should be dedicated to transmit
between 1500 and 10,000 elements. Such arrays while others should be used to receive. To increase
represent a technological challenge because of the system performance, future 2-D arrays should
high channel count [1]. To overcome this challenge, possibly include pre-amplifiers directly connected to
undersampling the 2-D array by only connecting some the receive elements.
of the all possible elements [2] is a suitable solution. The paper is organized in the following manner.
For a given set of constraints, the problem is to Section II describes fractal array design starting with
choose those elements that give the most appropriate sierpinsky fractal, carpet fractal and then pulse echo
beam pattern or image. The analysis of such sparse response. Section III describes the simulation and
array beam patterns has a long history. A short performance of different designs by adjusting the kerf
review of some of these works can be found in [3]. value. In section IV, we summarize the paper.
\
180
II. FRACTAL ARRAY LAYOUTS • Transmitter array: transmit array is drawn
using a matrix M consisting of both ones and
A fractal is generally a rough or fragmented zeros. These arrays have been constructed by
geometric shape that can be subdivided into parts, considering a large array of element
each of which is (at least approximately) a reduced- surrounded by a small matrix. In carpet
size copy of the whole, a property called self- fractal array first of all we have drawn a
similarity.The Fractal component model has the square at the right middle and this small
following important features: square will occupy 1/3rd of the original big
array. Surrounding the above built square we
• Recursivity : components can be nested in have constructed small squares.
composite components • Receiver array: in the sparse 2-D array layout
• Reflectivity: components have full introspection to avoid overlapping we are selecting
and intercession capabilities. different receiver and transmitter arrays. In
• Component sharing: a given component instance our paper we have taken those elements for
can be included (or shared) by more than one receiver array which will never cause an
component. overlapping.
• Binding components: a single abstraction for
components connections that is called bindings. D. Pulse-Echo Response
Bindings can embed any communication
semantics from synchronous method calls to
The layout should have optimal pulse-echo
remote procedure calls
performance, i.e. the pulse-echo radiation pattern
• Execution model independence: no execution
should have as low sidelobe level as possible for a
model is imposed. In that, components can be
specified mainlobe width for all angles and depths of
run within other execution models than the
interest. To compute the pulse-echo response for a
classical thread-based model such as event-based
given transmit and receive layout is time consuming.
models and so on.
A simplification commonly used is to evaluate the
• Open: extra-functional services associated to a
radiation properties in continuous wave mode in the
component can be customized through the
far field. An optimal set of layouts for continuous
notion of a control membrane.
waves does not necessarily give optimal pulse-echo
responses. To ensure reasonable pulse-echo
A. Sierpinski Fractal performance, additional criteria which ensure a
uniform distribution of elements could be introduced.
In the sierpinski fractal we have considered This will limit the interference in the sidelobe region
mainly two types between pulses transmitted from different elements
and reduce the sidelobe level.
• Sierpinski triangle
• Sierpinski carpet
B. Sierpniski Triangle
C. Sierpinski Carpet
181
III. RESULTS AND DISCUSSION D. case IV: kerf = lamda
Fractal layout exploits the advantages of both the In the last case kerf value is taken as lamda and
periodic and random arrays. Our main aim is to because of this we can see a spacing of lamda
suppress the sidelobes and to narrow down the between the elements in the array. Fig. 5(a-b) shows
mainlobe. Firstly we have created transmit and the transmitter and receiver layout. Fig. 5© shows the
receive array layouts. Both the layouts have been pulse-echo response here the mainlobe very sharp but
constructed in such a way they both won’t overlap the sidelobe level started spreading towards both
each other. Transmit array is designed using a matrix sides. Fig. 5(d) shows its intensity distribution. The
M. Iterations up to 3, were taken to construct the intensity distribution shows the spreading of the
transmit array. The intensity distributions were taken sidelobe clearly. The sidelobe level in this case is high
to find out the spreading of the sidelobe and the compared to all other cases.
mainlobe.
In our paper we have taken into consideration
different specifications such as speed of the sound
wave i.e. 1540 m/s, initial frequency, sampling
frequency as 100.10^6 HZ, width and height of the
array, kerf is also considered that is the height
between the elements in an array.
A. case I: kerf = 0
182
Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse response
for kerf=0
(c ) Pulse-Echo response
(c ) Pulse-Echo response
(d) Intensity distribution Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse echo
response for kerf=lamda/4
Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse echo
response for kerf=lamda/2
183
(d ) Intensity Distribution
IV. CONCLUSION
REFERENCES
184
PC Screen Compression
for Real Time Remote Desktop Access
Shanthini Pandiaraj, Assistant Professor, Department of Electronics & Communication Engineering
Karunya University, Coimbatore and Jagannath.D.J, Final Year Masters degree in Engineering, Karunya
University, Coimbatore.
Shanthini@karunya.edu, jj_jagannath@yahoo.co.in
186
I1I. SERVER - CLIENT COMMUNICATION GEC segments the server compound image into two
classes of pixels: text/ graphics block and pictures block.
Suitable software should be implemented in the There are normally four types of blocks: smooth background
computers, the one that transmits the desktop image and the blocks (one color), text blocks (two color), graphics blocks
one that receives the desktop image (double sided software). (four color), and picture blocks (more than four colors). In
We call them as the Server and Client. Server is the fact, the first three types can be grouped into a larger
computer that transmits its desk top that can be accessed by text/graphics class, which greatly simplifies the
the other computer. The Client is the one that receives that segmentation. The combined text/graphics class can be
image and proceeds in accessing it. coded by a lossless method.
The software should be implemented in the server Shape primitives are those elementary building
and client such that, the client receives the server’s image, units that compose text/graphics in a compound image, such
compresses it and transmits the encoded data to the server. as dots, lines, curves, triangles, rectangles, and others. Four
We are using a visual basic based IP communication different types of shape primitives are used in GEC: 1.
technique for this purpose. isolated pixels, 2. horizontal lines 3. Vertical lines, 4.
Rectangles .A shape primitive has the same interior color.
Two shape primitives can have the same shape but different
colors. A shape primitive can be represented by a color tag
and its position information, i.e., (a ,b ) is for an isolated
pixel, (a ,b ,w ) for a horizontal line, (a ,b ,h ) for a vertical
line, and (a ,b ,w ,h ) for a rectangle. Shape primitives can
be used to compactly represent the textual contents. To
encode pixels of text and graphics, a simple lossless coding
algorithm is designed to utilize the information of the
Fig.2. Block diagram of server and client extracted shape primitives. Shape primitives can be
efficiently encoded with a wavelet based SPIHT coding. The
reason that we use JPEG instead of JPEG-2000 to encode
IV. GEC --- ALGORITHM pictorial pixels is; on one hand, as the algorithm
DWT JPEG
SPIHT
187
coefficients, which are truncated to finite precision. For
perfectly reversible compression, one must use an integer
multiresolution transform, such as the S+P transform
introduced in, which yields excellent reversible compression
results when used with the new extended EZW techniques.
In GEC the pictorial pixels in picture blocks are
compressed using a simple JPEG coder. In order to reduce
ringing artifacts and to achieve higher compression ratio,
text/graphics pixels in the picture block are removed before
the JPEG coding. These pixels are coded by lossless coding
algorithm. Actually, their values can be arbitrarily chosen,
but it would be better if these values to be quite similar to the
neighbor pictorial pixels. This produces a smooth picture
block. We, fill in these holes with the average color of
pictorial pixels in the block.
188
[10] L. Bottou, P. Haffner, P. G. Howard, P. Simard, Y.
We have presented an efficient PC screen Bengio, and Y. LeCun,
compound image-coding scheme with very low complexity “High quality document image compression with DjVu,” J.
and high compression ratio for transmission of computer Electron Imag., vol. 7,
screen images. Two significant contributions are the
segmentation to extract text and graphics, and a wavelet
based lossless SPHIT coding algorithm. The advantages of
our image coding scheme is, low complexity, high
compression ratio, and visual lossless image quality. The
algorithm has been implemented in both, Client and server,
with the help of MATLAB and visual basic coding. The
resultant reconstructed image showed significant reduction
in size from 2.25 MB of the original compound image to 216
KB of the compressed compound image as shown in fig.7.
Our future work is to implement the coding for real-time
access of a remote desk top computer.
REFERENCES
189
Medical Image Classification using Hopfield Network
and Principal Components
Abstract— Medical domain is one of the principal images in a database is measured by some form of distance
application domains for image classification. Medical metrics in feature space. In the image classification task,
image classification deals with classifying input medical similarity measure technique is applied on the low-
images into a particular class to which it finds more dimensional feature space. For this, a dimensionality
similarity. This paper deals with classification of a query reduction technique is used for dimension reduction and a
image into a one of the four classes namely brain, chest, classifier is used for online category prediction of query and
breast and elbow using Hopfield Neural Network database images [3].
Classifier. Curse of dimensionality problem is solved by
using extracted principal components of images as input
to classifier. Finally, the results obtained using Hopfield
neural classifier is compared with Back Propagation Query Image
neural classifier. Classification
using Neural
Feature
Index Terms — Feature extraction, Hopfield Neural Network
Classifier, Principal components, Query image.
Extractio
Images in Model
database
INTRODUCTION
The number of digitally produced medical images is rising Fig 1. Block Diagram of an Image Classification System
x1 y1
n1
W2,1
W2,3 W2,n
x2 y2
n2
W3,1
W3,2 W3,n
x3 y3
n3
Wn,1
Wn,2 Wn,3
xn yn
nn
192
C. Class 3 – Breast using Hopfield neural classifier. The experimental work
proved that Hopfield neural classifier gives a better
performance compared to other neural classifier.
REFERENCES
CONCLUSION
Image classification finds a lot of applications in the
medical field. Survey of classifiers revealed that neural
classifiers outperformed other parametric and non parametric
classifiers. This paper dealt with classifying a query image
into one of the classes in a general medical image database
containing four classes namely brain, chest, breast and elbow
193
Delay Minimization of Sequential Circuits through Weight
Replacement
S. Nireekshan Kumar1, Grace Jency Gnannamal2
PG Scholar of VLSI Design1, Lecturer2
Department of Electronics & Communication Engineering
Karunya University, Coimbatore.
nireekshankumar@karunya.edu.in, shinywesley@gmail.com
Abstract- Optimizing sequential cycles is essential increased by the level of parallelism. Similar to the
for many types of high-performance circuits, such pipelining, parallel processing can also be used for
as pipelines for packet processing. Retiming is a reduction of power consumption.
powerful technique for speeding pipelines, but it is
stymied by tight sequential cycles. Designers Consider the three-tap finite impulse response (FIR)
usually attack such cycles by manually applying digital filter
retiming& Shannon decompisition—effectively a Y(n) = ax(n) + bx(n-1) + cx(n-2). (1.1)
form of speculation—but such manual application The block diagram implementation of this filter is
is error prone. This Paper proposes an efficient show in Fig.1.1. The critical path or the
algorithm that applies retiming & Shannon
decomposition algorithmically to optimize circuits
with tight sequential cycles.
IV. INTRODUCTION
194
same time. Consider the simple structure in fig..2(a), Retiming is a transformation technique used to
where the computation time of the critical path is 2 TA change the location of delay elements in a circuit
fig,1.2 (b), shows the 2-level pipelined structure, without affecting the input/output characteristics of
where 1 latch is placed between the 2 adders and the circuit. For example consider the fir filter in figure
hence the critical path is reduced by half. Its 2-level 2.1a. This filter is described by
parallel processing structure is shown in fig.1.2(c),
where the same hardware is duplicated so that 2 W (n) = ay (n-1) +by (n-2)
inputs can be processed at the same time and 2 Y (n) = w (n-1) + x (n)
outputs are produced simultaneously therefore, the = ay (n-2) + by (n-3) + x(n)
sample rate is increased by two. The filter fig. 2.1(b) is described by
W1 (n) = ay (n-1)
PIPELINING OF FIR DIGITAL FILTERS W2 (n) = by (n-2)
Y (n) = w1 (n-1) + w2 (n-1) +x (n)
Consider the pipelined implementation of 3-tap FIR = ay (n-2) + by (n-3) + x (n)
filter of (1.1) obtained by introducing 2 additional
latches as shown in fig.1.3 All though the filters in fig.2.1 (a) and fig.2.1b have
The critical path is now reduced from TM +2 TA to delays at different location, these filters have the same
TM + TA. In this arrangement while the left adder input/output characteristics. These 2 filters can be
initiates the computation of the current iteration the derived from one another using retiming.
right adder is completing the computation of the
previous iteration result. Retimng ahs many applications in synchronous circuit
design. These applications include reducing the clock
period of the circuit, reducing the number of registers
in the circuit. Reducing the power consumption of the
circuit, and logic synthesis
5 -2 12 20
R (3) = 12 5 7 15
7 12 14 22
5 -2 12 20
S’(U, V) = 12 5 7 15
7 12 14 22
Fig 1.3 A Sequential Circuit 5 -2 12 20
5 -2 12 20
1. Let M = t max x n, where t max is the maximum
computational time of the nodes in G and n is the 4. To determine W (U, V) & D (U, V), where W(U,
number of nodes in G.Since t max = 2 and n=4, then M V) is the minimum number of registers on any path
= 2 X 4 = 8. from node U to node V and D (U, V) is the maximum
computation time among all paths from node U to
node V with weight W (U, V).
196
if there is a solution to the 12 inequalities above, then
If U = V then W (U, V) = 0 & D (U, V) = t (U). the solution is a feasible retiming solution such that
If U = V then W (U, V) = S’(U, V) / 8 & D (U, V) = the circuit can be clocked with period c = 5.
M X W (U, V) – S’ (U, V) + t (V) The constraint graph is shown below which will not
have any negative cycles.
W (U, V) = 0 1 1 2
1 0 2 3
1 0 0 3
1 0 2 0
D (U, V) = 1 4 3 3
2 1 4 4
4 3 2 6 Fig 1.5: Restructured circuit without negative cycles
4 3 6 2
C. ISCAS 99 Sequential Benchmark Circuits
5. The values of W (U, V) & D (U, V) are used to
determine if there is a retiming solution that can The following are the sequential Benchmark circuits.
achieve a desired clock period. Given a clock period These Circuits.
‘c’, there is a feasible retiming solution r such that Phi
(Gr) < c if the following constraints hold. b01.blif
b02.blif
1. (Feasibility constraints) r (U) – r (V) < w (e) for b03.blif
every edge U to V of G. b04.blif
b05.blif
2. (Critical path constraints) r (U) – r (V) < W (U,
V) – 1 for all vertices U, V in G such that D (U, III. IMPLEMENTATION
V) > c.
The Benchmark circuits are synthesized and ran for
The Feasibility constraints forces the number of Timing and power analysis. Later the Bellman ford
delays on each edge in the retimed graph to be non algorithm is applied to the Benchmark circuits,
negative and the critical path constraints enforces Phi synthesized and finally ran for Timing and power
(G) < c. if D (U, V) > c then W (U, V) + r (V) – r (U) analysis.
> 1 must hold for the critical path to have computation
time lesser that or equal to c. This leads to critical These two results are compared and tabulated in the
path constraints. results section.
197
SL Benchmar Time Period TimePeriod [8] K. J. Singh, “Performance optimization of digital
N0 k Circuits before applying after applying circuits,” Ph.D. dissertation, Univ. California,
algorithm Algorithm Berkeley, CA, 1992.
1 B01 2.489 ns 401.99 1.103ns
MHZ 884.12 MHZ
2 B02 1.657 ns 1.012 ns
603.500 MHZ 889.52 MHZ
3 B04 9.132 ns 3.203 ns
109.505 MHZ 512.023 MHZ
REFERENCES
198
Analysis of MAC Protocol for Wireless Sensor Network
Jeeba P.Thomas, Mrs.M.Nesasudha,
ME Applied Electronics student, Sr. Lecturer
Department of Electronics & Communication Engineering,
Karunya University, Coimbatore
jeebathomas@gmail.com
199
wireless voice and data networks, but in sensor language which can efficiently manipulate bytes,
networks they are secondary. packet headers, and implement algorithms that run
The following are the major sources of over large data sets. For these tasks run-time speed
energy waste. The first one is collision. When a is important and turn-around time (run simulation,
transmitted packet is corrupted it has to be find bug, fix bug, recompile, re-run) is less
discarded, and the follow-on retransmissions important. On the other hand, a large part of
increase energy consumption. Collision increases network research involves slightly varying
latency as well. The second source is overhearing, parameters or configurations, or quickly exploring
meaning that a node picks up packets that are a number of scenarios. In these cases, iteration
destined to other nodes. The third source is control time (change the model and re-run) is more
packet overhead. Sending and receiving control important. Since configuration runs once (at the
packets consumes energy too, and less useful data beginning of the simulation), run-time of this part
packets can be transmitted. The last major source of the task is less important.
of inefficiency is idle listening, i.e., listening to ns meets both of these needs with two
receive possible traffic that is not sent. languages, C++ and OTcl. C++ is fast to run but
The aim here is to design a new MAC slower to change, making it suitable for detailed
protocol explicitly designed for wireless sensor protocol implementation. OTcl runs much slower
networks. While reducing energy consumption is but can be changed very quickly (and
the primary goal in this design. To achieve the interactively), making it ideal for simulation
primary goal of energy efficiency, for that it is configuration. ns (via tclcl) provides glue to make
needed to identify what are the main sources that objects and variables appear on both languages.
cause inefficient use of energy as well as what The tcl interface can be used in cases where small
trade-offs can make to reduce energy changes in the scenarios are easily implemented.
consumption. The new MAC tries to reduce the The simulator is initialized using the TCL
waste of energy wastage that occurs from existing interface. The energy model can be
protocols. Therefore new MAC lets its nodes implemented pretty simply in NS-2. After every
periodically sleep thus avoiding idle listening. In packet transmission or reception the energy
the sleep mode, a node will turn off its radio. The content is decreased. The time taken to transmit or
design reduces the energy consumption due to idle receive along with the power consumed for
listening. transmission or reception of a bit/byte of data is
passes as parameters to the functions .And these
II. PROTOCOL DESIGN functions would thus decrease the energy content
of the node.
The purpose of implementation is to
demonstrate the effectiveness of the new MAC B. Analysis of IEEE 802.11
protocol and to compare new protocol with 802.11
& TDMA .The steps to be followed in this In the IEEE 802.11MAC layer protocol, the basic
implementation are access method is the Distributed Coordination
1 .Study of existing protocols (IEEE Function which is based on the CSMA/CA. DCF
802.11 & TDMA) is designed for ad hoc networks, while the point
2. Design of new MAC protocol coordination function (PCF, or infrastructure
3. Comparing existing MAC protocols mode)
with new MAC protocol. adds support where designated access points (or
base-stations) manage wireless communication.
A. SIMULATOR IEEE 802.11 adopted all these features of
CSMA/CA, MACA and MACAW in its
Simulator using for the purpose of distributed coordination function. Among
implementing new protocol is Network simulator contention based protocols, the 802.11 does a very
(version 2). NS (Version-2) is an object oriented, good job of collision avoidance. Here the analysis
discrete event simulator, developed under the of the IEEE 802.11 protocol has to be conducted.
VINT project as a joint effort by UC Berkeley, The methodologies to be followed for the
USC/ISI, LBL, and Xerox PARC. It was written in analysis are
C++ with OTcl as a front-end. The simulator
supports a class hierarchy in C++ (compiled • Identifying sensor nodes (nos. 10)
hierarchy), and a similar class hierarchy within the • Giving energy model to each node
OTcl interpreter (interpreted hierarchy). • Analyzing nodes by transmitting and
The network simulator uses two languages receiving packets.
because simulator has two different kinds of things
it needs to do. On one hand, detailed simulations
of protocols require a systems programming Here in the analysis simulator used is NS 2.29, a
network simulator tool. The first step to be
200
followed is to identify the nodes as it is to be energy (in Joules) in Y axis and period in X axis.
assigned as 10. Then the energy model has to be The graph clearly specifies about the decrease in
given for each node. The transmission has to be energy as the transmission progresses.
taken place in a random manner from first node to
the last node .Here when the simulation happened
two files are getting .They are NAM Trace file and
Trce file .From the NAM file the topology of the
design has to be visible and the Trce file gives the
events occurred during transmission.
III. RESULT
NAM file:
REFERENCES
201
[5] Mark Stemm and Randy H Katz , “
Measuring and reducing energy consumption
of the network interfaces I hand-held
devices,” IEICE Transactions on
Communications, vol. E80-B, no . 8 ,pp.1125-
1131 , Aug. 1997.
[6] Jason Hill, Robert Szewczyk, Alec Woo, Seth
Hollar, David Culler, and Kristofer Pister,
“System architecture directions for networked
sensors,” in Proceedings of the 9th
International Conference on Architectural
Support for Programming Languages and
operating systems, Cambridge, MA, USA,
Nov. 2000, pp.93-104, ACM
202
Improving Security and Efficiency in WSN Using Pattern
Codes
Anu jyothy,Student ,ME (Applied Electronics)
Mrs M.Nesasudha,Sr Lecturer, Department of ECE
Karunya University, Coimbatore
anujyothi@gmail.com
ABSTRACT: Wireless sensor networks are an aircraft as it flies over the environment to be
undoubtedly one of the largest growing types of monitored may deploy them. Once distributed, they
networks today. Wireless sensor networks are fast may either remain in the locations in which they landed
becoming one of the largest growing types of or they may begin to move if necessary. Sensor
networks today and, as such, have attracted quite a networks are dynamic because of the addition and
bit of research interest. They are used in many removal of sensors due to device failure in addition to
aspects of our lives including environmental analysis mobility issues. Security in wireless sensor networks is
and monitoring, battlefield surveillance and a major challenge. The limited amount of processing
management etc. Their reliability, cost-effectiveness, power, computational abilities and memory with which
ease of deployment and ability to operate in an each sensor device is equipped makes security a
unattended environment, among other Positive difficult problem to solve. The GlomoSim network
characteristics make sensor networks the leading simulator (Global Mobile Information Systems
choice of networks for these applications. Much Simulation Library) is the simulator used which is a
research has been done to make these networks scalable simulation environment for large wireless and
operate more efficiently including the application of wired line communication networks. GloMoSim uses a
data aggregation. Recently, more research has been parallel discrete-event simulation capability provided
done on the security of wireless sensor networks by Parsec.GloMoSim simulates networks with up to
using data aggregation. Here pattern generation for thousand nodes linked by a heterogeneous
data aggregation is performed securely by allowing communications capability that includes multicast,
a sensor network to aggregate encrypted data asymmetric communications using direct satellite
without first decrypting it. In this pattern generation broadcasts, multi-hop wireless communications using
process, initially when a sensor node senses an event ad-hoc networking, and traditional Internet protocols.
from the environment, a pattern code is generated
and sends to the cluster head. This generated II. .USE OF GLOMOSIM SIMULATOR
pattern code is needed for further processes like
comparing with the existing pattern code in the After successfully installing GloMoSim, a simulation
cluster head and then receive the acknowledgement, can be started by executing the following command in
so that authentication is done and the actual data the BIN subdirectory.
can be sent. The simulator used for the /glomosim < inputfile >
implementation is GloMoSim Network Simulator The <inputfile> contains the configuration parameters
(Global Mobile Information Systems Simulation for the simulation (an example of such file is
Library). This is more efficient due to aggregated CONFIG.IN). A file called GLOMO.STAT is produced
data transmission, secure and bandwidth efficient. at the end of the simulation and contains all the
statistics generated.
Keywords - Wireless sensor networks, Security, GloMoSim has a Visualization Tool that is platform
Pattern codes, pattern generation and comparison independent because it is coded in Java. To initialize
the Visualization Tool, we must execute from the java-
I. INTRODUCTION gui directory the following: java GlomoMain. This tool
The primary function of a wireless sensor network is to allows to debug and verify models and scenarios; stop,
determine the state of the environment being monitored resume and step execution; show packet transmissions,
by sensing some physical event. Wireless sensor show mobility groups in different colors and show
networks consist of hundreds or thousands or, in some statistics. The radio layer is displayed in the
cases, even millions of sensor devices that have limited Visualization Tool as follows: When a node transmits a
amounts of processing power, computational abilities packet, a yellow link is drawn from this node to all
and memory and are linked together through some nodes within its power range. As each node receives the
wireless transmission medium such as radio and packet, the link is erased and a green line is drawn for
infrared media. These sensors are equipped with successful reception and a red line is drawn for
sensing and data collection capabilities and are unsuccessful reception. GloMoSim requires a C
responsible for collecting and transmitting data back to compiler to run and works with most C/C++ compilers
the observer of the event. Sensors may be distributed on many common platforms
randomly and may be installed in fixed locations or
they may be mobile. For example, dropping them from
203
the same lookup tables of Table 1. Are used for
temperature, pressure and humidity.
¾ PC is set to the new critical value found .For
the pressure and humidity; corresponding
critical values are appended to the end of
partially formed PC.
¾ Previous steps are applied for the pressure and
humidity readings
¾ When full pattern code is generated,
timestamp and sensor identifier is sent with
the pattern code to the cluster-head.
3.1Pattern Generation
Output: Pattern-code (PC) Table1:Look Up Table For Data Intervals And Critical
Values
•Sensing data from the environment.
•Defining intervals from threshold values set for the
environment parameters.
•Assigning critical values for intervals using pattern
seed from cluster-head.
•Generating the lookup table.
•Generating pattern codes using pattern generation
Algorithm.
•Sending pattern codes to cluster-heads
This explains how PG algorithm generates a pattern
code. Let D (d1, d2, d3) denote the sensed data with
three parameters d1, d2, and d3 representing
temperature, pressure and humidity respectively in a
given environment. Each parameter sensed is assumed
to have threshold values between the ranges 0 to 100 as
shown in Table 1. Table:2 Pattern Code Generation Table
The pattern generation algorithm performs the
following steps Pattern codes with the same value are referred as a
¾ Pattern code to be generated is initialized to redundant set. In this example, data sensed by sensor 1
empty pattern code and sensor 3 are same with each other as determined
¾ The algorithm iterates over sensor reading from the comparison of their pattern code values
values for parameters of data that are being (pattern code value 747) and they for the Redundant Set
sensed. In this case, it first considers #1. Similarly, data sensed by sensor 2, sensor 4 and
temperature sensor 5 are the same (pattern code value 755),
¾ Temperature parameter is extracted from Redundant Set #2. The cluster-head selects only sensor
sensor reading D from each redundant set (sensor 1 and sensor 5 in this
¾ For the temperature parameter, the algorithm example) to transmit the actual data of that redundant
first checks whether a new pattern seed is set based on the timestamps.
received from the cluster-head Arrival of a
seed refreshes the mapping of critical values to
data intervals. As an example, the IV. .ALGORITHM: PATTERN COMPARISON
configuration in Table 1.
¾ The data interval that contains the sensed The cluster-head runs the pattern comparison algorithm
temperature is found from the interval table. to eliminate the redundant pattern codes resulting in
Then, from the interval value, corresponding critical prevention of redundant data transmission. Cluster
value is determined from critical value table. Table 2. heads choose a sensor node for each distinct pattern
Shows the critical values for different sensor readings if code to send corresponding data of that pattern code,
204
and then chosen sensor nodes send the data in Send the pattern-codes or data packet along with the
encrypted form to the base station over the cluster-head. reference data
In pattern comparison algorithm, upon receiving all of else
the pattern codes from sensor nodes in a period of T, send the differential pattern-codes or data packets
cluster-head classifies all he codes based on endif
redundancy. While this increases the computation endWhile
overhead at the sending and receiving nodes, due to the end
significant energy consumption difference between the
computation and communication, the overall gains Choosing Sensor Nodes for Data Transmission by
achieved by transmitting smaller number of pattern Cluster heads .The technique of using lookup tables and
code bits overcomes the computational energy required pattern seed ensures that the sensed data cannot be re-
at either ends. generated from the pattern codes, which in turn help the
sensor nodes to send pattern codes without encryption.
4.1 ALGORITHM:PATTERN COMPARISON Only sensor nodes within the cluster know the pattern
seed, which ensures the security of the sensed data
Input: Pattern codes during the data aggregation.
Output: Request sensor nodes in the selected-set to send 4.3:Differential Data Transmission from Sensor Nodes
actual encrypted data. to Cluster head
205
VI. : CONCLUSION
ATED PATTERN CODE
Sensor nodes receive the secret pattern seed from the
cluster head. The interval values for the data are
defined, based on the given threshold values set for
each environment parameter. The number of threshold
values and the variation of intervals may depend on the
user requirement and the precision defined for the given
environment in which the network is deployed. The
algorithm then computes the critical values for each
interval using the pattern seed to generate the lookup
table, where the pattern seed is a random number
generated and broadcasted by the cluster-head. Pattern
Generation (PG) algorithm first maps the sensor data to
a set of numbers. Then, based on the user requirements
and precision defined for the environment in which the
network is deployed, this set of numbers is divided into
intervals such that the boundaries and width of intervals
are determined by the predefined threshold values. PG
algorithm then computes the critical values for each
interval using the pattern seed and generates the interval
5.2:SELECTED UNIQUE SET OF PATTERN CODES and critical value lookup tables. The interval lookup
table defines the range of each interval and the critical
value lookup table maps each interval to a critical
value. Upon sensing data from environment, the sensor
node compares the characteristics of data with the
intervals defined in the lookup table of PG algorithm.
Then, a corresponding critical value is assigned to each
parameter of the data; concatenation of these critical
values forms the pattern code of that particular data.
REFERENCES
206
[4] A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and
D.E. Culler, “SPINS: Security protocols for sensor
network”, Wireless Networks, vol. 8, no. 5, pp. 521-
534, 2002.
207
NCVCCC’08
208
NCVCCC’08
the fitness values of all the previous individuals).
The accumulated fitness of the last individual
should of course be 1 (otherwise something went
wrong in the normalization step!).
3. A random number R between 0 and 1 is chosen.
4. The selected individual is the first one whose
accumulated normalized value is greater than R.
209
NCVCCC’08
approach using multi-thresholding of three gray-level
regions is implemented using threshold values selected from
grayscale value range of 0 to 255.
210
NCVCCC’08
III. SIMULATION RESULTS [7] Z.Michalewicz. Genetic Algorithms + Data
structures=Evolution Programs. Springer-Verlag Berlin
Deterministic selection has the ability to reach the Heidelberg New York,second edition,1999.
highest maximum fitness, followed by roulette-wheel and
tournament for all the test images as shown in figure 4.
V. CONCLUSIONS
REFERENCES:
211
NCVCCC’08
Abstract Harmonics are the unwanted components, and Active filters are an effective means for harmonic
are dynamic in nature, that are generated by the usage of compensation of nonlinear power electronic loads, of
non linear loads. The increase in the number of non- particular importance as electric utilities enforce harmonic
linear loads has increased harmonic pollution in standards such as IEEE 519. Harmonic compensation is an
industries. Switched-mode power supplies, PWM extremely cost-sensitive application since the value-added to
inverters, voltage source inverters, fluorescent lighting the user is not apparent. Today active filters are more easily
with electronic ballasts, and computers are the major available for loads greater than 10 kVA and are costly. For
sources of harmonics current. Today’s automated facility effective active filtering, measurement of harmonics is
relies heavily on electronic systems, which increase required. This project proposes a method in which the
harmonics on the line side of the power distribution measurement of harmonics is performed on a FPGA
plant. While the fundamental current levels may be incorporating an Adaptive Neural Network called Adaline.
within specification, the harmonics can add the same The shunt active filter systems are used to improve the
amount of current as the fundamental. power factor (Chongming Qiao et al 2001).
Unmitigated, these harmonics produce heat, unwanted The shunt active filter is a boost topology based current
trips, lockups and poor power factor. Preventive controlled voltage source converter. The shunt active filter
solutions for the harmonics are phase cancellation or (SAF) is connected in parallel to the source and the
harmonic control in power converters and developing nonlinear load as shown in Figure 1.1 and SAF response is
procedures and methods to control, elimination of shown in Figure 1.2.The power factor is improved by
harmonics in power system equipment. Remedial compensating for harmonic currents. The control objective
solutions are use of filters and circuit detuning which on the shunt active filter is defined as: Shift the phase angle
involves reconfiguration of feeders or relocation of of the input current with the phase angle of the fundamental
capacitor banks to overcome resonance. Passive solutions component of the load current.
will correct one problem but add another. If the This proposed control strategy produces a current reference
load condition varies, passive systems can actually cause using phase shifting method on the sensed input currents and
resonances that can accelerate failure. But, active filters then it is applied on the resistive emulator type input current
have overcome these problems associated with passive shaping strategy. The phase shifting control technique has
filters. This project uses adaline algorithm to identify an advantage of compensating only for harmonic current, but
and measure the current harmonics present in the power this technique is capable of compensating for reactive
line. Since this measurement technique measures the current along with harmonic current as well (Hasan
harmonics in shorter time, it can be effectively used in Komurcugil et al 2006).
active filters. This adaline algorithm is implemented on a
FPGA platform
I. INTRODUCTION
212
NCVCCC’08
IEEE Std. 519-1992. Together with the active filtering, it is
also possible to control power factor by injecting or
absorbing reactive power from the load.
Active harmonic compensation an active harmonic filter
(conditioner) is a device using at least one static converter to
meet the “harmonic compensation” function (Hasan
Komurcugil 2006). This generic term thus actually covers a
Figure 1.2 SAF response wide range of systems, distinguished by:
1. The number of converters used and their
II. REDUCTION OF HARMONICS association mode,
A.Preventive solutions 2. Their type (voltage source, current source),
Phase cancellation or harmonic control in power 3. The global control modes (current or
converters voltage compensation),
Developing procedures and methods to control, reduce 4. Possible associations with passive
of eliminate harmonics in power system equipment components (or even passive filters).
B. Remedial solutions The only common feature between these active systems is
Use of filters Circuit detuning which involves that they all generate currents or voltages which oppose the
reconfiguration of feeders or relocation of capacitor harmonics created by non-linear loads. The most instinctive
banks to overcome resonance. application is the one shown in Figure 2.1 which is normally
C.Harmonic filters known as “shunt” (parallel) topology (Souvik
Important general specification to consider when searching Chattopadhyay et al 2004). It generates a harmonic current
for harmonics filters include the type and signal type. which cancels current harmonics on the power network side.
Harmonic filter isolate harmonic current to protect electrical when the current reference applied to this control is (for
equipment from damage due to harmonic voltage distortion. example) equal to the harmonic content of the current
They can also be used to improve power factor. Harmonic absorbed by a external non linear load, the rectifier cancels
filters are generally careful applications to ensure their all
compatibility with the power system and all present and harmonics at the point of common coupling: this is known as
future non-linear loads. Harmonics filter tend to be active harmonic filter as shown in Figure 2.2.
relatively large and can be expensive. Harmonic filter type
includes:
1. Passive Filter
2. Active Filter
D.Active filters
Active filters are those which consist of active components
like thyristors, IGBTs, MOSFETs etc. Active filtering
techniques have drawn great attention in recent years. Active Non-
filters are mainly for the purpose of to compensate transient linear
Active Load (s)
and harmonic components of load current iL so that only Harmonic
fundamental components remain in the grid current. Active Filter
filters are available mainly for low voltage networks. The
active filter uses power electronic switching to generate
harmonic currents that cancel the harmonic currents from a
nonlinear load (Victor M. Moreno, 2006).
By sensing the nonlinear load harmonic voltages and /or Figure 2.1 Shunt-type Active harmonic Filter
currents, active filters use either, Power
1. Injected harmonics at 180 degrees out of phase with Non-
the load harmonics or linear
2. Injected/absorbed current bursts to hold the voltage
wave form within an acceptable tolerance.
A shunt active filter consists of a controllable voltage source
behind a reactance acting as a current source. The Voltage Control
Source Inverter (VSI) based SAF is by far the most common
Convert
type used today, due to its well-known topology and
straightforward installation procedure. It consists of a de-link Figure 2.2 Operation
Power Network of Active harmonic Filter
capacitor, power electronic switches and filter inductors
between the VSI and the supply line. The operation of shunt III. HARMONIC ESTIMATION IN A POWER SYSTEM
active filters is based on injection of current harmonics in USING ADALINE ALGORITHM
phase with the load current harmonics, thus eliminating the
harmonic content of the line current (Jong-Gyu Hwang et al An adaptive neural network approach is used for the
2004). When using active filter, it is possible to choose the estimation of harmonic components of a power system. The
current harmonics to be filtered and the degree of neural estimator is based on the use of an adaptive
attenuation. The size of the VSI can be limited by using perceptron comprising a linear adaptive neuron called
selective filtering and removing only those current Adaline (Shatshat El R et al 2004). The learning parameters
harmonics that exceed a certain level, e.g. the level set in in the proposed algorithm are adjusted to force the error
213
NCVCCC’08
between the actual and desired outputs to satisfy a stable called Adaline is used as shown in Figure 3.2.The
difference error equation. The estimator tracks the Fourier performance of the proposed neural estimation algorithm is
coefficients of the signal data corrupted with noise and very dependent on the initial choice of weight vector w, and
decaying DC components very accurately. Adaptive tracking the learning parameters. An optimal choice of the weight
of harmonic components of a power system can easily be vector can produce faster convergence to the true values of
done using this algorithm. Several numerical tests have been the signal parameters. This can be done by minimising the
conducted for the adaptive estimation of harmonic RMS error between the actual and estimated signals starting
components of power system signals mixed with noise and with an initial random weight vector.
decaying DC components.Estimation of the harmonic y(t
Sin(t)
components in a power system is a standard approach for the
Sin2
assessment of quality of the delivered power. There is a
rapid increase in harmonic currents and voltages in the Sin(
∑ +
present AC systems due to the increased introduction of Sin(
y(c
solid-state power switching devices. Transformer saturation Sin( r
in a power network produces an increased amount of current
harmonics. Consequently, to provide the quality of the Sin(
delivered power, it is imperative to know the harmonic
parameters such as magnitude and phase. This is essential Weight
for designing filters for eliminating and reducing the effects updation
of harmonics in a power system. Many algorithms are algorithm
available to evaluate the harmonics of which the Fast Fourier Figure 3.1Block diagram of the adaline
Transform (FFT) developed by Cooley and Tukey is widely x(t) – input to adaline
used. Other algorithms include, recursive DFT, spectral w – Weight value
observer and Hartley transform for selecting the range of y(c) – estimated value
harmonics. The use of a more robust algorithm is described err – error value
by (Narade Pecharanin et al 1994) which provides a fixed y(t) – target value
gain Kalman filter for estimating the magnitudes of Once the weight vector is optimised, this can be used for
sinusoids of known frequencies embedded in an unknown online tracking of the changes in the amplitude and phase of
measurement noise, which can be a mixture of both the fundamental and harmonic components in the presence
stochastic and deterministic signals. of noise etc.
In tracking harmonics for the large power system, where it is
difficult to locate the magnitude of the unknown harmonic IV.MATHEMATICAL DESCRIPTION
sources, a new algorithm based on learning principles is used
by (Narade Pecharanin et al 1994). This method uses neural The general form of wave form is
networks to make initial estimates of the harmonic source in N
the power system with nonlinear loads. To predict the y(t)= (A sin( t+ )+ (t) (4.1)
voltage harmonics, the artificial neural network based on the =1
back propagation learning technique is used. An analogue where,
neural method of calculating harmonics uses the A -Amplitude of Harmonics
optimization technique to minimize error. This is an -Phase of Harmonics
interesting application from the point of view of VLSI The discrete-time version of signal represented by (4.1) is
implementation.This new approach is to find the adaptive N 2 k
estimation of harmonics using a Fourier linear combiner. Y(k)= sin(A —— + ) + (k) (4.2)
The linear combiner is realized using a linear adaptive neural =1 Ns
network called Adaline. An Adaline has an input sequence, The input to the Adaline is given by
an output sequence and a desired response-signal sequence. 2 k 2 k 4 k 4 k
It also has a set of adjustable parameters called the weight X(k)=[sin — cos —— sin —— cos ——
vector. The weight vector of the adaline generates the Ns Ns Ns Ns
Fourier coefficients of the signal using a nonlinear weight
adjustment algorithm based on a stable difference error 2N k 2N k
equation.This approach is substantially different from the
back propagation approach and allows one to better control ….. sin —— cos —— ]T (4.3)
the stability and speed of convergence by the appropriate Ns Ns
choice of parameters of the error difference equation (S N Where,
Sivanandam, 2006). Several computer simulation tests are ƒs
conducted to estimate the magnitude and phase angle of the Ns = —
harmonic components from power system signals embedded ƒo
in noise very accurately. Further, the estimation technique is ƒs = Sampling frequency
highly adaptive and is capable of tracking the variations of ƒo = Nominal power system frequency
amplitude and phase angle of the harmonic components. The T = Transpose of a quantity
performance of this algorithm is showing its superiority and The weight vector of Adaline is updated using Widrow-Hoff
accuracy in the presence of noise. To obtain the solution for delta rule
the on-line estimation of the harmonics, the use of an e(k) X(k)
adaptive neural network comprising a linear adaptive neuron W(k+1)= W(k) + ————— (4.4)
214
NCVCCC’08
X T (k). X(k)
Where, Figure 5.2 Waveform captured using Power Quality
X (k) = input vector at time k Analyzer
Y^ (k) = estimated signal amplitude at time k The harmonics coefficients measured Power Quality
Y (k) = actual signal amplitude at time k Analyzer and estimated harmonics coefficients using adaline
e (k) = y(k) –y^ (k ), error at time k algorithm is compared in Table 5.1. Using adaline algorithm
= reduction factor we can measure the harmonics in a single cycle.
Then signal Y(k) becomes
Y(k) =Wo T X(k) (4.5) Table5.1 Comparison of harmonic orders
Harmonics Values Output after Error
Where, Order obtained 10 epochs %
Wo = weight vector after final convergence from PQA using Adaline
The amplitude and phase of the Nth harmonic is given by algorithm
1 68.4 67.61 0.71
AN = Wo2(2N-1) +Wo2(2N) (4.6) 2 18.1 18.19 0.51
3 57.8 57.24 0.96
N = tan -1{Wo (2N-1) / Wo (2N)} (4.7) 4 11.6 11.44 1.36
5 39.7 39.28 1.05
V.RESULT AND DISCUSSIONS 6 4.7 4.356 7.30
7 19.5 19.68 0.94
The code for adaline algorithm is developed and verified
using MATLAB. And for the implementation its equivalent 8 2.6 2.79 7.48
source is written using C. and it is modified for compatible 9 5.4 5.37 0.52
with the Code Composer Studio (CCS). The output values in 0 4.6 4.53 1.45
the memory can be seen by using the watch window as 11 4.2 4.39 4.59
shown in
12 4.1 3.90 4.66
Figure 5.1
13 4.3 4.45 3.58
14 2.7 2.91 7.94
15 1.5 1.35 9.87
16 4.0 4.23 5.95
17 2.0 2.08 4.15
18 3.1 3.11 0.42
19 2.3 2.19 4.47
20 1.5 1.54 2.91
21 1.5 1.44 3.99
22 1.5 1.55 3.62
23 0.5 0.61 23.40
24 1.5 1.61 7.43
25 1.0 1.14 14.53
26 1.4 1.33 4.29
Figure 5.1 Output values in the memory
27 1.0 1.08 8.03
In this experiment personal computer is used as the non- 28 1.2 1.33 11.20
linear load. The supplied voltage and current drawn by 29 1.0 1.06 6.77
personal computer waveform is shown in Figure 5.2. This 30 0 0 --
waveform is captured using Power Quality Analyzer (PQA).
31 0 0 --
49 0 0 --
THD Val 113.2% 112.8% 0.35%
215
NCVCCC’08
VI. CONCLUSION
REFERENCES
216
NCVCCC’08
∑ (X * W) ………..(1)
They are broadly classified into three main forms - evolution
NET = strategies, genetic algorithms, and evolutionary
i=1 programming. Unlike gradient-based training methods, viz.
back propagation, GAs rely on probabilistic search
The design of ANN has two distinct steps[1]; technique. Though their search space is bigger, they can
1) Choosing a proper network ensure that better solutions are being generated over
architecture and generations [4].
2) Adjusting the parameters of a network so as to The typical approach called ‘non-invasive’ technique uses
minimize a certain fit criterion. Back propagation Neural Network for weight learning and
evolutionary algorithm for network optimisation. Back
217
NCVCCC’08
propagation is a method for calculating the gradient of error
with respect to weights and requires differentiability. IV.FITNESS COMPUTATION
Therefore back propagation cannot handle discontinuous
optimality criteria or discontinuous node transfer functions. Proper balance has to be mainted between ANN’s
When nearly global minima are well hidden among the network complexity and generalization capability. Here the
local minima, back propagation can end up bouncing fitness function (Qfit) considers three important criterion[5]:
between local minima without much overall improvement. Classification accuracy (Qacc); Training Error -percentage
This leads to very slow training [3]. Back propagation neural of normalized mean-squared error (Qnmse) and Network
network has some influence over evolutionary algorithm complexity (Qcomp).
which causes local optimisation. So it is necessary to use a They are defined as follows
new method employing the evolutionary algorithm [4].
Correct
Qacc = 100 * (1 − ) ……………. (2)
III.INVASIVE METHOD Total
P N 2
∑∑ (T − O )
100
The proposed technique named modified invasive Qnmse = * i i ……….. (3)
technique, where weight adaptation and network evolution is NP j =1 i =1
carried out using GA. More importantly the GAs relying on C
the crossover operator does not perform very well in Qcomp = ………………………. (4)
searching for optimal network topologies. So more Ctot
preference is given to Mutation [2]. So a modified Genetic
Algorithm (GA) for Neural Network with more fondness Qfit = * Qacc + * Qnmse + * Qcomp
given to mutation technique named Mutation based Genetic (5)
Neural Network (MGNN) is proposed to evolve network Where:
structure and adapt its weights at the same time [5]. N - Total number of input patterns
The applications of Genetic Algorithm in ANNs design P - Total number of training patterns,
and training are mostly concentrated in finding suitable T - Target
network topologies and then training the network. GAs can O- Network output,
quickly locate areas of high quality solutions when the The value of Ctot is based on the size of its input (in), output
search space is infinite, highly dimensional and multimodal. (out), and the user-defined maximum number of hidden
A. Evolution of Connection Weights nodes (hid).
Training a given network topology to recognize its Ctot = in x hid + hid x out
purpose generally means to determine an optimal set of The user-defined constants , and are set to small
connection weights. This is formulated as the minimization values ranging between 0 and 1. They are used to control
of some network error function, over the training data set, by the strength of influence of their respective factors in the
iteratively adjusting the weights [4]. The mean square error overall fitness measure. In the implemented ANN parity
between the target and actual output averaged over all output function favouring accuracy over training error and
nodes serve as a good estimate of the fitness of the network complexity for =1, = 0.70, and =0.30.
configuration corresponding to the current input.
B. Evolution of Architecture
A neural network’s performance depends on its V. ALGORITHM
structure. The representation and search operators used in
GAs are two most important issues in the evolution of In the proposed algorithm initially, a population of
architectures. An ANN structure is not unique for a given chromosomes is created. Then, the chromosomes are
problem, and there may exist different ways to define a evaluated by a defined fitness function. After that, any two
structure corresponding to the problem. Hence, deciding on chromosomes are selected for performing genetic operations
the size of the network is also an important issue [1] [4]. based on their fitness. Then, genetic operations namely
crossover and mutation are performed, (with more
1
preference given to mutation). The produced offspring
a
replace their parents in the initial population. This GA
2 x process repeats until a user-defined goal is reached. In this
b
paper, the standard GA is modified and a different method
y
[2][5] of generating offsprings are introduced to improve its
3
c performance.
A. Initial Population
4
First the weight matrix size is defined; it depends on the
W1 W2
number of hidden nodes. Then a set of population is
Fig 2. ANN architecture generated by assigning some random numbers.
. P = {p1, p2, p3,….., ppop-size}
Too small a network will prohibit it from learning the Here pop-size denotes the population size. Each member in
desired input to output mapping; too big a one will fail to this set denotes a weight matrix having a particular order
match inputs properly to previously seen ones and lose on corresponding to the number of connections in the network
the generalization ability. The structure is reduced by from one layer to another.
following ‘CoDi-1’ encoding. B. Evaluation
218
NCVCCC’08
Each weight matrix in the population will be evaluated by 3 a) 4-Bit parity
the defined fitness
function Qfit.
C. Selection
The weight matrix having best fitness value is selected
based on the modified GA approach [5].
D. Genetic Operations
Genetic operations namely mutation and crossover are
used to generate new offspring, finally the offspring with
with maximum fitness is considered.
1) Crossover: Two weight matrices p1 and p2 are taken from
P. The four new offsprings due to crossover mechanism is
formed [5] based on the modified scheme. Pmax and Pmin
are two matrices formed by the maximum and minimum
range of the element in the population, w ε [0 1]. Then max 4 b) 5-Bit Parity
(p1,p2) and min (p1,p2) denote the vectors with each
element obtained by taking the maximum and minimum
among the corresponding element of p1 and p2 respectively.
2) Mutation: The offspring OSc is taken and the mutation is
performed by selecting an element with certain probability
and its value is modified randomly based on the error at the
output [2].From the above four offsprings the one with the
largest fitness value is used as an offspring of crossover
operation [5].
E. Stopping criterion
Depending on the training performance and validation
performance, generation of offsprings will stop only if the
convergence rate is too low or the network output reaches
the specified goal [1].
219
NCVCCC’08
The effect of the GNNs performance against standard GA illustrate that fitness based selection is better than the usual
was measured using the following dependent variables: rank based selection, in terms of size, performance and time
• Percentage of wrong classification (Class Error); of computation.By pruning the neural network structure
• Number of connection weights (Connections); using this algorithm, hardware implementation becomes
• N easy and efficient. The NN can be modelled in to a
4-bit 5-bit 6-bit u reconfigurable device and tested.
m
No. of bit errors
No. of epochs
No. of epochs
b VIII. REFERENCES
Rank
Fitness
Fitness
Fitness
e
[1]X.Yao,“Evolving artificial neural networks,” Proc. IEEE, vol. 87, no.9,
r
pp. 1423–1447, Sep. 1999.
o [2]Paulito.P.Palmes, Taichi Hayasaka, “Mutation based Genetic Neural
f Network”, IEEE Trans on Neural Networks, Vol.16, no.3, pp587-600, 2005.
e [3]J. D. Schaffer, D. Whitley, and L. J. Eshelman, “Combinations of GA
1* 73 30 1 64 40 2 53 46 3 p and neural networks: A survey” in Proc. Combinations of Genetic
Algorithms and Neural Networks, pp1–37, 1992.
10* 25 58 4 31 70 5 20 86 10 o [4] V. Maniezzo, “Genetic evolution of the topology and weight distribution
12 4.2 ts 6 5.6 ts 6 3 ts 15 c of neural networks,” IEEE Trans. Neural Networks, vol. 5, pp. 39–53,1992.
h [5]F. Leung, H. Lam, S. Ling, and P. Tam, “Tuning the structure and
* values in plot(Fig. 3) ts-training stopped parameter of a neural network using an improved genetic algorithm”,IEEE
s
Trans. on Neural Network., vol. 14, no. 1, pp. 79–88, Jan. 2003.
(generations).
TABLE II
Simulation results of the Proposed & standard GA
Hidden
Epochs
Epochs
Fitness
Fitness
nodes
Rank
1 5 30 72.54 36 63.26
4 10 5 58 25.3 60 16.51
1 6 38 66.25 25 51.1
5 10 6 66 32 65 28.1
1 7 46 54.81 37 43
6 10 7 80 23.41 76 11.46
VII. CONCLUSION
220
NCVCCC’08
Abstract-Local Positioning signifies finding the requires integrate and dump operation at the symbol rate [3].
positioning of fixed or moving object in a closed or Using this algorithm less number of training symbols are
indoor environment. This paper deals with FPGA necessary for synchronization. It can be observed from Fig
implementation of Time-of-Arrival Estimator using the 1, that TOA Estimation consists of Channel Impulse
Distorted Template Architecture. The WLAN standard Response Estimator: maximum likelihood channel
IEEE 802.11a frame format is used as the basis for estimation is chosen; Candidate IR; Convolutor and Cross-
building Localization system. This paper serves as a Correlator.
reference for any future work that is done in on
localization scheme implementation since as of now no
hardware model exists.
I. INTRODUCTION
221
NCVCCC’08
resource on FPGA, should operate at a higher speed and the output of the candidate impulse response is represented
should consume less power. as 1, 2, 3, 4, 5.
Each block shown in Fig 1 has been separately modeled and Convolution and correlation is performed using 3 signals at a
verified for its functionality and performance. time, this is carried out using the signals 1,2, 3; 2,3,4 and
a) Maximum Likelihood Channel Estimator: 3,4,5 respectively.
It has been observed through computation that Maximum
Likelihood technique though computationally extensive, V. SIMULATION RESULTS
gives better accurate results than other methods like Least
Square, Minimum Mean Square. On choosing appropriate The model has been simulated using MATLAB, which is
algorithm this technique can be realized with less hardware. further used to validate the HDL results.
Maximum Likelihood Channel Estimator [3] is governed by RTL code has been written using Verilog HDL using Xilinx
the equation 1. It can be observed that the received signal ISE 9.1i and simulation has been performed using ModelSim
has to be multiplied with the PseudoInverse to get the XE 6.1e.
channel estimates. Fig 4, shows the output obtained from channel impulse
response estimator. Fig 5, shows the output
The Block Diagram for channel impulse response estimator obtained from candidate impulse response. Fig 6 shows the
is shown in Fig 3. The PseudoInverse part of the equation output obtained from maximum selection after correlation.
1.1 is stored in the ROM’s. The multiplication operation is Fig 7, shows the output obtained from final Time-of-Arrival
split into two parts, and combined before shift and addition Estimator, thus 3 outputs along with their positions can be
operation. Thus the matrix multiplication operation can be observed in Fig 7.
done using minimum number of multipliers.
REFERENCES
Abstract – This paper presents a practical design There are many substrates with various dielectric constants
procedure for Micro strip Patch Antenna for low, that are used in wireless applications. Those with high
medium and high dielectric constant substrate with dielectric constants are more suitable for lower frequency
single, double and four patches in series and parallel . applications in order to help minimize the size. Alumina
The design process starts with the theoretical design of laminates are some of the most widely used materials in the
the antenna. Finally, the results of the implementation of implementation of microwave circuits. Alumina laminate is
the designs are presented using SONNET software and most widely used for frequencies up to 20GHz.
compared to get the best possible design. The Alumina laminate has several advantages over the less
expensive FR4 substrate [2]. While the FR4 becomes very
Key words: Micro strip patch antenna, substrate, unstable at high frequencies above 1 GHz, the Alumina
radiation. laminate has very stable characteristics even beyond 10
I INTRODUCTION GHz. Furthermore, the high dielectric constant of the
ceramic-filled Alumina reduces the size of the micro strip
A micro strip patch antenna is a narrowband, wide- beam circuit significantly compared [3] to one that is designed
antenna fabricated by etching the antenna element pattern in using FR4.
metal trace bonded to an insulating substrate [1]. Because
such Antennas have a very low profile, are mechanically II RADIATION MECHANISM
rugged and can be conformable, they are often mounted on
the exterior of aircrafts and spacecrafts, or are incorporated Micro strip antennas are essentially suitably shaped
into mobile radio communications devices. discontinuities that are designed to radiate. The
Micro strip antennas have several advantages compared to discontinuities represent abrupt changes in the micro strip
conventional microwave antennas; therefore many line geometry [4]. Discontinuities alter the electric and
applications cover the broad frequency range from 100 MHz magnetic field distributions. These results in energy storage
to 100 GHz. and sometimes radiation at the discontinuity. As long as the
Some of the principal advantages compared to conventional physical dimensions and relative dielectric constant of the
microwave antennas are: line remains constant, virtually no radiation occurs.
• Light weight, low volume, end thin profile However the discontinuity introduced by the rapid change in
configurations, which can be made conformal. line width at the junction between the feed line and patch
• Low fabrication cost. radiates. The other end of the patch where the metallization
• Linear, Circular and dual polarizations antenna can abruptly ends also radiates. When the field on a micro strip
be made easily. line encounters an abrupt change in width at the input to the
• Feed lines and matching networks can be fabricated patch, electric fields spread out. It creates fringing fields at
simultaneously with the antenna this edge, as indicated.
However micro strip antennas also have limitations
compared to conventional microwave antennas:
• Narrow bandwidth and lower gain. III MICROSTRIP LINES
• Most micro strip antennas radiate into half space.
A micro strip line consists of a single ground plane and a
thin strip conductor on a low loss dielectric substrate above
• Polarization purity is difficult to achieve
the ground plate. Due to the absence of the top ground plate
• Lower power handling capability.
and the dielectric substrate above the strip, the electric field
lines remain partially in the air and partially in the lower
The general layout of a parallel coupled microstrip patch
dielectric substrate. This makes the mode of propagation not
antenna is shown in Figure 1.
pure TEM but what is called quasi-TEM [5]. Due to the
open structure and any presence in discontinuity, the micro
strip line radiates electromagnetic energy. The use of thin
and high dielectric materials reduces the radiation loss of the
open structure where the fields are mostly confined inside
the dielectric.
224
NCVCCC’08
Losses in micro strip lines: of the antenna will be reduced; secondly it is very easy to
construct the antenna. Based on the antenna knowledge
Two types of losses exist:- concentration has been put on the linearly polarized
(1) Dielectric loss in the substrate: Typical dielectric transmitted signal, because the bandwidth of the linearly
substrate material creates a very small power loss at polarized antenna is greater than the circularly polarized
microwave frequencies. The calculation of dielectric loss in antenna. Linear polarization is preferred as compared to
a filled transmission line is easily carried out provided exact circular polarization because of the convenience of a single
expressions for the wave mechanisms are available but for feed than a double feed. Moreover the construction of
micro strip this involves extensive mathematical series and linearly polarized rectangular patch antenna [7], [8] is
numerical methods. simpler than the other polarization configurations.
(2) Conductor loss: This is by far the most significant loss DESIGN CALCULATION FORMULAE
effect over a wide frequency range and is created by high
current density in the edge regions of the thin conducting The operating frequency f r
strip. Surface roughness and strip thickness also have some
Thickness of the dielectric medium,
bearing on the loss mechanism
c
h ≤ 0.3 ×
The total attenuation constant can be expressed as = d 2× Π × fr × ε r
+ c , where d , c are the dielectric and ohmic constants. Thickness of the grounded material alumina,
c
QUASI TEM MODE OF PROPOGATION h ≤ 0.3 ×
2 × Π × fr × ε r
The electromagnetic waves in free space propagate Width of metallic patch,
in the transverse electromagnetic mode (TEM). The electric −1
⎛ c ⎞ ⎡ ε r + 1⎤ 2
and magnetic fields are mutually perpendicular and further W = ⎜⎜ ⎟⎟ × ⎢ ⎥
in quadrature with the direction of i.e. along the transmission ⎝ 2 × fr ⎠ ⎣ 2 ⎦
line Coaxial and parallel wire transmission line employ TEM Length of metallic patch, L
mode of. In this mode the electromagnetic field lines are c
contained entirely within the dielectric between the lines. L= − 2∆l
2 × f r × ε reff
But the micro strip structure involves an
abrupt dielectric interface between the substrate and the air Where,
above it. Any transmission line system which is filled with a ⎡ (ε reff + .03)× (W + .264 h ) ⎤
∆l = .412 × h × ⎢ ⎥
⎣⎢ (ε reff − .258 )× (W + .8h ) ⎦⎥
uniform dielectric can support a single well defined mode of
propagation at least over a specific range of frequencies
−1
(TEM for coaxial lines TE or TM for wave guides.) ε r +1 ε r −1 ⎛ ⎛ 12 × h ⎞ ⎞ 2
Transmission lines which do not have such a uniform ε reff = + × ⎜⎜1 + ⎜ ⎟ ⎟⎟
dielectric filling cannot support a single mode of
2 2 ⎝ ⎝ W ⎠⎠
propagation. Micro strip falls in this category [9]. Here the
bulk of energy is transmitted along the micro strip with a
field distribution which quite closely resembles TEM and is
usually referred to as Quasi – TEM.
The micro strip design consists of finding the
values of width (w) and length (l) corresponding to the
characteristic impendence (Zo) defined at the design stage of
the network. A substrate of permittivity(Er) and thickness (h) V IMPLEMENTATION OF THE PROJECT
is chosen. The effective micro strip permittivity (Eeff) is
unique to a fixed dielectric transmission line system and
provides a useful link between various wave lengths
impedances and velocities [6].
The micro strip in general, will have a finite strip
thickness, ‘t’ which influences the field distribution for
moderate power applications. The thickness of the
conducting strip is quite significant when considering
conductor losses.
For micro strip with t /h ≤ 0.005, 2 ≤ Er ≤ 10
and w /h ≥ 0.1, the effects of the thickness are negligible.
But at smaller values of w /h or greater values of t / h the
significance increases.
Fig 1 Single patch antenna
IV DESIGN PROCESS OF ANTENNA
Through all the design process, air gap has been used to
build the antenna structures. The reason for choosing this is
because by using certain dielectric substrates the efficiency
225
NCVCCC’08
VI PERFORMANCE ANALYSIS
226
NCVCCC’08
Fig 12 Four patch with low dielectric 2.2 for 9.5GHz
VII RESULT
Thus the micro strip patch antenna was designed
and simulated with various substrates for single, double and
four patches in series and parallel to observe the difference
in the performance and in turn the responses. Performance
analysis shows when the dielectric constant increases
magnitude of the response increases. Thus increasing the
Fig 11 Double patch with High dielectric 12.9 for 6.5GHz magnitude will correspondingly decreasing antenna size.
Further increment of number of patches in series and parallel
enhances the performance of antenna.
227
NCVCCC’08
VIII. CONCLUSION
This paper has concentrated on an antenna design.
A method for the rigorous calculation of the antenna has also
been developed. The measured responses have good
agreement with the theoretical predictions. The main Quality
of the proposed antenna is that it allows an effective design
maintaining all the advantages of micro strip antennas in
terms of size, weight and ease of manufacturing. The
compactness in the circuit size makes the design quite
attractive for further developments and applications in
modern radio systems especially in the field of Software
Defined Radio receivers. It has been shown that the new
class of antennas holds promise for wireless and mobile
communications applications
REFERENCES
228
Motion Estimation of The Vehicle Detection and Tracking
System
Mr.A.Yogesh ,PG Scholar and Mrs.C. Kezi selva vijila,Assistant professor,
Electronics and communication engineering, Karunya University
Abstract—In this paper we are dealing with problems, different approaches using different features
increasing congestion on freeways and problems and learning algorithms for locating vehicles have been
associated with existing detectors. Existing investigated. Background subtraction [2-5] is used to
commercial image processing systems work well in extract motion features for detecting moving vehicles
free-flowing traffic, but the systems have difficulties from video sequences. However, this kind of motion
with congestion, shadows and lighting transitions. feature is no longer usable and found in still images. For
These problems stem from vehicles partially dealing with static images, Wu et al. [6] used wavelet
occluding one another and the fact that vehicles transform to extract texture features for locating
appear differently under various lighting conditions. possible vehicle candidates from roads. Then, each
We are proposing a feature-based tracking system vehicle candidate is verified using a (PCA)principal
for detecting vehicles under these challenging component analysis classifier. In addition, Sun et al. [7]
conditions. This paper describes the issues associated used Gabor filters to extract different textures and then
with feature based tracking, presents the real-time verified each vehicle candidate using a (SVM) support
implementation of a prototype system, and the vector machines classifier. In addition to textures,
performance of the system on a large data set. symmetry is another important feature used for vehicle
detection. In [8], Broggi et al. described a detection
Index Terms -- Vehicle Tracking, Video Image system to search for areas with a high vertical symmetry
Processing. as vehicle candidates. However, this cue is prone to
I.INTRODUCTION false detections such as symmetrical doors or other
objects. Furthermore, in [9], Bertozzi et al. used corner
In recent years, traffic congestion has features to build four templates of vehicles for vehicle
become a significant problem. Early solutions attempted detection and verification. In [10], Tzomakas and
to lay more pavements to avoid congestion, but adding Seelen found that the area shadow underneath a vehicle
more lanes is becoming less and less feasible. is a good cue to detect vehicles. In [11], Ratan et al.
Contemporary solutions emphasize better information developed a scheme to detect vehicles’ wheels as
and control to use the existing infrastructure more features to find possible vehicle positions and then used
efficiently. The hunt for better traffic information, and a method called Diverse Density to verify each vehicle
thus, an increasing reliance on traffic surveillance, has candidate. In addition, used stereo vision methods and
resulted in a need for better vehicle detection such as 3-D vehicle models to detect vehicles and obstacles are
wide-area detectors; while the high costs and safety used in [12-13]. The major drawback of the above
risks associated with lane closures has directed the methods to search vehicles is the need of a fully time-
search towards noninvasive detectors mounted beyond consuming search to scan all pixels of the whole image.
the edge of pavement. One promising approach is For the color feature, although color is an important
vehicle tracking via video image processing, which can perceptual descriptor to describe objects, there were
yield traditional traffic parameters such as flow and seldom color-based works addressed for vehicle
velocity, as well as new parameters such as lane detection since vehicles have very large variations in
changes and vehicle trajectories vehicles from images or their colors. A color transform to project all road pixels
videos. However, vehicle detection [1]–[10] is an on a color plane such that vehicles can be identified
important problem in many related applications, such as from road backgrounds is explained in [14]. Similarly,
self-guided vehicles, driver assistance systems, in [15], Guo et al. used several color balls to model road
intelligent parking systems, or measurement of traffic colors in color space and then vehicle pixels can be
parameters, due to the variations of vehicle colors and identified if they are classified no-road regions.
sizes. One of most common approaches to vehicle However, since these color models are not compact and
detection is using vision-based techniques to analyze, general in modeling vehicle colors, many false
orientations and shapes. Developing a robust and detections were produced and leaded to the degradation
effective system of vision-based vehicle detection is of accuracy of vehicle detection. In this paper we are
very challenging. To address the above proposing feature based tracking algorithm.
229
II.FEATURE BASED VEHICLE TRACKING This transformation is necessary for two reasons. First,
STRATEGIES features are tracked in world coordinates to exploit
An alternative approach of tracking objects as a known physical constraints on vehicle motion .Second,
whole sub-tracking features such as distinguishable the transformation is used to calculate distance based
points or lines on the object. The advantage of this measures such as position, velocity and density. Once
approach is that even in the presence of partial the homography has been computed, the user can
occlusion, some of the features of the moving object specify the detection region, exit region and fiducially
remain visible. Furthermore, the same algorithm can be points in the image plane.
used for tracking in daylight, twilight or night-time
conditions; it is self-regulating because it selects the
most salient features under the given day and night
conditions.
III. FEATURE BASED TRACKING ALGORITHM
This section presents our vehicle tracking
system, which includes: camera calibration, feature
detection, feature tracking, and feature grouping
modules. First, the camera calibration is conducted
once, off-line, for a given location and then, the other
modules are run continuously online in real-time.
• E.g., window corners, bumper edges, etc. Fi
during the day and tail lights at night. g.1 A projective transform, H, or homography is used to
• To avoid confusion, "trajectory" will be used map from image coordinates, (x,y), to world
when referring to entire vehicles and "track" coordinates, (X,Y).
will be used when referring to vehicle features. B. On-Line Tracking and Grouping
A. Off-Line Camera Definition
A block diagram for our vehicle tracking and
Before running the tracking and grouping grouping system is shown in Figure 2. First, the raw
system, the user specifies camera-specific parameters camera video is stabilized by tracking manually chosen
off-line. These parameters include: fiducially points to sub pixel accuracy and subtracting
• Line correspondences for a projective their motion from the entire image. Second, the
mapping, or homography, as explained in stabilized video is sent to a detection module, which
figure1. locates corner features in a detection zone at the bottom
• A detection region near the image bottom and of the image. In our detection module, "corner" features
an exit region near the image top, and are defined as regions in the gray level intensity image
• Multiple fiducially points for camera where brightness varies in more than one direction. This
stabilization. detection is operational zed by looking for points in the
Since most road surfaces are flat, the grouper exploits image, I , where the rank of the windowed second
an assumption that vehicle motion is parallel to the road moment matrix, ∇I⋅∇IT, is two). It shows some example
plane. To describe the road plane, the user simply corners detected by the system. Next, these corner
specifies four or more line or Point correspondences features are tracked over time in the tracking module.
between the video image of the road (i.e., the image The tracking module uses Kalman filtering to predict a
plane) and a separate 'world' road plane, as shown in given corner's location and velocity in the next frame,
Figure 1. In other words, the user must know the (X,Y,X ,Y ), using world coordinates. Normalized
relative distance in world coordinates between four correlation3 is used to search a small region of the
points visible in the image plane. Ideally, this step image around the estimate for the corner location. If the
involves a field survey; however, it is possible to corner is found, the state of the Kalman filter is updated;
approximate the calculations using a video tape otherwise, the feature track is dropped. It shows the
recorder, known lane widths and one or more vehicles temporal progression of several corner features in the
traveling at a constant velocity. image plane. Vehicle corner features will eventually
The vehicle velocity can be used to measure relative reach a user defined exit region that crosses the entire
distance along the road at different times and the lane road near the top of the image (or multiple exit regions
widths yield relative distance between two points on the if there is an off ramp). Once corner features reach the
edge of the road, coincident with the vehicle's position. exit region, they are grouped into vehicle hypotheses by
Based on this off-line step, our system computes a the grouping module,
projective transform, or homography, H, between the
image coordinates (x,y) and world coordinates (X,Y),
230
selected as being representative of the vehicle trajectory.
In particular, the grouper selects the feature point
closest to the camera because it is likely to be near the
ground plane and thus, is less likely to suffer from
distortions due to the viewing angle. Finally, traffic
parameters such as flow, average speed, and density are
computed from the vehicle trajectories.
231
RESULTS AND DISCUSSION IV. CONCLUSION
232
[4] S. Gupte et al., “Detection and classification of Proc. IEEE Intelligent Vehicles Symp., Oct.3–5, 2000,
vehicles,” IEEE Trans.Intell. Transport. Syst., vol. 1, pp. 249–254.
no. 2, pp. 119–130, Jun. 2000.
[5] G. L. Foresti, V. Murino, and C. Regazzoni,
“Vehicle recognition and tracking from road image
sequences,” IEEE Trans. Veh. Technol., vol.48, no. 1,
pp. 301–318, Jan. 1999.
[6] J. Wu, X. Zhang, and J. Zhou, “Vehicle detection in
static road images with PCA and wavelet-based
classifier,” in Proc. IEEE Intelligent Transportation
Systems Conf., Oakland, CA, Aug. 25–29, 2001, pp.
740–744.
[7] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle
detection using Gabor filters and support vector
machines,” presented at the IEEE Int. Conf. Digital
Signal Processing, Santorini, Greece, Jul. 2002.
[8] A. Broggi, P. Cerri, and P. C. Antonello, “Multi-
resolution vehicle detection using artificial vision,” in
Proc. IEEE Intelligent Vehicles Symp.,
Jun. 2004, pp. 310–314.
[9] M. Bertozzi, A. Broggi, and S. Castelluccio, “A real-
time oriented system for vehicle detection,” J. Syst.
Arch., pp. 317–325, 1997.
[10] C. Tzomakas and W. Seelen, “Vehicle detection in
traffic scenes using shadow,” Tech. Rep. 98-06 Inst. fur
neuroinformatik, Ruhtuniversitat, Germany, 1998.
[11] A. L. Ratan, W. E. L. Grimson, and W. M. Wells,
“Object detection and localization by dynamic template
warping,” Int. J. Comput. Vis.,
vol. 36, no. 2, pp. 131–148, 2000.
[12] A. Bensrhair et al., “Stereo vision-based feature
extraction for vehicle detection,” in Proc. IEEE
Intelligent Vehicles Symp., Jun. 2002, vol. 2, pp. 465–
470.
[13] T. Aizawa et al., “Road surface estimation against
vehicles’ existence for stereo-based vehicle detection,”
in Proc. IEEE 5th Int. Conf. Intelligent Transportation
Systems, Sep. 2002, pp. 43–48.
[14] J. C. Rojas and J. D. Crisman, “Vehicle detection
in color images,” in Proc. IEEE Conf. Intelligent
Transportation System, Nov. 9–11, 1997,pp. 403–408.
[15] D. Guo et al., “Color modeling by spherical
influence field in sensing driving environment,” in
233
Architecture for ICT (10,9,6,2,3,1) Processor
Mrs.D.S.Shylu,M.Tech., Miss.V.C.Tintumol
Sr.Lecturer, 2nd ME(VLSI) Student,
Karunya University ,Coimbatore KarunyaUniversity, Coimbatore
shylusam@karunya.edu vctintumol@gmail.com
Abstract—The Integer Cosine Transform (ICT) In some cases, they can be used for lossless
presents a performance close to Discrete Cosine compression applications since the round-off error can
Transform (DCT) with a reduced computational be completely eliminated. In these algorithms,
complexity. The ICT kernel is integer-based, so coefficients are scaled or approximated so that the
computation only requires adding and shifting floating-point multiplication can be implemented
operations. This paper presents a parallel-pipelined efficiently by binary shifts and additions. The Integer
architecture of ICT(10 ,9 ,6 ,2 ,3 ,1) processor for Cosine Transform (ICT) is generated applying the
image encoding. The main characteristics of ICT concept of dyadic symmetry and presents a similar
architecture are high throughput parallel processing, performance and compatibility with the DCT. The ICT
and high efficiency in all its computational elements. basis components are integers so they do not require
The arithmetic units are distributed and are made floating-point multiplications, as these are substituted
up of adders/ subtractors operating at half the by fixed-point addition and shifting operations, as they
frequency of the input data rate. In this transform, have more efficient hardware implementation.
the truncation and rounding errors are only This paper describes the architecture of 1-D ICT
introduced at the final normalization stage. The processor chip for image coding .In this architecture the
normalization coefficient word length has been arithmetic units are based on highly efficient
established using the requirements of IEEE standard adders/subtractors operating at half the frequency of the
1180–1990 as a reference. data input rate. The output coefficients can be selected
with or without normalization. In the latter case, the
Index Terms—Integer cosine transform, Discrete normalization coefficient’s word length must be 18 bit,
Cosine transform, image compression, parallel of which only 13 bits are necessary, if the specifications
processing, VLSI of the IEEE standard 1180–1990 are adhered to.
The paper is organized as follows: A decomposition
of the ICT to obtain a signal flow chart , which leads to
I.INTRODUCTION an efficient hardware implementation, is presented in
Section II. Generation and applying of order 8 ICT to
T HE Discrete Cosine Transform (DCT) is widely
real input sequence is explained in Section III and IV.In
Section V, a pipeline structure is proposed of the 1-D
considered to provide the best performance for eight-order transform, based on three processoring
transform coding and image compression . The discrete block operating in parallel with adders/subtractors
cosine transform (DCT) is widely considered to provide combined with wired-shift operations as the only
the best performance for transform coding and image arithmetic elements.
compression. The DCT has become an international
standard for sequential codecs such as JPEG, MPEG, II. DECOMPOSITION OF THE ICT
H.261 etc . However, DCT matrix elements contain real
numbers represented by a finite number of bits, which The ICT was derived from DCT by the concept of
inevitably introduce truncation and rounding errors dyadic symmetry. Definition of dyadic symmetry is as
during compaction. Thus many applications that use this follows:-
transform can be classified under the heading of “lossy”
encoding schemes. This implies that the reconstructed A vector of 2m elements [a0,a1,…………….a2m-1]
image is always only an approximation of the original is said to have the ith dyadic symmetry if and only if aj=
image. s. a j ⊕ I where ⊕is an exclusive OR operation, j lies in
VLSI implementation of DCT using floating-point the range [0,2m -1]and i lies in the range[1,2 m-1 ] and
arithmetic is highly complex and requires s=1 when the symmetry is even,and s=-1 when the
multiplications.Different multiplication-free algorithms, symmetry is odd.
which are approximations to the DCT, have been
proposed in order to reduce implementation complexity.
234
Let T be the matrix that represents the order-N DCT. The dyadic symmetry present in J reveals that to ensure
The mnth element of this matrix is defined as their orthogonality, the constants a, b, c and d must
satisfy the following only condition
Tnm=(1/N)1/2 [km cos(m(n+1/2)Π/N)]
ab =ac +bd +cd (4)
where
Step 3-Set up boundary conditions and generate new
m,n = 0,1,………….N-1 (1) transforms
1 if m≠0 or N
Equation(1) implies that for the DCT
km = (2)
a≥b≥c≥d and e ≥ f (5)
1/2
(1/2) if m=0 or N
To make the basis vectors of the new transforms
resemble those of the DCT, the inequality expression 4
have to be satisfied. Furthermore to eliminate truncation
III.GENERATION OF ORDER-8 ICTs error due to no-exact representation of the basis
components a, b, c, d, e and f expression (6) has to be
Steps to convert order-8 DCT kernel into order-8 ICT satisfied i.e,
kernel is as follows
a, b, c, d, e and f are integers (6)
Step 1-Substitute value for N in the DCT transform
matrix Those T matrices that satisfy (5),(6) and (7) are referred
to as order-8 integer cosine transforms(ICTs),which is
Equation (1) shows the order-N DCT kernel. denoted as ICT(a, b, c, d, e, f).
Substituting N=8 in equation(1) gives the order –8 DCT
kernel which can be expressed as follows :- IV. APPLYING 1-D ICT FOR A REAL INPUT
SEQUENCE
T=KJ (3)
The 1-D ICT for a real input sequence x(n) is defined as
Where K is the normalization diagonal matrix and J an
orthogonal matrix made up of the basis components of X=Tx =KJx=KY (7)
DCT.
Where X and x are dimension-8 column matrices, and K
By substituting N=8 in the above equation, we obtain is the diagonal normalization matrix.
an 8x8 matrix
Reordering the input sequence and the transform
t
[T]= [k0 j0,k1 j1 , k2 j2, k3 j3, k4 j4, k5 j5, k6 j6, k7 j7 ] coefficients according to the rules :-
where ki ji, the ith basis vector and ki is a scaling x’(n)=x(n) n∈[0,3] (8)
constant such that ⏐ki . ji⏐=1
x’(7-n)=x(n+4)
As T10 = - T17 = -T32 = T35 = -T51= T56 = -T73 = T74 ,we
may represent the magnitudes of J10 , J17, J32, J35, J51, J56,
J73, J74 by a single variable say ‘a’. Similarly all eight
basis vectors are expressed as variables a, b, c, d, e and f X’(m)=X(Br[m]) m∈[0,3] (9)
which are constants and g is 1.Hence the orthogonal
matrix J can be expressed in terms of variables a, b, c, d, X’(m+4)=X(2m+1)
e, f and g .
235
Where Br8[m] represents bit reverse operation of length
8, then 1-D ICT can be expressed as
e f
X’= TR x’ = KR JR x’= KR Y’ (10) J2o = (16)
The reordered basic components of ICT can be f -e
expressed as
ICT(10,9,6,2,3,1) is obtained by substituting a=10, b=9,
J4e 0 I4 I4 c=6, d=2, e=3, f=1 in the transform matrix. Hence J
JR= (11) matrix of ICT(10,9,6,2,3,1) can be expressed as follows
0 J4o I4 -I4
g g g g
f -e e -f
a b c d
b -d -a -c
J4o = (13)
c -a d b
d -c b -a
Fig.1.Signal flow graph of ICT(10,9,6,2,3,1)
Applying the decomposition rules defined in (8) and (9)
to the J4e matrix results in As can be seen in fig1,the first computing stage operates
on the input data ordered according to rule(8); additions
and subtractions of data pairs formed with sequences
x’(n) and x’(n+4) ,(n=0,1,2,3) are executed .In the
J2e 0 I2 I2
second computing stage, the transformations J4e and J4o
J4e = R4 (14) are carried out, their nuclei being the matrices defined
earlier. The transformations J4e is applied to first half of
0 J2o I2 -I2
the intermediate data sequences (a0,a1,a2,a3) giving as
a result the even coefficients (Y0,Y4,Y2,Y6) of the ICT.
Similarly J4o is applied to the other half of the middle
Where R4 is the reordering matrix of length 4, I2 is the data sequence (a7,a6,a5,a4) giving as a result the odd
dimension-2 identity matrix, and coefficients (Y1,Y3,Y5,Y7) of the ICT. In the third
g g computing stage,the coefficients Yi are normalized and
the transform sequence of the coefficients X(m) appears
J2e = (15) reordered according to rule(9) .
g -g
236
V.ONE-DIMENSIONAL J(10,9,6,2,3,1) the two 4:1 multiplexers select the data to be processed
by AE1 and AE2 in parallel at a sampling frequency
ARCHITECTURE
fs/2. The input data sequence is entered in to the shift
register at a sampling frequency fs. The output from the
The computations shown in the above shift register is selected with the help of two 4:1
signal flow graph (fig.1) can be realized using multiplexers .The output from the two multiplexers are
processing blocks i.e, individual processing block for then given to two registers REG1 and REG2. These two
each computing stage. The 1-D J(10,9,6,2,3,1) registers are driven by CLK2. The output from the
multiplication-free processor architecture is shown in registers are finally given to an adder and a subtractor
Fig2. This architecture has been designed to implement module which performs the addition and subtraction of
the transformation JR according to the computing selected signals accordingly. Adder AE1 and subtractor
diagram of fig.1. The 1-D J processor consists of three AE2 are driven by a CLK2. The output of AE1 and AE2
processing block namely the input processing block for provides the input for the even and odd processing
the processing of input sequence, the even processing block. Simulation results for the input processing block
block for the processing of half of the intermediate data is as shown in fig.4.
sequence to produce an output that constitutes the even
coefficients of the ICT i.e, computing the
transformation J4e and the odd processing block for the
processing of other half of intermediate data sequence to
produce an output that constitutes the odd coefficients
of the ICT i.e, computing the transformation J4o.These
three processing blocks have parallel architecture,
allowing the operation frequency to be reduced to fs/2
where fs is the input data sampling frequency. The final
output mixer arranges, in natural form, the coefficient
sequence of the ICT at a frequency of fs..The control of
the processor is very simple and is carried out using four
signals: Clk1, external clock at frequency fs; Clk2, Fig.3.Architecture of input processing block
internal clock at frequency fs/2; and the multiplexer
selection signals S1at frequency fs/4 and S2 at frequency
at frequency fs/8. The arithmetic multiplications have
been reduced to add and shift operations. The
adders/substractors in the processor are based on the
binary carry look ahead adder.
237
shown in (17) and (18). Applying the decomposition sequence of the ICT. Simulation results for J4e processor
rule to J4e matrix, we get, is shown in fig.8.
Y0 1 1 0 0 1 0 1 0 a0
Y4 = 1 -1 0 0 0 1 0 1 a1 (17)
Y2 0 0 3 1 1 0 -1 0 a2
Y6 0 0 1 -3 0 1 0 -1 a3
Y0 1 1 0 0 b0
Y = 1 -1 0 0 b1 (18)
Y2 0 0 3 1 b3
Y6 0 0 1 -3 b4
Fig.6.Timing diagram of J4e
Operating on (18), we get,
Y0 = 1 1 b0 and Y2 = 3 1 b3
Y4 1 -1 b1 Y6 1 -3 b4
238
The processor has been designed to calculate the odd VI.CONCLUSION
coefficients of the 1-D transform. The implementation
of this processor can be simplified through the This paper presents an architecture of ICT processor for
decomposition of the matrix. The odd coefficients of the image encoding. The 2-D ICT architecture is made up
1-D transform can be implemented simply in terms of of two 1-D ICT processors and a transpose buffer used
add and shift operations. Fig. 9 shows the signal flow as intermediate memory. The pipelined
graph. It has three computing stages with intermediate adders/substracters operates at half the frequency of the
data d, e, f and g. Fig.10 illustrates their architecture input data rate. Characteristics of this architecture are
made up of five shift registers, ten multiplexers 4:1 and high throughput and parallel processing.
five arithmetic units operating in parallel.
VII.REFRENCES
[1] C. L. Wang and C. Y. Chen, ‘‘High-throughput VLSI architectures
for the 1-D and 2-D discrete cosine transforms,’’ IEEE Trans. Circuits
Syst. Video Technol., vol. 5, no. 1, pp. 31---40, Feb. 1995.
[2] K. H. Cheng, C. S. Huang, and C. P. Lin, ‘‘The design and
implementation of DCT/IDCT chip with novel architecture,’’ in Proc.
IEEE Int. Symp. Circuits Syst., Geneva, Switzerland, May 28---31,
2000, pp.IV-741---IV-744.
[3] A. Michell, G. A. Ruiz, J. Liang, and A. M. Burón, ‘‘Parallel
pipelined architecture for 2-DICT VLSI implementation,’’ in Proc.
IEEE Int. Conf. Image Process., Barcelona, Spain, Sep. 14---17, 2003,
pp. III-89---III-92.
[4] G.A.Ruiz, J.A.Michell and A.M.Buron, ‘‘Parallel-pipelined 8x8
forward 2-D ICT processor chip for image coding,’’IEEE
Transc.signal processing,vol.53,no.2,Feb 2005
[5] P. C. Jain, W. Schlenk, and M. Riegel, ‘‘VLSI implementation of
Fig.9.Signal flow graph of J4o twodimensional DCT processor in real time for video codec,’’ IEEE
Trans. Consum. Electron., vol. 38, no. 3, pp. 537---545, Aug. 1992.
[6] L. G. Chen, J. Y. Jiu, H. C. Chang, Y. P. Lee, and C.W. Ku, ‘‘A
lowpower 8_8 direct 2D-DCT chip design,’’ in J. VLSI Signal
Process., vol. 26, 2000, pp. 319---332.
[7] J. S. Chiang, Y. F. Chiu, and T. H. Chang, ‘‘A high throughput 2-
dimensional DCT/IDCT architecture for real-time image and video
system,’’ in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., vol. 2,
Piscataway, NJ, 2001, pp. 867---870.
[8] Y. Zeng, L. Cheng, G. Bi, and A. C. Kot, ‘‘Approximation of DCT
without multiplication in JPEG,’’ in Proc. 3rd IEEE Int. Conf.
Electron., Circuits Syst., vol. 2, 1996, pp. 704---707.
[9] J. Liang and T. D. Tran, ‘‘Fast multiplierless approximations of
the DCT with the lifting scheme,’’ IEEE Trans. Signal Process., vol.
49, no. 12, pp. 3032---3044, Dec. 2001.
[10] W. K. Cham, ‘‘Development of integer cosine transforms by the
principle of dyadic symmetry,’’ in Proc. Inst. Elect. Eng. I, vol. 136,
Aug. 1989, pp. 276---282.
[11] W. K. Cham and Y. T. Chan, ‘‘An order-16 integer cosine
transform,’’ IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1205---
1208, May 1991.
[12] F. S. Wu and W. K. Cham, ‘‘A comparison of error behavior in
the implementation of the DCT and the ICT,’’ in Proc. IEEE Region
10 Conf. Comput. Commun. Syst., Sep. 1990, pp. 450---453.
[13] W. K. Cham, C. S. O. Choy, and W. K. Lam, ‘‘A 2-D integer
cosine transform chip set and its applications,’’ IEEE Trans. Consum.
Electron.,vol. 38, no. 2, pp. 43---47, May 1992.
[14] T. C. J. Pang, C. S. O. Choy, C. F. Chan, andW. K. Cham, ‘‘A
self-timed ICT chip for image coding,’’ IEEE Trans. Circuits Syst.
Video Technol.,vol. 9, no. 6, pp. 856---860, Sep. 1999.
[15] G. A. Ruiz, J. A. Michell, A. M. Burón, J. M. Solana, M. A.
Manzano, and F. J. Díaz, ‘‘Integer cosine transform chip design for
image compression,’’in Proc SPIE First Int. Symp. Microtechnologies
Fig.10.Architecture of J4o New Millenium:VLSI Circuits Syst., vol. 5117, Maspalomas, Gran
Canaria, Spain, May 2003, pp. 33---41.
239
Row Column Decomposition Algorithm for 2D Discrete Cosine
Transform
Caroline Priya.M and Mrs.D.S.Shylu, Lecturer, Karunya University
I.INTRODUCTION
240
N-1 architecture, all the pairs of input data enter the
adder/ subtractor cells at the same time. Fig. I
Zpj= Xij cpi, p,j=0,1,2,….N-1 (3) shows that the architecture also consists of N VIPs,
where half are used for the added pairs as described
i=0
by (5) and the other half for the subtracted pairs as
In order to compute an N x N-point DCT (where N described by (6). Each VIP consists of NI
is even), N row transforms and N column multiplier/accumulator cells. Each cell stores one
transforms need to be performed. However, by coefficient Cpi in a register and evaluates one
exploiting the symmetries of the cosine function, specific term over the summation in (4). The
the number of multiplications can be reduced from multiplications of the terms Cpi with the
N*N to N*N/2. In this case, each row transform corresponding data are performed simultaneously
given by (3) can be written as matrix-vector and then the resulting products are added together
multipliers via in parallel.
241
Meanwhile, the second row pixels f10–f17 are row during 64 cycles.A pair of registers R0–R7 and
loaded to R8–R15. In the 16–23rd cycles, R8–R15 R8–R15 is chosen by multiplexers controlled by
are selected to the computation kernel for the Clk_Enable signal 0 and 1, respectively. The same
second row coefficients computing. At the same computation schedule is again employed for the
time, the third row pixels are loaded into R0–R7. new block transformation. The timing schedule and
Repeat this schedule; one block pixel can be VLSI architecture for DCT computations are
transformed to 1-D coefficient with row-by-row illustrated
during 64 cycles.A pair of registers R0–R7 and R8–
R15 is chosen by multiplexes controlled with
Clk_Enable signal 0 and 1, respectively. The
addition or subtraction of two pixels is pre
proceeded for even or odd coefficients computing,
which can be implemented using two’s complement
control with XOR gate. The weights 1–4 are cosine
coefficients. The cosine coefficients can be easily
implemented using a finite state machine. The
computational order is regular from coefficients F0,
F1……F7. Repeat this schedule; one block pixel
can be transformed to 1-D coefficient with row-by-
IV. SHIFT REGISTER CELL AND CONTROL output latch, the first 2-D-DCT coefficient F[00] is
TIMING achieved at the 74th cycle. Then, the 2-D-DCT
coefficients F[10], F[20], . . . sequentially output during
75–81st cycles. For the next column processing, we
send one clock to the shift-register array.Now the output
The accessing schedule of the shift register is
of shift-register array becomes m[01], m[11], . . .,m[71].
shown in Fig. 4 at the 71st cycle. The shift-register array
The 1-D-DCT coefficients are loaded to R8–R15 in
is designed with a serial-in/parallel-output structure.The
parallel at the 79th cycle. One can attain the second
first 1-D-DCT results, m[00], m[10], . . ., m[70],are
column 2-D-DCT coefficients during the 82–89th cycle,
loaded to R0–R7 in parallel for2D DCT computation at
Repeating this computation schedule, the last column 1-
the 71st cycle. Due to one-stage pipelined delay and
242
D-DCT coefficients m[07], m[17], . . ., m[77] are
loaded to R8–R15 at the 116th cycle, and the 2-D
coefficient F[70]–F[77] is sequentially achieved.For the
next block processing, the new pixels are sequentially
written into R0–R7 from the 117th to 125th cycles. For
cost-effective design, a special shift register cell is
designed with MOS circuit to reduce the memory size,
as in Fig 5. The shift operation is based on capacitor
energy transferring methodology.
Fig.4. Shift-register cell and its control timing D2 and D1 are shifted c2 and c4 capacitors,
respectively. Repeatedly, the shift function can be
performed with the energy transferring technique. We
D1 data since the Q2 turns off. The D1 data in c2 can adjust the ratio of channel width and length of Q1,
capacitor is through the inverter and transfers to the c3 Q2 and inverter to decide the c1 and c2 capacitances.
since the Q3 turns on. In the next half cycle, the _2
clock becomes high;
243
width and length for Q1, and the uniform ration for Q2
and inverter to minimize the memory size.
Fig.6.Serial-in/parallel-out shift register output. The shift-register cell can be implemented with
two nMOS and one inverter circuit, where one bit cell
only uses four transistors. The circuit complexity for
The capacitor c1 is dominated by Q1 source transpose memory is much less than that of the
capacitance and Q2 drain capacitance. To satisfy c1_c2, conventional SRAM or flip-flop. Moreover, we do not
one can increase the c1 capacitance with large ratio of need the extra controller, such as READ/WRITE access
control and address decoder. The shift register is
244
modeled as a function block for full-system simulations.
First, the preprocessing and computational core is REFERENCES
realized with Fig.3. Then, the 2-D-DCT core is
integrated with one 1-D-DCT core and the shift-register
array and verified with logic simulations. [1]Aggoun and I. Jalloh, “Two-dimensional DCT/IDCT
architecture,” Proc. IEE Comput. Digit. Tech., vol. 150,
V. CONCLUSION no. 1, pp. 2–10, 2003.
[2]D. Gong, Y. He, and Z. Cao, “New cost-effective
The 2-D-DCT processor is realized with a particular VLSI implementation of a 2-D discrete cosine transform
schedule consisting of 1-D-DCT core and the shift- and its inverse,” IEEE Trans. Circuits Syst. Video
register array. The shift-registers array can perform data Technol., vol. 14, no. 4, pp. 405–415, Apr. 2004.
transposition with serial-in/parallel-out structure based [3]E. Feig and S.Winograd, “Fast algorithm for the
discrete cosine transform,” IEEE Trans. Signal Process.,
on capacitor energy transferring technique. The shift-
vol. 40, no. 9, pp. 2174–2193, Sep. 1992.
register based transposition can reduce the control- [4]N. I. Cho and S. U. Lee, “Fast algorithm and
overhead since the address generator and decoder for implementation of 2-D discrete cosine transform,” IEEE
memory access can be removed. Comparison with the Trans. Circuits Syst., vol. 38, no. 3, pp. 297–305, Mar.
transposition-based DCT chips, the memory size and the 1991.
full 2-D-DCT complexity can be reduced. This paper [5] “MPEG-2 video coding,” ISO/IEC DIS 13818-2,
1995.
presents a cost effective DCT architecture for video
[6]G. Cote, B. Erol, and F. Kossentini, “H.263+: Video
coding applications. coding at low bit rate,” IEEE Trans. Circuits Syst.
Video Technol., vol. 8, no. 7, pp. 849–866, Nov. 1998.
245
VLSI Architecture for Progressive Image Encoder
Abstract This paper presents VLSI architecture for spatial orientation trees in manner that tends to keep
progressive image coding based on a new algorithm insignificant co-efficients together in large subsets.
called Tag setting in hierarchical tree. This SPIHT based algorithms are not best suited for
algorithm is based on Set-Partitioning In hardware implementation due to their memory
Hierarchical Trees (SPIHT). The new algorithm has requirement. This paper presents a new algorithm for
an advantage of requiring less memory as compared progressive image transmission based on Tag Setting In
to SPIHT. VHDL code for the encoder core is Hierarchical Tree which keeps low bit-rate quality as
developed. SPIHT algorithm and has three improved features. To
reduce the amount of memory usage, tag flags are
Index Terms:- Image compression; VLSI; introduced to store the significant information instead of
Progressive coding the coordinate-lists in SPIHT. The flags are four two-
dimensional binary tag arrays including Tag of
I. INTRODUCTION Significant Pixels (TSP), Tag of Insignificant Pixels
(TIP) and Tag of Significant Trees (TST) respectively.
Progressive image transmission (PIT) is an elegant When comparing with SPIHT coding, the algorithm
method for making effective use of communication only needs 26 K bytes memory to store four tag-arrays
bandwidth. Unlike conventional sequential for a 256×256 gray-scale image. Both sorting-pass and
transmission, an approximate image is transmitted first, refinement-pass of SPIHT coding are merged in one
which is then progressively improved over a number of coding in order to simplify hardware-control and save
transmission passes. PIT allows the user to quickly unnecessary memory. It uses the Depth-First-Search
recognize an image and is essential for databases with (DFS) traversal order to encode bit-stream rather than
large images and image transmission over low- the Breadth-First-Search (BFS) method as the SPIHT
bandwidth connections. Newer coding techniques, such coding. For the hierarchical pyramid nature of the
as JPEG2000 [1] and MPEG4 [2] standards, have spatial orientation tree, DFS provides a better
supported the progressive transmission feature. architecture than BFS method. The VLSI image
PIT via wavelet-coding using the Embedded Zerotree compressor called PIE (Progressive Image Encoder)
Wavelet (EZW) algorithm was first presented by core has been synthesized using VHDL coding. The PIE
Shapiro [3] in 1993. The embedded zerotree wavelet is designed to handle 256×256 gray-scale images.
algorithm (EZW) is a simple, yet remarkable effective, The remainder sections of this paper are organized as
image compression algorithm, having the property that follows. Section 2 is the background of progressive
the bits in the bit stream are generated in order of image transmission,. Section 3 addresses the proposed
importance, yielding a fully embedded code. Using an algorithm for progressive image encoding. Section 4
embedded coding algorithm, an encoder can terminate presents the VLSI architecture of the proposed PIE core.
the encoding at any point thereby allowing a target rate Finally, the conclusion is given in Section 5.
or target distortion metric to be met exactly. Also, given
a bit stream, the decoder can cease decoding at any II. PROGRESSIVE IMAGE TRANSMISSION
point in the bit stream and still produce exactly the same
image that would have been encoded at the bit rate Progressive image transmission requires application of
corresponding to the truncated stream. In addition to multi-resolution decomposition on the target image. The
producing a fully embedded bit stream, EZW multi-resolution decomposition provides multi-
consistently produces compression results that are resolution representation of an image. Let pi,j be a two-
competitive with virtually all known compression dimensional image, where i and j are the indices of pixel
algorithms. coordinates. The multi-resolution decomposition of
Said and Pearlman presented a faster and more efficient image pi,j is written as
codec in 1996 [4] called Set-Partitioning in Hierarchical c = (p). (1)
Trees (SPIHT) underlying the principles of EZW Where (p) is a transformation of multi-resolution
method. The SPIHT algorithm is a generalization of the decomposition. Two-dimensional coefficient array c has
EZW algorithm. It uses a partitioning of the trees called the same dimensions as image p, and each element ci,j is
246
the transformation coefficient of p at coordinate (i,j). In (ii) Otherwise, if TIP=1 and Sn (ci,j) = 0
a progressive image transmission, receiver updates then output ‘0’;
received reconstruction coefficient cr according to the (4) TST update:
coded message until approximate or exact amount (a) for each entry (k,l) O(i,j) do:
coefficients have been received. Then, the decoder can (i) if TST=0 and Sn(ci,j) = 1 then set value TST:=1;
obtain a reconstructed image by applying inverse (5) Spatial orientation tree encoding:
transformation (a) for each entry (i,j) using DFS method do:
pr = -1(cr). (2) (i) if TSP=0 and TIP=0 then
Where pr is the reconstructed image, and c r are (A) If Sn(i,j) = 1 then output ‘1’, sign of ci,j and
progressively received coefficients. Image distortion of the value of TST; set value TSP:=1;
reconstructed image pr from original image p can be (B) otherwise, if Sn (i,j) = 0 then output ‘0’ and
measured by using Mean Squared Error (MSE), that is the value of TST; set value TIP:=1;
DMSE (p- pr)= DMSE (c- cr) (3) (6) Quantization-step update: decrease n by 1
and go to Step 2.
= (c(i,j)- cr(i,j))2 MN (4)
Where MN is the total number of all image pixels. In a In Step 1, the algorithm calculates initial threshold and
progressive image transmission process, the transmitter sets the values of three tag flags TSP, TIP and TST to
rearranges the details within the image in the decreasing ’0’ initially. In Step 2, the entry marked with TSP=1,
order of the importance. From Equation (3), it is clear which is evaluated in the last Step 5, is significant. The
that if an exact value of the transform coefficient cr(i,j) entry, TIP=1,tested as insignificant in last Step 5 may be
is sent to the decoder, then the MSE decreases by | ci,j |2/ significant in Step 3 due to the different threshold. Thus,
MN [4].This means that the coefficients with larger the algorithm performs TIP testing to update TIP value
magnitude should be transmitted first because they have in Step 3. In Step 4, it updates TST value of each
a larger content information. coefficient except the leave nodes and prepares to
perform tree encoding in next Step. If a node is TST=0,
III. NEW ALGORITHM FOR PROGRESSIVE IMAGE its descendants are all insignificant; in the otherwords,
ENCODING the tree leading by that node, TST=0, is a zerotree. The
The new algorithm is based on SPIHT algorithm. The algorithm searches those nodes, TST=0, using depth-
essential of SPIHT coding algorithm is to identify which first search (DFS) method and outputs an ’0’ in Step 5
coefficients are significant, sort selected coefficients in to keep low bit rate as SPIHT coding does. At last, it
each sorting pass, and transmit the ordered refinement decreases quantization step n by 1 and go to Step 2
bits. A function Sn(T) is used to indicate the iteratively. Proposed algorithm is the same as what the
significance of a set of SPIHT coding does but using different data structures.
coordinates T .i. e, For instance, in the refinement output and TIP testing
Sn(T)= 1 when max { ci,j } 2n steps, the algorithm uses tag flags TSP and TIP to
0 otherwise indicate whether a node is significant or not. Then,
In our opinion, the above encoder has three essential output and encode the image stream by investigating the
advantages as following.(1) Less memory required (2) TSP and TIP tags. On the other hand, SPIHT coding
Improved refinement pass (3) Efficient depth-first- uses coordinate sets LSP and LIP to store coordinate
search (DFS). information of nodes. When comparing both methods,
the information stored in TSP (LSP) is the same as in
Let TSP, TIP and TST be the two-dimensional binary TIP (LIP). Besides, in the spatial orientation tree
arrays, whose entries are either ’0’ or ’1’. The overall encoding step of TSIHT coding, if a node is TST=1, it
coding algorithm includes six steps as follows. trends to searching its descendants using DFS method
(1) Initialization: output n = log ( max{| ci,j |} ); set each without any output. However, in the sorting pass of the
value of all entries in TSP, TIP and TST arrays to ’0’. coding, each node in LIS list with type A may change to
(2) Refinement output: type B and apply encoding again. Thus, in general case,
(a) for each entry (i,j) in the TSP do: the proposed algorithm has lower bit rate quality than
(i) if TSP=1 then output the n-th most significant SPIHT does.
bit of | ci,j |;
IV. VLSI ARCHITECTURE
(3) TIP testing:
(a) for each entry (i,j) in the TIP do: Progressive Image Encoder reads the wavelet
(i) if TIP=1 and Sn(ci,j) = 1 then coefficients from external memory using a 16-bit input
(A) output ‘1’ and output sign of ci,j ; signal, and it reads the tag flags of TSP, TIP and TST
(B) set value TIP := 0 and TSP := 1; from external tag memory using 8-bit input signals. For
247
reading coefficients or tags from memory, encoder first
generates the address of the data and then it reads the
data using input signals. PE outputs the encoded bit-
stream using signal bit_out when sync asserts. Figure 1
shows the overall architecture of the encoder. It has six
blocks in addition to the external coefficient and tag
Fig. 2. Threshold Generator
memory.
3) Tag Access Unit (TAU): To store three two-
dimensional tag arrays, two 256×256 bits and one
128×128 bits RAM blocks are needed and controlled by
Tag Access Unit. In this work, each tag memory is 8 bits
wide; however, each tag flag is a one-bit data. To access
each bit from 8 bits wide memory using 16-bit address
signal, Addr[15:0], TAU uses a similar architecture
shown in Figure 3.When TAU reads one bit from tag
memory, it first generates a 13-bits address signal,
Addr[15:3], to read one byte data, then it uses the lowest
3-bits address signal Addr[2:0], to indicate that one-bit
tag. When TAU writes one bit of tag memory, it first
reads the mentioned bytes as reading operation, then it
replaces that one-bit tag to tag memory. Thus, TAU
Fig.1. Progressive Image Encoder hardware architecture needs one clock cycle to read each bit and two clock
cycles to write it.
Clock Divider generates three clock signals with
different frequencies to synchronize internal circuit.
Threshold Generator calculates the initial value n and
updates its value at every iteration. Tag Access Unit
controls the access of three tags, TSP, TIP and TST.
Address Generator generates the location addresses of
the coefficient and the tag memory. Bit-Stream
Generator outputs the encoded bit-stream of encoder.
Controller is the master of all blocks. We will discuss
each block in the following sections.
248
following.
249
state set of AG controller. The finite state machine of coefficient is significant is significant or not. According
AG_ controller is shown in fig 7 to the TSIHT algorithm, BG outputs values depend on
threshold, TST signal, magnitude and sign of
coefficient. The output signals of BG include the bit_out
bit stream and synchronous sync signals. Note that, only
when sync asserts, the bit stream appearing at bit_out
signal is meaningful.
V. CONCLUSION
250
REFERENCES
[12] Z. Liu and L. J. Karam, “An efficient embedded
[1] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The zerotree wavelet image codec based onintraband
JPEG2000 still image coding system: partitioning,” IEEE International Conference on Image
An overview,” IEEE Transactions on Consumer Processing, vol. 3, pp. 162–165, Sept. 2000.
Electronics, vol. 46, pp. 1103–1127, Nov. 2000.
[2] T. Sikora, “The MPEG-4 video standard verification
model,” IEEETransactions on Circuits and Systems for
Video Technology, vol. 7,no. 1, pp. 19–31, Feb. 1997.
[3] J. M. Shapiro, “Embedded image coding using
zerotrees of wavelet coefficients,” IEEE Transactions
on Signal Processing, vol. 41, pp.3445–3462, Dec.
1993..
[4] A. Said and W. A. Pearlman, “A new, fast, and
efficient image codec based on set partitioning in
hierarchical trees,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 6, no. 3, pp. 243–
250, June 1996
[5] D. Mukherjee and S. K. Mitra, “Vector spiht for
embedded wavelet video and image coding,” IEEE
Transactions on Circuits and Systems for Video
Technology, vol. 13, no. 3, pp. 231–246, Mar. 2003.
[6] Z. Wang and A. C. Bovik, “Embedded foveation
image coding,” IEEETransactions on Image
Processing, vol. 10, no. 10, pp. 1397–1410, Oct.2001.
[7] T. Kim, S. Choi, R. E. V. Dyck, and N. K. Bose,
“Classified zerotree wavelet image coding and adaptive
packetization for low-bit-rate transport,” IEEE
Transactions on Circuits and Systems for Video
Technology, vol. 11, no. 9, pp. 1022–1034, Sept. 2001.
[8] W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said,
“Efficient, low complexity image coding with a set-
partitioning embedded block coder, ”IEEE Transactions
on Circuits and Systems for Video Technology,vol. 14,
no. 11, pp. 1219–1228, Nov. 2004.
[9] A. Munteanu, J. Cornelis, G. V. der Auwera, and P.
Cristea, “Wavelet image compression - the quadtree
coding approach,” IEEE Transactions on Information
Technology in Biomedicine, vol. 3, no. 3, pp. 176–185,
Sept. 1999.[11] S. G. Mallat, “A theory for
multiresolution signal decomposition: the wavelet
representation,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 11, no. 7, pp. 674–693,
July 1989.
[10] S.-F. Hsiao, Y.-C. Tai, and K.-H. Chang,
“Vlsi design of an efficient embedded zerotree wavelet
coder with function of digital watermarking,” IEEE
Transactions on Consumer Electronics, vol. 46, no. 7,
pp. 628–636,Aug. 2000.
[11] B. Vanhoof, M. Peon, G. Lafruit, J. Bormans, M.
Engels, and I. Bolsens, “A scalable architecture for
mpeg-4 embedded zero tree coding,” Custom Integrated
Circuit Conference, pp. 65–68, 1999.
251
Reed Solomon Encoder and Decoder using
Concurrent Error Detection Schemes
Abstract—Reed–Solomon (RS) codes are widely transmission. You might use error-control coding if
used to identify and correct errors in transmission your transmission channel is very noisy or if your
and storage systems. When RS codes are used for data is very sensitive to noise.
high reliable systems, the designer should also B. Block Coding
take into account the occurrence of faults in the
encoder and decoder subsystems. In this paper, Depending on the nature of the data or noise, you
might choose a specific type of error-control coding.
self-checking RS encoder and decoder
Block coding is a special case of error-control
architectures are presented. The presented coding. Block-coding techniques map a fixed number
architecture exploits some properties of the of message symbols to a fixed number of code
arithmetic operations on GF(2m) Galois Field, symbols. A block coder treats each block of data
related to the parity of the binary representation independently and it is memoryless. The Reed
of the elements of the field. In the RS decoder, Solomon codes are based on this concept. Reed-
allows implementing Concurrent Error Detection Solomon codes are block codes. This means that a
fixed block of input data is processed into a fixed
(CED) schemes useful for a wide range of
block of output data.
different decoding algorithms with no
intervention on the decoder architecture. The Reed-Solomon encoder takes a block of
digital data and adds extra "redundant" bits. Errors
Index Terms: Concurrent Error Detection, Error occur during transmission or storage for a number of
Correction Coding, Galois Field, Reed Solomon reasons. The Reed-Solomon decoder processes each
Codes. block and attempts to correct errors and recover the
original data. The number and type of errors that can
I. INTRODUCTION be corrected depends on the characteristics of the
Reed-Solomon code. The typical system is shown in
Fig.1
Reed-Solomon codes are block-based Error
Correcting Codes with a wide range of applications
in digital communications and storage. Reed
Solomon codes are used to correct errors in many
systems including: Storage devices, Wireless or
mobile communications, Satellite communications,
Digital television, High-Speed modems. Fig.1.. Typical System
A. Error Correction Codes In the design of high reliable electronics
systems both the Reed-Solomon (RS) encoder and
High reliable data transmission and storage systems
frequently use Error Correction Codes (ECC) to decoder should be self checking in order to avoid
protect data. By adding a certain grade of redundancy faults in these blocks which compromise the
these codes are able to detect and correct errors in the reliability of the whole system. In fact, a fault in the
coded information. Error-control coding techniques encoder can produce a noncorrect codeword, while a
detect and possibly correct errors that occur when fault in the decoder can give a wrong data word even
messages are transmitted in a digital communication if no errors occur in the codeword transmission.
system. To accomplish this, the encoder transmits not
Therefore, great attention must be paid to detect and
only the information symbols but also extra
redundant symbols. The decoder interprets what it recover faults in the encoding and decoding circuitry.
receives, using the redundant symbols to detect and
possibly correct whatever errors occurred during
252
C. Properties of Reed Solomon Codes GF(2m). The RS codeword is then generated by using
the generator polynomial g(x). All valid codewords
Nowadays, the most used Error Correcting are exactly divisible by g(x).
Codes are the RS codes, based on the properties of
the finite field arithmetic. In particular, finite fields The general form g(x) is
with 2m elements are suitable for digital i i+1 i+2t
g(x) = (x + ) (x + ) … (x + )
implementations due to the isomorphism between the
addition, performed modulo 2, and the XOR where 2t=n-k and is primitive element of the field
operation between the bits representing the elements i.e.,
of the field. The use of the XOR operation in addition
and multiplication allows to use parity check-based GF(2m) -{0} ∃ i N| i
= .
strategies to check the presence of faults in the RS
encoder, while the implicit redundancy in the
codeword is used either for correct erroneous data
The codewords of a separable RS(n, k) code
and for detect faults inside the decoder block.
correspond to the polynomial c(x) with degree n -1
II. REED-SOLOMON CODES that can be generated by using the following
formulas:
Reed-Solomon codes provide very powerful error
correction capabilities, have high channel efficiency c(x) = d(x) . x(n-k) + p(x)
and are very versatile. They are a “block code”
p(x) = d(x) . x(n-k) mod g(x)
coding technique requiring the addition of redundant
parity symbols to the data to enable error correction. where p(x) is a polynomial with degree less than n - k
The data is partitioned into blocks and each block is representing the parity symbols. In practice, the
processed as a single unit by both the encoder and encoder takes k data symbols and adds 2t parity
decoder. The number of parity check symbols per symbols obtaining a n symbol codeword. The 2t
block is determined by the amount of error correction parity symbols allows the correction of up to t
required. These additional check symbols must symbols containing errors in a codeword.
contain enough information to locate the position and
determine the value of the erroneous information Defining the Hamming distance of two
symbols. polynomials a(x) and b(x) of degree n as the number
of coefficients of the same degree that are different,
A. Finite Field Arithmetic i.e., H(a(x), b(x)) = #{i n|ai bi}, and the
Hamming weight W(a(x)) as the number of non-zero
The finite fields used in digital implementations are coefficients of a(x), i.e., W(a(x)) = #{i n|ai 0} it
in the form GF(2m), where m represents the number is easy to prove that H(a(x),b(x)) = W(a(x) - b(x)). In
of bits of a symbol to be coded. An element a(x) a RS(n, k) code the Hamming distance between two
GF(2m) is a polynomial with coefficients ai {0,1} codewords is n - k. After the transmission of the
and can be seen as a symbol of m bit a = am-1….. a1a0. coded data on a noisy channel the decoder receives as
The addition of two elements a(x) and b(x) GF(2m)
input a polynomial , where e(x)
is the sum modulo 2 of the coefficients ai and bi, i.e.,
is the error polynomial. The RS decoder identifies the
is the bitwise XOR of the two symbols a and b. The
position and magnitude of up to t errors and it is able
multiplication of two elements a(x) and b(x)
to correct them. In otherwords the decoder is able to
GF(2m) requires the multiplication of the two
identify the e(x) polynomial if the Hamming weight
polynomials followed by the reduction modulo i(x),
W(e(x)) is not greater than t. The decoding algorithm
where i(x) is an irreducible polynomial of degree m.
provides as outpu t the codeword that is the only
Multiplication can be implemented as an AND-XOR
codeword having an Hamming distance not greater
network.
than t from the received polynomial .
The RS(n,k) code is defined by representing
the data word symbols as elements of the field B. Proposed Implementations
GF(2m) and the overall data word is treated as a
polynomial d(x) of degree k - 1 with coefficient in
253
In this section, the motivations of the design GF(2m) used in the RS encoder are analyzed with
methodology used for the proposed design are respect to the parity of the binary representation of
described. the operands. The following two operations are
considered:
A radiation-tolerant RS encoder hardened
against space radiation effects through circuit and • Parity of the addition in GF(2m);
layout techniques and also the single and multiple
parity bits schemes are presented to check the • Parity of the constant multiplication in GF(2m).
correctness of addition and multiplication in
polynomial basis representation of finite fields. Then Defining the parity P(a(x)) of a symbol as
extend the techniques presented to detect faults the XOR of the coefficients ai, and taking into
occurring in the RS encoder, achieving the
account that in GF(2^m) the addition operation is
selfchecking property for the RS encoder
implementation. Moreover, a method to obtain CED realized by the XOR of the bits having the same
circuits for finite field multipliers and inverters has index, the following property can be easily
been proposed. demonstrated:
Since both the RS encoder and decoder are
based on GF(2m) addition, multiplication, and P( a(x) + b(x) )= P( a(x) ) P ( b(x))
inversion, their self-checking design can be obtained
Taking into account that in the RS encoder
by using CED design of these basic arithmetic
the polynomial used to encode the data is constant,
operations. Moreover, a self-checking algorithm for
the polynomial multiplication is implemented by the
solving the key equation has been introduced.
multiplication for the constant gi, where gi are the
Exploiting the algorithm proposed and substituting
coefficients of the generator polynomial g(x). The
the elementary operations with the corresponding
constant multiplier is implemented by using an
CED implementation for the other parts of the
suitable network of XOR gates. The parity P(c(x)) of
decoding algorithm a self-checking decoder can been
the result can be evaluated as
implemented. This approach can be used for the
encoder, that use only addition and constant
multiplication and is illustrated in the following
subsection, but it is unusable for the decoder as
described later in this paper and a specific technique
where A is the set of inputs that are evaluated an odd
will be explained in the successive section.
number of times. For the input bits evaluated an even
III. REED-SOLOMON ENCODER number of times additional outputs are added.
B. Self-Checking RS Encoder
The Reed-Solomon Encoder is used in many Forward
Error Correction (FEC) applications and in systems The implementation of RS encoders are usually
where data are transmitted and subject to errors based on an LFSR, which implements the
before reception, for example, communications polynomials division over the finite field. In Fig.2,
systems, disk drives, and so on. the implementation of an RS encoder is shown. The
additions and multiplications are performed on
A. characteristics of the Reed-solomon Encoder GF(2m) and gi are the coefficients of the generator
polynomial g(x).
In order to design a self-checking RS encoder by
using the multipliers, each fault inside these blocks The RS encoder architecture is composed by
should be correctly detected. This detection is not slice blocks containing a constant multiplier, an
ensured for the entire set of stuck-at faults because no adder, and a register.
details on the logical net-list implementing the
multipliers are given previously. In fact, an
estimation of the probability of undetected faults
different from zero. To overcome this limitation,
obtaining a total fault coverage for the single stuck-at
faults the solution proposed in is used. First of all,
the characteristics of the arithmetic operations in Fig. 2. RS Encoder.
254
The number of slices to design for an RS(n,
k) code is n - k. The self-checking implementation
requires the insertion of some parity prediction
blocks and a parity checker.
• Aout is the result of the multiplication and addition 1) The internal structure of the decoder must be
operation. modified substituting the elementary operations with
the corresponding CED ones. Therefore, the decoder
• Pout is the predicted parity of the result.
performances in terms of maximum operating
The parity prediction block is implemented. frequency, area occupation, and power consumption
It must be noticed that some constrains in the can be very different with respect to the nonself-
implementation of the constant multiplier must be checking implementation.
added in order to avoid interference between
2) The self-checking implementation is strongly
different outputs when a fault occurs.
dependent from the chosen decoder architecture
These interferences are due to the sharing
3) A good knowledge of the finite field arithmetic is
of intermediate results between different outputs and,
essential for the implementation of GF(2m)
therefore, can be avoided by using networks with
arithmetic blocks.
fan-out equal to one. The parity checker block checks
if the parity of the inputs is even or odd. In the solution presented in this paper,
differently from the previously discussed approaches,
This considerations guarantee the self-
the implementation of the self-checking RS decoder
checking property of the checker. It can be noticed
is based on the use of a standard RS decoder
that, due to the LFSR-based structure of the RS
completed by adding suitable hardware blocks to
encoder, there are no control state machines to be
check its functionality. In this way, the proposed
protected against faults
method can be directly used for a wide range of
255
different decoder algorithms enabling the use of This approach is completely independent by
important design concepts such as reusability. The the assumed fault set and it is based only on the
proposed technique starts from the following two assumption that the fault-free behavior of the decoder
main properties of the fault-free decoder. provides always a codeword as output. This
assumption is valid for a wide range of decoder
Property 1: The decoder output is always a architectures. For some decoders that are able to
codeword. perform a miscorrection detection for some received
polynomials with more than t errors suitable
Property 2: The Hamming weight of the error
modification of our proposed method could be done.
polynomial is not greater than t.
B. concurrent Error detection for the RS Decoder
If a fault occurs inside the decoder the
previously outlined observation is able to detect the
In Fig. 5, the CED implementation of the RS decoder
occurrence of the fault. When the fault is activated,
is shown. Its main blocks are as follows.
i.e., the output is different from the correct one due to
the presence of the fault , the following two cases • RS decoder, i.e., the block to be checked.
occur.
• An optional error polynomial recovery block. This
• The first case the decoder gives as output a non- block is needed if the RS decoder does not provide at
codeword, and this case can be detected by property the output the error polynomial coefficients.
1. This is the most probable case because the decoder
computes the error polynomial and obtains the output • Hamming weight counter, that checks the number
codeword by calculating c(x) = c(x) + e(x), where of nonzero coefficients of the error polynomial.
c(x) is the received polynomial.
• Codeword checker, that checks if the output data of
• If the output of the faulty decoder is a wrong the RS decoder form a correct codeword.
codeword the detection of this fault is easily
performed by evaluating the Hamming weight of the • Error detection block that take as inputs the output
error polynomial e(x).The error polynomial can be of the Hamming weight counter and of the codeword
provided by the encoder as an additional output or checker and provides an error detection signal if a
can be evaluated by comparing the received fault in the RS decoder has been detected.
polynomial and the provided output .
The RS decoder can be considered as a
If one of the two properties is not respected black box performing an algorithm for the error
a fault inside the decoder is detected, while if all the
observations are satisfied we can detect that no faults
are activated inside the decoder.
256
detection and correction of the input data (the VI. REFERENCES
coefficients of the received data forming the polynomial
[1] Altera Corp., San Jose, CA, “Altera Reed-Solomon
. compiler user guide 3.3.3,” 2006.
[2] Xilinx, San Jose, CA, “Xilinx logicore Reed-Solomon
The error polynomial recovery block is decoder v5.1,” 2006.
composed by a shifter register of length L (the latency [3] S. B. Sarmadi and M. A. Hasan, “Concurrent error
detection of polynomial basis multiplication over extension
of the decoder) and by a GF(2m) adder having as fields using a multiple-bit parity scheme,” in Proc. IEEE Int.
operands the coefficients of c(x) and . Symp. Defect Fault Tolerance VLSI Syst., 2005, pp. 102–110.
[4] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano,
The Hamming weight counter is composed by “Design of a self checking reed solomon encoder,” in Proc.
11th IEEE Int. On-Line Test. Symp. (IOLTS’05), 2005, pp.
the following: 201–202
[5] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “A
1) a comparator indicating (at each clock cycle) if the self checking Reed Solomon encoder: Design and analysis,” in
e(x) coefficients are zero; Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,
2005, pp. 111–119.
[6] A. R. Masoleh and M. A. Hasan, “Low complexity bit
2) a counter that takes into account the number of
parallel architectures for polynomial basis multiplication over
nonzero coefficients; GF(2m), computers,” IEEE Trans. Comput., vol. 53, no. 8, pp.
945–959, Aug. 2004.
3) a comparator between the counter output and t that is [7] J. Gambles, L. Miles, J. Has, W. Smith, and S. Whitaker,
the maximum allowed number of nonzero elements. “An ultra-lowpower, radiation-tolerant reed solomon encoder
for space applications,” in Proc. IEEE Custom Integr. Circuits
Conf., 2003, pp. 631–634.
The codeword checker block checks if the reconstructed
[8] A. R. Masoleh and M. A. Hasan, “Error Detection in
c(x) is a codeword, i.e., if it is exactly divisible for the Polynomial Basis Multipliers over Binary Extension Fields,”
generator polynomial g(x). The two Types of this block in Lecture Notes in Computer Science. New York: Springer-
are proposed. Verlag, 2003, vol. 2523, pp. 515–528.
[9] Y.-C. Chuang and C.-W. Wu, “On-line error detection
schemes for a systolic finite-field inverter,” in Proc. 7th Asian
The error detection block takes as inputs the Test Symp., 1998, pp. 301–305.
outputs of the Hamming weight counter and the outputs [10] I. M. Boyarinov, “Self-checking algorithm of solving the
of the codeword checker. The additional blocks used to key equation,” in Proc. IEEE Int. Symp. Inf. Theory, 1998, p.
detect faults inside the decoder are susceptible to faults. 292.
[11] M. Gossel, S. Fenn, and D. Taylor, “On-line error
For the codeword checker and the error polynomial detection for finite field multipliers,” in Proc. IEEE Int. Symp.
generator blocks only register and GF(2m) addition and Defect Fault Tolerance VLSI Syst., 1997, pp. 307–311.
constant multiplication are used and, therefore, the same [12] C. Bolchini, F. Salice, and D. Sciuto, “A novel
methodology for designing TSC networks based on the parity
consideration of RS encoder can be used to obtain the
bit code,” in Proc. Eur. Design Test Conf., 1997, pp. 440–444.
self-checking property of these blocks. For the counters [13] D. Nikolos, “Design techniques for testable embedded
and the comparator used in the Hamming weight error checkers, computers,” Computer, vol. 23, no. 7, pp. 84–
counter and error detection blocks, many efficient 88, Jul. 1990.
[14] P. K. Lala, “Fault Tolerant and Fault Testable Hardware
techniques are found. Design”. Englewood Cliffs, NJ: Prentice-Hall, 1985
[15]R. E. Blahut, “Theory and Practice of Error Control
V. CONCLUSION Codes”. Reading, MA: Addison-Wesl.
[16] Andr´e S¨ulflow Rolf Drechsler “Modeling a Fully
Scalable Reed-Solomon Encoder/Decoder over GF(pm) in
In this paper self-checking architectures for an
SystemC” Andr´e S¨ulflow Rolf Drechsler Institute of
RS encoder and decoder are described. For the self- Computer Science University of Bremen 28359 Bremen,
checking RS decoder two main properties of the fault Germany
free decoder have been identified and used to detect [17] Dong Hoon LEE and Jong Tae KIM*, “ Efficient
Recursive Cell Architecture for the Reed-Solomon Decoder”,
faults inside the decoder. This method can be used for a Jounal of the Korean Physical Society, Vol. 40, No.1, January
wide range of algorithm implementing the decoder 2002, pp. 82~86.
function. [18] Kenny Chung Chung Wai, Dr. Shanchieh Jay Yang
“Field Programmable Gate Array Implementation of Reed-
Solomon Code, RS(255,239)” Xelic Inc., Pittsford, New York
14534, R.I.T, Rochester, New York 14623.
257
Design of High Speed Architectures for MAP Turbo Decoders
1
Lakshmi .S.Kumar, 2D.Jackuline Moni
1
II M.E, Applied Electronics ,2Associate Professor,
1, 2
Karunya University, Coimbatore
1
Email id:lakshmiskumarr@gmail.com
258
is reported for a signal to noise ratio of 0.7 dB. Fig 1.A Turbo encoder
Turbo coding is a forward error correction (FEC)
scheme. Turbo codes consist of concatenation of In a turbo decoder the iterative decoding process
two convolutional codes .Turbo codes gives better of the turbo decoder is described. The maximum a
performance at low SNRs. posteriori algorithm (MAP) is used in the turbo
decoder. There are three types of algorithms used in
The turbo encoder transmits the encoded bits turbo decoder namely MAP, Max-Log-MAP and
which form inputs to the turbo decoder. The turbo Log-MAP. The MAP algorithm is a forward-
decoder decodes the information iteratively. Turbo backward recursion algorithm, which minimizes the
codes can be concatenated in series, parallel or in a probability of bit error, has a high computational
hybrid manner. Concatenated codes can be complexity and numerical instability. The solution
classified as parallel concatenated convolution to these problems is to operate in the log-domain.
codes (PCCC) or serial concatenated convolutional One advantage of operating in log-domain is that
codes (SCCC). In PCCC two encoders operate on multiplication becomes addition. Addition however
the same information bits. In SCCC, one encoder is not straight forward. Addition is a maximization
encodes the output of another encoder. The hybrid function plus a correction term in the log domain.
concatenation scheme consists of the combination The Max-Log- MAP algorithm approximates
of both parallel and serial concatenated addition solely as maximization.
convolutional codes. The turbo decoder has two
decoders that perform iterative decoding.
• Mobile radio
• Digital video
• Long-haul terrestrial wireless
• Satellite communications and
• Deep space communication
259
and thus called Log-MAP algorithm. MAP-based metric is first introduced for each competing pair of
Turbo decoders normally adopted a sliding window states metrics (e.g., 0 and 1 in Fig. 4) so that the
approach [11] in order to reduce computation front-end addition and the subtraction operations
latency and memory for storing state metrics. As it can be performed simultaneously in order to reduce
is explained in [4], three recursive computation the computation delay of the loop. Second, a
units: , , and pre- units are needed for a Log- generalized LUT (see GLUT in Fig. 4) is employed
MAP decoder. This paper is focused on the design that can efficiently avoid the computation of
of high-speed recursive computation units as they absolute value instead of introducing another
form the bottleneck in high speed circuit design. It subtraction operation. Third the final addition is
is known from the Log-MAP algorithm that all moved to the input side as with the OACS
three recursion units have similar architectures. So architecture and then utilizes one stage carry-save
we will focus our discussion on the design of structure to convert a three-number addition to a
units. The traditional design for computation is two-number addition. Finally, an intelligent
illustrated in Fig. 3, where the ABS block is used to approximation is made in order to further reduce
compute the absolute value of the input and the the critical path.
LUT block is used to implement a nonlinear
function log(1+ e-x)), where x > 0. For simplicity,
only one branch (i.e., one state) is drawn. The
overflow approach [14] is assumed for
normalization of state metrics as used in
conventional Viterbi decoders.
Here,an advanced Radix-2 recursion architecture 2[k + 1] =max* ( 0[k] + 3[k], 1[k] + 0[k])
260
numbers to an addition of two numbers, where FA
and HA represents full-adder and half-adder,
where max* function is defined in (2). respectively, XOR stands for exclusive OR gate, d0
and d1 correspond to the 2-bit output of GLUT.
In addition, we split each state metric into two
The state metrics and branch metrics are
terms as follows:
represented with 9 and 6 bits, respectively, in this
example. The sign extension is only applied to the
branch metrics. It should be noted that an extra
0[k] = 0A[k] + 0B[k] addition operation might be required to integrate
each state metric before storing it into the
1[k] = 1A[k] + 1B[k] memory. The GLUT structure is shown in Fig. 6,
where the computation of absolute value is
2[k] = 2A[k] + 2B[k]:
eliminated by including the sign bit into two logic
(4)
blocks, i.e., Ls2 and ELUT, where the Ls2 function
block is used to detect if the absolute value of the
input is less than 2.0, and the ELUT block is a
Similarly, the corresponding difference metric is small LUT with 3-bit inputs and 2-bit outputs. It
also split into the following two terms: can be derived that Z = S b7,. . . ,+b1 + b + S(b7, . .
. , b1b0). It was reported in [13] that using two
output values for the LUT only caused a
performance loss of 0.03 dB from the floating point
01[k] = 01A[k] + 01B[k]
simulation for a four state Turbo code. The
approximation is described as follows:
01A[k] = 0A[k] - 1A[k]
261
which is generally the case. In our design, both the operation, and 1-bit addition operation, which saves
inputs and outputs of the LUT are quantized in four nearly two multibit adder delay compared to the
levels. traditional ACSO architecture.
262
Custom Integr. Circuits Conf. (CICC), 2000, pp.
39–42.
[5] L. Bahl, J. Jelinek, J. Raviv, and F. Raviv,
“Optimal decoding of linear codes for minimizing [14] Y.Wu, B.D.Woerner, and T. K. Blankenship,
symbol error rate,” IEEE Trans. Inf. Theory, vol. “Data width requirement in SISO decoding with
IT-20, no. 2, pp. 284–287, Mar. 1974. module normalization,” IEEE Trans. Commun.,
[6] S.-J. Lee, N. Shanbhag, and A. Singer, “A 285- vol. 49, no. 11, pp. 1861–1868, Nov. 2001.
MHz pipelined MAP decoder in 0.18 _m CMOS,”
IEEE J. Solid-State Circuits, vol. 40, no. 8, pp.
1718–1725, Aug. 2005.
[7] P. Urard et al., “A generic 350 Mb/s Turbo
codec based on a 16-state Turbo decoder,” in IEEE
ISSCC Dig. Tech. Papers, 2004, pp. 424–433.
[8] E. Boutillon, W. Gross, and P. Gulak, “VLSI
architectures for the MAP algorithm,” IEEE Trans.
Commun., vol. 51, no. 2, pp. 175–185, Feb. 2003.
[9] T. Miyauchi, K. Yamamoto, and T. Yokokawa,
“High-performance programmable SISO decoder
VLSI implementation for decoding Turbo codes,”
in Proc. IEEE Global Telecommun. Conf., 2001,
pp. 305–309.
[10] M. Bickerstaff, L. Davis, C. Thomas, D.
Garret, and C. Nicol, “A 24 Mb/s radix-4 LogMAP
Turbo decoder for 3 GPP-HSDPA mobile
wireless,” in IEEE ISSCC Dig. Tech. Papers, 2003,
pp. 150–151.
[11] A. J. Viterbi, “An intuitive justification of the
MAP decoder for convolutional codes,” IEEE J.
Sel. Areas Commun., vol. 16, pp. 260–264, Feb.
1998.
[12] T. C. Denk and K. K. Parhi, “Exhaustive
scheduling and retiming of digital signal processing
systems,” IEEE Trans. Circuits Syst., Part II:
Analog Dig. Signal Process., vol. 45, no. 7, pp.
821–838, Jul. 1998.
[13] W. Gross and P. G. Gulak, “Simplified MAP
algorithm suitable for implementation of turbo
decoders,” Electron. Lett., vol. 34, no. 16, pp.
1577–1578, Aug. 1998.
263
Technology Mapping Using Ant Colony Optimization
Jacukline Moni, S. Arumugam, M.SajanDeepak,
1,3ECE,dept,karunyauniversity
Abstract The ant colony optimization [2] meta- pheromone [2] as they walk along a chosen path.
heuristic is adopted from the natural foraging Following ants most likely prefer those paths
behavior of real ants and has been used to find possessing the strongest pheromone information,
good solutions to a wide spectrum of there by refreshing or further increasing the
combinatorial optimization problem. Ant colonies respective amounts of pheromone. Since ants on
[2][3] are capable of finding shortest path between short paths are Quicker, pheromone traces [5][6] on
nest and food. In ACO[2] algorithm ants construct these paths are increased very frequently. On the
solutions with help of local decisions. And this other hand, pheromone information is permanently
approach is being used for optimizing wire length reduced by evaporation [3], which diminishes the
and minimizing the area[4]. And performance influence of formerly chosen unfavorable paths. This
wise the disadvantages in other optimization combination focuses the search process on short,
algorithms [4] like time consumption is reduced favorable paths. In ACO [2][3], a set of artificial ants
and also this ACO algorithm quickly converges to searches for good solutions for the optimization
an optimum .It is also used in other applications problem under consideration. Each ant constructs a
like traveling sales main problem and quadratic solution by making a sequence of local decisions. Its
assignment problem. Field programmable gate decisions are guided by pheromone information and
arrays [1] are becoming increasingly important some additional heuristic information .After a
implementation platforms for digital circuits. One number of ants have constructed solutions, the best
of the necessary requirements to effectively utilize ants are allowed to update the pheromone
the field programmable gate arrays[1] fixed information along their path through the decision
resources is an efficient placement and routing graph. Evaporation is accomplished by globally
mechanism here we use ant colony optimization reducing the pheromone information by a certain
algorithm[4] for placement and routing problem. percentage. This process is repeated iteratively until a
stopping criterion is met. ACO[2] has shown good
performance on several combinatorial optimization
problems[5][7], including scheduling, vehicle
Keywords-Fpga, ACO, Propablistic rule.
routing, constraint satisfaction, and the quadratic
assignment problem[5]. In this paper, we adapt an
ACO algorithm to field programmable gate arrays
I INTRODUCTION (FPGAs). FPGAs [1]are used for a wide range of
applications, e.g. network communication ,video
Natural evolution has yielded biological systems in communication and processing and Cryptographic
which complex collective behavior emerges from the applications. We show that ACO can also be
local interaction of simple components. One example implemented on FPGAs,[1] leading to significant
where this phenomenon can be observed is the speedups in runtime compared to implementations in
foraging behavior of ant colonies. Ant colonies software on sequential machines. Standard ACO
[2][3][5] are capable of finding shortest paths algorithm is not very well suited to implementation
between their nest and food sources. This complex on the resources provided by current commercial
behavior of the colony is possible because the ants FPGA architectures. Instead we suggest using the
communicate indirectly by disposing traces of Population-based ACO, in which pheromone
264
information is replaced. By a small set (population) II METHODOLOGY
of good solutions discovered during the preceding
iterations. Accordingly, the combination of The objective is to find the minimal length
pheromone updates and evaporation has been connecting two components. For example we can
replaced by inserting a new good solution into the consider a component at ‘i’ which is the source
population, replacing the oldest Solution from the and it has to be connected to the component at
population. ‘j’,
here the distance between two components can be and the evaporation of the pheromone is given
clearly given by the basic equation[1][2], by the equation
k
⎧
(t ) = ⎪⎨
τ ij[η ]
(t )
α
ij
β ⎫
⎪α = 1
⎬
∑ [τ (t )] [η ]
p ij α β
⎪⎩ ij ij ⎪⎭ β = 5
ς = 0.5
And simultaneous deposition and evaporation of
pheromone[2][3][4] takes place and the paths with
shorter distance will be having the nigh IV RESULTS
concentration fo pheromone and the path with
longer distance will be having less concentration of The results here shows the comparison of the ant
pheromone. The pheromone updating is given by colony optimization with simulated annealing
the equation
Device utilization by simulated annealing for B01
265
Number of Slices: 8 out of 192 Selected Device: 2s15cs144-6
4%
Number of Slices: 7 out of 192
Number of Slice Flip Flops: 13 out of 384 3%
3%
Number of Slice Flip Flops: 12 out of 384
Number of 4 input LUTs: 14 out of 384 3%
3%
Number of 4 input LUTs: 13 out of 384
Number of bonded IOBs: 4 4 out of 90 3%
48%
Number of bonded IOBs: 40 out of 90
Number of GCLKs: 2 Lut of 4 44%
50%
Number of GCLKs: 1 out of 4
Device utilization by ACO for B01 25%
Number of LOCed External IOBs 0 out of 28 Number of Flip Flops: 12 out of 384 3%
0%
VI FUTURE WORK
266
that it even takes lesser recourses for accomplishing [12]S.Bade ,B.Hutching, ”Fpga based stochastic
the task neural network implementation”proceedings of the
IEEE workshop on FPGA for custom computing
REFERENCES machines,1994,pp.180-198.
[13]P.Lysaght, J.Stockwood, J.Law
[1]B.Scheuerman ,D.Grima,”Artifical neural network implementation
K>So,M.Guntsch,M.Middendorf,O.Diessel,H.Elgin on a fine grained Fpga in field programmable Logic
dy,H.Schmeck “FPGA placement and routing ant “1994,pp.421-431.
colony optimization” 26 january 2004. [14]M.Guntsch,M.Middendorf,”A population based
[2]J.L.Deneubourg J.M Passteels, J.C Verhaege approach for ACO”in: S.Cagnoni et al,application
“propablistic behaviour in ants:a strategy of error?” of Evolutionary computing-EvoWorkshop 2002:
105 (1983) 259-271. EvoCOP.pp.72-81.
[3]M.Dorgio,”optimization learning and natural [15] P.Albuquerque and A.Dupuis: “A parallel
algorithmsElettronica Politeenico Di Milano,italy Cellular Ant Colony Algorithm for Clustering and
2991 sorting” proc of ACRI 2002, LNCS
[4]C.Solonon,”Ants can solve constraint 2493,springer,220-230(2002).
satisfaction problem “ IEEE trans Evolut [16] cordon,O., F.Herrera and T.Stuzle “A Review
Comput.6(4)(2002) 347-357. on the Ant Colony Optimization
[5] L.M Ganmbardella, E.Taillard,M.Dorgio, “ant Metaheuristic”:Basis,Models and new trends. Math
colonies for the quadaratic assignment ware and softcomputing 9(2002).
problem”J.Operat Res Soc.50 (1999) 167 -176 [17] P.Delisle,M.Krajecki,M.Gravel and C.Gagne:
[6] M.Dorgio,”Parallel ant system:an experimental “parallel implementation of an ant colony
study” optimization metaheuristic” with openMP
Manuscript 1993. proceeding 3rd European workshop on openMP
[ 7] E-G Talbi,O.Roux,c.Fonlupt,D.Robillard (2001).
“parallel ant colonies for combinatorial .[18] M.Dorigo,V.Manieezzo na dA.Colorni:”the
optimization problem” parallel and distributed ant system:optimization by a Colony of cooperating
processing,11 IPPS/SPDP’99 workshop,no1586 in agents”;IEEE trans.sys.,man,Cybernetics B26,29-
LNCS Springer-Verlag 1999.pp. 239-247. 41(1996).
[8]M.Rahoul,R.Hadji,V.Bachelet,”parallel ant [19] M.SFiorenzo Catalono and F.Malucelli:
system for the set covering problem,in ant “parallel randomized heuristics for the set covering
algorithms,proceedings of third international problem”, international journal of practicall parallel
Workshop ANTS 2002,LNCS 2463,Springer- computing 10(4):113-132 (2001).
Verlag Brussels Belgium 2002.pp. 262-267. [21] H.Kawamura,M.Yamamoto,K.Suzuki, and
[9] M.Middendorf,F.Reischle,H.Schmeck,”Multi A.Ohunchi: “Multiple ant colony algorithms based
colony optimization”J.Parallel Distrib Comput on colony level interactions”.IEICE Transactions
62(9)(2002) 1421-1432. on fundamental,E83-A(2): 371-379(2000)
[10]R.Miller ,V.K Prasanna kumar, D.I Reisis,Q.F [22] A.E LanghM AND p>w.Grant. “Using
Stout,” parallel computation on reconfigurable competing ant colonies to solve K-way portioning
meshes,IEEE trans. Comput 42(6)(1993) 678-692 problems with foraging and raiding strategies in
conference on advanced research in VLSI,1998. proc 5th European conference on artificial
[11] O.Cheung, P.Leong,”implementation of an life,ECAL’99,Springer,LNCS 1674,621-625(1999)
FPGA based accelator for virtual private
networks”in IEEE international conference on field
programmable technology HongKong 2002 .pp.34-
43
267
OUR SPONSORS
Chennai.
Scientronics
Bangalore.
Hi-Tech Electronics
Trichy.