LDPC Encoding and Decoding For High Memory and DSP Applications
LDPC Encoding and Decoding For High Memory and DSP Applications
ABSTRACT:
Low density parity check (LDPC) codes are most widely used error correcting codes
(ECC). Because of their popularity they are used in several applications such as the digital
satellite broadcasting system (DVB-S2), Wireless Local Area Network (IEEE 802.11n)
and Metropolitan Area Network (802.16e). These codes are used to transmit a message
over noisy transmission channel. LDPC codes are constructed using sparse parity check
matrices. LDPC codes provide very fast encoding and decoding algorithms. In this paper,
low density parity check decoder is implemented using Verilog technique. A partially
parallel decoder is design using belief propagation (BP) algorithm. For simulation,
Modelsim is used and for synthesis, Mentor Graphics Leonardo Spectrum with vertex IV
technology is used.
CHAPTER 1
INTRODUCTION:
Low-Density Parity-Check (LDPC) [1] codes have become one of the most attractive error
correction codes due to its excellent performance [2] and suitability in high data rate
applications, such as WiMax, DVB-S2 and so on [3]. The inherent structure of the LDPC
code makes the decoder achieve high degree of parallelism in practical implementation
[4]. LDPC decoding algorithms are primarily iterative and are based on belief propagation
message passing algorithm. The complexity of the decoding algorithm is highly critical for
the overall performance of the LDPC decoder. Various algorithms have been proposed
in the past to achieve tradeoff between complexity and performance [5, 6]. The Sum-
Product Algorithm (SPA) [7], a soft decision based message passing algorithm can
achieve best performance, but with high decoding complexity. Whereas, Bit-Flip is a hard
decision based algorithm with least decoding complexity, but suffers from poor
performance [6]. Min-Sum Algorithm (MSA) is the simplified version of SPA that has
reduced implementation complexity with a slight degradation in performance [7]. The
MSA performs simple arithmetic and logical operations that makes suitable for hardware
implementation. But the performance of the algorithm is significantly impacted by the
quantization of soft input messages used [8]. Reducing the quantization of the message
is invariably important to reduce the implementation complexity and hardware resources
of the decoder. But this advantage comes with degradation in decoding performance.
Performance issues and hardware implementation of such low complexity algorithms,
especially the 2-bit MSA has limited information in the literature.
PURPOSE OF CODING:
To provide and improvise designing the efficient and reliable data transmission methods.
This typically involves the removal of redundancy and the correction (or detection) of
errors in the transmitted data
OUTLINE:
The rest of the paper is organized as follows. NS-FAIDs are introduced in Section II, which
also discusses their expected implementation benefits and the DE analysis. The
optimization of regular and irregular NS-FAIDs is presented in Section III. The proposed
hardware architectures, with both MS and NSFAID decoding kernels, are discussed in
Section IV. Numerical results are provided in Section V, and Section VI concludes the
paper.
CHAPTER 2
LITERATURE SURVEY:
Coding theory is the study of the properties of codes and their fitness for a specific
application. Codes are used for data compression, cryptography, error-correction and
more recently also for network coding. Codes are studied by various scientific
disciplines—such as information theory, electrical engineering, mathematics, and
computer science—for the purpose of designing efficient and reliable data transmission
methods. This typically involves the removal of redundancy and the correction (or
detection) of errors in the transmitted data.
3. Cryptographic coding
4. Line coding
Source encoding attempts to compress the data from a source in order to transmit it more
efficiently. This practice is found every day on the Internet where the common Zip data
compression is used to reduce the network load and make files smaller.
The second, channel encoding, adds extra data bits to make the transmission of data
more robust to disturbances present on the transmission channel. The ordinary user may
not be aware of many applications using channel coding. A typical music CD uses the
Reed-Solomon code to correct for scratches and dust. In this application the transmission
channel is the CD itself. Cell phones also use coding techniques to correct for the fading
and noise of high frequency radio transmission. Data modems, telephone transmissions,
and NASA all employ channel coding techniques to get the bits through, for example the
turbo code and LDPC codes.
The binary Golay code was developed in 1949. More specifically, it is an error-correcting
code capable of correcting up to three errors in each 24-bit word, and detecting a fourth.
A two-dimensional visualisation
of the Hamming distance
Richard Hamming won the Turing Award in 1968 for his work at Bell Labs in numerical
methods, automatic coding systems, and error-detecting and error-correcting codes. He
invented the concepts known as Hamming codes, Hamming windows, Hamming
numbers, and Hamming distance.
Source coding
The aim of source coding is to take the source data and make it smaller.
Definition
probability .
The code word of the empty string is the empty string itself:
Properties
1. is non-singular if injective.
Principle
Entropy of a source is the measure of information. Basically, source codes try to reduce
the redundancy present in the source, and represent the source with fewer bits that carry
more information.
Data compression which explicitly tries to minimize the average length of messages
according to a particular assumed probability model is called entropy encoding.
Various techniques used by source coding schemes try to achieve the limit of Entropy of
the source. C(x) ≥ H(x), where H(x) is entropy of source (bitrate), and C(x) is the bitrate
after compression. In particular, no source coding scheme can be better than the entropy
of the source.
Example
Facsimile transmission uses a simple run length code. Source coding removes all data
superfluous to the need of the transmitter, decreasing the bandwidth required for
transmission.
Channel coding
The purpose of channel coding theory is to find codes which transmit quickly, contain
many valid code words and can correct or at least detect many errors. While not mutually
exclusive, performance in these areas is a trade off. So, different codes are optimal for
different applications. The needed properties of this code mainly depend on the
probability of errors happening during transmission. In a typical CD, the impairment is
mainly dust or scratches. Thus codes are used in an interleaved manner. [citation needed] The
data is spread out over the disk.
Although not a very good code, a simple repeat code can serve as an understandable
example. Suppose we take a block of data bits (representing sound) and send it three
times. At the receiver we will examine the three repetitions bit by bit and take a majority
vote. The twist on this is that we don't merely send the bits in order. We interleave them.
The block of data bits is first divided into 4 smaller blocks. Then we cycle through the
block and send one bit from the first, then the second, etc. This is done three times to
spread the data out over the surface of the disk. In the context of the simple repeat code,
this may not appear effective. However, there are more powerful codes known which are
very effective at correcting the "burst" error of a scratch or a dust spot when this
interleaving technique is used.
Other codes are more appropriate for different applications. Deep space communications
are limited by the thermal noise of the receiver which is more of a continuous nature than
a bursty nature. Likewise, narrowband modems are limited by the noise, present in the
telephone network and also modeled better as a continuous disturbance.[citation needed] Cell
phones are subject to rapid fading. The high frequencies used can cause rapid fading of
the signal even if the receiver is moved a few inches. Again there are a class of channel
codes that are designed to combat fading.[citation needed]
Linear codes
The term algebraic coding theory denotes the sub-field of coding theory where the
properties of codes are expressed in algebraic terms and then further researched. [citation
needed]
Algebraic coding theory is basically divided into two major types of codes:[citation needed]
2. Convolutional codes.
the minimum distance between two valid code words, using mainly the Hamming
distance, sometimes also other distances like the Lee distance.
Linear block codes have the property of linearity, i.e. the sum of any two codewords is
also a code word, and they are applied to the source bits in blocks, hence the name linear
block codes. There are block codes that are not linear, but it is difficult to prove that a
code is a good one without this property.[2]
Linear block codes are summarized by their symbol alphabets (e.g., binary or ternary)
and parameters (n,m,dmin)[3] where
2. m is the number of source symbols that will be used for encoding at once,
2. Repetition codes
3. Parity codes
5. Reed–Solomon codes
7. Reed–Muller codes
8. Perfect codes.
Block codes are tied to the sphere packing problem, which has received some attention
over the years. In two dimensions, it is easy to visualize. Take a bunch of pennies flat on
the table and push them together. The result is a hexagon pattern like a bee's nest. But
block codes rely on more dimensions which cannot easily be visualized. The powerful
(24,12) Golay code used in deep space communications uses 24 dimensions. If used as
a binary code (which it usually is) the dimensions refer to the length of the codeword as
defined above.
The theory of coding uses the N-dimensional sphere model. For example, how many
pennies can be packed into a circle on a tabletop, or in 3 dimensions, how many marbles
can be packed into a globe. Other considerations enter the choice of a code. For example,
hexagon packing into the constraint of a rectangular box will leave empty space at the
corners. As the dimensions get larger, the percentage of empty space grows smaller. But
at certain dimensions, the packing uses all the space and these codes are the so-called
"perfect" codes. The only nontrivial and useful perfect codes are the distance-3 Hamming
codes with parameters satisfying (2r – 1, 2r – 1 – r, 3), and the [23,12,7] binary and [11,6,5]
ternary Golay codes.[2][3]
Another code property is the number of neighbors that a single codeword may have. [4]
Again, consider pennies as an example. First we pack the pennies in a rectangular grid.
Each penny will have 4 near neighbors (and 4 at the corners which are farther away). In
a hexagon, each penny will have 6 near neighbors. When we increase the dimensions,
the number of near neighbors increases very rapidly. The result is the number of ways
for noise to make the receiver choose a neighbor (hence an error) grows as well. This is
a fundamental limitation of block codes, and indeed all codes. It may be harder to cause
an error to a single neighbor, but the number of neighbors can be large enough so the
total error probability actually suffers.[4]
Properties of linear block codes are used in many applications. For example, the
syndrome-coset uniqueness property of linear block codes is used in trellis shaping, [5]
one of the best known shaping codes. This same property is used in sensor networks for
distributed source coding
Convolutional codes
The idea behind a convolutional code is to make every codeword symbol be the weighted
sum of the various input message symbols. This is like convolution used in LTI systems
to find the output of a system, when you know the input and impulse response.
So we generally find the output of the system convolutional encoder, which is the
convolution of the input bit, against the states of the convolution encoder, registers.
Fundamentally, convolutional codes do not offer more protection against noise than an
equivalent block code. In many cases, they generally offer greater simplicity of
implementation over a block code of equal power. The encoder is usually a simple circuit
which has state memory and some feedback logic, normally XOR gates. The decoder
can be implemented in software or firmware.
The Viterbi algorithm is the optimum algorithm used to decode convolutional codes. There
are simplifications to reduce the computational load. They rely on searching only the most
likely paths. Although not optimum, they have generally been found to give good results
in the lower noise environments.
Convolutional codes are used in voiceband modems (V.32, V.17, V.34) and in GSM
mobile phones, as well as satellite and military communication devices.
Cryptographical coding
Cryptography or cryptographic coding is the practice and study of techniques for secure
communication in the presence of third parties (called adversaries).[6] More generally, it
is about constructing and analyzing protocols that block adversaries;[7] various aspects in
information security such as data confidentiality, data integrity, authentication, and non-
repudiation[8] are central to modern cryptography. Modern cryptography exists at the
intersection of the disciplines of mathematics, computer science, and electrical
engineering. Applications of cryptography include ATM cards, computer passwords, and
electronic commerce.
Cryptography prior to the modern age was effectively synonymous with encryption, the
conversion of information from a readable state to apparent nonsense. The originator of
an encrypted message shared the decoding technique needed to recover the original
information only with intended recipients, thereby precluding unwanted persons from
doing the same. Since World War I and the advent of the computer, the methods used to
carry out cryptology have become increasingly complex and its application more
widespread.
Line coding
A line code (also called digital baseband modulation or digital baseband transmission
method) is a code chosen for use within a communications system for baseband
transmission purposes. Line coding is often used for digital data transport.
Another concern of coding theory is designing codes that help synchronization. A code
may be designed so that a phase shift can be easily detected and corrected and that
multiple signals can be sent on the same channel.[citation needed]
Another application of codes, used in some mobile phone systems, is code-division
multiple access (CDMA). Each phone is assigned a code sequence that is approximately
uncorrelated with the codes of other phones.[citation needed] When transmitting, the code word
is used to modulate the data bits representing the voice message. At the receiver, a
demodulation process is performed to recover the data. The properties of this class of
codes allow many users (with different codes) to use the same radio channel at the same
time. To the receiver, the signals of other users will appear to the demodulator only as a
low-level noise.[citation needed]
Another general class of codes are the automatic repeat-request (ARQ) codes. In these
codes the sender adds redundancy to each message for error checking, usually by adding
check bits. If the check bits are not consistent with the rest of the message when it arrives,
the receiver will ask the sender to retransmit the message. All but the simplest wide area
network protocols use ARQ. Common protocols include SDLC (IBM), TCP (Internet), X.25
(International) and many others. There is an extensive field of research on this topic
because of the problem of matching a rejected packet against a new packet. Is it a new
one or is it a retransmission? Typically numbering schemes are used, as in
TCP."RFC793". RFCs. Internet Engineering Task Force (IETF). September 1981.
Group testing
Group testing uses codes in a different way. Consider a large group of items in which a
very few are different in a particular way (e.g., defective products or infected test
subjects). The idea of group testing is to determine which items are "different" by using
as few tests as possible. The origin of the problem has its roots in the Second World War
when the United States Army Air Forces needed to test its soldiers for syphilis. It
originated from a ground-breaking paper by Robert Dorfman.
Analog coding
Neural coding is a neuroscience-related field concerned with how sensory and other
information is represented in the brain by networks of neurons. The main goal of studying
neural coding is to characterize the relationship between the stimulus and the individual
or ensemble neuronal responses and the relationship among electrical activity of the
neurons in the ensemble.[12] It is thought that neurons can encode both digital and analog
information,[13] and that neurons follow the principles of information theory and compress
information,[14] and detect and correct[15] errors in the signals that are sent throughout the
brain and wider nervous system.
The redundancy allows the receiver to detect a limited number of errors that may occur
anywhere in the message, and often to correct these errors without retransmission. FEC
gives the receiver the ability to correct errors without needing a reverse channel to request
retransmission of data, but at the cost of a fixed, higher forward channel bandwidth. FEC
is therefore applied in situations where retransmissions are costly or impossible, such as
one-way communication links and when transmitting to multiple receivers in multicast.
FEC information is usually added to mass storage devices to enable recovery of corrupted
data, and is widely used in modems.
FEC processing in a receiver may be applied to a digital bit stream or in the demodulation
of a digitally modulated carrier. For the latter, FEC is an integral part of the initial analog-
to-digital conversion in the receiver. The Viterbi decoder implements a soft-decision
algorithm to demodulate digital data from an analog signal corrupted by noise. Many FEC
coders can also generate a bit-error rate (BER) signal which can be used as feedback to
fine-tune the analog receiving electronics.
The maximum fractions of errors or of missing bits that can be corrected is determined by
the design of the FEC code, so different forward error correcting codes are suitable for
different conditions.
How it works
A simplistic example of FEC is to transmit each data bit 3 times, which is known as a (3,1)
repetition code. Through a noisy channel, a receiver might see 8 versions of the output,
see table below.
001 0
010 0
100 0
111 1 (error free)
110 1
101 1
011 1
This allows an error in any one of the three samples to be corrected by "majority vote" or
"democratic voting". The correcting ability of this FEC is:
Though simple to implement and widely used, this triple modular redundancy is a
relatively inefficient FEC. Better FEC codes typically examine the last several dozen, or
even the last several hundred, previously received bits to determine how to decode the
current small handful of bits (typically in groups of 2 to 8 bits).
FEC could be said to work by "averaging noise"; since each data bit affects many
transmitted symbols, the corruption of some symbols by noise usually allows the original
user data to be extracted from the other, uncorrupted received symbols that also depend
on the same user data.
Because of this "risk-pooling" effect, digital communication systems that use FEC
tend to work well above a certain minimum signal-to-noise ratio and not at all below
it.
Interleaving FEC coded data can reduce the all or nothing properties of transmitted
FEC codes when the channel errors tend to occur in bursts. However, this method
has limits; it is best used on narrowband data.
Most telecommunication systems use a fixed channel code designed to tolerate the
expected worst-case bit error rate, and then fail to work at all if the bit error rate is ever
worse. However, some systems adapt to the given channel error conditions: some
instances of hybrid automatic repeat-request use a fixed FEC method as long as the FEC
can handle the error rate, then switch to ARQ when the error rate gets too high; adaptive
modulation and coding uses a variety of FEC rates, adding more error-correction bits per
packet when there are higher error rates in the channel, or taking them out when they are
not needed.
Types of FEC
The two main categories of FEC codes are block codes and convolutional codes.
Convolutional codes work on bit or symbol streams of arbitrary length. They are
most often soft decoded with the Viterbi algorithm, though other algorithms are
sometimes used. Viterbi decoding allows asymptotically optimal decoding
efficiency with increasing constraint length of the convolutional code, but at the
expense of exponentially increasing complexity. A convolutional code that is
terminated is also a 'block code' in that it encodes a block of input data, but the
block size of a convolutional code is generally arbitrary, while block codes have a
fixed size dictated by their algebraic characteristics. Types of termination for
convolutional codes include "tail-biting" and "bit-flushing".
There are many types of block codes, but among the classical ones the most notable is
Reed-Solomon coding because of its widespread use on the Compact disc, the DVD, and
in hard disk drives. Other examples of classical block codes include Golay, BCH,
Multidimensional parity, and Hamming codes.
Hamming ECC is commonly used to correct NAND flash memory errors.[3] This provides
single-bit error correction and 2-bit error detection. Hamming codes are only suitable for
more reliable single level cell (SLC) NAND. Denser multi level cell (MLC) NAND requires
stronger multi-bit correcting ECC such as BCH or Reed–Solomon.[4][dubious – discuss] NOR
Flash typically does not use any error correction.[4]
Classical block codes are usually decoded using hard-decision algorithms,[5] which
means that for every input and output signal a hard decision is made whether it
corresponds to a one or a zero bit. In contrast, convolutional codes are typically decoded
using soft-decision algorithms like the Viterbi, MAP or BCJR algorithms, which process
(discretized) analog signals, and which allow for much higher error-correction
performance than hard-decision decoding.
Nearly all classical block codes apply the algebraic properties of finite fields. Hence
classical block codes are often referred to as algebraic codes.
Most forward error correction correct only bit-flips, but not bit-insertions or bit-deletions.
In this setting, the Hamming distance is the appropriate way to measure the bit error rate.
A few forward error correction codes are designed to correct bit-insertions and bit-
deletions, such as Marker Codes and Watermark Codes. The Levenshtein distance is a
more appropriate way to measure the bit error rate when using such codes.[6]
Classical (algebraic) block codes and convolutional codes are frequently combined in
concatenated coding schemes in which a short constraint-length Viterbi-decoded
convolutional code does most of the work and a block code (usually Reed-Solomon) with
larger symbol size and block length "mops up" any errors made by the convolutional
decoder. Single pass decoding with this family of error correction codes can yield very
low error rates, but for long range transmission conditions (like deep space) iterative
decoding is recommended.
Concatenated codes have been standard practice in satellite and deep space
communications since Voyager 2 first used the technique in its 1986 encounter with
Uranus. The Galileo craft used iterative concatenated codes to compensate for the very
high error rate conditions caused by having a failed antenna.
LDPC codes were first introduced by Robert G. Gallager in his PhD thesis in 1960, but
due to the computational effort in implementing encoder and decoder and the introduction
of Reed–Solomon codes, they were mostly ignored until recently.
LDPC codes are now used in many recent high-speed communication standards, such
as DVB-S2 (Digital video broadcasting), WiMAX (IEEE 802.16e standard for microwave
communications), High-Speed Wireless LAN (IEEE 802.11n)[citation needed], 10GBase-T
Ethernet (802.3an) and G.hn/G.9960 (ITU-T Standard for networking over power lines,
phone lines and coaxial cable). Other LDPC codes are standardized for wireless
communication standards within 3GPP MBMS (see fountain codes).
Turbo codes
Turbo coding is an iterated soft-decoding scheme that combines two or more relatively
simple convolutional codes and an interleaver to produce a block code that can perform
to within a fraction of a decibel of the Shannon limit. Predating LDPC codes in terms of
practical application, they now provide similar performance.
One of the earliest commercial applications of turbo coding was the CDMA2000 1x (TIA
IS-2000) digital cellular technology developed by Qualcomm and sold by Verizon
Wireless, Sprint, and other carriers. It is also used for the evolution of CDMA2000 1x
specifically for Internet access, 1xEV-DO (TIA IS-856). Like 1x, EV-DO was developed
by Qualcomm, and is sold by Verizon Wireless, Sprint, and other carriers (Verizon's
marketing name for 1xEV-DO is Broadband Access, Sprint's consumer and business
marketing names for 1xEV-DO are Power Vision and Mobile Broadband, respectively).
Sometimes it is only necessary to decode single bits of the message, or to check whether
a given signal is a codeword, and do so without looking at the entire signal. This can make
sense in a streaming setting, where codewords are too large to be classically decoded
fast enough and where only a few bits of the message are of interest for now. Also such
codes have become an important tool in computational complexity theory, e.g., for the
design of probabilistically checkable proofs.
Locally decodable codes are error-correcting codes for which single bits of the message
can be probabilistically recovered by only looking at a small (say constant) number of
positions of a codeword, even after the codeword has been corrupted at some constant
fraction of positions. Locally testable codes are error-correcting codes for which it can be
checked probabilistically whether a signal is close to a codeword by only looking at a
small number of positions of the signal.
Interleaving
The analysis of modern iterated codes, like turbo codes and LDPC codes, typically
assumes an independent distribution of errors.[8] Systems using LDPC codes therefore
typically employ additional interleaving across the symbols within a code word.[9]
For turbo codes, an interleaver is an integral component and its proper design is crucial
for good performance.[7][10] The iterative decoding algorithm works best when there are
not short cycles in the factor graph that represents the decoder; the interleaver is chosen
to avoid short cycles.
rectangular (or uniform) interleavers (similar to the method using skip factors
described above)
convolutional interleavers
Example
Here, each group of the same letter represents a 4-bit one-bit error-correcting codeword.
The codeword cccc is altered in one bit and can be corrected, but the codeword dddd is
altered in three bits, so either it cannot be decoded at all or it might be decoded incorrectly.
With interleaving:
Interleaved: abcdefgabcdefgabcdefgabcdefg
In each of the codewords aaaa, eeee, ffff, gggg, only one bit is altered, so one-bit error-
correcting code will decode everything correctly.
With interleaving:
Disadvantages of interleaving
Use of interleaving techniques increases latency. This is because the entire interleaved
block must be received before the packets can be decoded.[15] Also interleavers hide the
structure of errors; without an interleaver, more advanced decoding algorithms can take
advantage of the error structure and achieve more reliable communication than a simpler
decoder combined with an interleaver.
Distance Code
5 (double-error correcting)
AN codes
BCH code, which can be designed to correct any arbitrary number of errors per
code block.
Berger code
Constant-weight code
Convolutional code
Expander codes
Group codes
Hadamard code
Hagelbarger code
Hamming code
Latin square based code for non-white noise (prevalent for example in broadband
over powerlines)
Lexicographic code
Long code
Low-density parity-check code, also known as Gallager code, as the archetype for
sparse graph codes
m of n codes
Reed–Muller code
Repeat-accumulate code
Turbo code
Walsh–Hadamard code
CHAPTER 3:
1. Specification: This is the beginning and most important step towards designing a
chip as the features and functionalities of the chip are defined. Both design at
macro and micro level are taken into consideration which is derived from the
required features and functionalities. Speed, size, power consumption are among
the considerations on which the accepted range of values are specified. Other
performance criteria are also set at this point and deliberated on its viability; some
form of simulation might be possible to check on this.
3. Simulation and Testbench: RTL code and testbench are simulated using HDL
simulators to check on the functionality of the design. If Verilog is the language
used a Verilog simulator is required while VHDL simulator for a VHDL code. Some
of the tools available at CEDEC include: Cadence’s Verilog XL, Synopsys’s VCS,
and Mentor Graphic’s Modelsim. If the simulation results do not agree with the
intended function expected, the testbench file or the RTL code could be the cause.
The process of debugging the design has to be done if the RTL code is the source
of error. The simulation has to be repeated once either one of the two causes, or
both, have been corrected. There could be a possiblity of the loop in this process,
until the RTL code correctly describes the required logical behaviour of the design.
4. Synthesis: This process is conducted on the RTL code. This is the process
whereby the RTL code is converted into logic gates. The logic gate produced is
the functional equivalent of the RTL code as intended in the design. The synthesis
process however requires two input files: firstly, the “standard cell technology files”
and secondly the “constraints file”. A synthesised database of the design is created
in the system.
6. APR: This is the Automatic Place and Route process whereby the layout is being
produced. In this process, the synthesized database together with timing
information from synthesis is used to place the logic gates. Most designs have
critical paths whose timings required them to be routed first. The process of
placement and routing normally has some degree of flexibility.
7. Back Annotation: This is the process where extraction for RC parasitics are made
from the layout. The path delay is calculated from these RC parasitics. Long
routing lines can significantly increase the interconnect delay for a path and for
sub-micron design parasitics cause significant increase in delay. Back annotation
is the step that bridges synthesis and physical layout.
8. Post-Layout Timing Analysis: This step in ASIC flow allows real timing violations
such as hold and setup to be detected. In this step, the net interconnect delay
information is fed into the timing analysis and any setup violation should be fixed
by optimizing the paths that fail while hold violation is fixed by introducing buffers
to the path to increase the delay. The process between APR, back annotation and
post-layout timing analysis go back and forth until the design is cleared of any
violation. Then it will be ready for logic verification.
9. Logic Verification: This step acts as the final check to ensure the design is correct
functionally after additional timing information from layout. Changes have to be
made on the RTL code or the post-layout synthesis to correct the logic verification.
10. Tapeout: When the design passed the logic verification check, it is now ready for
fabrication. The tapeout design is in the form of GDSII file, which will be accepted
by the foundry.
FPGA ARCHITECTURE:
Logic blocks
The most common FPGA architecture[1] consists of an array of logic blocks (called
configurable logic block, CLB, or logic array block, LAB, depending on vendor), I/O pads,
and routing channels. Generally, all the routing channels have the same width (number
of wires). Multiple I/O pads may fit into the height of one row or the width of one column
in the array.
An application circuit must be mapped into an FPGA with adequate resources. While the
number of CLBs/LABs and I/Os required is easily determined from the design, the number
of routing tracks needed may vary considerably even among designs with the same
amount of logic. For example, a crossbar switch requires much more routing than
a systolic array with the same gate count. Since unused routing tracks increase the cost
(and decrease the performance) of the part without providing any benefit, FPGA
manufacturers try to provide just enough tracks so that most designs that will fit in terms
of lookup tables (LUTs) and I/Os can be routed. This is determined by estimates such as
those derived from Rent's ruleor by experiments with existing designs.
In general, a logic block (CLB or LAB) consists of a few logical cells (called ALM, LE, slice
etc.). A typical cell consists of a 4-input LUT[timeframe?], a full adder (FA) and a D-type flip-
flop, as shown below. The LUTs are in this figure split into two 3-input LUTs. In normal
mode those are combined into a 4-input LUT through the left mux. In arithmetic mode,
their outputs are fed to the FA. The selection of mode is programmed into the middle
multiplexer. The output can be either synchronous or asynchronous, depending on the
programming of the mux to the right, in the figure example. In practice, entire or parts of
the FA are put as functions into the LUTs in order to save space.[33][34][35]
Hard blocks
Modern FPGA families expand upon the above capabilities to include higher level
functionality fixed into the silicon. Having these common functions embedded into the
silicon reduces the area required and gives those functions increased speed compared
to building them from primitives. Examples of these include multipliers, generic DSP
blocks, embedded processors, high speed I/O logic and embedded memories.
Higher-end FPGAs can contain high speed multi-gigabit transceivers and hard IP
cores such as processor cores, Ethernet MACs, PCI/PCI Express controllers, and
external memory controllers. These cores exist alongside the programmable fabric, but
they are built out of transistors instead of LUTs so they have ASIC level performance and
power consumption while not consuming a significant amount of fabric resources, leaving
more of the fabric free for the application-specific logic. The multi-gigabit transceivers also
contain high performance analog input and output circuitry along with high-speed
serializers and deserializers, components which cannot be built out of LUTs. Higher-level
PHY layer functionality such as line coding may or may not be implemented alongside
the serializers and deserializers in hard logic, depending on the FPGA.
Clocking
Most of the circuitry built inside of an FPGA is synchronous circuitry that requires a clock
signal. FPGAs contain dedicated global and regional routing networks for clock and reset
so they can be delivered with minimal skew. Also, FPGAs generally contain
analog PLL and/or DLL components to synthesize new clock frequencies as well as
attenuate jitter. Complex designs can use multiple clocks with different frequency and
phase relationships, each forming separate clock domains. These clock signals can be
generated locally by an oscillator or they can be recovered from a high speed serial data
stream. Care must be taken when building clock domain crossing circuitry to avoid
metastability. FPGAs generally contain block RAMs that are capable of working as dual
port RAMs with different clocks, aiding in the construction of building FIFOs and dual port
buffers that connect differing clock domains.
3D architectures
Xilinx's approach stacks several (three or four) active FPGA die side-by-side on a
silicon interposer – a single piece of silicon that carries passive interconnect. [37][38] The
multi-die construction also allows different parts of the FPGA to be created with different
process technologies, as the process requirements are different between the FPGA fabric
itself and the very high speed 28 Gbit/s serial transceivers. An FPGA built in this way is
called a heterogeneous FPGA.[39]
Altera's heterogeneous approach involves using a single monolithic FPGA die and
connecting other die/technologies to the FPGA using Intel's embedded multi-die
interconnect bridge (EMIB) technology.
To define the behavior of the FPGA, the user provides a design in a hardware description
language (HDL) or as a schematic design. The HDL form is more suited to work with large
structures because it's possible to just specify them numerically rather than having to
draw every piece by hand. However, schematic entry can allow for easier visualisation of
a design.
The most common HDLs are VHDL and Verilog, although in an attempt to reduce the
complexity of designing in HDLs, which have been compared to the equivalent
of assembly languages, there are moves[by whom?] to raise the abstraction level through the
introduction of alternative languages. National Instruments' LabVIEW graphical
programming language (sometimes referred to as "G") has an FPGA add-in module
available to target and program FPGA hardware.
To simplify the design of complex systems in FPGAs, there exist libraries of predefined
complex functions and circuits that have been tested and optimized to speed up the
design process. These predefined circuits are commonly called IP cores, and are
available from FPGA vendors and third-party IP suppliers (rarely free, and typically
released under proprietary licenses). Other predefined circuits are available from
developer communities such as OpenCores (typically released under free and open
source licenses such as the GPL, BSD or similar license), and other sources.
In a typical design flow, an FPGA application developer will simulate the design at multiple
stages throughout the design process. Initially the RTL description in VHDL or Verilog is
simulated by creating test benches to simulate the system and observe results. Then,
after the synthesis engine has mapped the design to a netlist, the netlist is translated to
a gate level description where simulation is repeated to confirm the synthesis proceeded
without errors. Finally the design is laid out in the FPGA at which point propagation delays
can be added and the simulation run again with these values back-annotated onto the
netlist.
Overview[edit]
The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Like C,
Verilog is case-sensitive and has a basic preprocessor (though less sophisticated than
that of ANSI C/C++). Its control flow keywords (if/else, for, while, case, etc.) are
equivalent, and its operator precedence is compatible with C. Syntactic differences
include: required bit-widths for variable declarations, demarcation of procedural blocks
(Verilog uses begin/end instead of curly braces {}), and many other minor differences.
Verilog requires that variables be given a definite size. In C these sizes are assumed from
the 'type' of the variable (for instance an integer type may be 8 bits).
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating,
undefined") and signal strengths (strong, weak, etc.). This system allows abstract
modeling of shared signal lines, where multiple sources drive a common net. When a wire
has multiple drivers, the wire's (readable) value is resolved by a function of the source
drivers and their strengths.
A subset of statements in the Verilog language are synthesizable. Verilog modules that
conform to a synthesizable coding style, known as RTL (register-transfer level), can be
physically realized by synthesis software. Synthesis software algorithmically transforms
the (abstract) Verilog source into a netlist, a logically equivalent description consisting
only of elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a
specific FPGA or VLSI technology. Further manipulations to the netlist ultimately lead to
a circuit fabrication blueprint (such as a photo mask set for an ASIC or a bitstream file for
an FPGA).
History[edit]
Beginning[edit]
Verilog was one of the first popular[clarification needed] hardware description languages to be
invented.[citation needed] It was created by Prabhu Goel, Phil Moorby and Chi-Lai Huang and
Douglas Warmke between late 1983 and early 1984.[2] Chi-Lai Huang had earlier worked
on a hardware description LALSD, a language developed by Professor S.Y.H. Su, for his
PhD work.[3] The wording for this process was "Automated Integrated Design Systems"
(later renamed to Gateway Design Automation in 1985) as a hardware modeling
language. Gateway Design Automation was purchased by Cadence Design Systems in
1990. Cadence now has full proprietary rights to Gateway's Verilog and the Verilog-XL,
the HDL-simulator that would become the de facto standard (of Verilog logic simulators)
for the next decade. Originally, Verilog was only intended to describe and allow
simulation, the automated synthesis of subsets of the language to physically realizable
structures (gates etc.) was developed after the language had achieved widespread
usage.
Verilog-95[edit]
With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain
under the Open Verilog International (OVI) (now known as Accellera) organization.
Verilog was later submitted to IEEE and became IEEE Standard 1364-1995, commonly
referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards
support behind its analog simulator Spectre. Verilog-A was never intended to be a
standalone language and is a subset of Verilog-AMS which encompassed Verilog-95.
Verilog 2001[edit]
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users
had found in the original Verilog standard. These extensions became IEEE Standard
1364-2001 known as Verilog-2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a
simple 8-bit addition required an explicit description of the Boolean algebra to determine
its correct value). The same function under Verilog-2001 can be more succinctly
described by one of the built-in operators: +, -, /, *, >>>. A generate/endgenerate construct
(similar to VHDL's generate/endgenerate) allows Verilog-2001 to control instance and
statement instantiation through normal decision operators (case/if/else). Using
generate/endgenerate, Verilog-2001 can instantiate an array of instances, with control
over the connectivity of the individual instances. File I/O has been improved by several
new system tasks. And finally, a few syntax additions were introduced to improve code
readability (e.g. always, @*, named parameter override, C-style function/task/module
header declaration).
Verilog 2005[edit]
A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and
mixed signal modeling with traditional Verilog.
SystemVerilog[edit]
Example[edit]
module toplevel(clock,reset);
input clock;
input reset;
reg flop1;
reg flop2;
if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
end
endmodule
The "<=" operator in Verilog is another aspect of its being a hardware description
language as opposed to a normal procedural language. This is known as a "non-blocking"
assignment. Its action doesn't register until after the always block has executed. This
means that the order of the assignments is irrelevant and will produce the same result:
flop1 and flop2 will swap values every clock.
The other assignment operator, "=", is referred to as a blocking assignment. When "="
assignment is used, for the purposes of logic, the target variable is updated immediately.
In the above example, had the statements used the "=" blocking operator instead of "<=",
flop1 and flop2 would not have been swapped. Instead, as in traditional programming, the
compiler would understand to simply set flop1 equal to flop2 (and subsequently ignore
the redundant logic to set flop2 equal to flop1).
input cet;
input cep;
output tc;
// within an always
// (or initial)block
else
begin
if (count == length-1)
else
end
endmodule
An example of delays:
...
reg a, b, c, d;
wire e;
...
always @(b or e)
begin
a = b & e;
b = a | b;
#5 c = b;
d = #6 c ^ e;
end
The always clause above illustrates the other type of method of use, i.e. it executes
whenever any of the entities in the list (the b or e) changes. When one of these
changes, a is immediately assigned a new value, and due to the blocking
assignment, b is assigned a new value afterward (taking into account the new value of a).
After a delay of 5 time units, c is assigned the value of b and the value of c ^ e is tucked
away in an invisible store. Then after 6 more time units, d is assigned the value that was
tucked away.
Signals that are driven from within a process (an initial or always block) must be of
type reg. Signals that are driven from outside a process must be of type wire. The
keyword reg does not necessarily imply a hardware register.
Definition of constants[edit]
The definition of constants in Verilog supports the addition of a width parameter. The
basic syntax is:
Examples:
Synthesizeable constructs[edit]
There are several statements in Verilog that have no analog in real hardware, e.g.
$display. Consequently, much of the language can not be used to describe hardware.
The examples presented here are the classic subset of the language that has a direct
mapping to real gates.
wire out;
reg out;
begin
case(sel)
1'b0: out = b;
1'b1: out = a;
endcase
end
// Finally — you can use if/else in a
// procedural structure.
reg out;
if (sel)
out = a;
else
out = b;
The next interesting structure is a transparent latch; it will pass the input to the output
when the gate signal is set for "pass-through", and captures the input and stores it upon
transition of the gate signal to "hold". The output will remain stable regardless of the input
signal while the gate is set to "hold". In the example below the "pass-through" level of the
gate would be when the value of the if clause is true, i.e. gate = 1. This is read "if gate is
true, the din is fed to latch_out continuously." Once the if clause is false, the last value at
latch_out will remain and is independent of the value of din.
reg latch_out;
if(gate)
The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it
can be modeled as:
reg q;
q <= d;
The significant thing to notice in the example is the use of the non-blocking assignment.
A basic rule of thumb is to use <= when there is a posedge or negedge statement within
the always clause.
A variant of the D-flop is one with an asynchronous reset; there is a convention that the
reset state will be the first if clause within the statement.
reg q;
if(reset)
q <= 0;
else
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition;
again the convention comes into play, i.e. the reset term is followed by the set term.
reg q;
if(reset)
q <= 0;
else
if(set)
q <= 1;
else
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.
Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set
goes high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume
no setup and hold violations.
In this example the always @ statement would first execute when the rising edge of reset
occurs which would place q to a value of 0. The next time the always block executes
would be the rising edge of clk which again would keep q at a value of 0. The always
block then executes when set goes high which because reset is high forces q to remain
at 0. This condition may or may not be correct depending on the actual flip flop. However,
this is not the main problem with this model. Notice that when reset goes low, that set is
still high. In a real flip flop this will cause the output to go to a 1. However, in this model it
will not occur because the always block is triggered by rising edges of set and reset —
not levels. A different approach may be necessary for set/reset flip flops.
The final basic variant is one that implements a D-flop with a mux feeding its input. The
mux has a d-input and feedback from the flop itself. This allows a gated load function.
if(gate)
q <= d;
else
//
if(gate)
Note that there are no "initial" blocks mentioned in this description. There is a split
between FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks
where reg values are established instead of using a "reset" signal. ASIC synthesis tools
don't support such a statement. The reason is that an FPGA's initial state is something
that is downloaded into the memory tables of the FPGA. An ASIC is an actual hardware
implementation.
There are two separate ways of declaring a Verilog process. These are the always and
the initial keywords. The always keyword indicates a free-running process.
The initial keyword indicates a process executes exactly once. Both constructs begin
execution at simulator time 0, and both execute until the end of the block. Once
an always block has reached its end, it is rescheduled (again). It is a common
misconception to believe that an initial block will execute before an always block. In fact,
it is better to think of the initial-block as a special-case of the always-block, one which
terminates after it completes for the first time.
//Examples:
initial
begin
end
begin
if (a)
c = b;
else
d = ~b;
end // Done with this block, now return to the top (i.e. the @ event-control)
always @(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant additional
uses. The most common of these is an always keyword without the @(...) sensitivity list.
It is possible to use always as shown below:
always
The always keyword acts similar to the C language construct while(1) {..} in the sense
that it will execute forever.
The other interesting exception is the use of the initial keyword with the addition of
the forever keyword.
begin
end
Fork/join[edit]
The fork/join pair are used by Verilog to create parallel processes. All statements (or
blocks) between a fork/join pair begin execution simultaneously upon execution flow
hitting the fork. Execution continues after the join upon completion of the longest running
statement or block between the fork and join.
initial
fork
end
join
The way the above is written, it is possible to have either the sequences "ABC" or "BAC"
print out. The order of simulation between the first $write and the second $write depends
on the simulator implementation, and may purposefully be randomized by the simulator.
This allows the simulation to contain both accidental race conditions as well as intentional
non-deterministic behavior.
Notice that VHDL cannot dynamically spawn multiple processes like Verilog.[5]
Race conditions[edit]
The order of execution isn't always guaranteed within Verilog. This can best be illustrated
by a classic example. Consider the code snippet below:
initial
a = 0;
initial
b = a;
initial
begin
#1;
What will be printed out for the values of a and b? Depending on the order of execution
of the initial blocks, it could be zero and zero, or alternately zero and some other arbitrary
uninitialized value. The $display statement will always execute after both assignment
blocks have completed, due to the #1 delay.
Operators[edit]
Operator
Operator type Operation performed
symbols
Bitwise | Bitwise OR
^ Bitwise XOR
~^ or ^~ Bitwise XNOR
! NOT
|| OR
~| Reduction NOR
^ Reduction XOR
~^ or ^~ Reduction XNOR
+ Addition
- Subtraction
- 2's complement
Arithmetic
* Multiplication
/ Division
** Exponentiation (*Verilog-2001)
Concatenation {, } Concatenation
Conditional ?: Conditional
CHAPTER 4:
If number of iterations is Imax or valid codeword has been found then finish Update Bit
Messages For each bit node j, and for every check node associated with it j compute:
Modified sum product algorithm Decoder which implement using modified sum product
algorithm which is an approximated algorithm in context with normal SPA algorithm.
Modified SPA is easy for implementation of decoder [6]. When encoder output is
transmitted through the AWGN channel the output is given to the decoder’s variable node.
Let M (n) denote the set of check nodes connected to the symbol node n and N (m) the
set of symbol nodes participating in the m-th parity-check equation. Step1 Initialization
Assuming the AWGN channel with noise variance 𝜎2, the reliability value is Lc = 2/𝜎2.
Step1
The initialization is done in every position (m, n) of the parity check matrix H, where Hm,
n =1 as
STEP2
Iterative Process Update the check-node LLR, for each m and for each n∈ N (m), as
Note that both the tanh and tanh−1 . functions are increasing and have odd symmetry.
Thus, the sign and the magnitude of the incoming messages can be used in a simplified
version, as
Step 3 Variable node update Update the variable node LLR, for each n and for each m
∈ M (n), as
Step 4 Decision Process Decide if 𝜆𝑛 (𝑢𝑛) ≥ 0, then 𝑢𝑛=0 and if 𝜆𝑛 (𝑢𝑛) ≤ 0 then 𝑢𝑛=1.
Then compute the syndrome uHT =0, then the codeword (u) is the final codeword,
otherwise the iteration takes place till valid code word is obtained
CHAPTER 5:
INTRODUCTION:
Low density parity check (LDPC) code is an error correcting code which is used in noisy
communication channel to reduce probability of loss of information. LDPC codes are
capacity-approaching codes, which means that practical constructions exist that allow the
noise threshold to be set very close the theoretical maximum (the Shannon limit) for a
symmetric memoryless channel. The noise threshold defines an upper bound for the
channel noise, up to which the probability of lost information can be reduce. Low density
parity check (LDPC) codes are also known as Gallager codes because these codes
proposed by R.G. Gallager in 1962[1]. With increased capacity of computers and the
development of relevant theories such as belief propagation algorithm LDPC codes were
rediscovered by Mackay and Neal in 1996[2]. LDPC codes are linear block codes which
defined by sparse M × N parity check matric H. Where N is LDPC code length [3]. The
tanner graph has been introduced to represent LDPC codes [4]. Tanner graphs are
bipartite graph. Tanner graph have two types of nodes are variable node (v- node) and
check node(c-node). The n coordinates of the codewords are associated with the n
message nodes. The codewords are those vectors such that for all check nodes the sum
of the neighbouring positions among the message nodes is zero.
Here we consider H be the parity check matrix of irregular (10, 5) LDPC code and its
tanner graph is also shown in fig. 1. For this LDPC code the path (c1 → v8 → c3 → v10
→ p1) with the black bold lines. In recent year studies the decoding is done by various
algorithms and different types of decoders are designed such as partially parallel decoder,
memory efficient decoders. In all decoding scheme the belief propagation decoding is
lead to good approximate decoder. A belief propagation decoding of LDPC codes on
memoryless channel is best practical decoding algorithm
In an example using the DVB-S2 rate 2/3 code the encoded block size is 64800 symbols
(N=64800) with 43200 data bits (K=43200) and 21600 parity bits (M=21600). Each
constituent code (check node) encodes 16 data bits except for the first parity bit which
encodes 8 data bits. The first 4680 data bits are repeated 13 times (used in 13 parity
codes), while the remaining data bits are used in 3 parity codes (irregular LDPC code).
For comparison, classic turbo codes typically use two constituent codes configured in
parallel, each of which encodes the entire input block (K) of data bits. These constituent
encoders are recursive convolutional codes (RSC) of moderate depth (8 or 16 states) that
are separated by a code interleaver which interleaves one copy of the frame.
The LDPC code, in contrast, uses many low depth constituent codes (accumulators) in
parallel, each of which encode only a small portion of the input frame. The many
constituent codes can be viewed as many low depth (2 state) 'convolutional codes' that
are connected via the repeat and distribute operations. The repeat and distribute
operations perform the function of the interleaver in the turbo code.
The ability to more precisely manage the connections of the various constituent codes
and the level of redundancy for each input bit give more flexibility in the design of LDPC
codes, which can lead to better performance than turbo codes in some instances. Turbo
codes still seem to perform better than LDPCs at low code rates, or at least the design of
well performing low rate codes is easier for Turbo Codes.
As a practical matter, the hardware that forms the accumulators is reused during the
encoding process. That is, once a first set of parity bits are generated and the parity bits
stored, the same accumulator hardware is used to generate a next set of parity bits.
As with other codes, the maximum likelihood decoding of an LDPC code on the binary
symmetric channel is an NP-complete problem. Performing optimal decoding for a NP-
complete code of any useful size is not practical.
The decoding of the SPC codes is often referred to as the "check node" processing, and
the cross-checking of the variables is often referred to as the "variable-node" processing.
In a practical LDPC decoder implementation, sets of SPC codes are decoded in parallel
to increase throughput.
In contrast, belief propagation on the binary erasure channel is particularly simple where
it consists of iterative constraint satisfaction.
For example, consider that the valid codeword, 101011, from the example above, is
transmitted across a binary erasure channel and received with the first and fourth bit
erased to yield ?01?
11. Since the transmitted message must have satisfied the code
constraints, the message can be represented by writing the received message on the top
of the factor graph.
In this example, the first bit cannot yet be recovered, because all of the constraints
connected to it have more than one unknown bit. In order to proceed with decoding the
message, constraints connecting to only one of the erased bits must be identified. In this
example, only the second constraint suffices. Examining the second constraint, the fourth
bit must have been zero, since only a zero in that position would satisfy the constraint.
This procedure is then iterated. The new value for the fourth bit can now be used in
conjunction with the first constraint to recover the first bit as seen below. This means that
the first bit must be a one to satisfy the leftmost constraint.
Thus, the message can be decoded iteratively. For other channel models, the messages
passed between the variable nodes and check nodes are real numbers, which express
probabilities and likelihoods of belief.
This result can be validated by multiplying the corrected code word r by the parity-check
matrix H:
Code construction
For large block sizes, LDPC codes are commonly constructed by first studying the
behaviour of decoders. As the block size tends to infinity, LDPC decoders can be shown
to have a noise threshold below which decoding is reliably achieved, and above which
decoding is not achieved,[17] colloquially referred to as the cliff effect. This threshold
can be optimised by finding the best proportion of arcs from check nodes and arcs from
variable nodes. An approximate graphical approach to visualising this threshold is an
EXIT chart. The construction of a specific LDPC code after this optimization falls into
two main types of techniques:
Pseudorandom approaches
Combinatorial approaches
(a).Simulation result Simulation is done to check the correctness of the code. In this
paper Verilog coding technique is used. Verilog code is designed to check errors like
syntax errors, logical errors, etc. simulation is done by using modelsim. Decoding is
done using Verilog in Xilinx and simulation results are given below.
(b).Synthesis
Synthesis is done using the software mentor graphics Leonardo spectrum. And the
technology used is Virtex IV. In synthesis the result is observed like chip area, delay,
etc. chip area is find out by the number of look up tables (LUTs). The delay gives
propagation delay. The table show the results obtained during the synthesis of the code
to implement the LDPC decoder.
The decoder for the LDPC codes is implemented with use of bipartite graph. Code is
implemented by using Verilog and in Xilinx and simulation is done by using modelsim.
The decoder modified sum product algorithm was found to be effective for decoding.
We observed that high throughput LDPC decoding architectures should exploit the
benefit of parallel decoding algorithms.
[1]. Robert G. Gallager, “Low Density Parity Check Codes”, IRE Trans. Inf. Theory, Vol.
IT-8, No.1, pp. 21–28, Jan 1962.
[2]. MacKay, D.J.C., “Good error-correcting codes based on very sparse matrices”,
IEEE Trans. Inform. Theory, Vol. 45, No. 3, pp. 399–431, March 1999.
[3]. Lei Yang, Hui Liu, and C.-J. Richard Shi,” Code Construction and FPGA
Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code
Decoder”, Department of Electrical Engineering University of Washington, Seattle, WA
98195
[5]. Jinghu Chen, and Marc P. C. Fossorier, Senior Member, “Near Optimum Universal
Belief Propagation Based Decoding of LowDensity Parity Check Codes”, IEEE
Transactions on Communications, Vol. 50, No. 3, March 2002.
[6]. S. Papaharalabos, P. Sweeney, B.G. Evans, P.T. Mathiopoulos, G. Albertazzi, A.
Vanelli-Coralli and G.E. Corazza, “ Modified sum-product algorithms for decoding low-
density parity-check codes”, IET Communication., 2007, Vol. 1, No. 3, pp. 294–300,
2007
[7]. Guido Masera, Federico Quaglio, and Fabrizio Vacca, “Implementation of a Flexible
LDPC Decoder”, IEEE Transactions on circuits and systems, Vol. 54, No. 6, June 2007.
[8]. T. Richardson, “Error floors of LDPC codes”, in Proc. Annual Allerton Conference on
Communication, Control, and Computing, Monticello, IL, pp. 1426-1435, 2003
[9]. Tuan Ta, “a Tutorial on Low Density Parity-Check Codes”, The University of Texas
at Austin.
[10]. Edward Liao1, Engling Yeo2, Borivoje Nikolic, “Low-Density Parity-Check Code
Constructions for Hardware Implementation, IEEE Communications Society” 2004.
[11]. Jin Sha, Minglun Gao, Zhongjin Zhang,Li Li, Zhongfeng Wang, “High-Throughput
and Memory Efficient LDPC Decoder Architecture”, Proceedings of the 5th WSEAS Int.
Conf. on Instrumentation, Measurement, Circuits and Systems, Hangzhou, China, pp.
224-229, April 2006