Icisc47916 2020 9171146

Proceedings of the Fourth International Conference on Inventive Systems and Control (ICISC 2020)
IEEE Xplore Part Number: CFP20J06-ART; ISBN: 978-1-7281-2813-9
Low Latency and Area Efficient Very Large Scale

Integration Architecture of 2-Dimensional Bicubic
Interpolation using Carry Save Adder Based Fast
Multiplier
Sayantan Dutta Dr. Ayan Banerjee
Electronics and Telecommunication Engineering Electronics and Telecommunication Engineering
IIEST, Shibpur IIEST, Shibpur
Howrah-711103 Howrah-711103
sayantan.dutta30@gmail.com ayanb12@gmail.com
Abstract—Presently, real-time image processing is reconstruction of image with the help of hardware that
gaining increasing popularity, specifically in the field of leads to an on-chip image reconstruction architecture
satellite imaging and medical imaging. Naturally, using the help of a bicubic interpolation technique. The
researchers are increasingly being inclined to design bicubic technique is used whenever there is a necessity to
dedicated hardware for imaging methods to aid real-time
processing in a cost-effective way. Likewise, several VLSI
transform LRI (Low-Resolution Image) to HRI (High-
architectures have already been designed for a variety of Resolution Image) or to store the old images or to scale up
imaging methods. But, in spite of being a significant and or scale down the image pixels. The proposed architecture
popular imaging method, the image interpolation has no is successfully verified using on board testing and that can
such notable dedicated VLSI architecture so far. Image be fabricated on-chip for real-time image processing. This
interpolation has vast applicability ranging from image architecture will compete with software-intensive
inpainting to image registration. In all such applications, the processes that are present in the market. In the next
image interpolation part has been performed by software- subsequent sections, it can be observed that our
based implementations. To alleviate the issue, a dedicated architecture will do the same task in an efficient manner
VLSI architecture 2D bicubic interpolation has been
designed. The proposed bicubic architecture is based on
in terms of latency, throughput, and the number of a clock
state-of-the-art VLSI architectures for performing square cycle.
and cube operations following vedic mathematics. The paper is organized as follows. Existing works
Furthermore, all the adders used in the proposed design are
have been discussed in section II. Proposed bicubic
judiciously developed to increase the speed retaining an
acceptable area and power consumption. The proposed interpolation architecture has been illustrated in section
architecture is realized in the xilinx vivado 18.2 tool. The III. And in section IV contains results and discussions.
simulation results exhibit satisfactory performances. The Finally, the conclusion in section V.
quality of the proposed bicubic architecture has also been
successfully tested by on board testing using the xilinx
ZCU104 board to verify the viability of the design to be
II. EXISTING WORK
applied in real-time imaging applications.
Keywords- High-resolution image, Low-resolution Bicubic interpolation is one of the hot topics among
image, Bicubic interpolation, Cube module, Square module, other interpolation methods available for digital image
Bicubic interpolation function, Hard shifter, Multiplier, processing. There are several researchers who have
Adder
successfully made some algorithms to improve the
I. INTRODUCTION performance of the system. But there are few of them who
have tried to implement it to hardware. Among them,
Bicubic interpolation is one of the efficient methods to some of the significant and notable work has been
find unknown pixels in digital image processing. In recent mentioned hare.
days many researchers have tried to improve the
algorithm to improve image quality. They have come up One of the definitions of image interpolation is that it's
with new ideas that have helped a lot to find unknown a technique to transform low-resolution images into a
pixels value. But at one place existing ideas are lagging. high-resolution image. It is very useful in many image
Till now, researchers have tried to improve image quality processing tasks. An edge directed bicubic interpolation
only in software level using a local machine. There is no has been done by zhou dengwen. The main objective of
harm in doing it in a local machine with the help of this paper is that it can adapt to the varying edge structure
software, but the time and process complexity also of an image. And he has successfully reduced common
increase. So, in this paper, our main interest is to artifacts like blocking, ringing, and blurring [1].
overcome this deficiency by implementing the process of
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 686
Authorized licensed use limited to: Heriot-Watt University. Downloaded on September 20,2020 at 18:01:03 UTC from IEEE Xplore. Restrictions apply.
When we scan some old images for restoring, the

scanned image is not as good as the original image. There
are some removing objects and scratches in the scanned
image. From that point of view, Mehram motmaen and his
co-worker tried to build an improved algorithm of Image
that can be used 1-D bicubic and 2-D hyperbolic
interpolations. For better preservation of corner pixel,
they have used hyperbolic formation in neighboring pixels
[2].
Fig.1: Support 8 interpolation [9]
Another definition of image interpolation is that it is a
technique to scale up and scale down the image pixels. On The above-mentioned process introduces delay and
that basis, sekar.k and his co-worker presented a paper process complexity. The introduction of hardware can
that will do bicubic interpolation based on Discrete allow on-chip real-time image interpolation with reduced
Wavelet Transform (DWT). They have done DWT delay. This also makes the system faster, less complex,
interpolation on a greyscale image. Then they performed and robust hardware [10].
bicubic interpolation [3]. The bicubic interpolation method is used in the
One of the zhang xiang-guang paper, he has designed context of finding out of unknown distinctive pixel
values. In constant interpolation, any unknown pixel value
an algorithm that can perform super-resolution
is substituted by locating the nearest adjacent pixel. But it
reconstruction efficiently [4]. Watchara ruangsang and his introduces blocking artifact [11]. In bilinear interpolation,
co-worker have done the same type of work. But their the unknown pixel value is substituted depending on the
main motivation is to increase the resolution of the CCTV four nearest neighbours. But both of them fail to give a
cameras. Most of the time, CCTV cameras produce a low- precise result. So, in this context bicubic interpolation has
resolution image due to the camera field of view or been opted [12]. The function for obtaining unknown
lightening. From that degraded image, it is impossible to pixel value using bicubic interpolation function is
extract valuable information. So, they have tried to build a ଶ ଶ
super-resolution algorithm based on the overlapping of ᇱ ᇱሻ
the bicubic interpolation [5]. ‫ ܨ‬ሺ‫ ݌‬ǡ ‫ݍ‬ ൌ ෍ ෍ ‫ܨ‬ሺ‫ ݌‬൅ ݉ǡ ‫ݍ‬
௠ୀଵ ௡ୀିଵ
In any TV program, it uses a 2x2 scale factor to ൅ ݊ሻܴ௖ ሼሺ݉ െ ܽሻሽܴ௖ ሼെሺ݊ െ ܾሻሽ
enlarge the image. Auangkun rangsikunpum and his co-
worker tried to describe how this real-time expansion of (1)
sign language images work on any TV. Here they have
used bicubic interpolation, and the 2x2 scale factor helps Where a=p’-p, b=q’-q and ܴ௖ ሺ‫ݔ‬ሻ is the bicubic
them to simplify the bicubic interpolation formula [6]. interpolation function. The general form of bicubic
interpolation function is
Yunshan zhang and co-workers have tried to improve
the bicubic interpolation algorithm on hardware. Where ܴ௖ ሺ‫ ݔ‬ሻ ൌ ‫ܣ‬ଵ ȁ‫ݔ‬ȁଷ ൅ ‫ܤ‬ଵ ȁ‫ ݔ‬ȁଶ ൅ ‫ܥ‬ଵ ȁ‫ ݔ‬ȁ ൅ ‫ܦ‬ଵ ݂‫ Ͳݎ݋‬൑ ȁ‫ ݔ‬ȁ ൑
they have used scratch table method to avoid floating ͳ [12]
number multiply and cubic operation. Based on the
ൌ ‫ܣ‬ଶ ȁ‫ ݔ‬ȁଷ ൅ ‫ܤ‬ଶ ȁ‫ ݔ‬ȁଶ ൅ ‫ܥ‬ଶ ȁ‫ ݔ‬ȁ ൅ ‫ܦ‬ଶ ݂‫ ͳݎ݋‬൑ ȁ‫ ݔ‬ȁ ൑
parallel processing capability of the hardware, they have
performed bicubic interpolation [7]. ʹ[12]
Where ‫ܣ‬௜ ǡ ‫ܤ‬௜ ǡ ‫ܥ‬௜ ǡ ‫ܦ‬௜ denotes the weighted factors.

III. PROPOSED WORK
The process flow of bicubic interpolation is described in
Bicubic interpolation is a complex and time- Fig.2 .
consuming process. It takes more time than bilinear
interpolation because it requires complex computation. As
of now, there is no hardware architecture to perform
bicubic interpolation [8]. So, whatever is done for bicubic
interpolation, all are at the software level. First, an image
is captured. Then it is transferred to the computer and
after that, the bicubic interpolation is performed. Lastly,
after post-processing, the image is sent to an output device
[9].
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 687
ܵ‫ ݉ݑ‬ൌ ܲ௜ ̰‫ܥ‬௜
(4)
‫ܥ‬௢௨௧ ൌ ‫ܩ‬௜ ൅ ܲ௜ ‫ܥ כ‬௜

(5)
Fig.2: Process flow of 2D bicubic interpolation Fig.4: Proposed modified CSA using CLA.
From the above pixel function represented in (1), it is
evident that the overall architecture needs adder, Using such kind of adder, shift and add based arbitrary
subtractor, square circuit, cube circuit, and delay elements multiplier has been developed. ‘Hard Shifters’ (HS), has
[13]. For the implementation of bicubic interpolation been used in the multiplier which can be realized by mere
function (Rc), the different components required are bus cross-connection. A sample of the hard shifter is
adder, counter, SAM, memory, and delay elements. depicted in Fig. 5.
As the design procedure first, the Carry-Save Adder

(CSA) is modified by using Carry Look-ahead Adder
(CLA) instead of a Ripple Carry Adder (RCA). In normal
CSA adder, there are two blocks. One is carrying save
adder using a full adder, and the second is RCA. This
RCA block introduces a rippling effect in the circuit
which is the main cause of delay in the CSA module [14].
VLSI architecture of CSA using RCA is described in Fig.
3.
Fig. 5: VLSI architecture of prototype HS1 (1-bit shifter)
So, practically, our proposed multiplier consumes an

area of some adders and logic gates. The prototype 16-bit
Fig.3: VLSI architecture of CSA using RCA in the final architecture of our proposed multiplier is shown in Fig. 6
stage. below. It is clearly visible from Fig. 6 that the latency of
the proposed multiplier is limited by the single logic gate
Using the modified CSA as in Fig. 4, the dynamic delay and the propagation delay of the adder tree. As
power, stages delay, junction temperature, total on-chip stated before, the shifters are realized by bus cross-
power, logic power, I/O power, and signal power are
connection. Naturally, the shifters offer no propagation
improved significantly.
delay. The delay of the single logic gate can also be
CLA: neglected. The delay of any adder tree is ଶ ܰ times a
single adder delay, where ܰ denotes the bit length. So, our
ܲ௜ ൌ ‫ܣ‬௜ ൅ ‫ܤ‬௜ proposed multiplier has the latency ଶ ܰ times the
(2) propagation delay of CSA. In Fig. 5, HS1 denotes hard
shifter performing 1-bit left shift, HS2 denotes hard
‫ܩ‬௜ ൌ ‫ܣ‬௜ ‫ܤ כ‬௜ shifter performing 2 bit left shift, and so on. P and Q, in
(3)
Fig. 6, represent input 16-bit numbers to be multiplied.
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 688
ܲ଴ ǡ ܲଵ ǡ ǥ ǥ Ǥ Ǥ ܲଵହ indicates respective binary bits of the adder is used to add the numbers from the two memory
number ܲǤ arrays in order to get the cubes.
Fig. 6: CSA based 16-bit multiplier

The second big challenge in the interpolation Fig.8: Proposed cube module for 2D bicubic Interpolation
operation is the square and cube circuits. Square and cube function
circuits are complex in nature and also produce large
delays which are another cause for increasing delay in
bicubic interpolation architecture. Therefore remodeling For implementing square function using the vedic
of square and cube circuits is required. For that purpose, method, a mod-10 counter, subtractor, memory unit
the vedic multiplication technique has been used. That not duplex circuit, and an adder circuit are used.
only makes the circuit simpler but also reduces the delay
of the components. And it also helps to reduce the overall
delay. After simplifying and remodeling the basic
functional block, it gives us the confidence to make the
bicubic interpolation architecture. Before going to make
the pixel interpolation architecture, the bicubic
interpolation function is targeted at first in Fig. 7.
Fig.9: Proposed square module for 2D bicubic

interpolation function
After making bicubic interpolation function

architecture, cube architecture, and square architecture, it
becomes easy for us to construct the bicubic interpolation
architecture as in Fig. 10.
Fig.7: Proposed architecture for bicubic interpolation

function
The cube and square module designed using the Vedic

method is shown in Fig. 8 and Fig. 9 respectively. Normal
cube and square circuits have failed to give the best-
desired output in terms of latency. For cubing the number
using vedic mathematics, firstly, HCF for each individual
digit is computed. At the same time, the cube from 0 to 9
is stored in the memory block. Secondly, a row of four
numbers is made such that the first number of the row of
four numbers represent the first cell of the memory array.
This cell consists of the digits obtain from the memory.
The remaining three digits are calculated by using
geometric progression where a common ratio is the HCF
of each individual digit. A different memory is used to Fig.10: a Proposed architecture for 2D bicubic
hold the twice of the second and third numbers. A serial interpolation
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 689
IV. RESULTS AND DISCUSSIONS
The bicubic architecture has been designed from the

base level. Firstly, a CLA based CSA has been designed.
After designing CLA based CSA, a rigorous analysis of
the quality of the proposed design has been done. The
design has been synthesized, followed by post routing
simulation using modelsim. The simulation results of the
proposed CSA is compared with that of conventional
CSA. The simulation results are shown in Fig. 11, Fig. 12,
and Fig. 13, respectively. Synthesized RTL design of
proposed CSA using CLA is shown in Fig. 6. The
performance of our CLA based CSA is compared with
that of conventional CSA for varying bit width. The Fig. 13: Simulation result of CSA using Carry Look
comparative analysis is demonstrated in TABLE II, Ahead Adder (CLA) in all the stages in Xilinx Vivado
TABLE III, and TABLE IV below. From these tables, it 18.2
can be inferred that the proposed CSA adder can
outperform conventionally used adders in terms of speed, B. RTL Synthesis Result:
area, and power consumption. So, our proposed CSA
based bicubic interpolation function architecture can be
effectively utilized in high-speed applications.
A. Simulation Results:
Fig14: CSA using RCA after RTL Synthesis in Xilinx

Vivado 18.2
C. Implemented Design:
Fig.11: Simulation result of CSA using Ripple Carry

Adder (RCL) at last stage in Xilinx Vivado 18.2
Fig. 12: Simulation result of CSA using Carry Look

Ahead Adder (CLA) at last stage in Xilinx Vivado 18.2
Fig15. Implemented Design on Part-xc7k70tbg676-
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 690
TABLE II. COMPARISON BETWEEN V. CONCLUSION

CONVENTIONAL AND PROPOSED CSA
REGARDING SPEED OF OPERATION In this article, a novel VLSI architecture of 2D bicubic
interpolation has been presented. The proposed
Delay architecture contains several computation-efficient
Bit CSA using modules performing computationally intensive operations.
Width CLA at the Proposed CSA The proposed 2D bicubic architecture includes digital
last stage circuits for performing square and cube operations. The
8 0.119ns 0.102 ns digital circuits for squaring and cubing are designed
following the time-efficient approach of vedic
16 3.344 ns 3.335 ns mathematics. The area and power consumption of the
proposed architecture have been lowered by a judicious
TABLE III. COMPARISON BETWEEN design strategy. The whole architecture is pipelined to
CONVENTIONAL AND PROPOSED CSA increase the throughput rate, thus lowering the
REGARDING POWER CONSUMPTION computation time. The quality of the proposed
architecture is successfully assessed by simulation using
Proposed Proposed
Conventional Conventional ModelSim and also by on-board testing in the case of real-
CSA (8 CSA (16
CSA (8 bit) CSA (16 bit) time applications.
bit) bit)
Total Power 8.67 w 8.885 W 16.541 W 16.71 W
Junction References:
௢ ௢ ௢
ͶͳǤ͵ ‫ܥ‬ ͶͳǤ͹ ‫ܥ‬ ͷ͸Ǥͳ ‫ܥ‬ ͷ͸Ǥͷ௢ ‫ܥ‬
Temperature [1] Dengwen, Z. (2010, October). An edge-directed bicubic interpolation
Thermal Ͷ͵Ǥ͹௢ ‫ܥ‬ Ͷ͵Ǥ͵௢ ‫( ܥ‬22.8 ʹͺǤͻ௢ ‫ܥ‬ ʹͺǤͷ௢ ‫( ܥ‬15.0 algorithm. In 2010 3rd International Congress on Image and Signal
Margin (23.0 W) W) (15.2 W) W) Processing (Vol. 3, pp. 1186-1189). IEEE.
On-Chip
99% 99% 99% 99% [2] Motmaen, M., Mohrekesh, M., Akbari, M., Karimi, N., & Samavi, S.
Power
(2018, May). Image Inpainting by Hyperbolic Selection of Pixels for
8.566 W 8.780 W 16.398 W 16.566 W
Dynamic Two-Dimensional Bicubic Interpolations. In Electrical Engineering
(97%) (97%) (97%) (96%) (ICEE), Iranian Conference on (pp. 665-669). IEEE.
0.141 W 0.366 W
Signals 0.153 W (2%) 0.456 W (3%)
(2%) (2%) [3] Sekar, K., Duraisamy, V., & Remimol, A. M. (2014, March). An
0.085 W 0.174 W approach of image scaling using DWT and bicubic interpolation. In 2014
Logic 0.084 W (1%) 0.211 W (1%) International Conference on Green Computing Communication and
(1%) (1%)
8.340 W 8.543 W 15.858 W 15.899 W Electrical Engineering (ICGCCEE) (pp. 1-5). IEEE.
I/O
(97%) (97%) (97%) (96%) [4] Zhang, X. G. (2008, December). A new kind of super-resolution
Device 0.104 W 0.143 W
0.104 W (1%) 0.144 W (1%) reconstruction algorithm based on the ICM and the bicubic interpolation.
Static (1%) (1%) In 2008 International Symposium on Intelligent Information Technology
Application Workshops (pp. 817-820). IEEE.
TABLE IV. FPGA UTILIZATION DETAILS OF THE
[5] Ruangsang, W., & Aramvith, S. (2017, October). Efficient super-
PROPOSED CSA resolution algorithm using overlapping bicubic interpolation. In 2017
IEEE 6th Global Conference on Consumer Electronics (GCCE) (pp. 1-
8 Bit adder 16bit adder 2). IEEE.
Ava
Ite Use Utilizatio Use Avail Utilizati [6] Rangsikunpum, A., Leelarasmee, E., & Pumrin, S. (2017, June). A
ilab
m d n d able on design of sign video image expander for hdmi source using bicubic
le
interpolation. In 2017 14th International Conference on Electrical
LU
30 41K 1 62 41K 1 Engineering/Electronics, Computer, Telecommunications and
T Information Technology (ECTI-CON) (pp. 171-174). IEEE.
I/O 42 342 12 82 342 24
[7] Zhang, Y., Li, Y., Zhen, J., Li, J., & Xie, R. (2010, October). The
hardware realization of the bicubic interpolation enlargement algorithm
The comparative analysis of our proposed VLSI based on FPGA. In 2010 Third International Symposium on Information
architecture for 2D bicubic interpolation is demonstrated Processing (pp. 277-281). IEEE.
in TABLE V. First, we have described latency. Then we
have described the Throughput rate of 2D bicubic [8] Koljonen, J., Bochko, V. A., Lauronen, S. J., & Alander, J. T. (2019,
interpolation architecture. Last, we have described the October). Fast Fixed-point Bicubic Interpolation Algorithm on FPGA. In
2019 IEEE Nordic Circuits and Systems Conference (NORCAS):
clock frequency of 2D bicubic interpolation.
NORCHIP and International Symposium of System-on-Chip (SoC) (pp.
TABLE V PERFORMANCE PARAMETERS OF THE 1-.
VLSI DESIGN FOR 2D BICUBIC INTERPOLATION [9] Pratt, W. K. (1991). Digital image processing john wiley & sons.
Inc., New York.
Latency 3.782 ns
Throughput rate 1output/ clock cycle [10] Gonzalez, R. C., & Wintz, P. (1977). Digital image
processing(Book). Reading, Mass., Addison-Wesley Publishing Co.,
Maximum clock Frequency 302.748 MHz Inc.(Applied Mathematics and Computation, (13), 451.
[11] Muthukrishnan, R., & Radha, M. (2011). Edge detection techniques

for image segmentation. International Journal of Computer Science &
Information Technology, 3(6), 259.
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 691
[12] Duan, H., Deng, Y., Wang, X., & Liu, F. (2013). Biological eagle-
eye-based visual imaging guidance simulation platform for unmanned
flying vehicles. IEEE Aerospace and Electronic Systems Magazine,
28(12), 36-45.
[13] Chakraborty, A., & Banerjee, A. (2018, December). A

Multiplierless VLSI Architecture of QR Decomposition Based 2D
Wiener Filter for 1D/2D Signal Processing With High Accuracy. In 2018
4th International Conference on Computing Communication and
Automation (ICCCA) (pp. 1-6). IEEE.
[14] Javali, R. A., Nayak, R. J., Mhetar, A. M., & Lakkannavar, M. C.

(2014, November). Design of high speed carry save adder using carry
lookahead adder. In International Conference on Circuits,
Communication, Control and Computing (pp. 33-36). IEEE.
978-1-7281-2813-9/20/$31.00 ©2020 IEEE 692

Icisc47916 2020 9171146

Uploaded by

Copyright:

Available Formats

Icisc47916 2020 9171146

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Icisc47916 2020 9171146

Uploaded by

Copyright:

Available Formats

Proceedings of the Fourth International Conference on Inventive Systems and Control (ICISC 2020)

IEEE Xplore Part Number: CFP20J06-ART; ISBN: 978-1-7281-2813-9

Low Latency and Area Efficient Very Large Scale

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 686

When we scan some old images for restoring, the

Where ‫ܣ‬௜ ǡ ‫ܤ‬௜ ǡ ‫ܥ‬௜ ǡ ‫ܦ‬௜ denotes the weighted factors.

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 687

‫ܥ‬௢௨௧ ൌ ‫ܩ‬௜ ൅ ܲ௜ ‫ܥ כ‬௜

As the design procedure first, the Carry-Save Adder

Fig. 5: VLSI architecture of prototype HS1 (1-bit shifter)

So, practically, our proposed multiplier consumes an

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 688

Fig. 6: CSA based 16-bit multiplier

Fig.9: Proposed square module for 2D bicubic

After making bicubic interpolation function

Fig.7: Proposed architecture for bicubic interpolation

The cube and square module designed using the Vedic

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 689

IV. RESULTS AND DISCUSSIONS

The bicubic architecture has been designed from the

Fig14: CSA using RCA after RTL Synthesis in Xilinx

Fig.11: Simulation result of CSA using Ripple Carry

Fig. 12: Simulation result of CSA using Carry Look

Fig15. Implemented Design on Part-xc7k70tbg676-

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 690

TABLE II. COMPARISON BETWEEN V. CONCLUSION

[11] Muthukrishnan, R., & Radha, M. (2011). Edge detection techniques

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 691

[13] Chakraborty, A., & Banerjee, A. (2018, December). A

[14] Javali, R. A., Nayak, R. J., Mhetar, A. M., & Lakkannavar, M. C.

978-1-7281-2813-9/20/$31.00 ©2020 IEEE 692

You might also like