Approximate computing forms a design alternative that exploits the intrinsic error resilience of various applications and produces energy-efficient circuits with small accuracy loss. In this paper, we propose an approximate hybrid high radix encoding for generating the partial products in signed multiplications that encodes the most significant bits with the accurate radix-4 encoding and the least significant bits with an approximate higher radix encoding. The approximations are performed by rounding the high radix values to their nearest power of two. The proposed technique can be configured to achieve the desired energy–accuracy tradeoffs. Compared with the accurate radix-4 multiplier, the proposed multipliers deliver up to 56% energy and 55% area savings, when operating at the same frequency, while the imposed error is bounded by a Gaussian distribution with near-zero average. Moreover, the proposed multipliers are compared with state-of-the-art inexact multipliers, outperforming them by up to 40% in energy consumption, for similar error values. Finally, we demonstrate the scalability of our technique.
Report
Share
Report
Share
1 of 13
Download to read offline
More Related Content
Similar to Approximate hybrid high radix encoding for energy efficient inexact multipliers
An efficient fault tolerance design for integer parallel matrix vectorNxfee Innovation
This paper proposes a fault tolerant design for integer parallel matrix-vector multiplications (MVMs). The scheme combines ideas from error correction codes and the self-checking capability of MVMs. It adds a detection matrix and sum matrix to the original parallel matrices to enable error detection and correction. Field-programmable gate array evaluation shows the proposed scheme can significantly reduce overheads compared to protecting each MVM individually. The detection matrix is generated using a checksum of each original matrix and a Hamming code. The sum matrix is the direct sum of the original matrices. By comparing results from the original MVMs and detection matrix, faulty outputs can be identified and corrected.
Feedback based low-power soft-error-tolerant design for dual-modular redundancyNxfee Innovation
Triple-modular redundancy (TMR), which consists of three identical modules and a voting circuit, is a common architecture for soft-error tolerance. However, the original TMR suffers from two major drawbacks: the large area overhead and the vulnerability of the voter. In order to overcome these drawbacks, we propose a new complementary dual-modular redundancy (CDMR) scheme for mitigating the effect of soft errors. Inspired by the Markov random field (MRF) theory, a two-stage voting system is implemented in CDMR, including a first stage optimal MRF structure and a second-stage high-performance merging unit. The CDMR scheme can reduce the voting circuit area by 20% while saving the area of one redundant module, achieving at least 26% error-rate reduction at an ultralow supply voltage of 0.25 V with 8.33% faster timing compared to previous voter designs.
Multilevel half rate phase detector for clock and data recovery circuitsNxfee Innovation
In this brief, a half-rate (HR) bang-bang (BB) phase detector (PD) with multiple decision levels is proposed for clock and data recovery (CDR) circuits. The combination allows the oscillator to run at half the input data rate while providing information about the sign and magnitude of the phase shift between the PD inputs. This allows a finer control of the frequency of the oscillator in the phase-locked loop (PLL) of the CDR circuit, which results in up to 30% less output clock jitter than with a conventional two-levels HR BB PD. Thanks to this, the bit error rate can be decreased by up to 5× in a 5-Gb/s CDR circuit. The proposed topology was implemented in a 28-nm FDSOI CMOS technology providing average power consumption below 76 µW with a supply voltage of 1 V. Although multilevel (ML) BB PDs have already been proposed in some PLL-based CDR with very interesting results, a specific design of the PD has to be implemented for an HR system. This brief provides the first ML-HR-BBPD.
The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and “real-world” application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using “real-world” benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.
An energy efficient programmable many core accelerator for personalized biome...Nxfee Innovation
Wearable personalized health monitoring systems can offer a cost-effective solution for human health care. These systems must constantly monitor patients’ physiological signals and provide highly accurate, and quick processing and delivery of the vast amount of data within a limited power and area footprint. These personalized biomedical applications require sampling and processing multiple streams of physiological signals with a varying number of channels and sampling rates. The processing typically consists of feature extraction, data fusion, and classification stages that require a large number of digital signal processing (DSP) and machine learning (ML) kernels. In response to these requirements, in this paper, a tiny, energy efficient, and domain-specific manycore accelerator referred to as power-efficient nano clusters (PENC) is proposed to map and execute the kernels of these applications. Simulation results show that the PENC is able to reduce energy consumption by up to 80% and 25% for DSP and ML kernels, respectively, when optimally parallelized. In addition, we fully implemented three compute-intensive personalized biomedical applications, namely, multichannel seizure detection, multi physiological stress detection, and standalone tongue drive system (sTDS), to evaluate the proposed manycore performance relative to commodity embedded CPU, graphical processing unit (GPU), and field programmable gate array (FPGA)-based implementations. For these three case studies, the energy consumption and the performance of the proposed PENC manycore, when acting as an accelerator along with an Intel Atom processor as a host, are compared with the existing commercial off-the-shelf general purpose, customizable, and programmable embedded platforms, including Intel Atom, Xilinx Artix-7 FPGA, and NVIDIA TK1 advanced RISC machine -A15 and K1 GPU system on a chip. For these applications, the PENC manycore is able to significantly improve throughput and energy efficiency by up to 1872× and 276×, respectively. For the most computational intensive application of seizure detection, the PENC manycore is able to achieve a throughput of 15.22 giga-operations-per-second (GOPs), which is a 14× improvement in throughput over custom FPGA solution. For stress detection, the PENC achieves a throughput of 21.36 GOPs and an energy efficiency of 4.23 GOP/J, which is 14.87× and 2.28× better over FPGA implementation, respectively. For the sTDS application, the PENC improves a through put by 5.45× and an energy efficiency by 2.37× over FPGA implementation.
The document describes a design for a low power 32x32 multiplier that combines Booth and Vedic multiplication architectures. It partitions each 32-bit input into two 16-bit blocks, uses 16x16 Booth multipliers to generate partial products for each block, and employs 16x16 Vedic multipliers and carry select adders to add the partial products. This combined architecture achieves lower power and faster performance than individual Booth or Vedic multipliers. The design is implemented using Xilinx Vivado and evaluated for applications such as floating point multiplication.
A high accuracy programmable pulse generator with a 10-ps timing resolutionNxfee Innovation
Automatic test equipment must have high-precision and low-power pulse generators (PGs) for testing memory and device-under-test ICs. This paper describes a high-accuracy and wide-data-rate-range PG with a 10-ps time resolution. The PG comprises an edge combiner (EC) and a multiphase clock generator (MPCG). The EC can produce an arbitrary waveform through 32 phase outputs of the MPCG. The EC adopts a one/zero detector and phase selection logic to define an operational data rate range and a timing resolution, respectively. Therefore, the EC uses the phase selection logic to combine the period window of the one/zero detector with the MPCG output phases. The EC also uses a countdown counter for a wide operational range. In the MPCG, a multiphase oscillator (MPO) adopts a ring oscillator scheme with sub feedback loops to extend its maximum operational frequency. The MPO also uses a phase error corrector to reduce the output phase error resulting from process and layout mismatches. Thus, the PG can obtain high accuracy waveforms owing to small phase errors. The test chip was implemented using a 0.13-µm CMOS process. The core area and power consumption of the PG were measured to be 250 × 300 µm2 and 18.7 mW, respectively. The data rate range of the PG was determined to be from 3.2 kHz to 893 MHz. The time resolution and average accuracy of the PG were measured to be 10 ps and ±0.3 LSB, respectively.
IRJET- Image and Signal Filtering using Fir Filter Made using Approximate Hyb...IRJET Journal
This document proposes and evaluates approximate hybrid high radix encoding techniques for designing energy-efficient inexact multipliers. A novel approximate hybrid high radix encoding is proposed that encodes the most significant bits of the multiplicand using radix-4 encoding and the least significant bits using approximate radix-2k encoding. Approximations are performed by rounding high radix values to the nearest power of two. The technique is applied to design 16x16 bit multipliers using 4:2 compressors to reduce the area compared to normal adders. Simulation results show the proposed design achieves area savings compared to an accurate radix-4 multiplier. The document also explores applying the proposed encoding technique to the design of finite impulse response filters using
Combating data leakage trojans in commercial and asic applications with time ...Nxfee Innovation
Globalization of microchip fabrication opens the possibility for an attacker to insert hardware Trojans into a chip during the manufacturing process. While most defensive methods focus on detection or prevention, a recent method, called Randomized Encoding of Combinational Logic for Resistance to Data Leakage (RECORD), uses data randomization to prevent hardware Trojans from leaking meaningful information even when the entire design is known to the attacker. Both RECORD and its sequential variant require significant area and power overhead. In this paper, a Time-Division Multiplexed version of the RECORD design process is proposed which reduces area overhead by 63% and power by 56%. This time-division multiplexing (TDM) concept is further refined to allow commercial off the shelf (COTS) products and IP cores to be safely operated from a separate chip. These new methods tradeoff latency (5.3× for TDM and 3.9× for COTS) and energy use to accomplish area and power savings and achieve greater security than the original RECORD process.
A closed form expression for minimum operating voltage of cmos d flip-flopNxfee Innovation
In this paper, a closed-form expression for estimating the minimum operating voltage (VDDmin) of D flip-flops (FFs) is proposed. VDDmin is defined as the minimum supply voltage at which the FFs are functional without errors. The proposed expression indicates that VDDmin of FFs is a linear function of the square root of logarithm of the number of FFs, and its slope depends on the within-die variation of the threshold voltage (VTH) and its intercept depends on the balance between PMOS and NMOS, which is mainly due to the die-to-die VTH variation. The proposed expression of VDDmin is validated by the simulation results as well as the silicon measurements. Finally, we discuss the dependence of VDDmin on the device parameters..
Efficient fpga mapping of pipeline sdf fft coresNxfee Innovation
In this paper, an efficient mapping of the pipeline single-path delay feedback (SDF) fast Fourier transform (FFT) architecture to field-programmable gate arrays (FPGAs) is proposed. By considering the architectural features of the target FPGA, significantly better implementation results are obtained. This is illustrated by mapping an R22SDF 1024-point FFT core toward both Xilinx Virtex-4 and Virtex-6 devices. The optimized FPGA mapping is explored in detail. Algorithmic transformations that allow a better mapping are proposed, resulting in implementation achievements that by far outperforms earlier published work. For Virtex-4, the results show a 350% increase in throughput per slice and 25% reduction in block RAM (BRAM) use, with the same amount of DSP48 resources, compared with the best earlier published result. The resulting Virtex-6 design sees even larger increases in throughput per slice compared with Xilinx FFT IP core, using half as many DSP48E1 blocks and less BRAM resources. The results clearly show that the FPGA mapping is crucial, not only the architecture and algorithm choices.
Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley...IRJET Journal
This document reviews designs for low power multiply and accumulate (MAC) units. It summarizes several papers on MAC unit architectures that aim to improve speed and reduce power consumption. For 8-bit MAC units, designs using a Baugh-Wooley multiplier have increased delay but very low power compared to other techniques. For 16-bit MAC units, a proposed 2-cycle MAC architecture has less power and delay than other 2-cycle and 3-cycle MAC units. For 32-bit MAC units, designs using a Baugh-Wooley multiplier with a high performance multiplier tree exhibit comparable delay, less power dissipation, and smaller area than designs using a modified Booth multiplier. In general, incorporating a Baugh-Wooley multiplier
Low complexity methodology for complex square-root computationNxfee Innovation
In this brief, we propose a low-complexity methodology to compute a complex square root using only a circular coordinate rotation digital computer (CORDIC) as opposed to the state-of-the-art techniques that need both circular as well as hyperbolic CORDICs. Subsequently, an architecture has been designed based on the proposed methodology and implemented on the ASIC platform using the UMC 180-nm Technology node with 1.0 V at 5 MHz. Field programmable gate array (FPGA) prototyping using Xilinx’ Virtex-6 (XC6v1x240t) has also been carried out. After thorough theoretical analysis and experimental validations, it can be inferred that the proposed methodology reduces 21.15% slice look up tables (on FPGA platform) and saves 20.25% silicon area overhead and decreases 19% power consumption (on ASIC platform) when compared with the state-of-the-art method without compromising the computational speed, throughput, and accuracy.
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTI...Hari M
PARTIAL PRODUCT ARRAY HEIGHT REDUCTION USING RADIX-16 FOR 64-BIT BOOTH MULTIPLIER:
Reduce the maximum height of the partial product columns to [n/4] for n = 64-bit unsigned
operand. This is in contrast to the conventional maximum height of [(n + 1)/4].
The multiplier algorithm is normally used for higher bit length applications and ordinary multiplier is good for lower order bits.
An overview of the use-cases built developed in the Accenture Cloud Innovation Center of Rome leveraging the partnership with Red Hat that can help organizations to create business value by implementing solutions that give fast answers, optimized time to delivery with controllable costs using scalable and open architectures.
Moe Kinney is seeking a technical position that allows collaboration across various disciplines. He has over 15 years of experience at IBM developing standard cell components, synthesis and physical synthesis methodologies, and test cases. He also has experience as a project manager, software developer, and is skilled in Verilog, VHDL, PERL, TCL, and Unix/Linux. He has a Bachelor's degree in Electrical Engineering and Economics.
Securing the present block cipher against combined side channel analysis and ...Nxfee Innovation
This document summarizes a research paper that presents a hardware implementation of the PRESENT block cipher secured against both side-channel analysis and fault attacks. The implementation uses threshold implementation masking to protect against side-channel analysis and Private Circuits II to protect against fault attacks. The implementation is evaluated on an FPGA and is shown to provide first-order security against side-channel analysis and resistance against arbitrary 1-bit faults. Differential fault analysis attacks on PRESENT are also evaluated and shown to require more effort to be successful against this implementation due to the fault masking.
Algorithm and vlsi architecture design of proportionate type lms adaptive fil...Nxfee Innovation
Proportionate-type normalized LMS (Pt-NLMS) family of adaptive filtering algorithms for sparse system identification pose significant implementation challenges due to their high computational complexity especially for real-time applications like network echo cancelation. In this paper, we make the first attempt to implement Pt-NLMS algorithms in hardware. Several reformulations are proposed to simplify the original Pt-NLMS algorithms, thereby making them amenable to real time VLSI implementations and the reformulated algorithms referred as delayed µ-law proportionate LMS (DMPLMS) algorithm for white input and delayed wavelet MPLMS (DWMPLMS) for colored input are then implemented in hardware. Simulation studies demonstrate that the performance loss is very small for the proposed reformulations. We implemented the proposed designs considering 16-bit fixed point representation in hardware, and synthesis results show that the DMPLMS architecture with ≈30% increase in hardware over the state-of-the-art conventional delayed LMS architecture achieves 3× improvement in convergence rate for white input and the DWMPLMS architecture with ≈70% increase in hardware achieves 10× improvement in convergence rate for correlated input conditions.
This document provides information about VLSI projects completed by VisionGroups, an engineering solutions company. It lists 9 projects, including low-power flip-flop designs, 64-bit MAC units, 32-bit multipliers using different adder structures, reversible Vedic multipliers, digit-serial FIR filters, and aging-aware reliable multiplier designs. For each project, it provides a brief abstract outlining the goal, methodology, and key results such as power consumption, operating frequency, or area improvements over previous designs. The document also provides contact information for VisionGroups.
Improvement of Process and Product Layout for Metro Coach using Craft Method...IRJET Journal
This document discusses using the CRAFT (Computerized Relative Allocation of Facilities Technique) algorithm to improve the process and product layout of a metro coach manufacturing plant. CRAFT was developed in 1964 to help with computerized facilities design and layout optimization. The methodology involves collecting data on the current layout, material flow, and transportation costs. The CRAFT algorithm is then used to generate an improved layout. When compared to the original layout, the new layout designed using CRAFT shows a 21.1% reduction in transportation costs by relocating departments to streamline material flow. In conclusion, using layout optimization algorithms like CRAFT can enhance productivity and space usage in a manufacturing facility.
Similar to Approximate hybrid high radix encoding for energy efficient inexact multipliers (20)
Noise insensitive pll using a gate-voltage-boosted source-follower regulator ...Nxfee Innovation
In this brief, we propose a supply noise-insensitive charge pump phase-locked loop (PLL) using a source-follower (SF) regulator and noise cancellation. In order to minimize the voltage drop of the SF regulator while improving supply rejection, a gate-voltage-boosting technique and the body-controlled noise cancellation are proposed. To suppress the phase noise from the ring oscillator, a reference multiplier is employed to maximize the PLL loop bandwidth. Implemented in 65-nm CMOS, a prototype PLL at 3.2 GHz achieves supply noise spur of less than −33 dBc for a 50-mVpp supply noise around the loop bandwidth while consuming 3.12 mW from a 1-V supply.
The implementation of the improved omp for aic reconstruction based on parall...Nxfee Innovation
This document presents a hardware implementation of an improved orthogonal matching pursuit (OMP) algorithm for signal reconstruction in analog-to-information converters based on compressive sensing. The proposed architecture reduces computational complexity and the number of iterations compared to the original OMP algorithm. It achieves a higher recovery signal-to-noise ratio of 31.04 dB. The design includes parallel complex multiplication, matrix inversion using the Goldschmidt algorithm, and signal estimation units. Implementation on a Xilinx Virtex6 FPGA shows the architecture uses a few percentage of resources at 135.4 MHz with a reconstruction time of 170 μs, faster than existing designs.
A flexible wildcard pattern matching accelerator via simultaneous discrete fi...Nxfee Innovation
Regular expression matching becomes indispensable elements of Internet of Things network security. However, traditional ternary content addressable memory (TCAM) search engine is unable to handle patterns with wildcards, as it precisely tracks only one active state with single transition. This paper proposes a promising simultaneous pattern matching methodology for wildcard patterns by two separated engines to represent discrete finite automata. A key preprocessing to encode possible postfix pattern by a unique key ensures that follow-up patterns can accurately traverse all possible matches with limited hardware resources. This approach is practical and scalable for achieving good performance and low space consumption in network security, and it can be applicable to any regular expressions even with multi wildcard patterns. The experimental results demonstrate that this scheme can efficiently and accurately recognize wildcard patterns by simultaneously tracking only two active states. By adopting SRAM TCAM in the proposed architecture, the energy consumption is reduced to around 39%, compared with the energy consumption using a computing system that contains a large memory lookup and comparison overhead.
A 128 tap highly tunable cmos if finite impulse response filter for pulsed ra...Nxfee Innovation
A configurable-bandwidth (BW) filter is presented in this paper for pulsed radar applications. To eliminate dispersion effects in the received waveform, a finite impulse response (FIR) topology is proposed, which has a measured standard deviation of an in-band group delay of 11 ns that is primarily dominated by the inherent, fully predictable delay introduced by the sample-and-hold. The filter operates at an IF of 20 MHz, and is tunable in BW from 1.5 to 15 MHz, which makes it optimal to be used with varying pulse widths in the radar. Employing a total of 128 taps, the FIR filter provides greater than 50-dB sharp attenuation in the stop band in order to minimize all out-of-band noise in the low signal-to-noise received radar signal. Fabricated in a 0.18-µm silicon on insulator CMOS process, the proposed filter consumes approximately 3.5mW/tap with a 1.8-V supply. A 20-MHz two-tone measurement with 200-kHz tone separation shows IIP3 greater than 8.5dBm.
NXFEE Innovation is the Industry of Semiconductor IP Development, IP Designs, and services of developing solution to provide core products and application to customers with a wide range of solution that include custom ASIC/ FPGA/ DSP/ EMBEDDED System/ Wireless Technologies. Having lustrum of expertise and satisfied customers, NXFEE have the capability to deliver solution that is fully meshed with customer’s business requirement, meeting the highest standards.
NXFEE will Provide cost effective outsourcing services for secure and turn key product development in the areas of Bio-Medical/ Wireless/ Robotics/ VLSI/ DSP/ Embedded design & Development from conceptualization to production. Our sound technology and knowledge base have helped us to create products using emerging technology that include FPGA, VHDL, VERILOG HDL, SYSTEM VERILOG HDL, UVM, OVM, VVM, DSP, RTOS, DSP, Bluetooth, WI-FI, RF, CDMA, AXI, AHP, APB, and other related technologies in the area of industrial automation, telecommunications, consumer electronics and automotive applications.
How to Manage Internal Notes in Odoo 17 POSCeline George
In this slide, we'll explore how to leverage internal notes within Odoo 17 POS to enhance communication and streamline operations. Internal notes provide a platform for staff to exchange crucial information regarding orders, customers, or specific tasks, all while remaining invisible to the customer. This fosters improved collaboration and ensures everyone on the team is on the same page.
Development of Chatbot Using AI/ML Technologiesmaisnampibarel
The rapid advancements in artificial intelligence and natural language processing have significantly transformed human-computer interactions. This thesis presents the design, development, and evaluation of an intelligent chatbot capable of engaging in natural and meaningful conversations with users. The chatbot leverages state-of-the-art deep learning techniques, including transformer-based architectures, to understand and generate human-like responses.
Key contributions of this research include the implementation of a context- aware conversational model that can maintain coherent dialogue over extended interactions. The chatbot's performance is evaluated through both automated metrics and user studies, demonstrating its effectiveness in various applications such as customer service, mental health support, and educational assistance. Additionally, ethical considerations and potential biases in chatbot responses are examined to ensure the responsible deployment of this technology.
The findings of this thesis highlight the potential of intelligent chatbots to enhance user experience and provide valuable insights for future developments in conversational AI.
A brand new catalog for the 2024 edition of IWISS. We have enriched our product range and have more innovations in electrician tools, plumbing tools, wire rope tools and banding tools. Let's explore together!
In May 2024, globally renowned natural diamond crafting company Shree Ramkrishna Exports Pvt. Ltd. (SRK) became the first company in the world to achieve GNFZ’s final net zero certification for existing buildings, for its two two flagship crafting facilities SRK House and SRK Empire. Initially targeting 2030 to reach net zero, SRK joined forces with the Global Network for Zero (GNFZ) to accelerate its target to 2024 — a trailblazing achievement toward emissions elimination.
Enhancing Security with Multi-Factor Authentication in Privileged Access Mana...Bert Blevins
In the constantly evolving field of cybersecurity, ensuring robust protection for sensitive data and critical systems has never been more vital. As cyber threats grow more sophisticated, organizations continually seek innovative ways to bolster their defenses. One of the most effective tools in the security arsenal is Multi-Factor Authentication (MFA), particularly when integrated with Privileged Access Management (PAM).
Privileged Access Management encompasses the methods, procedures, and tools used to regulate and monitor access to privileged accounts within an organization. Users with privileged accounts possess elevated rights, enabling them to perform essential operations such as system configuration, access to sensitive data, and management of network infrastructure. However, these elevated privileges also pose a significant security risk if they fall into the wrong hands.
By combining MFA with PAM, organizations can significantly enhance their security posture. MFA adds an additional layer of verification, ensuring that even if privileged account credentials are compromised, unauthorized access can be thwarted. This integration of MFA and PAM provides a robust defense mechanism, protecting critical systems and sensitive data from increasingly sophisticated cyber threats.
Approximate hybrid high radix encoding for energy efficient inexact multipliers
1. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact
Multipliers
Abstract:
Approximate computing forms a design alternative that exploits the intrinsic error
resilience of various applications and produces energy-efficient circuits with small
accuracy loss. In this paper, we propose an approximate hybrid high radix encoding for
generating the partial products in signed multiplications that encodes the most significant
bits with the accurate radix-4 encoding and the least significant bits with an approximate
higher radix encoding. The approximations are performed by rounding the high radix
values to their nearest power of two. The proposed technique can be configured to
achieve the desired energy–accuracy tradeoffs. Compared with the accurate radix-4
multiplier, the proposed multipliers deliver up to 56% energy and 55% area savings,
when operating at the same frequency, while the imposed error is bounded by a Gaussian
distribution with near-zero average. Moreover, the proposed multipliers are compared
with state-of-the-art inexact multipliers, outperforming them by up to 40% in energy
consumption, for similar error values. Finally, we demonstrate the scalability of our
technique.
Software Implementation:
Modelsim
Xilinx 14.2
Existing System:
In modern embedded systems and data centers, energy efficiency is a mandatory design
concern. Considering that a large amount of application domains exhibits an intrinsic
error tolerance, e.g., digital signal processing (DSP), image processing, data analytics,
2. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
and data mining, approximate computing appears as an effective solution to reduce their
power dissipation. In approximate computing, error has been viewed as a commodity that
can be traded for significant gains in cost (e.g., power, energy, and performance), and as
a result, it composes a promising design paradigm targeting energy-efficient systems by
extremely decreasing the power consumption of inherently error resilient applications.
Specifically, approximate computing exploits the innate error tolerance of the respective
applications and deliberately relaxes the correctness of some computations, in order to
decrease their power consumption and/or accelerate their execution. Recently, targeting
to take advantage of its benefits, massive research has been conducted in the field of
hardware approximate circuits. The main targets are arithmetic units, e.g., adders and
multipliers, that are the core components in many embedded devices and hardware
accelerators. Extensive research is reported in approximate adders, providing significant
gains in terms of delay and power dissipation. However, research activities on the
approximate multipliers is less comprehensive compared with the respective on
approximate adders. In inexact multipliers, approximations can be applied on the partial
product generation, as well as the partial product accumulation. Approximations on the
partial product generation and approximations on their accumulation are synergistic, and
can be applied in collaboration in order to achieve higher power reduction.
Although significant research has been conducted in the partial product accumulation,
research activity on the approximation of the partial product generation is still limited.
Finally, another limitation of the existing approximate multipliers is that the majority of
them does not examine signed multiplication. In this paper, targeting the design of
inexact multipliers by applying approximations on the partial product generation, we
propose a novel approximate hybrid high radix encoding. In the proposed technique, the
most significant bits (MSBs) of the multiplicand are encoded using the radix-4 encoding,
whereas the k least significant bits (LSBs) are encoded using a radix-2k (with k ≥ 4). To
3. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
simplify the increased complexity induced by the proposed hybrid encoding, the circuit
for generating the partial products is approximated by altering accordingly its truth table.
Hence, the number of the partial products decreases significantly and simpler tree
architectures are used for their accumulation, reducing the multiplier’s energy
consumption, area, and critical path delay. The major contributions of this paper are
summarized as follows. 1) We propose and enable the application of hybrid high radix
encodings for the generation of energy-efficient approximate multipliers, exceeding the
increased hardware complexity of very high radix encodings. 2) The proposed technique
can be applied to any multiplier architecture and is reconfigurable, enabling the user to
select the optimal per application energy–error tradeoff. 3) An analytical error analysis is
conducted, showing that the output error of the proposed technique is bounded and
predictable. Such a rigorous error analysis leads to precise and a priori error estimation
for any input distribution, without the need of time-consuming simulations. 4) We show
that t terms of hardware and accuracy, achieving up to 40% less energy dissipation for
comparable error values. More specifically, the proposed technique is applied to a 16×16
bit multiplier and is evaluated using industrial strength tools, i.e., Synopsys Design
Compiler, Prime Time, and Mentor Graphics ModelSim. Compared with the accurate
multiplier, the proposed technique delivers up to 55% energy and area reduction, for
mean relative error up to 0.93%. Moreover, compared with related state-of-the-art
approximate computing signed multipliers, our technique outperforms them significantly
in terms of energy consumption and error. Finally, we examine the scalability of the
proposed technique and show that for the same error value, the delivered energy savings
increase as the multiplier size increases.
Disadvantages:
Accuracy is less
Energy efficiency is low
4. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Area size is high and
Errors are not reduced
Proposed System:
Approximate hybrid high radix multipliers
High radix encodings offer partial products reduction, and as a result, their accumulation
requires smaller trees, leading to energy, area, and/or delay savings. However, high radix
encodings require complex encoding and partial product generation circuits, negating
thus the benefits of the partial products reduction. In this section, the proposed hybrid
high radix encoding and the performed approximations for simplifying its circuit
complexity are presented. In the proposed technique, the multiplicand B is encoded using
the approximate high radix encoding, generating , and the approximate multiplication
A· is performed. Finally, its adaptation on inexact 16-bit hardware multipliers is
described, and a qualitative analysis is conducted, targeting to estimate the potential area
gains.
Hybrid High Radix Encoding
In the proposed hybrid high radix encoding, B is divided in two groups: the MSB part of
n −k bits and the LSB part of k bits. The configuration parameter, k ≥ 4, is an even
number, namely, k = 2m: m ∈ Z, with m ≥ 2. The MSB part is encoded using the radix-4
(modified Booth) encoding, while the LSB part is encoded with the high radix-2 k
encoding
5. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
where
and
The radix-4 encoding includes (n − k)/2 digits ∈ {0, ±1, ±2}, while ∈ {0, ±1,
±2, ±3,..., ± (2k
−1 − 1), −2k
−1} corresponds to the radix2k
- encoding. Overall, B is
encoded with (n − k)/2 + 1 digits. The above hybrid high radix encoding is characterized
by increased logic complexity, due to the high radix values of that are not power of
two, and thus, an approximate version is proposed. However, in order to retain high
accuracy, the radix-4 encoding of the MSB is performed accurately. In particular, in the
approximate encoding, all the values that are not power of two and the k − 4 smallest
powers of two as well, are rounded to the nearest of the 4 largest powers of two or 0, so
that the sum of all the values of the approximate digit is 0. We choose to keep only
the four largest powers of two, so that the radix-2k
encoding circuit requires only about
the double area in comparison with the accurate radix-4 encoder. Therefore, B is
approximately encoded as follows:
6. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
where
7. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Table I
Accurate radix-4 encoding table
Table II
Approximate radix-2 k
encoding table
8. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Table I presents the accurate radix-4 encoding. The output signals sign j , ×1 j , and ×2 j
define the radix-4 digit . Their logic equations are the following:
Table II presents the approximate radix-2k
encoding. The logic equations of the encoding
signals that define the radix2k
digit are the following:
9. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
The effectiveness of the approximate hybrid radix encoding technique is explored with its
application to 16-bit signed numbers, for k = 6, 8, 10, namely, the LSBs are encoded
using
Fig. 1. i-bit partial product generator based on (a) accurate radix-4 encoding and the approximate (b)
radix-64, (c) radix-256, and (d) radix-1024 encoding. ai : i-bit of operand A, ai = ai ⊕ sign.
10. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Fig. 2. Partial product tree based on the hybrid encoding of accurate radix-4 and approximate (a) radix-
64, (b) radix-256, and (c) radix-1024 encoding partial product bits from the approximate high radix
encoding. •: partial product bits from the accurate radix-4 encoding inverted MSBs of the
partial products. sign factors.
the radix-64, radix-256, and radix-1024 encoding, respectively. In the radix-64 encoding,
the bits of B are grouped as in
The following values of the digit are rounded to their nearest power of two: ±1, ±3,
±5, ±6, ±7, ±9, ... , ±15, ±17, ... , ±31 are rounded to ±4, ±8, ±16, or ±32, while the
smallest powers of two, i.e., ±1 and ±2, are rounded to 0 or ±4. o 0 or ±4.
In radix-1024 encoding, the bits of B are grouped as follows:
11. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
Similarly, the non powers of two are rounded to ±64, ±128, ±256, or ±512, and the
smallest powers of two (±1, ±2, ±4, ±8, ±16, ±32) are rounded to 0 or ±64. The encoder’s
inputs are the bits b9, b8,..., b0, the approximate radix-1024 digit is ∈ {0,
±64, ±128, ±256, ±512}, and the output signals that define are sign, ×64, ×128, ×256,
×512.
Partial Product Generation
In the proposed hybrid encoding, the n − k MSBs of B are encoded with the accurate
radix-4 encoding, while the k LSBs are encoded with an approximate radix-2k
encoding.
The accurate radix-4 encoder produces the signals defined, whereas the approximate high
radix encoder produces the signals. Overall, there is a reduction of k/2 − 1 partial
products generated in the multiplication A · B˜.
Table III
Partial products per radix encoding
12. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
In Fig. 1, four partial product generators are presented, i.e., the circuit of the accurate
radix-4 encoding and the ones of the three approximate high radix encodings discussed.
The partial products created from each encoding are shown in Table III. In addition, the
three hybrid high radix encodings create the partial product trees shown in Fig. 2. The
trees also include the encoding’s correction term (constant terms and sign factors). The
implementation of the partial product accumulation can be chosen by the designer. In this
paper, an accurate Wallace tree is used to implement the partial product’s sum, whereas
the two outputs produced by the Wallace tree are added using a prefix (fast) adder.
Overall, the multiplication circuit consists of stages of operand hybrid radix encoding,
partial product generation, partial product accumulation, and final addition. The proposed
approximate multipliers are named RAD2k
, showing the selected approximate high radix
encoding, e.g., RAD64, RAD256, and RAD1024.
Advantages:
Accuracy is high
Energy efficiency is high
Area size is low and
Errors are reduced
References:
[1] V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, “Analysis and characterization of
inherent application resilience for approximate computing,” in Proc. ACM/IEEE Design Autom. Conf.,
May 2013, pp. 1–9.
[2] S. T. Chakradhar and A. Raghunathan, “Best-effort computing: Re-thinking parallel software and
hardware,” in Proc. ACM/IEEE Design Autom. Conf., Jun. 2010, pp. 865–870.
[3] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, “Highly energyefficient and quality-tunable inexact
FFT accelerators,” in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2014, pp. 1–4.
13. NXFEE INNOVATION
(SEMICONDUCTOR IP &PRODUCT DEVELOPMENT)
(ISO : 9001:2015Certified Company),
# 45, Vivekanandar Street, Dhevan kandappa Mudaliar nagar, Nainarmandapam,
Pondicherry– 605004, India.
Buy Project on Online :www.nxfee.com | contact : +91 9789443203 |
email : nxfee.innovation@gmail.com
_________________________________________________________________
[4] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power digital signal processing using
approximate adders,” IEEE Trans. Comput.- Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–
137, Jan. 2013.
[5] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, “IMPACT: IMPrecise adders for
low-power approximate computing,” in Proc. 17th IEEE/ACM Int. Symp. Low-Power Electron. Design,
Aug. 2011, pp. 409–414.
[6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, “Design of low-power high-speed
truncation-error-tolerant adder and its application in digital signal processing,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010.
[7] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired imprecise computational
blocks for efficient VLSI implementation of soft-computing applications,” IEEE Trans. Circuits Syst. I,
Reg. Papers, vol. 57, no. 4, pp. 850–862, Apr. 2010.
[8] A. K. Verma, P. Brisk, and P. Ienne, “Variable latency speculative addition: A new paradigm for
arithmetic circuit design,” in Proc. Design, Autom. Test Eur., Mar. 2008, pp. 1250–1255.
[9] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power with an underdesigned
multiplier architecture,” in Proc. 24th Int. Conf. VLSI Design, Jan. 2011, pp. 346–351.
[10] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate
compressors for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
[11] C.-H. Lin and I.-C. Lin, “High accuracy approximate multiplier with error correction,” in Proc.
IEEE 31st Int. Conf. Comput. Design, Oct. 2013, pp. 33–38.
[12] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed multiplier for error-tolerant
application,” in Proc. IEEE Int. Conf. Electron Devices Solid-State Circuits, Dec. 2010, pp. 1–4.
[13] C. Liu, J. Han, and F. Lombardi, “A low-power, high-performance approximate multiplier with
configurable partial error recovery,” in Proc. Design, Autom. Test Eur., Mar. 2014, pp. 1–4.