Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Delong Shang

    In asynchronous circuit synthesis, the direct translation method, whereby circuits are derived from Petri net (PN) (1) specifications directly, has not evolved any automatic tools. This paper describes a design method based on direct... more
    In asynchronous circuit synthesis, the direct translation method, whereby circuits are derived from Petri net (PN) (1) specifications directly, has not evolved any automatic tools. This paper describes a design method based on direct translation techniques, incorpo- rating refinement, optimization and an automatic tool.
    Several designs of self-timed (some of which speed independent) latch circuits are presented. These are used in the speed independent (SI) implementation of two consecutive binary assignment statements. Issues such as logic reduction,... more
    Several designs of self-timed (some of which speed independent) latch circuits are presented. These are used in the speed independent (SI) implementation of two consecutive binary assignment statements. Issues such as logic reduction, utilisation of well known, simple components, and response speed improvement are dealt with in detail. The techniques employed can be used in designing other asynchronous circuits where
    DC/DC conversion has been an integral part of the power delivery chain in energy harvesting systems because the conventionally targeted synchronous computation load demands stable Vdd, which cannot in general be supplied by power... more
    DC/DC conversion has been an integral part of the power delivery chain in energy harvesting systems because the conventionally targeted synchronous computation load demands stable Vdd, which cannot in general be supplied by power harvesters directly. However, asynchronous computation loads, in addition to their potential power-saving capabilities, can be made tolerant to a much wider range of Vdd variance. This may open up opportunities for much more energy efficient methods of power delivery to be adopted. This paper presents in-depth investigations into the behavior and performance of different power delivery methods driving both asynchronous and synchronous load for the first time. A novel power delivery method, which employs a capacitor bank for adaptively storing the energy from power harvesters depending on load and source conditions, is developed. Its advantages, especially when driving asynchronous loads, are demonstrated through comprehensive comparative analyses.
    — In energy-aware design, especially for systems with uncertain power sources, asynchronous computation loads which can function under variable power supply have many potential advantages. Fully safe asynchronous loads based on delay... more
    — In energy-aware design, especially for systems with uncertain power sources, asynchronous computation loads which can function under variable power supply have many potential advantages. Fully safe asynchronous loads based on delay insensitivity (DI), however, tend to suffer power and size penalties. As a compromise, delay bundling has been widely used in asynchronous computation, but traditional delay elements have been shown to be unsuitable for bundling memory components, whose latency behaviour varies differently under variable Vdd from other types of logic. This paper proposes an intelligent delay bundling method for SRAM working under Vdd which unpredictably varies over a wide range (e.g. 200mV to 1V for 90nm technology), based on the principle of using matching delay bundling elements. A fully speed independent (SI) SRAM design is investigated in depth and a self-timed SRAM architecture using such SI SRAM cells as delay bundling elements is demonstrated through comprehensiv...
    Energy harvesting systems suffer variable power supply from time to time. Ultra low power supply or even a very short period without power supply may cause very serious consequences. In the case of system memories, important data may be... more
    Energy harvesting systems suffer variable power supply from time to time. Ultra low power supply or even a very short period without power supply may cause very serious consequences. In the case of system memories, important data may be lost. A gated diode based DRAM can be a potential memory solution as a dynamic backup storage to meet these challenges. However, before exploring this solution, such DRAMs need to be studied under low and variable Vdd situations and this paper investigates these problems. Gated diodes can be used in DRAM cells and sense amplifiers on the read bit-lines. These give the DRAM significantly long retention time and improve memory reading speed. This paper investigates the performance of two existing DRAM structures, 2T1D and 3T1D, working under low and variable Vdd. In addition, a novel sense amplifying method which works correctly under very low Vdd is developed. The performances of DRAMs with the novel sense amplifiers, with typical sense amplifiers, an...
    In order to increase the power efficiency of IP cores in an SoC, a Self-timed Event Processor (STEP) is designed in this paper to provide power management and event handling for each IP core in a frame of Virtual Self-timed Block (VSB).... more
    In order to increase the power efficiency of IP cores in an SoC, a Self-timed Event Processor (STEP) is designed in this paper to provide power management and event handling for each IP core in a frame of Virtual Self-timed Block (VSB). Following Model Based Design (MBD) method, this paper presents the specification, analysis and verification of a VSB design in detail. 1.
    In order to increase the power efficiency of IP cores in an SoC, a Self-timed Event Processor (STEP) is modelled and designed in our previous research to provide power management and event handling for each IP core in a frame of Virtual... more
    In order to increase the power efficiency of IP cores in an SoC, a Self-timed Event Processor (STEP) is modelled and designed in our previous research to provide power management and event handling for each IP core in a frame of Virtual Self-timed Block (VSB). This paper presents the construction of an example SoC with four VSBs in MATLAB Simulink where a test bench named as “ball game ” was implemented. The benchmark achieved from MATLAB Simulation verified the correctness and efficiency of our STEP design. 1.
    Page 1. Multiple-Rail Phase-Encoding for NoC Crescenzo D'Alessandro ∗1 , Delong Shang 1 , Alex Bystrov 1 , Alex Yakovlev 1 , Oleg Maevsky 2 1 University of Newcastle upon Tyne, UK – 2 Intel Labs, Moscow, RU Abstract ...
    Delay-insensitivity is a theoretically attractive design principle which helps circuits to be resistant to process variations, particularly exhibiting them selves at the system level as delay variations. Unfortunately, delay insensitive... more
    Delay-insensitivity is a theoretically attractive design principle which helps circuits to be resistant to process variations, particularly exhibiting them selves at the system level as delay variations. Unfortunately, delay insensitive (DI) design is impractical for most real systems. Speed independent (SI) design is often used in practice as a next best approach. With the scaling of wires becom ing more and more difficult compared with logic gates at current and future technology nodes, SI systems are becoming less acceptable as “approximates” for DI systems. This paper proposes an approach based on decomposing complex systems into simple, manageable blocks which can be safely rendered in an SI manner. These blocks are then connected using interconnects which satisfy DI requirements to obtain “virtual DI” behaviour at system level. We demonstrate this approach with a tile-based implementation of a multi-access arbiter.
    A modified 4-slot asynchronous communication mechanism (ACM) using entirely selftimed circuits to implement the algorithm is presented here. Mutual exclusion elements are used to concentrate potential metastability to a couple of discrete... more
    A modified 4-slot asynchronous communication mechanism (ACM) using entirely selftimed circuits to implement the algorithm is presented here. Mutual exclusion elements are used to concentrate potential metastability to a couple of discrete points so that it can be resolved entirely within the mechanism itself, while the self-timed circuits allow the interface between the reader and writer processes and the mechanism to be minimised. Initial analyses show that this solution is more robust with regard to steering logic metastability, and can potentially run faster, than the original 4-slot solution. 2 Introduction writer data in shared memory control variables reader Figure 1 Asynchronous data communication mechanisms using shared memory and control variables. Data communication between concurrent processes often employ shared memory which may have access conflicts when the processes are not synchronised. The most obvious way to protect shared memory is to put it into a critical sectio...
    Research Interests:
    ABSTRACT In future systems with relatively unreliable and unpredictable energy sources such as harvesters, the system Vdd may become non-deterministic. Reliable and accurate on-chip voltage sensors are therefore indispensible for the... more
    ABSTRACT In future systems with relatively unreliable and unpredictable energy sources such as harvesters, the system Vdd may become non-deterministic. Reliable and accurate on-chip voltage sensors are therefore indispensible for the power and computation management of such systems. Stable and known references are also difficult to obtain in this environment. This paper describes a reference-free voltage sensor implemented using a speed independent (SI) SRAM cell and an inverter chain. It can work under a wide range of Vdd, and provides accurate measurements of Vdd over this operating range with a precision range from 50mV to 10mV. Unlike existing methods, the voltage information is directly generated as a digital code without any analog circuits. This is realized by exploiting the inherently different latency behaviors of different types of circuits under different Vdd.
    ABSTRACT Multi-resource multi-client arbiters are becoming more important in on-chip systems because of the increasing significance of dynamic, run-time, allocation of various system performance resources such as power and computation and... more
    ABSTRACT Multi-resource multi-client arbiters are becoming more important in on-chip systems because of the increasing significance of dynamic, run-time, allocation of various system performance resources such as power and computation and communication facilities. Arbiters, for example, can be used to limit the amount of concurrency for regulating voltage droops, and for balancing load and traffic. This paper describes the design of multi-resource arbiters with high degrees of concurrency. By using freezing logic, this design method guarantees correct computation whilst simplifies the implementation. Quick release mechanisms and the implementation of the multi-token concept through the duplication of the client requests help improve the efficiency.
    ABSTRACT Asynchronous techniques have become more significant with continued scaling of VLSI technologies. This paper proposes an asynchronous FPGA architecture. Different from previous methods of introducing asynchrony into FPGAs, our... more
    ABSTRACT Asynchronous techniques have become more significant with continued scaling of VLSI technologies. This paper proposes an asynchronous FPGA architecture. Different from previous methods of introducing asynchrony into FPGAs, our method seeks to preserve the current FPGA cell structure as much as possible, whilst achieving delay insensitivity in the inter-cell interconnects. By using David Cells as the central technique in the delay insensitive clock replacement, this method is conducive to the establishment of an automatic design and synthesis flow. It also particularly caters for low power designs, where current FPGA solutions are not effective yet.
    ABSTRACT An heuristic approach towards the scheduling, binding and allocation problem for the high-level synthesis of data-paths is presented. The approach makes use of closeness tables to group operations with similar closeness... more
    ABSTRACT An heuristic approach towards the scheduling, binding and allocation problem for the high-level synthesis of data-paths is presented. The approach makes use of closeness tables to group operations with similar closeness properties, based on their inputs and outputs, into clusters. A tight packing scheduling and binding algorithm is then used to schedule and bind operations from individual clusters to individual functional units. A low interconnect solution is subsequently generated as a result of binding similar groups of operations with common sources and sinks to the same functional units. The approach simultaneously generates efficient schedules. It is shown that this is achieved in fast execution time and that the problem can be solved in reasonably low time complexity. Comparisons are made against other nonpipelined and pipelined approaches.
    ABSTRACT The authors present a novel circuit implementation of the advanced encryption standard using self-timed dual-rail technology. The design reduces leakage of internal information through balanced power consumption, which is... more
    ABSTRACT The authors present a novel circuit implementation of the advanced encryption standard using self-timed dual-rail technology. The design reduces leakage of internal information through balanced power consumption, which is achieved by avoidance of glitches and by data-independent switching behaviour. The design utilises a pipeline structure with built-in controllers and novel, highly balanced security latches.
    ABSTRACT The most efficient power saving method in digital systems is to scale Vdd, owing to the quadratic dependence of dynamic power consumption. This requires memory working under a wide range of Vdds in terms of performance and power... more
    ABSTRACT The most efficient power saving method in digital systems is to scale Vdd, owing to the quadratic dependence of dynamic power consumption. This requires memory working under a wide range of Vdds in terms of performance and power saving requirements. A self-timed 6T SRAM was previously proposed, which adapts to the variable Vdd automatically. However due to leakage, the size of memory is restricted by process variations. This paper reports a new self-timed 10T SRAM cell with bit line keepers developed to improve robustness in order to work in a wide range of Vdds down to 0.3V under PVT variations. In addition, this paper briefly discusses the potential benefits of the self-timed SRAM for designing highly reliable systems and detecting the data retention voltage (DRV).
    ABSTRACT arbiter decomposition Multi-resource arbiter decomposition
    Research Interests:
    ABSTRACT Abstract Checker designs for on-line testing of asynchronous,handshake interfaces are proposed,here. The checker monitors the interface signals that follow a protocol. The checker produces a code word at its output when the... more
    ABSTRACT Abstract Checker designs for on-line testing of asynchronous,handshake interfaces are proposed,here. The checker monitors the interface signals that follow a protocol. The checker produces a code word at its output when the interface signals abide to the protocol, where as, when the protocol is violated, a noncode word is generated at the output. Checkers are designed to directly implement,sets of forbidden transitions, otherwise known as refusals. A ìbusyî approach is used to design the checker. In this approach, self-test of the checker is performed during the normal operation where the output signals are constantly switching.
    Research Interests:
    ABSTRACT This paper focusses on variability analysis for analyzing the robustness of self-timed SRAM to random process variations. The paper augments our previously proposed approaches at the circuit level which provide robustness against... more
    ABSTRACT This paper focusses on variability analysis for analyzing the robustness of self-timed SRAM to random process variations. The paper augments our previously proposed approaches at the circuit level which provide robustness against signals that are susceptible to deadlock with analysis techniques at the transistor level to analyze the effect of the process parameters for the transistors inside the SRAM memory cells. This has been accomplished by employing a variability analysis tool, VARMA, which facilitates the job of analyzing the robustness to variation of process parameters. We have augmented the VARMA tool to use efficient multi-partitioned surface response with back-end Monte Carlo simulation to analyse the problem. The results provide a faster insight than other approaches into the effect of variation processes on circuits.
    Two a synchronous data communication mechanisms (ACMs) using self-timed circuits are presented. Mutual exclusion elements are used to concentrate potential metastability to discrete points so that it can b e resolved entirely within the... more
    Two a synchronous data communication mechanisms (ACMs) using self-timed circuits are presented. Mutual exclusion elements are used to concentrate potential metastability to discrete points so that it can b e resolved entirely within the ACMs themselves. Self-timed circuits allow the minimisation o f t he interface between the reader and writer processes and the ACMs. Initial analysis shows that these
    This paper describes the synthesis and hardware im- plementation of a signal-type asynchronous data com- munication mechanism (ACM). Such an ACM can be used in systems where a data-driven ("lazy") logic must be interfaced with a... more
    This paper describes the synthesis and hardware im- plementation of a signal-type asynchronous data com- munication mechanism (ACM). Such an ACM can be used in systems where a data-driven ("lazy") logic must be interfaced with a time-driven ("busy") environment. A new classification system for ACMs is introduced. The conceptual definition of the signal ACM (called simply "Signal")is refined using Petri
    ABSTRACT This paper describes the design of an asynchronous implementation of a sensor network processor. The main purpose of this work is the reduction of power consumption in sensor network node processors and the research presented... more
    ABSTRACT This paper describes the design of an asynchronous implementation of a sensor network processor. The main purpose of this work is the reduction of power consumption in sensor network node processors and the research presented here tries to explore the suitability of asynchronous circuits for this purpose. The Handshake Solutions toolkit is used to implement an asynchronous version of a sensor processor. The design is made compact, trading area and leakage power savings with dynamic power costs, targeting the typical sparse operating characteristics of sensor node processors. It is then compared with a synchronous version of the same processor based on a reasonable power metric to guarantee accurate comparison. Apart from that, we also compare the design effort between synchronous and asynchronous implementations.
    In this paper we present the architecture for virtual self-timed blocks. Being globally asynchronous locally synchronous (GALS) and lazy reactive processing units, such blocks target multi-processing on-chip systems where power... more
    In this paper we present the architecture for virtual self-timed blocks. Being globally asynchronous locally synchronous (GALS) and lazy reactive processing units, such blocks target multi-processing on-chip systems where power consumption is an important factor. The architecture provides a hardware foundation which transparently supports the systematic organization of application-level activities (processes) and the efficient use of system resources. It further
    As important communication components of asynchronous systems, the ACMs have been studied for many years. A well known Pool using 4 data slots was proposed by H. R. Simpson. However, under certain assumptions, the number of slots in... more
    As important communication components of asynchronous systems, the ACMs have been studied for many years. A well known Pool using 4 data slots was proposed by H. R. Simpson. However, under certain assumptions, the number of slots in shared memory can be reduced to 3. Mutex, David Cells and SYNCs are used here to implement the 3-slot Signal. The design performed well, maintaining all the required asynchronous properties. It is also a potential building block for the design of low-power heterogeneous systems.
    Research Interests:
    A novel self-timed communication,protocol is based upon phase-modulation of a reference signal. The reference and the data are sent on the same,transmission lines and the data can be recovered observing the sequence,of events on the... more
    A novel self-timed communication,protocol is based upon phase-modulation of a reference signal. The reference and the data are sent on the same,transmission lines and the data can be recovered observing the sequence,of events on the same,lines. The sender block consists of a reference generator and variable-delay elements, while the receiver includes a delay-locked loop for synchronization and a mutual exclusion element with additional logic (validity bit and FIFO) for data recovery. This protocol exhibits high robustness with respect to transient errors caused by narrow pulse interference, usually associated with crosstalk and radiation.
    The hardware implementation of AES algorithm as an asynchronous circuit has a reduced leakage of information through side-channels and enjoys high performance and low power. Dual-rail data encoding and return-to-spacer protocol are used... more
    The hardware implementation of AES algorithm as an asynchronous circuit has a reduced leakage of information through side-channels and enjoys high performance and low power. Dual-rail data encoding and return-to-spacer protocol are used to avoid hazards, including data-dependent glitches, and in order to make switching activity data-independent (constant). The implementation uses a coarse pipeline architecture which is different from traditional
    ABSTRACT Dynamic power management (DPM) is one of the main system-level low-power techniques for portable devices. This study presents a fine-grain Markov modelling approach that enables accurate analysis of system power and latency... more
    ABSTRACT Dynamic power management (DPM) is one of the main system-level low-power techniques for portable devices. This study presents a fine-grain Markov modelling approach that enables accurate analysis of system power and latency characteristics with full consideration of mode switching overheads in both processor and power controller. The new approach also makes it possible to incorporate latency analysis in terms of deadline satisfaction.
    ABSTRACT A dynamic global security-aware synthesis flow using the SystemC language is presented. SystemC security models are first specified at the system or behavioural level using a library of SystemC behavioural descriptions which... more
    ABSTRACT A dynamic global security-aware synthesis flow using the SystemC language is presented. SystemC security models are first specified at the system or behavioural level using a library of SystemC behavioural descriptions which provide for the reuse and extension of security modules. At the core of the system is incorporated a global security-aware scheduling algorithm which allows for scheduling to a mixture of components of varying security level. The output from the scheduler is translated into annotated nets which are subsequently passed to allocation, optimisation and mapping tools for mapping into circuits. The synthesised circuits incorporate asynchronous secure power-balanced and fault-protected components. Results show that the approach offers robust implementations and efficient security/area trade-offs leading to significant improvements in turnover.

    And 10 more