Today the most commonly used system architectures in data processing can be divided into three categories, general purpose processors, application specific architectures and reconfigurable architectures. Application specific architectures...
moreToday the most commonly used system architectures in data processing can be divided into three categories, general purpose processors, application specific architectures and reconfigurable architectures. Application specific architectures are efficient and give good performance, but are inflexible. Recently reconfigurable systems have drawn increasing attention due to their combination of flexibility and efficiency. Re-configurable architectures limit their flexibility to a particular algorithm. This paper introduces approaches to mapping point arithmetic. After presenting an optimal formulation using applications onto CGRAs supporting both integer and floating point unit.High-level design entry tools are essential for reconfigurable systems, especially coarse-grained reconfigurable architectures.Coarse-grained reconfigurable architectures have drawn increasing attention due to their performance and flexibility. However, their applications have been restricted to domains based on integer arithmetic since typical CGRAs support only integer arithmetic or logical operations. In this project we introduce an approach to map applications onto CGRAs supporting floating point addition. The increase in requirements for more flexibility and higher performance in embedded systems design, reconfigurable computing is becoming more and more popular. Keywords— High-level synthesis, CGRA, FPGA, Floating point unit , Reconfigurable architecture. I. INTRODUCTION Various coarse-grained reconfigurable architectures (CGRAs) have been proposed in recent years [1]–[3], with different target domains of applications and different tradeoffs between flexibility and performance. Typically, they consist is not easy to map an application onto the reconfigurable array host mostly reduced instruction set computing (RISC)] processor. The computation intensive kernels of the applications—typically loop—are mapped to the reconfigurable array while the remaining code is executed by the processor. However, it of a reconfigurable array of processing elements (PEs) and abecause of the high complexity of the problem that requires compiling the application on a dynamically reconfigurable parallel architecture, with additional complexity of dealing with complex routing resources. The problem of mapping an application onto a CGRA to minimize the number of resources giving best performance has been shown to be NP-complete [16].Few automatic mapping/compiling/synthesis tools have been developed to exploit the parallelism found in the applications and extensive computation resources of CGRAs. Some researchers [1], [4] have used structure-based or graphical user interface based design tools to manually generate a mapping, which would have a difficulty in handling big designs. Some researchers [5], [6] have only focused on instruction-level parallelism, failing to fully utilize the resources in CGRAs, which is possible by exploiting loop-level parallelism. Some researchers [7], [8], [17] have introduced a compiler to exploit the parallelism in the CGRA provided by the abundant resources. However, their approaches use shared registers to solve the mapping problem. While these shared registers explicitly during the mapping process. registers can be eliminated if routing resources are considered simplify the mapping process, theycan increase the critical path delay or latency. We show in this paper that the shared More recently, routing-aware mapping algorithms have been introduced [9]–[11]. However, they rely on step-by-step approaches, where scheduling, binding, and routing algorithms are performed sequentially. Thus, it tends to fall into a local integer type application domains, whereas ours extends the routing at the same time, thereby generating better optimized solutions. It also explicitly considers incorporating Steiner points for more efficient routing (some previous incorporate Steiner points although it is not clear whether they approaches use a kind of maze routing algorithm that can are actually incorporated). In [7], they have also presented a unified approach based on the simulated annealing algorithm. However, it takes too much time to get a solution. few minutes. Furthermore, all previous mapping/compiling/synthesis tools have been restricted to coverage to floating pointtype application domains.the approach in [7] takes hundreds of minutes whereas our optimum. Our unified approach considers scheduling and binding.