International Journal of Parallel Programming, 2008
The problem of allocating and scheduling precedence-constrained tasks on the processors of a dist... more The problem of allocating and scheduling precedence-constrained tasks on the processors of a distributed real-time system is NP-hard. As such, it has been traditionally tackled by means of heuristics, which provide only approximate or near-optimal solutions. This paper proposes a complete allocation and scheduling framework, and deploys an MPSoC virtual platform to validate the accuracy of modelling assumptions. The optimizer implements an efficient and exact approach to the mapping problem based on a decomposition strategy. The allocation subproblem is solved through Integer Programming (IP) while the scheduling one through Constraint Programming (CP). The two solvers interact by means of an iterative procedure which has been proven to converge to the optimal solution. Experimental results show significant speed-ups w.r.t. pure IP and CP exact solution strategies as well as high accuracy with respect to cycle-accurate functional simulation. Two case studies further demonstrate the practical viability of our framework for real-life applications.
The growing complexity of customizable embedded multiprocessor architectures for digital media pr... more The growing complexity of customizable embedded multiprocessor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. Here, we propose xpipes, a scalable and high-performance NoC ...
ABSTRACT We present a variation-tolerant tasking technique for tightlycoupled shared memory proce... more ABSTRACT We present a variation-tolerant tasking technique for tightlycoupled shared memory processor clusters that relies upon modeling advance across the hardware/software interface. This is implemented as an extension to the OpenMP 3.0 tasking programming model. Using the notion of Task-Level Vulnerability (TLV) proposed here, we capture dynamic variations caused by circuitlevel variability as a high-level software knowledge. This is accomplished through a variation-aware hardware/software codesign where:(i) ...
Proceedings of the 2010 international …, Oct 24, 2010
Abstract In this paper we address the issue of efficient doall workload distribution on a embedde... more Abstract In this paper we address the issue of efficient doall workload distribution on a embedded 3D MPSoC. 3D stacking technology enables low latency and high bandwidth access to multiple, large memory banks in close spatial proximity. In our implementation one silicon layer contains multiple processors, whereas one or more DRAM layers on top host a NUMA memory subsystem. To obtain high locality and balanced workload we consider a two-step approach. First, a compiler pass analyzes memory references in a loop and ...
International Journal of Parallel Programming, 2008
The problem of allocating and scheduling precedence-constrained tasks on the processors of a dist... more The problem of allocating and scheduling precedence-constrained tasks on the processors of a distributed real-time system is NP-hard. As such, it has been traditionally tackled by means of heuristics, which provide only approximate or near-optimal solutions. This paper proposes a complete allocation and scheduling framework, and deploys an MPSoC virtual platform to validate the accuracy of modelling assumptions. The optimizer implements an efficient and exact approach to the mapping problem based on a decomposition strategy. The allocation subproblem is solved through Integer Programming (IP) while the scheduling one through Constraint Programming (CP). The two solvers interact by means of an iterative procedure which has been proven to converge to the optimal solution. Experimental results show significant speed-ups w.r.t. pure IP and CP exact solution strategies as well as high accuracy with respect to cycle-accurate functional simulation. Two case studies further demonstrate the practical viability of our framework for real-life applications.
The growing complexity of customizable embedded multiprocessor architectures for digital media pr... more The growing complexity of customizable embedded multiprocessor architectures for digital media processing will soon require highly scalable network-on-chip based communication infrastructures. Here, we propose xpipes, a scalable and high-performance NoC ...
ABSTRACT We present a variation-tolerant tasking technique for tightlycoupled shared memory proce... more ABSTRACT We present a variation-tolerant tasking technique for tightlycoupled shared memory processor clusters that relies upon modeling advance across the hardware/software interface. This is implemented as an extension to the OpenMP 3.0 tasking programming model. Using the notion of Task-Level Vulnerability (TLV) proposed here, we capture dynamic variations caused by circuitlevel variability as a high-level software knowledge. This is accomplished through a variation-aware hardware/software codesign where:(i) ...
Proceedings of the 2010 international …, Oct 24, 2010
Abstract In this paper we address the issue of efficient doall workload distribution on a embedde... more Abstract In this paper we address the issue of efficient doall workload distribution on a embedded 3D MPSoC. 3D stacking technology enables low latency and high bandwidth access to multiple, large memory banks in close spatial proximity. In our implementation one silicon layer contains multiple processors, whereas one or more DRAM layers on top host a NUMA memory subsystem. To obtain high locality and balanced workload we consider a two-step approach. First, a compiler pass analyzes memory references in a loop and ...
Uploads
Papers by Luca Benini