—Multicore embedded systems are rapidly emerging. Hardware designers are packing more and more fe... more —Multicore embedded systems are rapidly emerging. Hardware designers are packing more and more features into their design. Introducing heterogeneity in these systems, i.e. adding cores of varying types does provide opportunities to solve problems in different aspects. However, this presents several challenges to embedded system programmers since software is still not mature enough to efficiently exploit the capabilities of the emerging hardware rich with cores of varying types. Programmers still rely on understanding and using low-level hardware-specific API. This approach is not only very time-consuming but also tedious and error-prone. Moreover, the solutions developed are very closely tied to a particular hardware raising significant concerns with software portability. What we need is an industry standard that will enable better programming practices for both current and future embedded systems. To that end, in our project, we have explored the possibility of using existing standards such as OpenMP that provides portable high-level programming constructs along with another industry-driven standard for multicore systems, MCA. For our work, we have considered the GNU compiler since it is the compiler that mostly used in the embedded system domain facilitating open source development. We target a platform consisting of twelve PowerPC e6500 64-bit dual-threaded cores. We create a portable software solution by studying the GNU OpenMP runtime library and extending it to incorporate MCA libraries. The solution abstracts the low-level details of the target platform and the results show that the additional MCA layer does not incur any overhead. The results are competitive when compared with a proprietary toolchain.
—Heterogeneous multicore embedded systems are rapidly growing with cores of varying types and cap... more —Heterogeneous multicore embedded systems are rapidly growing with cores of varying types and capacity. Programming these devices and exploiting the hardware has been a real challenge. The programming models and its execution are typically meant for general purpose computation; they are mostly too heavy to be adopted for the resource-constrained embedded systems. Embedded programmers are still expected to use low-level and proprietary APIs, making the software built less and less portable. These challenges motivated us to explore how OpenMP, a high-level directive-based model, could be used for embedded platforms. In this paper, we translate OpenMP to Multicore Association Task Management API (MTAPI), which is a standard API for leveraging task parallelism on embedded platforms. Results demonstrate that the performance of our OpenMP runtime library is comparable to the state-of-the-art task parallel solutions. We believe this approach will provide a portable solution since it abstracts the low-level details of the hardware and no longer depends on vendor-specific API.
—Programming emerging complex embedded systems is a challenge. Embedded applications are complica... more —Programming emerging complex embedded systems is a challenge. Embedded applications are complicated enough, hence demanding code reuse and easy adoption. Unfortunately existing software solutions expect programmers to handle most of the low-level details giving rise to a plethora of non-portable proprietary commercial solutions. The need to have industry-standards is becoming more and more critical. The Multicore Association (MCA) offers industry-driven standard-based approaches that provide portable and scalable solutions. In this paper, we use Multicore Communication API (MCAPI), one of the popular APIs used in the embedded industry enabling inter-core communication and synchronization. We have extended the reference MCAPI implementation for a Freescale QorlQ P4080 multicore platform consisting of eight e500mc Power Architecture T M and specialized accelerators such as Pattern Match Engine (PME) and Security Engine (SEC) integrated with Data Path Acceleration Accelerators (DPAA). We establish communication with PME from power cores, using MCAPI, thus abstracting all low-level configurations and function calls.
Multicore embedded systems are being widely used in telecommu-nication systems, robotics, medical... more Multicore embedded systems are being widely used in telecommu-nication systems, robotics, medical applications and more. While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit the capabilities that the hardware offers, software developers are expected to handle many of the low-level details of programming including utilizing DMA, ensuring cache coherency, and inserting synchronization primitives explicitly. The state-of-the-art involves solutions where the software toolchain is too vendor-specific thus tying the software to a particular hardware leaving no room for portability. In this paper we present a runtime system to explore mapping a high-level programming model, OpenMP, on to multicore embedded systems. A key feature of our scheme is that unlike the existing approaches that largely rely on POSIX threads, our approach leverages the Multicore Association (MCA) APIs as an OpenMP translation layer. The MCA APIs is a set of low-level APIs handling resource management, inter-process communications and task scheduling for multicore embedded systems. By deploying the MCA APIs, our runtime is able to effectively capture the characteristics of multicore embedded systems compared with the POSIX threads. Furthermore, the MCA layer enables our run-time implementation to be portable across various architectures. Thus programmers only need to maintain a single OpenMP code base which is compatible by various compilers, while on the other hand, the code is portable across different possible types of platforms. We have evaluated our runtime system using several embedded benchmarks. The experiments demonstrate promising and competitive performance compared to the native approach for the platform.
—Multicore embedded systems are rapidly emerging. Hardware designers are packing more and more fe... more —Multicore embedded systems are rapidly emerging. Hardware designers are packing more and more features into their design. Introducing heterogeneity in these systems, i.e. adding cores of varying types does provide opportunities to solve problems in different aspects. However, this presents several challenges to embedded system programmers since software is still not mature enough to efficiently exploit the capabilities of the emerging hardware rich with cores of varying types. Programmers still rely on understanding and using low-level hardware-specific API. This approach is not only very time-consuming but also tedious and error-prone. Moreover, the solutions developed are very closely tied to a particular hardware raising significant concerns with software portability. What we need is an industry standard that will enable better programming practices for both current and future embedded systems. To that end, in our project, we have explored the possibility of using existing standards such as OpenMP that provides portable high-level programming constructs along with another industry-driven standard for multicore systems, MCA. For our work, we have considered the GNU compiler since it is the compiler that mostly used in the embedded system domain facilitating open source development. We target a platform consisting of twelve PowerPC e6500 64-bit dual-threaded cores. We create a portable software solution by studying the GNU OpenMP runtime library and extending it to incorporate MCA libraries. The solution abstracts the low-level details of the target platform and the results show that the additional MCA layer does not incur any overhead. The results are competitive when compared with a proprietary toolchain.
—Heterogeneous multicore embedded systems are rapidly growing with cores of varying types and cap... more —Heterogeneous multicore embedded systems are rapidly growing with cores of varying types and capacity. Programming these devices and exploiting the hardware has been a real challenge. The programming models and its execution are typically meant for general purpose computation; they are mostly too heavy to be adopted for the resource-constrained embedded systems. Embedded programmers are still expected to use low-level and proprietary APIs, making the software built less and less portable. These challenges motivated us to explore how OpenMP, a high-level directive-based model, could be used for embedded platforms. In this paper, we translate OpenMP to Multicore Association Task Management API (MTAPI), which is a standard API for leveraging task parallelism on embedded platforms. Results demonstrate that the performance of our OpenMP runtime library is comparable to the state-of-the-art task parallel solutions. We believe this approach will provide a portable solution since it abstracts the low-level details of the hardware and no longer depends on vendor-specific API.
—Programming emerging complex embedded systems is a challenge. Embedded applications are complica... more —Programming emerging complex embedded systems is a challenge. Embedded applications are complicated enough, hence demanding code reuse and easy adoption. Unfortunately existing software solutions expect programmers to handle most of the low-level details giving rise to a plethora of non-portable proprietary commercial solutions. The need to have industry-standards is becoming more and more critical. The Multicore Association (MCA) offers industry-driven standard-based approaches that provide portable and scalable solutions. In this paper, we use Multicore Communication API (MCAPI), one of the popular APIs used in the embedded industry enabling inter-core communication and synchronization. We have extended the reference MCAPI implementation for a Freescale QorlQ P4080 multicore platform consisting of eight e500mc Power Architecture T M and specialized accelerators such as Pattern Match Engine (PME) and Security Engine (SEC) integrated with Data Path Acceleration Accelerators (DPAA). We establish communication with PME from power cores, using MCAPI, thus abstracting all low-level configurations and function calls.
Multicore embedded systems are being widely used in telecommu-nication systems, robotics, medical... more Multicore embedded systems are being widely used in telecommu-nication systems, robotics, medical applications and more. While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit the capabilities that the hardware offers, software developers are expected to handle many of the low-level details of programming including utilizing DMA, ensuring cache coherency, and inserting synchronization primitives explicitly. The state-of-the-art involves solutions where the software toolchain is too vendor-specific thus tying the software to a particular hardware leaving no room for portability. In this paper we present a runtime system to explore mapping a high-level programming model, OpenMP, on to multicore embedded systems. A key feature of our scheme is that unlike the existing approaches that largely rely on POSIX threads, our approach leverages the Multicore Association (MCA) APIs as an OpenMP translation layer. The MCA APIs is a set of low-level APIs handling resource management, inter-process communications and task scheduling for multicore embedded systems. By deploying the MCA APIs, our runtime is able to effectively capture the characteristics of multicore embedded systems compared with the POSIX threads. Furthermore, the MCA layer enables our run-time implementation to be portable across various architectures. Thus programmers only need to maintain a single OpenMP code base which is compatible by various compilers, while on the other hand, the code is portable across different possible types of platforms. We have evaluated our runtime system using several embedded benchmarks. The experiments demonstrate promising and competitive performance compared to the native approach for the platform.
Uploads
Papers by Peng Sun