Abstract
The Diagnostic ultrasound is a rapidly developing imaging technology that is widely used in the clinic. A typical ultrasound imaging pipeline including the following algorithms: beamforming, Envelope detection, log-compression, and scan-conversion [1]. In tradition, ultrasound imaging is implemented using Application-specific integrated circuits (ASICs) and FPGAs due to its high throughput and massive data processing requirements. With the development of the GPGPU and its programming environments (e.g. CUDA), researchers use software to implement ultrasound imaging algorithms [2], [3].
For now, the two limiting factors of developing ultrasound imaging are: First, using a hardware development approach to implement ultrasound imaging algorithms is complex, time-consuming and lacks flexibility. Second, the existing CUDA-based ultrasound imaging implementations are limited to Nvidia hardware, which is also a restriction applying more architectures.
oneAPI is a cross-platform and unified programming environment developed by intel. It enables heterogeneous computing across multiple hardware architectures using Data Parallel C++ (DPC++). This new programming suite can be used to address the problems mentioned above. To be clear, using a high-level language like DPC++ to program FPGA can accelerate ultrasound imaging application development. SYCL-based ultrasound imaging applications can be easily migrated to other vendor's hardware.
To implement an ultrasound imaging application across multiple architectures (e.g., GPU, FPGA, and CPU) in a unified programming environment. We migrated a CUDA-based open-source ultrasound imaging project SUPRA [4]. The migration process was performed using oneAPI compatibility tool (e.g. dpct). After migration, the code was tuned to run on GPU, FPGA, and CPU.
In this talk, we will discuss our experiences with the complete process of migrating a CUDA code to oneAPI code. First, the whole process of migrating CUDA code base using the dpct will be presented, including usage, code modification, API comparison and build instruction. Second, the ultrasound imaging algorithms’ computation characteristics will be analyzed, and we will show how to optimize the application on Intel GPUs, Including ESIDM usage. Third, the early experiences of tuning the migrated code to target FPGA will be highlighted, this will include device code rewrite for FPGA and programming skills to improve performance on FPGA. The device code comparison of GPU and FPGA will also be discussed. Last, we will compare ultrasound imaging algorithms performance and computation results on different hardware, including Intel GPU (integrated GPU and discrete GPU), Intel Arria 10 FPGA, Intel CPU, Nvidia GTX 1080 GPU, and GTX 960M GPU.