research-article

HIPA<sup>cc</sup>: A Domain-Specific Language and Compiler for Image Processing

Authors:

Richard Membarth,

Wieland EckertAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 27, Issue 1

Pages 210 - 224

https://doi.org/10.1109/TPDS.2015.2394802

Published: 01 January 2016 Publication History

Abstract

Domain-specific languages (DSLs) provide high-level and domain-specific abstractions that allow expressive and concise algorithm descriptions. Since the description in a DSL hides also the properties of the target hardware, DSLs are a promising path to target different parallel and heterogeneous hardware from the same algorithm description. In theory, the DSL description can capture all characteristics of the algorithm that are required to generate highly efficient parallel implementations. However, most frameworks do not make use of this knowledge and the performance cannot reach that of optimized library implementations. In this article, we present the HIPA<sup>cc</sup> framework, a DSL and source-to-source compiler for image processing. We show that domain knowledge can be captured in the language and that this knowledge enables us to generate tailored implementations for a given target architecture. Back ends for CUDA, OpenCL, and Renderscript allow us to target discrete graphics processing units (GPUs) as well as mobile, embedded GPUs. Exploiting the captured domain knowledge, we can generate specialized algorithm variants that reach the maximal achievable performance due to the peak memory bandwidth. These implementations outperform state-of-the-art domain-specific languages and libraries significantly.

References

[1]

P. Du, R. Weber, P. Luszczek, S. Tomov, G. Peterson, and J. Dongarra, “ From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming, ” Parallel Comput., vol. 38, no. 8, pp. 391–407, 2011.

[2]

J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, “Decoupling algorithms from schedules for easy optimization of image processing pipelines,” ACM Trans. Graph., vol. 31, no. 4, p. 32, Jul. 2012.

Digital Library

[3]

T. Lepley, P. Paulin, and E. Flamand, “A novel compilation approach for image processing graphs on a many-core platform with explicitly managed memory,” in Proc. Int. Conf. Compilers, Archit. Synthesis Embedded Syst., Sep. 2013, pp. 6:1–6:10.

[4]

Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan, “Liszt: A domain specific language for building portable mesh-based PDE solvers,” in Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., Nov. 2011, pp. 9:1–9:12.

Digital Library

[5]

A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, H. Chafi, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun, “OptiML: An implicitly parallel domain-specific language for machine learning,” in Proc. 28th Int. Conf. Mach. Learn., Jun. 2011, pp. 609–616.

[6]

R. Membarth, F. Hannig, J. Teich, M. Körner, and W. Eckert, “Generating device-specific GPU code for local operators in medical imaging,” in Proc. 26th IEEE Int. Parallel Distrib. Process. Symp., 2012, pp. 569–581.

[7]

R. Membarth, “Code generation for GPU accelerators from a domain-specific language for medical imaging,” Ph.d. dissertation, Hardware/Softw. Co-Design, Dept. Comput. Sci., Univ. Erlangen-Nuremberg, Germany, verlag Dr. Hut, Munich, Germany.

[8]

R. Membarth, O. Reiche, F. Hannig, and J. Teich, “Code generation for embedded heterogeneous architectures on Android,” in Proc. Conf. Des., Autom. Test Eur., 2014, pp. 86:1–86:6.

[9]

S. Williams, A. Waterman, and D. Patterson, “ Roofline: An insightful visual performance model for multicore architectures,” Commun. ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009.

Digital Library

[10]

M. H. Halstead, Elements of Software Science, (ser. Operating and Programming Systems). New York, NY, USA: Elsevier, 1977.

Digital Library

[11]

I. N. Bankman, Handbook of Medical Image Processing and Analysis, vol. 2. New York, NY, USA: Academic, 2008.

[12]

J. C. Russ, The Image Processing Handbook, vol. 5. Boca Raton, FL, USA: CRC Press, 2006.

[13]

R. Klette and P. Zamperoni, Handbook of Image Processing Operators, vol. 1. Hoboken, NJ, USA : Wiley, 1996.

Digital Library

[14]

P. Burt and E. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., vol. 31, no. 4, pp. 532–540, Apr. 1983.

[15]

J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. Sebastopol, CA, USA: O’Reilly Media, 2007.

Digital Library

[16]

C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. 6th Int. Conf. Comput. Vis., Jan. 1998, pp. 839–846.

[17]

G. E. Blelloch, “Prefix sums and their applications,” in Synthesis of Parallel Algorithms, J. H. Reif, Ed. San Mateo, CA, USA : Morgan Kaufmann, 1993, ch. 1, pp. 35 –60.

[18]

A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, vol. 2. Reading, MA, USA: Addison-Wesley, 1986.

Digital Library

[19]

N. Wirth, “Program development by stepwise refinement,” Commun. ACM, vol. 14, no. 4, pp. 221–227, Apr. 1971.

Digital Library

[20]

R. Karrenberg and S. Hack, “Whole-function vectorization,” in Proc. 9th Annu. IEEE/ACM Int. Symp. Code Generation Optim., 2011, pp. 141–150.

Digital Library

[21]

M. Wolfe, High Performance Compilers for Parallel Computing. Reading, MA, USA: Addison-Wesley, 1996.

Digital Library

[22]

RapidMind, RapidMind Development Platform Documentation. Waterloo, Ontario, Canada: RapidMind Inc., 2009.

[23]

C. H. González and B. B. Fraguela, “A generic algorithm template for divide-and-conquer in multicore systems,” in Proc. 12th Int. Conf. High Perform. Comput. Commun., Sep. 2010, pp. 79–88.

[24]

J. M. Stroud, “The fine structure of psychological time,” in Inf. Theory in Psychology. New York, NY, US: Free Press, 1956.

[25]

R. D. Gordon and M. H. Halstead, “An experiment comparing Fortran programming times with the software physics hypothesis,” in Proc. Nat. Comput. Conf.; Amer. Federation Inf. Process. Soc., Jun. 1976, pp. 935–937.

[26]

C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. 4th Alvey Vis. Conf., 1988, pp. 147–151.

[27]

F. Stein, “Efficient computation of optical flow using the census transform,” in Proc. DAGM Pattern Recognit., 2004, pp. 79 –86.

[28]

P. Feautrier and C. Lengauer, “Polyhedron model,” in Encyclopedia of Parallel Computing. New York, NY, USA: Springer, 2011, pp. 1581–1592.

[29]

H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun, “Language virtualization for heterogeneous parallel computing,” in Proc. ACM Int. Conf. Object Oriented Programm. Syst. Lang. Appl., Oct. 2010, pp. 835–847.

[30]

H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun, “A domain-specific approach to heterogeneous parallelism,” in Proc. 16th Annu. Symp. Principles Practice Parallel Programm., Feb. 2011, pp. 35–46.

Digital Library

[31]

L. Howes, A. Lokhmotov, A. Donaldson, and P. H. J. Kelly, “Deriving efficient data movement from decoupled access/execute specifications,” in Proc. 4th Int. Conf. High-Perform. Embedded Archit. Compilers, 2009, pp. 168–182.

[32]

J. L. Cornwall, L. Howes, P. H. J. Kelly, P. Parsonage, and B. Nicoletti, “High-performance SIMT code generation in an active visual effects library,” in Proc. 6th ACM Conf. Comput. Frontiers, 2009, pp. 175–184.

Digital Library

[33]

M. McCool, S. Du Toit, T. Popa, B. Chan, and K. Moule, “Shader algebra,” ACM Trans. Graph. , vol. 23, no. 3, pp. 787–795, 2004.

Digital Library

[34]

C. J. Newburn, B. So, Z. Liu, M. McCool, A. Ghuloum, S. Du Toit, Z. G. Wang, Z. H. Du, Y. Chen, G. Wu, P. Guo, Z. Liu, and D. Zhang, “Intel’s array building blocks: A retargetable, dynamic compiler and embedded language, ” in Proc. 9th Annu. IEEE/ACM Int. Symp. Code Generation Optim., Apr. 2011, pp. 224–235.

[35]

L. Howes, A. Lokhmotov, A. F. Donaldson, and P. H. J. Kelly, “Towards metaprogramming for parallel systems on a chip,” in Proc. 3rd Workshop Highly Parallel Process. Chip, 2009, pp. 36–45.

[36]

R. Membarth, A. Lokhmotov, and J. Teich, “Generating GPU code from a high-level representation for image processing kernels,” in Proc. 5th Workshop Highly Parallel Process. Chip, 2011, pp. 270–280.

[37]

O. Reiche, M. Schmid, F. Hannig, R. Membarth, and J. Teich, “Code generation from a domain-specific language for C-based HLS of hardware accelerators,” in Proc. Int. Conf. Hardware/Softw. Codesign Syst. Synthesis , 2014, pp. 17:1–17:10.

Cited By

Feng YMena JYang HWang HJeremić B(2024)Domain specific language for finite element modeling and simulationAdvances in Engineering Software10.1016/j.advengsoft.2024.103666193:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.advengsoft.2024.103666
Du Bois ACavalheiro G(2023)GPotion: An embedded DSL for GPU programming in ElixirProceedings of the XXVII Brazilian Symposium on Programming Languages10.1145/3624309.3624314(1-8)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1145/3624309.3624314
Wan ZZhang YXia XJiang YLo DChandra SBlincoe KTonella P(2023)Software Architecture in Practice: Challenges and OpportunitiesProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616367(1457-1469)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616367
Show More Cited By

Index Terms

HIPA^cc: A Domain-Specific Language and Compiler for Image Processing
1. Computing methodologies
  1. Computer graphics
  2. Parallel computing methodologies
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Parallel programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Declaratively defining domain-specific language debuggers
GCPE '11

Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification ...
Declaratively defining domain-specific language debuggers
GPCE '11: Proceedings of the 10th ACM international conference on Generative programming and component engineering

Tool support is vital to the effectiveness of domain-specific languages. With language workbenches, domain-specific languages and their tool support can be generated from a combined, high-level specification. This paper shows how such a specification ...
Teaching compiler construction using a domain specific language

Building a compiler for a domain specific language (a language designed for a specific problem domain) can engage students more than traditional compiler course projects. Most students feel that compiler courses are irrelevant because they are not ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 27, Issue 1

Jan. 2016

304 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © 2015.

Publisher

IEEE Press

Publication History

Published: 01 January 2016

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng YMena JYang HWang HJeremić B(2024)Domain specific language for finite element modeling and simulationAdvances in Engineering Software10.1016/j.advengsoft.2024.103666193:COnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.advengsoft.2024.103666
Du Bois ACavalheiro G(2023)GPotion: An embedded DSL for GPU programming in ElixirProceedings of the XXVII Brazilian Symposium on Programming Languages10.1145/3624309.3624314(1-8)Online publication date: 25-Sep-2023
https://dl.acm.org/doi/10.1145/3624309.3624314
Wan ZZhang YXia XJiang YLo DChandra SBlincoe KTonella P(2023)Software Architecture in Practice: Challenges and OpportunitiesProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616367(1457-1469)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616367
Juang TSchlaak CDubach C(2023)Let Coarse-Grained Resources Be Shared: Mapping Entire Neural Networks on FPGAsACM Transactions on Embedded Computing Systems10.1145/360910922:5s(1-23)Online publication date: 31-Oct-2023
https://dl.acm.org/doi/10.1145/3609109
Nourazar MBooth BGoossens B(2023)A GPU optimization workflow for real-time execution of ultra-high frame rate computer vision applicationsJournal of Real-Time Image Processing10.1007/s11554-023-01384-721:1Online publication date: 26-Nov-2023
https://dl.acm.org/doi/10.1007/s11554-023-01384-7
Fryer JGarcia P(2023)The Good, the Bad and the Ugly: Practices and Perspectives on Hardware Acceleration for Embedded Image ProcessingJournal of Signal Processing Systems10.1007/s11265-023-01885-595:10(1181-1201)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s11265-023-01885-5
Kalms LNickel MGöhringer D(2023)ArcvaVX: OpenVX Framework for Adaptive Reconfigurable Computer Vision ArchitecturesApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_7(97-112)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-42921-7_7
Wang CYao CZhao SZhao SQiang B(2022)The Theory and Method of Data Acquisition of Mixed Traffic Popular People and Nonmotor Vehicles Based on Image ProcessingMobile Information Systems10.1155/2022/96991622022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/9699162
Chen YLi J(2022)Informatization Teaching Mode of Vision Sensor Digital Image and Fusion Association Rule Mining AlgorithmSecurity and Communication Networks10.1155/2022/75022392022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7502239
Xiong XHou Y(2022)Digital Media Design for Dynamic Gesture Interaction with Image ProcessingJournal of Electrical and Computer Engineering10.1155/2022/40566222022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/4056622
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents