research-article

A configurable algorithm for parallel image-compositing applications

Authors:

Rajeev ThakurAuthors Info & Claims

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Article No.: 4, Pages 1 - 10

https://doi.org/10.1145/1654059.1654064

Published: 14 November 2009 Publication History

Abstract

Collective communication operations can dominate the cost of large-scale parallel algorithms. Image compositing in parallel scientific visualization is a reduction operation where this is the case. We present a new algorithm called Radix-k that in many cases performs better than existing compositing algorithms. It does so through a set of configurable parameters, the radices, that determine the number of communication partners in each message round. The algorithm embodies and unifies binary swap and direct-send, two of the best-known compositing methods, and enables numerous other configurations through appropriate choices of radices. While the algorithm is not tied to a particular computing architecture or network topology, the selection of radices allows Radix-k to take advantage of new supercomputer interconnect features such as multiporting. We show scalability across image size and system size, including both powers of two and nonpowers-of-two process counts.

References

[1]

Argonne Leadership Computing Facility. 2009. http://www.alcf.anl.gov/.

[2]

J. Ahrens and J. Painter. Efficient sort-last rendering using compression-based image compositing. In Proc. Eurographics Parallel Graphics and Visualization Symposium 2008, Bristol, United Kingdom, 1998.

[3]

M. Barnett, S. Gupta, D. G. Payne, L. Shuler, R. Geijn, and J. Watts. Interprocessor collective communication library (intercom. In In Proceedings of the Scalable High Performance Computing Conference, pages 357--364. IEEE Computer Society Press, 1994.

Digital Library

[4]

M. Barnett, D. G. Payne, R. A. van de Geijn, and J. Watts. Broadcasting on meshes with wormhole routing. Journal of Parallel Distributed Computing, 35(2):111--122, 1996.

Digital Library

[5]

M. Bernaschi and G. Iannello. Collective communication operations: Experimental results vs. theory. Concurrency, 10(5):359--386, 1998.

[6]

J. Bruck, C.-T. Ho, S. Kipnis, and D. Weathersby. Efficient algorithms for all-to-all communications in multi-port message-passing systems. In SPAA '94: Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, pages 298--309, New York, NY, USA, 1994. ACM.

Digital Library

[7]

X. Cavin, C. Mion, and A. Fibois. Cots cluster-based sort-last rendering: Performance evaluation and pipelined implementation. In Proc. IEEE Visualization 2005, pages 111--118, 2005.

[8]

A. Chan, W. Gropp, and E. Lusk. An efficient format for nearly constant-time access to arbitrary time intervals in large trace files. Scientific Programming, 16(2--3):155--165, 2008.

Digital Library

[9]

E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience: Research articles. Concurr. Comput.: Pract. Exper., 19(13):1749--1783, 2007.

Digital Library

[10]

E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--11, New York, NY, USA, 2006. ACM.

Digital Library

[11]

W. M. Hsu. Segmented ray casting for data parallel volume rendering. In Proc. 1993 Parallel Rendering Symposium, pages 7--14, San Jose, CA, 1993.

Digital Library

[12]

G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P. D. Kirchner, and J. T. Klosowski. Chromium: a stream-processing framework for interactive rendering on clusters. ACM Trans. Graph., 21(3):693--702, 2002.

Digital Library

[13]

S. Kumar, G. Dozsa, G. Almasi, P. Heidelberger, D. Chen, M. E. Giampapa, M. Blocksome, A. Faraj, J. Parker, J. Ratterman, B. Smith, and C. J. Archer. The deep computing messaging framework: generalized scalable message passing on the blue gene/p supercomputer. In ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, pages 94--103, New York, NY, USA, 2008. ACM.

Digital Library

[14]

S. Kumar, G. Dozsa, J. Berg, B. Cernohous, D. Miller, J. Ratterman, B. Smith, and P. Heidelberger. Architecture of the component collective messaging interface. In Euro PVM/MPI '08: Proceedings of the 15th annual European PVM/MPI users' group meeting, pages 23--32, New York, NY, USA, 2008. Springer.

Digital Library

[15]

T.-Y. Lee, C. S. Raghavendra, and J. B. Nicholas. Image composition schemes for sort-last polygon rendering on 2d mesh multicomputers. IEEE Transactions on Visualization and Computer Graphics, 2(3):202--217, 1996.

Digital Library

[16]

K.-L. Ma and V. Interrante. Extracting feature lines from 3d unstructured grids. In Proc. IEEE Visualization 1997, pages 285--292, Phoenix, AZ, 1997.

Digital Library

[17]

K.-L. Ma, J. S. Painter, C. D. Hansen, and M. F. Krogh. Parallel volume rendering using binary-swap compositing. IEEE Computer Graphics and Applications, 14(4):59--68, 1994.

Digital Library

[18]

S. Molnar, M. Cox, D. Ellsworth, and H. Fuchs. A sorting classification of parallel rendering. IEEE Computer Graphics and Applications, 14(4):23--32, 1994.

Digital Library

[19]

K. Moreland, B. Wylie, and C. Pavlakos. Sort-last parallel rendering for viewing extremely large data sets on tile displays. In PVG '01: Proceedings of the IEEE 2001 symposium on parallel and large-data visualization and graphics, pages 85--92, Piscataway, NJ, USA, 2001. IEEE Press.

Digital Library

[20]

U. Neumann. Parallel volume-rendering algorithm performance on mesh-connected multicomputers. In Proc. 1993 Parallel Rendering Symposium, pages 97--104, San Jose, CA, 1993.

Digital Library

[21]

U. Neumann. Communication costs for parallel volume-rendering algorithms. IEEE Computer Graphics and Applications, 14(4):49--58, 1994.

Digital Library

[22]

J. Nonaka, K. Ono, and H. Miyachi. Theoretical and practical performance and scalability analyses of binary-swap image composition method on ibm blue gene/l. In Proc. 2008 International Workshop on Super Visualization (unpublished manuscript), Kos, Greece, 2008.

[23]

T. Porter and T. Duff. Compositing digital images. In Proc. 11th Annual Conference on Computer Graphics and Interactive Techniques, pages 253--259, 1984.

Digital Library

[24]

D. Pugmire, L. Monroe, A. DuBois, and D. DuBois. Npu-based image compositing in a distributed visualization system. IEEE Transactions on Visualization and Computer Graphics, 13(4):798--809, 2007. Member-Connor Davenport, Carolyn and Member-Poole, Stephen.

Digital Library

[25]

R. Rabenseifner. New Optimized MPI Reduce Algorithm. 2004. http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html.

[26]

R. Rabenseifner and J. L. Traff. More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In Proc. EuroPVM/MPI 2004, pages 36--46, Budapest, Hungary, 2004.

[27]

A. Stompel, K.-L. Ma, E. B. Lum, J. Ahrens, and J. Patchett. Slic: Scheduled linear image compositing for parallel volume rendering. In Proc. IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pages 33--40, Seattle, WA, 2003.

Digital Library

[28]

A. Takeuchi, F. Ino, and K. Hagihara. An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors. Parallel Comput., 29(11--12):1745--1762, 2003.

Digital Library

[29]

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. International Journal of High Performance Computing Applications, 19:49--66, 2005.

Digital Library

[30]

J. L. Traff. An improved algorithm for (non-commutative) reduce-scatter with an application. In Proc. EuroPVM/MPI 2005, pages 129--137, Sorrento, Italy, 2005.

Digital Library

[31]

J. L. Traff, A. Ripke, C. Siebert, P. Balaji, R. Thakur, and W. Gropp. A simple, pipelined algorithm for large, irregular all-gather problems. In Proc. EuroPVM/MPI 2008, Dublin, Ireland, 2008.

Digital Library

[32]

H. Yu, C. Wang, and K.-L. Ma. Massively parallel volume rendering using 2--3 swap image compositing. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.

Digital Library

Cited By

Mathai MLarsen MChilds H(2023)A Distributed-Memory Parallel Approach for Volume Rendering with Shadows2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00010(22-31)Online publication date: 23-Oct-2023
https://doi.org/10.1109/LDAV60332.2023.00010
Bicer TYu XChing DChard RCherukara MNicolae BKettimuthu RFoster I(2022)High-Performance Ptychographic Reconstruction with Federated FacilitiesDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation10.1007/978-3-030-96498-6_10(173-189)Online publication date: 10-Mar-2022
https://doi.org/10.1007/978-3-030-96498-6_10
Usher WAmstutz JGünther JKnoll AJohnson GBrownlee CHota ACherniak BRowley TJeffers JPascucci V(2022)Scalable CPU Ray Tracing for In Situ Visualization Using OSPRayIn Situ Visualization for Computational Science10.1007/978-3-030-81627-8_16(353-374)Online publication date: 5-May-2022
https://doi.org/10.1007/978-3-030-81627-8_16
Show More Cited By

Index Terms

A configurable algorithm for parallel image-compositing applications
1. Information systems
  1. Information systems applications

Recommendations

An image compositing solution at scale
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

The only proven method for performing distributed-memory parallel rendering at large scales, tens of thousands of nodes, is a class of algorithms called sort last. The fundamental operation of sort-last parallel rendering is an image composite, which ...
An improvement on binary-swap compositing for sort-last parallel rendering
SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

Sort-last parallel rendering is a good rendering scheme on distributed memory multiprocessors. This paper presents an improvement on the binary-swap (BS) method, which is an efficient image compositing algorithm for sort-last parallel rendering. Our ...
An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors
Special issue: Parallel and distributed scientific and engineering computing

Sort-last parallel rendering is a good rendering scheme on distributed memory multiprocessors. This paper presents an improvement on the binary-swap (BS) method, which is an efficient image compositing algorithm for sort-last parallel rendering. Our ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

November 2009

778 pages

ISBN:9781605587448

DOI:10.1145/1654059

Conference Chair:
Wilfred Pinfold

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SC '09

Sponsor:

SIGARCH
IEEE-CS

SC '09: International Conference for High Performance Computing, Networking, Storage and Analysis

November 14 - 20, 2009

Oregon, Portland

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
63
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mathai MLarsen MChilds H(2023)A Distributed-Memory Parallel Approach for Volume Rendering with Shadows2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00010(22-31)Online publication date: 23-Oct-2023
https://doi.org/10.1109/LDAV60332.2023.00010
Bicer TYu XChing DChard RCherukara MNicolae BKettimuthu RFoster I(2022)High-Performance Ptychographic Reconstruction with Federated FacilitiesDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation10.1007/978-3-030-96498-6_10(173-189)Online publication date: 10-Mar-2022
https://doi.org/10.1007/978-3-030-96498-6_10
Usher WAmstutz JGünther JKnoll AJohnson GBrownlee CHota ACherniak BRowley TJeffers JPascucci V(2022)Scalable CPU Ray Tracing for In Situ Visualization Using OSPRayIn Situ Visualization for Computational Science10.1007/978-3-030-81627-8_16(353-374)Online publication date: 5-May-2022
https://doi.org/10.1007/978-3-030-81627-8_16
Shudler SPetruzza SPascucci VBremer P(2021)Portable and Composable Flow Graphs for In Situ Analytics2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV53230.2021.00014(63-72)Online publication date: Oct-2021
https://doi.org/10.1109/LDAV53230.2021.00014
Lipinksi RMoreland KPapka MMarrinan T(2021)GPU-based Image Compression for Efficient Compositing in Distributed Rendering Applications2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV53230.2021.00012(43-52)Online publication date: Oct-2021
https://doi.org/10.1109/LDAV53230.2021.00012
Ren XLis M(2021)CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00065(709-722)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00065
Chu DWu C(2021)Generalizing the over operator for parallelization and order-independencyJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.02.001151(52-60)Online publication date: May-2021
https://doi.org/10.1016/j.jpdc.2021.02.001
Atzori MKöpp WChien SMassaro DMallor FPeplinski ARezaei MJansson NMarkidis SVinuesa RLaure ESchlatter PWeinkauf T(2021)In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView CatalystThe Journal of Supercomputing10.1007/s11227-021-03990-3Online publication date: 2-Aug-2021
https://doi.org/10.1007/s11227-021-03990-3
Proficz JOcetkiewicz K(2020)Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patternsThe Journal of Supercomputing10.1007/s11227-020-03499-1Online publication date: 20-Nov-2020
https://doi.org/10.1007/s11227-020-03499-1
Usher WWald IAmstutz JGünther JBrownlee CPascucci V(2019)Scalable Ray Tracing Using the Distributed FrameBufferComputer Graphics Forum10.1111/cgf.1370238:3(455-466)Online publication date: 10-Jul-2019
https://doi.org/10.1111/cgf.13702
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents