Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1654059.1654064acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A configurable algorithm for parallel image-compositing applications

Published: 14 November 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Collective communication operations can dominate the cost of large-scale parallel algorithms. Image compositing in parallel scientific visualization is a reduction operation where this is the case. We present a new algorithm called Radix-k that in many cases performs better than existing compositing algorithms. It does so through a set of configurable parameters, the radices, that determine the number of communication partners in each message round. The algorithm embodies and unifies binary swap and direct-send, two of the best-known compositing methods, and enables numerous other configurations through appropriate choices of radices. While the algorithm is not tied to a particular computing architecture or network topology, the selection of radices allows Radix-k to take advantage of new supercomputer interconnect features such as multiporting. We show scalability across image size and system size, including both powers of two and nonpowers-of-two process counts.

    References

    [1]
    Argonne Leadership Computing Facility. 2009. http://www.alcf.anl.gov/.
    [2]
    J. Ahrens and J. Painter. Efficient sort-last rendering using compression-based image compositing. In Proc. Eurographics Parallel Graphics and Visualization Symposium 2008, Bristol, United Kingdom, 1998.
    [3]
    M. Barnett, S. Gupta, D. G. Payne, L. Shuler, R. Geijn, and J. Watts. Interprocessor collective communication library (intercom. In In Proceedings of the Scalable High Performance Computing Conference, pages 357--364. IEEE Computer Society Press, 1994.
    [4]
    M. Barnett, D. G. Payne, R. A. van de Geijn, and J. Watts. Broadcasting on meshes with wormhole routing. Journal of Parallel Distributed Computing, 35(2):111--122, 1996.
    [5]
    M. Bernaschi and G. Iannello. Collective communication operations: Experimental results vs. theory. Concurrency, 10(5):359--386, 1998.
    [6]
    J. Bruck, C.-T. Ho, S. Kipnis, and D. Weathersby. Efficient algorithms for all-to-all communications in multi-port message-passing systems. In SPAA '94: Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures, pages 298--309, New York, NY, USA, 1994. ACM.
    [7]
    X. Cavin, C. Mion, and A. Fibois. Cots cluster-based sort-last rendering: Performance evaluation and pipelined implementation. In Proc. IEEE Visualization 2005, pages 111--118, 2005.
    [8]
    A. Chan, W. Gropp, and E. Lusk. An efficient format for nearly constant-time access to arbitrary time intervals in large trace files. Scientific Programming, 16(2--3):155--165, 2008.
    [9]
    E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: theory, practice, and experience: Research articles. Concurr. Comput.: Pract. Exper., 19(13):1749--1783, 2007.
    [10]
    E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--11, New York, NY, USA, 2006. ACM.
    [11]
    W. M. Hsu. Segmented ray casting for data parallel volume rendering. In Proc. 1993 Parallel Rendering Symposium, pages 7--14, San Jose, CA, 1993.
    [12]
    G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P. D. Kirchner, and J. T. Klosowski. Chromium: a stream-processing framework for interactive rendering on clusters. ACM Trans. Graph., 21(3):693--702, 2002.
    [13]
    S. Kumar, G. Dozsa, G. Almasi, P. Heidelberger, D. Chen, M. E. Giampapa, M. Blocksome, A. Faraj, J. Parker, J. Ratterman, B. Smith, and C. J. Archer. The deep computing messaging framework: generalized scalable message passing on the blue gene/p supercomputer. In ICS '08: Proceedings of the 22nd annual international conference on Supercomputing, pages 94--103, New York, NY, USA, 2008. ACM.
    [14]
    S. Kumar, G. Dozsa, J. Berg, B. Cernohous, D. Miller, J. Ratterman, B. Smith, and P. Heidelberger. Architecture of the component collective messaging interface. In Euro PVM/MPI '08: Proceedings of the 15th annual European PVM/MPI users' group meeting, pages 23--32, New York, NY, USA, 2008. Springer.
    [15]
    T.-Y. Lee, C. S. Raghavendra, and J. B. Nicholas. Image composition schemes for sort-last polygon rendering on 2d mesh multicomputers. IEEE Transactions on Visualization and Computer Graphics, 2(3):202--217, 1996.
    [16]
    K.-L. Ma and V. Interrante. Extracting feature lines from 3d unstructured grids. In Proc. IEEE Visualization 1997, pages 285--292, Phoenix, AZ, 1997.
    [17]
    K.-L. Ma, J. S. Painter, C. D. Hansen, and M. F. Krogh. Parallel volume rendering using binary-swap compositing. IEEE Computer Graphics and Applications, 14(4):59--68, 1994.
    [18]
    S. Molnar, M. Cox, D. Ellsworth, and H. Fuchs. A sorting classification of parallel rendering. IEEE Computer Graphics and Applications, 14(4):23--32, 1994.
    [19]
    K. Moreland, B. Wylie, and C. Pavlakos. Sort-last parallel rendering for viewing extremely large data sets on tile displays. In PVG '01: Proceedings of the IEEE 2001 symposium on parallel and large-data visualization and graphics, pages 85--92, Piscataway, NJ, USA, 2001. IEEE Press.
    [20]
    U. Neumann. Parallel volume-rendering algorithm performance on mesh-connected multicomputers. In Proc. 1993 Parallel Rendering Symposium, pages 97--104, San Jose, CA, 1993.
    [21]
    U. Neumann. Communication costs for parallel volume-rendering algorithms. IEEE Computer Graphics and Applications, 14(4):49--58, 1994.
    [22]
    J. Nonaka, K. Ono, and H. Miyachi. Theoretical and practical performance and scalability analyses of binary-swap image composition method on ibm blue gene/l. In Proc. 2008 International Workshop on Super Visualization (unpublished manuscript), Kos, Greece, 2008.
    [23]
    T. Porter and T. Duff. Compositing digital images. In Proc. 11th Annual Conference on Computer Graphics and Interactive Techniques, pages 253--259, 1984.
    [24]
    D. Pugmire, L. Monroe, A. DuBois, and D. DuBois. Npu-based image compositing in a distributed visualization system. IEEE Transactions on Visualization and Computer Graphics, 13(4):798--809, 2007. Member-Connor Davenport, Carolyn and Member-Poole, Stephen.
    [25]
    R. Rabenseifner. New Optimized MPI Reduce Algorithm. 2004. http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html.
    [26]
    R. Rabenseifner and J. L. Traff. More efficient reduction algorithms for non-power-of-two number of processors in message-passing parallel systems. In Proc. EuroPVM/MPI 2004, pages 36--46, Budapest, Hungary, 2004.
    [27]
    A. Stompel, K.-L. Ma, E. B. Lum, J. Ahrens, and J. Patchett. Slic: Scheduled linear image compositing for parallel volume rendering. In Proc. IEEE Symposium on Parallel and Large-Data Visualization and Graphics, pages 33--40, Seattle, WA, 2003.
    [28]
    A. Takeuchi, F. Ino, and K. Hagihara. An improved binary-swap compositing for sort-last parallel rendering on distributed memory multiprocessors. Parallel Comput., 29(11--12):1745--1762, 2003.
    [29]
    R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. International Journal of High Performance Computing Applications, 19:49--66, 2005.
    [30]
    J. L. Traff. An improved algorithm for (non-commutative) reduce-scatter with an application. In Proc. EuroPVM/MPI 2005, pages 129--137, Sorrento, Italy, 2005.
    [31]
    J. L. Traff, A. Ripke, C. Siebert, P. Balaji, R. Thakur, and W. Gropp. A simple, pipelined algorithm for large, irregular all-gather problems. In Proc. EuroPVM/MPI 2008, Dublin, Ireland, 2008.
    [32]
    H. Yu, C. Wang, and K.-L. Ma. Massively parallel volume rendering using 2--3 swap image compositing. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--11, Piscataway, NJ, USA, 2008. IEEE Press.

    Cited By

    View all
    • (2023)A Distributed-Memory Parallel Approach for Volume Rendering with Shadows2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00010(22-31)Online publication date: 23-Oct-2023
    • (2022)High-Performance Ptychographic Reconstruction with Federated FacilitiesDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation10.1007/978-3-030-96498-6_10(173-189)Online publication date: 10-Mar-2022
    • (2022)Scalable CPU Ray Tracing for In Situ Visualization Using OSPRayIn Situ Visualization for Computational Science10.1007/978-3-030-81627-8_16(353-374)Online publication date: 5-May-2022
    • Show More Cited By

    Index Terms

    1. A configurable algorithm for parallel image-compositing applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
      November 2009
      778 pages
      ISBN:9781605587448
      DOI:10.1145/1654059
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 November 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. communication
      2. image compositing
      3. parallel scientific visualization

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SC '09
      Sponsor:

      Acceptance Rates

      SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Distributed-Memory Parallel Approach for Volume Rendering with Shadows2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV60332.2023.00010(22-31)Online publication date: 23-Oct-2023
      • (2022)High-Performance Ptychographic Reconstruction with Federated FacilitiesDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation10.1007/978-3-030-96498-6_10(173-189)Online publication date: 10-Mar-2022
      • (2022)Scalable CPU Ray Tracing for In Situ Visualization Using OSPRayIn Situ Visualization for Computational Science10.1007/978-3-030-81627-8_16(353-374)Online publication date: 5-May-2022
      • (2021)Portable and Composable Flow Graphs for In Situ Analytics2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV53230.2021.00014(63-72)Online publication date: Oct-2021
      • (2021)GPU-based Image Compression for Efficient Compositing in Distributed Rendering Applications2021 IEEE 11th Symposium on Large Data Analysis and Visualization (LDAV)10.1109/LDAV53230.2021.00012(43-52)Online publication date: Oct-2021
      • (2021)CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00065(709-722)Online publication date: Feb-2021
      • (2021)Generalizing the over operator for parallelization and order-independencyJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.02.001151(52-60)Online publication date: May-2021
      • (2021)In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView CatalystThe Journal of Supercomputing10.1007/s11227-021-03990-3Online publication date: 2-Aug-2021
      • (2020)Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patternsThe Journal of Supercomputing10.1007/s11227-020-03499-1Online publication date: 20-Nov-2020
      • (2019)Scalable Ray Tracing Using the Distributed FrameBufferComputer Graphics Forum10.1111/cgf.1370238:3(455-466)Online publication date: 10-Jul-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media