research-article

Public Access

Topology-aware optimizations for multi-GPU ptychographic image reconstruction

Authors:

Rajkumar Kettimuthu,

Ian FosterAuthors Info & Claims

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

Pages 354 - 366

https://doi.org/10.1145/3447818.3460380

Published: 04 June 2021 Publication History

Abstract

Ptychography is an advanced high-resolution X-ray imaging technique that can generate extremely large datasets. Ptychographic reconstruction transforms reciprocal space experimental data to high-resolution 2D real-space images. GPUs have been used extensively to meet the computational requirements of the reconstruction. Generic multi-GPU reconstruction solutions use common communication topologies, such as P2P graph and ring, that are provided by MPI and NCCL libraries, to establish inter-GPU communications. However, these common topologies assume homogeneous physical links between GPUs, resulting in sub-optimal performance on heterogeneous configurations that are composed of both high- (e.g., NVLink) and low-speed (e.g., PCIe) interconnects. This mismatch between application-level communication topology and physical interconnection can cause data transfer congestion, inefficient memory access, and under-utilization of network resources. Here we present topology-aware designs and optimizations to address the aforementioned mismatch and boost end-to-end application performance. We introduce topology-aware data splitting, propose a novel communication topology, and incorporate asynchronous data movement and computation. We evaluate our design and optimizations using real and artificial datasets and compare its performance with that of the direct P2P and NCCL-based approaches. The results show that our optimizations always outperform the counterparts and achieve up to 5.13× and 1.63× communication and end-to-end application speedups, respectively.

References

[1]

AMD. (accessed Oct. 20, 2020). Workload Tuning Guide for AMD EPYC™ 7002 Series Processor Based Servers. https://developer.amd.com/wp-content/resources/56745_0.80.pdf.

[2]

Selin Aslan, Zhengchun Liu, Viktor Nikitin, Tekin Bicer, Sven Leyffer, and Doga Gursoy. 2020. Distributed optimization with tunable learned priors for robust ptycho-tomography. arXiv preprint arXiv:2009.09498 (2020).

[3]

Selin Aslan, Viktor Nikitin, Daniel J Ching, Tekin Bicer, Sven Leyffer, and Doğa Gürsoy. 2019. Joint ptycho-tomography reconstruction through alternating direction method of multipliers. Optics express 27, 6 (2019), 9128--9143.

[4]

Ammar Ahmad Awan, Jereon Bédorf, Ching-Hsiang Chu, Hari Subramoni, and Dhabaleswar K Panda. 2019. Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 498--507.

[5]

Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, and Dhabaleswar K Panda. 2017. S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 193--205.

Digital Library

[6]

Tal Ben-Nun, Ely Levy, Amnon Barak, and Eri Rubin. 2015. Memory access patterns: the missing piece of the multi-GPU puzzle. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.

Digital Library

[7]

Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices 52, 8 (2017), 235--248.

Digital Library

[8]

Ching-Hsiang Chu, Pouya Kousha, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, and Dhabaleswar K Panda. 2020. Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems. In Proceedings of the 34th ACM International Conference on Supercomputing. 1--12.

Digital Library

[9]

Martin Dierolf, Oliver Bunk, Søren Kynde, Pierre Thibault, Ian Johnson, Andreas Menzel, Konstantins Jefimovs, Christian David, Othmar Marti, and Franz Pfeiffer. 2008. Ptychography & lensless X-ray imaging. Europhysics News 39, 1 (2008), 22--24.

[10]

Martin Dierolf, Andreas Menzel, Pierre Thibault, Philipp Schneider, Cameron M Kewish, Roger Wepf, Oliver Bunk, and Franz Pfeiffer. 2010. Ptychographic X-ray computed tomography at the nanoscale. Nature 467, 7314 (2010), 436--439.

[11]

Zhihua Dong, Yao-Lung L Fang, Xiaojing Huang, Hanfei Yan, Sungsoo Ha, Wei Xu, Yong S Chu, Stuart I Campbell, and Meifeng Lin. 2018. High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs. arXiv preprint arXiv:1808.10375 (2018).

[12]

B Enders and P Thibault. 2016. A computational framework for ptychographic reconstructions. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 472, 2196 (2016), 20160640.

[13]

Pablo Enfedaque, Huibin Chang, Bjoern Enders, David Shapiro, and Stefano Marchesini. 2019. High Performance Partial Coherent X-Ray Ptychography. In International Conference on Computational Science. Springer, 46--59.

[14]

Denis Foley and John Danskin. 2017. Ultra-performance Pascal GPU and NVLink interconnect. IEEE Micro 37, 2 (2017), 7--17.

Digital Library

[15]

S Mahdieh Ghazimirsaeed, Seyed H Mirsadeghi, and Ahmad Afsahi. 2019. An efficient collaborative communication mechanism for MPI neighborhood collectives. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 781--792.

[16]

Peter Gottschling and Torsten Hoefler. 2012. Productive parallel linear algebra programming with unstructured topology adaption. In 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012). IEEE, 9--16.

Digital Library

[17]

William Gropp. (accessed Oct. 20, 2020). Strategies for Parallelism and Halo Exchange. https://wgropp.cs.illinois.edu/courses/cs598-s15/lectures/lecture25.pdf.

[18]

William D Gropp. 2019. Using node and socket information to implement MPI Cartesian topologies. Parallel Comput. 85 (2019), 98--108.

Digital Library

[19]

Mert Hidayetoğlu, Tekin Bicer, Simon Garcia de Gonzalo, Bin Ren, Vincent De Andrade, Doga Gursoy, Raj Kettimuthu, Ian T Foster, and Wen-mei W Hwu. 2020. Petascale XCT: 3D image reconstruction with hierarchical communications on multi-GPU nodes. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--13.

[20]

Mert Hidayetoğlu, Tekin Biçer, Simon Garcia De Gonzalo, Bin Ren, Doğa Gürsoy, Rajkumar Kettimuthu, Ian T Foster, and Wen-mei W Hwu. 2019. Memxct: Memory-centric x-ray CT reconstruction with massive parallelization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--56.

Digital Library

[21]

Torsten Hoefler, Rolf Rabenseifner, Hubert Ritzdorf, Bronis R de Supinski, Rajeev Thakur, and Jesper Larsson Träff. 2011. The scalable process topology interface of MPI 2.2. Concurrency and Computation: Practice and Experience 23, 4 (2011), 293--310.

Digital Library

[22]

Torsten Hoefler and Marc Snir. 2011. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the international conference on Supercomputing. 75--84.

Digital Library

[23]

Kaixi Hou, Hao Wang, Wu-chun Feng, Jeffrey S Vetter, and Seyong Lee. 2018. Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 276--285.

[24]

IBM. 2018 (accessed Oct. 20, 2020). IBM Spectrum MPI: Accelerating high-performance application parallelization. https://www.ibm.com/us-en/marketplace/spectrum-mpi.

[25]

Vishwesh Jatala, Roshan Dathathri, Gurbinder Gill, Loc Hoang, V Krishna Nandivada, and Keshav Pingali. 2020. A Study of Graph Analytics for Massive Datasets on Distributed Multi-GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 84--94.

[26]

Nicholas T Karonis, Bronis R De Supinski, Ian Foster, William Gropp, Ewing Lusk, and John Bresnahan. 2000. Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In 14th International Parallel and Distributed Processing Symposium. IEEE, 377--384.

[27]

Network-Based Computing Laboratory. 2001 (accessed Oct. 20, 2020). MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. http://mvapich.cse.ohio-state.edu/.

[28]

Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R Tallent, and Kevin J Barker. 2019. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2019), 94--110.

Digital Library

[29]

Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. [n.d.]. Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite. In 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 191--202.

[30]

A. M. Maiden and J. M. Rodenburg. 2009. An improved ptychographical phase retrieval algorithm for diffractive imaging. Ultramicroscopy 109 (2009), 1256--1262.

[31]

Ondrej Mandula, Marta Elzo Aizarna, Joël Eymery, Manfred Burghammer, and Vincent Favre-Nicolin. 2016. PyNX. Ptycho: a computing library for X-ray coherent diffraction imaging of nanostructures. Journal of Applied Crystallography 49, 5 (2016), 1842--1848.

[32]

Stefano Marchesini, Hari Krishnan, Benedikt J Daurer, David A Shapiro, Talita Perciano, James A Sethian, and Filipe RNC Maia. 2016. SHARP: a distributed GPU-based ptychographic solver. Journal of applied crystallography 49, 4 (2016), 1245--1252.

[33]

Seyed Hessamedin Mirsadeghi, Jesper Larsson Träff, Pavan Balaji, and Ahmad Afsahi. 2017. Exploiting common neighborhoods to optimize MPI neighborhood collectives. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). IEEE, 348--357.

[34]

Dmitriy Morozov and Tom Peterka. 2016. Block-Parallel Data Analysis with DIY2. (2016).

[35]

Open MPI. 2004 (accessed Oct. 20, 2020). Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/.

[36]

Youssef SG Nashed, David J Vine, Tom Peterka, Junjing Deng, Rob Ross, and Chris Jacobsen. 2014. Parallel ptychographic reconstruction. Optics express 22, 26 (2014), 32082--32097.

[37]

Viktor Nikitin, Selin Aslan, Yudong Yao, Tekin Biçer, Sven Leyffer, Rajmund Mokso, and Doğa Gürsoy. 2019. Photon-limited ptychography of 3D objects via Bayesian reconstruction. OSA Continuum 2, 10 (2019), 2948--2968.

[38]

Marziyeh Nourian, Xiang Wang, Xiaodong Yu, Wu-chun Feng, and Michela Becchi. 2017. Demystifying Automata Processing: GPUs, FPGAs or Micron's AP?. In Proceedings of the International Conference on Supercomputing (ICS '17). ACM.

Digital Library

[39]

NVIDIA. (accessed Oct. 20, 2020)a. NVIDIA Collective Communication Library (NCCL) Documentation. https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/.

[40]

NVIDIA. (accessed Oct. 20, 2020)b. NVIDIA Collective Communication Library (NCCL) Documentation. https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/p2p.html\#neighbor-exchange.

[41]

Michal Odstrčil, Andreas Menzel, and Manuel Guizar-Sicairos. 2018. Iterative least-squares solver for generalized maximum-likelihood ptychography. Optics express 26, 3 (2018), 3108--3123.

[42]

Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D Owens. 2017. Multi-GPU graph analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 479--490.

[43]

Tom Peterka, Robert Ross, Attila Gyulassy, Valerio Pascucci, Wesley Kendall, Han-Wei Shen, Teng-Yok Lee, and Abon Chaudhuri. 2011. Scalable parallel building blocks for custom data analysis. In 2011 IEEE Symposium on Large Data Analysis and Visualization. IEEE, 105--112.

[44]

Franz Pfeiffer. 2018. X-ray ptychography. Nature Photonics 12, 1 (2018), 9--17.

[45]

Pierre Thibault, Martin Dierolf, Andreas Menzel, Oliver Bunk, Christian David, and Franz Pfeiffer. 2008. High-resolution scanning x-ray diffraction microscopy. Science 321, 5887 (2008), 379--382.

[46]

Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, et al. 2020. Cusz: An efficient gpu-based error-bounded lossy compression framework for scientific data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. 3--15.

Digital Library

[47]

Klaus Wakonig, Hans-Christian Stadler, Michal Odstrčil, Esther H. R. Tsai, Ana Diaz, Mirko Holler, Ivan Usov, Jörg Raabe, Andreas Menzel, and Manuel Guizar-Sicairos. 2020. PtychoShelves, a versatile high-level framework for high-performance analysis of ptychographic data. Journal of Applied Crystallography 53, 2 (Apr 2020).

[48]

Guanhua Wang, Shivaram Venkataraman, Amar Phanishayee, Nikhil Devanur, Jorgen Thelin, and Ion Stoica. 2020. Blink: Fast and Generic Collectives for Distributed ML. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 172--186. https://proceedings.mlsys.org/paper/2020/file/43ec517d68b6edd3015b3edc9a11367b-Paper.pdf

[49]

Hao Wang, Sreeram Potluri, Devendar Bureddy, Carlos Rosales, and Dhabaleswar K Panda. 2013. GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation. IEEE Transactions on Parallel and Distributed Systems 25, 10 (2013), 2595--2605.

[50]

Linnan Wang, Wei Wu, Zenglin Xu, Jianxiong Xiao, and Yi Yang. 2016. Blasx: A high performance level-3 BLAS library for heterogeneous multi-GPU computing. In Proceedings of the 2016 International Conference on Supercomputing. 1--11.

Digital Library

[51]

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600--612.

Digital Library

[52]

Xiaodong Yu. 2013. Deep packet inspection on large datasets: algorithmic and parallelization techniques for accelerating regular expression matching on many-core processors. University of Missouri-Columbia.

[53]

Xiaodong Yu. 2019. Algorithms and Frameworks for Accelerating Security Applications on HPC Platforms. Ph.D. Dissertation. Virginia Tech.

[54]

Xiaodong Yu and Michela Becchi. 2013a. Exploring Different Automata Representations for Efficient Regular Expression Matching on GPUs. SIGPLAN Not. (2013).

[55]

Xiaodong Yu and Michela Becchi. 2013b. GPU Acceleration of Regular Expression Matching for Large Datasets: Exploring the Implementation Space. In Proceedings of the ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '13). ACM, New York, NY, USA, Article 18, 10 pages.

Digital Library

[56]

Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, and Guohua Cao. 2016. cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

Digital Library

[57]

Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, and Guohua Cao. 2017. An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs. In Proceedings of the Computing Frontiers Conference (CF'17). ACM.

Digital Library

[58]

Xiaodong Yu, Hao Wang, Wu-chun Feng, Hao Gong, and Guohua Cao. 2019. GPU-based iterative medical CT image reconstructions. Journal of Signal Processing Systems 91, 3-4 (2019), 321--338.

Digital Library

[59]

Xiaodong Yu, Fengguo Wei, Xinming Ou, Michela Becchi, Tekin Bicer, and Danfeng (Daphne) Yao. 2020. GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting. In The 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE.

[60]

Jing Zhang, Hao Wang, and Wu-chun Feng. 2017. cublastp: Fine-grained parallelization of protein sequence search on CPU+GPU. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 14, 4 (2017), 830--843.

Digital Library

Cited By

Jamil HChung JBicer TKosar TKettimuthu R(2023)Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data StreamingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624593(795-805)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624593
Babu AZhou TKandel SBicer TLiu ZJudge WChing DJiang YVeseli SHenke SChard RYao YSirazitdinova EGupta GHolt MFoster IMiceli ACherukara M(2023)Deep learning at the edge enables real-time streaming ptychographic imagingNature Communications10.1038/s41467-023-41496-z14:1Online publication date: 3-Nov-2023
https://doi.org/10.1038/s41467-023-41496-z
Wang XTsaris AMukherjee DWahib MChen POxley MOvchinnikova OHinkle JWolf FShende SCulhane CAlam SJagode H(2022)Image gradient decomposition for parallel and memory-efficient ptychographic reconstructionProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571895(1-13)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571895
Show More Cited By

Index Terms

Topology-aware optimizations for multi-GPU ptychographic image reconstruction
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Interconnection architectures
      2. Single instruction, multiple data
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms

Recommendations

Enhancing Intra-Node GPU-to-GPU Performance in MPI+UCX through Multi-Path Communication
ExHET '24: Proceedings of the 3rd International Workshop on Extreme Heterogeneity Solutions

Efficient communication among GPUs is crucial for achieving high performance in modern GPU-accelerated applications. This paper introduces a multi-path communication framework within the MPI+UCX library to enhance P2P communication performance between ...
An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs
CF'17: Proceedings of the Computing Frontiers Conference

The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). ...
GPU-Based Iterative Medical CT Image Reconstructions

The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

June 2021

506 pages

ISBN:9781450383356

DOI:10.1145/3447818

General Chairs:
Huiyang Zhou
North Carolina State University
,
Jose Moreira
IBM Research
,
Program Chairs:
Frank Mueller
North Carolina State University
,
Yoav Etsion
Technion

Copyright © 2021.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy

Conference

ICS '21

Sponsor:

SIGARCH

ICS '21: 2021 International Conference on Supercomputing

June 14 - 17, 2021

Virtual Event, USA

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
607
Total Downloads

Downloads (Last 12 months)248
Downloads (Last 6 weeks)28

Reflects downloads up to 12 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jamil HChung JBicer TKosar TKettimuthu R(2023)Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data StreamingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624593(795-805)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624593
Babu AZhou TKandel SBicer TLiu ZJudge WChing DJiang YVeseli SHenke SChard RYao YSirazitdinova EGupta GHolt MFoster IMiceli ACherukara M(2023)Deep learning at the edge enables real-time streaming ptychographic imagingNature Communications10.1038/s41467-023-41496-z14:1Online publication date: 3-Nov-2023
https://doi.org/10.1038/s41467-023-41496-z
Wang XTsaris AMukherjee DWahib MChen POxley MOvchinnikova OHinkle JWolf FShende SCulhane CAlam SJagode H(2022)Image gradient decomposition for parallel and memory-efficient ptychographic reconstructionProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571895(1-13)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571895
Yu XDi SZhao KTian JTao DLiang XCappello FWeissman JChandra AGavrilovska ATiwari D(2022)Ultrafast Error-bounded Lossy Compression for Scientific DatasetsProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531473(159-171)Online publication date: 27-Jun-2022
https://dl.acm.org/doi/10.1145/3502181.3531473
Wang XTsaris AMukherjee DWahib MChen POxley MOvchinnikova OHinkle J(2022)Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic ReconstructionSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00013(1-13)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00013
Yu XNikitin VChing DAslan SGürsoy DBiçer T(2022)Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography dataScientific Reports10.1038/s41598-022-09430-312:1Online publication date: 29-Mar-2022
https://doi.org/10.1038/s41598-022-09430-3
Barreiros WMelo AKong JFerreira RKurc TSaltz JTeodoro G(2022)Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.004164:C(40-54)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1016/j.jpdc.2022.02.004
Bicer TYu XChing DChard RCherukara MNicolae BKettimuthu RFoster I(2022)High-Performance Ptychographic Reconstruction with Federated FacilitiesDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation10.1007/978-3-030-96498-6_10(173-189)Online publication date: 10-Mar-2022
https://doi.org/10.1007/978-3-030-96498-6_10
Yu XDi SGok ATao DCappello F(2021)cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00065(307-319)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00065

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents