Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring

Published: 01 April 2012 Publication History

Abstract

Partitioning and load balancing are important problems in scientific computing that can be modeled as combinatorial problems using graphs or hypergraphs. The Zoltan toolkit was developed primarily for partitioning and load balancing to support dynamic parallel applications, but has expanded to support other problems in combinatorial scientific computing, including matrix ordering and graph coloring. Zoltan is based on abstract user interfaces and uses callback functions. To simplify the use and integration of Zoltan with other matrix-based frameworks, such as the ones in Trilinos, we developed Isorropia as a Trilinos package, which supports most of Zoltan's features via a matrix-based interface. In addition to providing an easy-to-use matrix-based interface to Zoltan, Isorropia also serves as a platform for additional matrix algorithms. In this paper, we give an overview of the Zoltan and Isorropia toolkits, their design, capabilities and use. We also show how Zoltan and Isorropia enable large-scale, parallel scientific simulations, and describe current and future development in the next-generation package Zoltan2.

References

[1]
C. Aykanat, A. Pinar and Ü. V. Çatalyürek, Permuting sparse rectangular matrices into block-diagonal form, SIAM J. Sci. Comput. 26(6) (2004), 1860-1879.
[2]
M.J. Berger and S.H. Bokhari, A partitioning strategy for nonuniform problems on multiprocessors, IEEE Trans. Comput. C-36(5) (1987), 570-580.
[3]
D. Bozda¿, ü.V. Çatalyürek, A.H. Gebremedhin, F. Manne, E.G. Boman and F. Özgünner, Distributed-memory parallel algorithms for distance-2 coloring and related problems in derivative computation, SIAM J. Sci.Comput. 32(4) (2010), 2418-2446.
[4]
D. Bozda¿, A.H. Gebremedhin, F. Manne, E.G. Boman and ü.V. çatalyürek, A framework for scalable greedy coloring on distributed memory parallel computers, J. Parallel Distrib. Comput. 68(4) (2008), 515-535.
[5]
J. Brandt, B. Debusschere, A. Gentile, J. Mayo, P. Pébay, D. Thompson and M. Wong, OVIS-2: a robust distributed architecture for scalable RAS, in: Proc. 4th Workshop on System Management Techniques, Processes and Services at 22nd IEEE Int. Parallel and Distributed Processing Symposium, Miami, FL, 2008.
[6]
F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault and R. Namyst, hwloc: a generic framework for managing hardware affinities in HPC applications, in: Proc. 18th Euromicro Int. Conf. on Parallel, Distributed and Network-Based Computing, Pisa, 2010.
[7]
T.N. Bui and C. Jones, A heuristic for reducing fill-in sparse matrix factorization, in: Proc. 6th SIAM Conf. Parallel Processing for Scientific Computing, SIAM, 1993, pp. 445-452.
[8]
Ü.V. Çatalyürek and C. Aykanat, Hypergraph-partitioningbased decomposition for parallel sparse-matrix vector multiplication, IEEE Trans. Parallel Dist. Syst. 10(7) (1999), 673-693.
[9]
Ü.V. Çatalyürek and C. Aykanat, PaToH: a multilevel hypergraph partitioning tool, Version 3.0, Bilkent University, Department of Computer Engineering, Ankara, PaToH is available at: http://bmi.osu.edu/~umit/software.htm, 1999.
[10]
Ü.V. Çatalyürek, C. Aykanat and E. Kayaaslan, Hypergraph partitioning-based fill-reducing ordering for symmetric matrices, SIAM J. Sci. Comput. 33 (2011), 1996-2023.
[11]
Ü.V. Çatalyürek, C. Aykanat and B. Uçar, On two-dimensional sparse matrix partitioning: Models, methods and a recipe, SIAM J. Sci. Comput. 32(2) (2010), 656-683.
[12]
Ü.V. Çatalyürek, E.G. Boman, K.D. Devine, D. Bozda?g, R.T. Heaphy and L.A. Riesen, A repartitioning hypergraph model for dynamic load balancing, J. Parallel Distrib. Comput. 69(8) (2009), 711-724.
[13]
Ü.V. Çatalyürek, F. Dobrian, A. Gebremedhin, M. Halappanavar and A. Pothen, Distributed-memory parallel algorithms for matching and coloring, in: Proc. Int. Symp. on Parallel and Distributed Processing, Workshops and PhD Forum, Workshop on Parallel Computing and Optimization, IEEE Press, 2011, pp. 1966-1975.
[14]
Ü.V. Çatalyürek, B. Uçar and C. Aykanat, Hypergraph partitioning, in: Encyclopedia of Parallel Computing, D. Padua, ed., Springer, 2011, pp. 871-881.
[15]
T.F.C. Chan and T.P. Mathew, The interface probing technique in domain decomposition, SIAM J. Matrix Anal. Appl. 13 (1992), 212-238.
[16]
C. Chevalier and F. Pellegrini, PT-SCOTCH: a tool for efficient parallel graph ordering, Parallel Comput. 34(6-8) (2008), 318-331.
[17]
T.F. Coleman and J.J. Moré, Estimation of sparse Jacobian matrices and graph coloring problems, SIAM J. Numer. Anal. 1(20) (1983), 187-209.
[18]
J.C. Culberson, Iterated greedy graph coloring and the difficulty landscape, Technical Report TR 92-07, University of Alberta, June 1992.
[19]
A.R. Curtis, M.J.D. Powell and J.K. Reid, On the estimation of sparse Jacobian matrices, J. Inst. Math. Appl. 13 (1974), 117-119.
[20]
E. Cuthill and J. McKee, Reducing the bandwidth of sparse symmetric matrices, in: Proceedings of the 24th National Conference, ACM, New York, NY, USA, 1969, pp. 157-172.
[21]
T. Davis, J. Gilbert, S. Larimore and E. Ng, A column approximate minimum degree ordering algorithm, ACM Trans. Math. Software 30(3) (2004), 353-376.
[22]
J. DeBlasio, K. Ewing, A. Lawrence and M. Leece, Exploring the feasibility of 2D matrix partitioning, Technical report, Department of Computer Science, Harvey Mudd College, May 2011.
[23]
K. Devine, E. Boman, R. Heaphy, B. Hendrickson and C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4(2) (2002), 90-97.
[24]
K.D. Devine, E.G. Boman, R.T. Heaphy, R.H. Bisseling and Ü.V. Çatalyürek, Parallel hypergraph partitioning for scientific computing, in: Proc. 20th Int. Parallel and Distributed Processing Symp., IEEE, 2006.
[25]
I.S. Duff and G.A. Meurant, The effect of ordering on preconditioned conjugate gradients, BIT 29 (1989), 635-657.
[26]
M. Garey, D. Johnson and L. Stockmeyer, Some simplified NP-complete graph problems, Theoret. Comput. Sci. 1 (1976), 237-267.
[27]
A.H. Gebremedhin, F. Manne and A. Pothen, What color is your Jacobian? Graph coloring for computing derivatives, SIAM Rev. 47(4) (2005), 629-705.
[28]
A.H. Gebremedhin, D. Nguyen, M.M.A. Patwary and A. Pothen, ColPack: graph coloring software for sparse derivative matrix computation and beyond, ACM Trans. Math. Software (2012), to appear.
[29]
J.A. George, Nested dissection of a regular finite element mesh, SIAM J. Numer. Anal. 10 (1973), 345-363.
[30]
L. Grigori, E. Boman, S. Donfack and T. Davis, Hypergraph-based unsymmetric nested dissection ordering for sparse LU factorization, SIAM J. Sci. Comp. 32(6) (2010), 3426-3446.
[31]
B. Hendrickson and T.G. Kolda, Graph partitioning models for parallel computing, Parallel Comput. 26 (2000), 1519-1534.
[32]
B. Hendrickson and R. Leland, A multilevel algorithm for partitioning graphs, in: Proc. Supercomputing'95, ACM, 1995.
[33]
B. Hendrickson and E. Rothberg, Effective sparse matrix ordering: just around the bend, in: Proc. 8th SIAM Conference on Parallel Processing for Scientific Computing, Atlanta, GA, 1997.
[34]
M. Heroux, R. Bartlett, V. Howle, R. Hoekstra, J. Hu, T. Kolda, R. Lehoucq, K. Long, R. Pawlowski, E. Phipps, A. Salinger, H. Thornquist, R. Tuminaro, J. Willenbring and A. Williams, An overview of Trilinos, Technical Report SAND2003-2927, Sandia National Laboratories, Albuquerque, NM, 2003.
[35]
http://trilinos.sandia.gov/packages/epetra.
[36]
http://trilinos.sandia.gov/packages/tpetra.
[37]
http://trilinos.sandia.gov/packages/kokkos.
[38]
http://trilinos.sandia.gov/packages/belos.
[39]
M. Jones and P. Plassmann, Scalable iterative solution of sparse linear systems, Parallel Comput. 20(5) (1994), 753-773.
[40]
G. Karypis and V. Kumar, METIS: unstructured graph partitioning and sparse matrix ordering system, Technical report, Department of Computer Science, University of Minnesota, 1995, available at: http://www.cs.umn.edu/~karypis/metis.
[41]
G. Karypis and V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comp. 20(1) (1998), 359-392.
[42]
G. Karypis, K. Schloegel and V. Kumar, ParMETIS: parallel graph partitioning and sparse matrix ordering library, version 3.1, Technical report, Department of Computer Science, University of Minnesota, 2003, available at: http://wwwusers. cs.umn.edu/~karypis/metis/parmetis/download.html.
[43]
S.O. Krumke, M. Marathe and S. Ravi, Models and approximation algorithms for channel assignment in radio networks, Wireless Networks 7(6) (2001), 575-584.
[44]
V.S.A. Kumar, M.V. Marathe, S. Parthasarathy and A. Srinivasan, End-to-end packet-scheduling in wireless ad-hoc networks, in: Proceedings of the 15th Annual ACM-SIAM Symp. on Discrete Algorithms, SIAM, 2004, pp. 1021-1030.
[45]
F. Manne, A parallel algorithm for computing the extremal eigenvalues of very large sparse matrices, in: Proceedings of Para, Lecture Notes in Computer Science, Vol. 1541, Springer, 1998, pp. 332-336.
[46]
D.W. Matula, A min-max theorem for graphs with application to graph coloring, SIAM Rev. 10 (1968), 481-482.
[47]
Message Passing Interface Forum, MPI: a message-passing interface standard, May 1994, available at: http://www.mpiforum. org.
[48]
F. Pellegrini, PT-SCOTCH 5.1 user's guide, Technical report, LaBRI, September 2008.
[49]
J.R. Pilkington and S.B. Baden, Partitioning with spacefilling curves, CSE Technical Report CS94-349, Department of Computer Science and Engineering, University of California, San Diego, CA, 1994.
[50]
A. Pinar, Combinatorial algorithms in scientific computing, PhD thesis, University of Illinois at Urbana-Champaign, 2001.
[51]
S. Rajamanickam, E.G. Boman and M.A. Heroux, ShyLU: a hybrid-hybrid solver for multicore platforms, in: Proc. of 26th IEEE Int. Parallel and Distributed Processing Symp., IEEE, 2012.
[52]
Y. Saad, ILUM: a multi-elimination ILU preconditioner for general sparse matrices, SIAM J. Sci. Comput. 17 (1996), 830-847.
[53]
M. Sala, W. Spotz and M. Heroux, PyTrilinos: highperformance distributed-memory solvers for Python, ACM Trans. Math. Software 34(2) (2008), 1-33.
[54]
M. Sala, K.S. Stanley and M.A. Heroux, On the design of interfaces to sparse direct solvers, ACM Trans. Math. Software 34(9) (2008), 1-22.
[55]
A.E. Sariyüce, E. Saule and ü.V. çatalyürek, Improving graph coloring on distributed-memory parallel computers, in: Proceedings of the 18th Annual International Conference on High Performance Computing, Bangalore, India, 2011.
[56]
K. Schloegel, G. Karypis and V. Kumar, Multilevel diffusion algorithms for repartitioning of adaptive meshes, J. Parallel Distrib. Comput. 47(2) (1997), 109-124.
[57]
C. Siefert and E. de Sturler, Probing methods for saddle-point problems, Electr. Trans. Numer. Anal. 22 (2006), 163-183.
[58]
H.D. Simon, Partitioning of unstructured problems for parallel processing, Comput. Syst. Eng. 2 (1991), 135-148.
[59]
M.M. Strout, L. Carter, J. Ferrante, J. Freeman and B. Kreaseck, Combining performance aspects of irregular Gauss-Seidel via sparse tiling, in: LCPC, W. Pugh and C.-W. Tseng, eds, Lecture Notes in Computer Science, Vol. 2481, Springer, 2002, pp. 90-110.
[60]
V.E. Taylor and B. Nour-Omid, A study of the factorization fill-in for a parallel implementation of the finite element method, Int. J. Numer. Meth. Eng. 37 (1994), 3809-3823.
[61]
J.D. Teresco, M.W. Beall, J.E. Flaherty and M.S. Shephard, A hierarchical partition model for adaptive finite element computation, Comput. Methods Appl. Mech. Eng. 184 (2000), 269-285.
[62]
W.F. Tinney and J.W. Walker, Direct solution of sparse network equations by optimally ordered triangular factorization, Proc. IEEE 55 (1967), 1801-1809.
[63]
M.S. Warren and J.K. Salmon, A parallel hashed oct-tree n-body algorithm, in: Proc. Supercomputing'93, Portland, OR, 1993.

Cited By

View all
  • (2024)An accurate, adaptive and scalable parallel finite element framework for the part-scale thermo-mechanical analysis in metal additive manufacturing processesComputational Mechanics10.1007/s00466-023-02397-673:5(983-1011)Online publication date: 1-May-2024
  • (2023)A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance ComputingInternational Journal of High Performance Computing Applications10.1177/1094342023116661037:3-4(306-327)Online publication date: 1-Jul-2023
  • (2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Scientific Programming
Scientific Programming  Volume 20, Issue 2
A New Overview of the Trilinos Project --Part 1
April 2012
138 pages

Publisher

IOS Press

Netherlands

Publication History

Published: 01 April 2012

Author Tags

  1. Combinatorial Scientific Computing
  2. Fill-Reducing Ordering
  3. Graph Coloring
  4. Load Balancing
  5. Matrix Ordering
  6. Parallel Computing
  7. Partitioning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An accurate, adaptive and scalable parallel finite element framework for the part-scale thermo-mechanical analysis in metal additive manufacturing processesComputational Mechanics10.1007/s00466-023-02397-673:5(983-1011)Online publication date: 1-May-2024
  • (2023)A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance ComputingInternational Journal of High Performance Computing Applications10.1177/1094342023116661037:3-4(306-327)Online publication date: 1-Jul-2023
  • (2023)Performance Implication of Tensor Irregularity and Optimization for Distributed Tensor DecompositionACM Transactions on Parallel Computing10.1145/358031510:2(1-27)Online publication date: 20-Jun-2023
  • (2022)Hypergraph-partitioning-based online joint scheduling of tasks and dataThe Journal of Supercomputing10.1007/s11227-022-04460-078:14(16088-16117)Online publication date: 1-Sep-2022
  • (2022)Leveraging special-purpose hardware for local search heuristicsComputational Optimization and Applications10.1007/s10589-022-00354-282:1(1-29)Online publication date: 1-May-2022
  • (2021)A task-based distributed parallel sparsified nested dissection algorithmProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3468267.3470619(1-11)Online publication date: 5-Jul-2021
  • (2021)PySPH: A Python-based Framework for Smoothed Particle HydrodynamicsACM Transactions on Mathematical Software10.1145/346077347:4(1-38)Online publication date: 28-Sep-2021
  • (2020)High-performance parallel graph coloring with strong guarantees on work, depth, and qualityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433833(1-17)Online publication date: 9-Nov-2020
  • (2020)A Flexible, Parallel, Adaptive Geometric Multigrid Method for FEMACM Transactions on Mathematical Software10.1145/342519347:1(1-27)Online publication date: 8-Dec-2020
  • (2020)Algorithm 1003ACM Transactions on Mathematical Software10.1145/333779246:1(1-18)Online publication date: 20-Mar-2020
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media