Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2388996.2389072acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Heuristic static load-balancing algorithm applied to the fragment molecular orbital method

Published: 10 November 2012 Publication History

Abstract

In the era of petascale supercomputing, the importance of load balancing is crucial. Although dynamic load balancing is widespread, it is increasingly difficult to implement effectively with thousands of processors or more, prompting a second look at static load-balancing techniques even though the optimal allocation of tasks to processors is an NP-hard problem. We propose a heuristic static load-balancing algorithm, employing fitted benchmarking data, as an alternative to dynamic load balancing. The problem of allocating CPU cores to tasks is formulated as a mixed-integer nonlinear optimization problem, which is solved by using an optimization solver. On 163,840 cores of Blue Gene/P, we achieved a parallel efficiency of 80% for an execution of the fragment molecular orbital method applied to model protein-ligand complexes quantum-mechanically. The obtained allocation is shown to outperform dynamic load balancing by at least a factor of 2, thus motivating the use of this approach on other coarse-grained applications.

References

[1]
C. Xu and F. C. M. Lau, Load balancing in parallel computers: theory and practice. Norwell, MA. Kluwer Academic Publishers, 1997.
[2]
K. D. Devine, E. G. Boman, R. T. Heaphy, B. A. Hendrickson, J. D. Teresco, J. Faik, J. E. Flaherty, and L. G. Gervasio, "New challenges in dynamic load balancing," Applied Numerical Mathematics, vol. 52, pp. 133--152, 2005.
[3]
M. H. Willebeek-LeMair and A. P. Reeves, "Strategies for dynamic load balancing on highly parallel computers," Parallel and Distributed Systems, IEEE Transactions on, vol. 4, pp. 979--993, 1993.
[4]
Y. Bejerano, S. J. Han, and L. E. Li, "Fairness and load balancing in wireless LANs using association control," in Proceedings of the 10th annual international conference on mobile computing and networking, New York, NY, 2004, pp. 315--329.
[5]
B. Y. Zhang, Z. Y. Mo, G. W. Yang, and W. M. Zheng, "An efficient dynamic load-balancing algorithm in a large-scale cluster," Distributed and Parallel Computing, pp. 174--183, 2005.
[6]
M. J. Zaki, W. Li, and S. Parthasarathy, "Customized dynamic load balancing for a network of workstations," in Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing, Syracuse, NY, 1996, pp. 282--291.
[7]
R. D. Williams, "Performance of dynamic load balancing algorithms for unstructured mesh calculations," Concurrency: Practice and experience, vol. 3, pp. 457--481, 1991.
[8]
M. J. Berger and S. H. Bokhari, "A partitioning strategy for nonuniform problems on multiprocessors," Computers, IEEE Transactions on, vol. 100, pp. 570--580, 1987.
[9]
H. D. Simon, "Partitioning of unstructured problems for parallel processing," Computing Systems in Engineering, vol. 2, pp. 135--148, 1991.
[10]
V. E. Taylor and B. Nour-Omid, "A study of the factorization fill-in for a parallel implementation of the finite element method," International journal for numerical methods in engineering, vol. 37, pp. 3809--3823, 1994.
[11]
M. S. Warren and J. K. Salmon, "A parallel hashed oct-tree n-body algorithm," in Proceedings of the ACM/IEEE Supercomputing 1993 Conference, Portland, 1993, pp. 12--21.
[12]
J. R. Pilkington and S. B. Baden, "Partitioning with spacefilling curves, CSE Technical Report CS94-349," Dept. of Computer Science Engineering, University of California, San Diego, CA1994.
[13]
A. Patra and J. T. Oden, "Problem decomposition for adaptive hp finite element methods," Computing Systems in Engineering, vol. 6, pp. 97--109, 1995.
[14]
J. E. Flaherty, R. M. Loy, M. S. Shephard, B. K. Szymanski, J. D. Teresco, and L. H. Ziantz, "Adaptive local refinement with octree load balancing for the parallel solution of three-dimensional conservation laws," Journal of Parallel and Distributed Computing, vol. 47, pp. 139--152, 1997.
[15]
A. Pothen, H. D. Simon, and K. P. Liou, "Partitioning sparse matrices with eigenvectors of graphs," SIAM Journal on Matrix Analysis and Applications vol. 11, pp. 430--452, 1990.
[16]
E. Leiss and H. Reddy, "Distributed load balancing: design and performance analysis," WM Keck Research Computation Laboratory, vol. 5, pp. 205--270, 1989.
[17]
G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM Journal on Scientific Computing, vol. 20, p. 359, 1999.
[18]
Y. F. Hu and R. J. Blake, "An optimal dynamic load balancing algorithm, Technical Report DL-P-95-011," Daresbury Laboratory, Warrington, WA4 4AD, UK1995.
[19]
B. Hendrickson and R. Leland, "A multilevel algorithm for partitioning graphs," in Proceedings of the ACM Supercomputing 1995 Conference, New York, 1995, pp. 28--42.
[20]
G. Cybenko, "Dynamic load balancing for distributed memory multiprocessors," Journal of Parallel and Distributed Computing, vol. 7, pp. 279--301, 1989.
[21]
T. Bui and C. Jones, "A heuristic for reducing fill in sparse matrix factorization," in SIAM Conference on Parallel Processing for Scientific Computing, Philadelphia, PA, 1993, pp. 445--452.
[22]
S. H. Bokhari, "On the mapping problem," IEEE Transactions on Computers, vol. 100, pp. 207--214, 1981.
[23]
S. H. Bokhari, Assignment problems in parallel and distributed computing vol. 32. New York, NY. Springer-Verlag, 1987.
[24]
M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. S., T. L. Windus, M. Dupuis, and J. A. J. Montgomery, "General atomic and molecular electronic structure system," Journal of Computational Chemistry, vol. 14, pp. 1347--1363, 1993.
[25]
M. S. Gordon and M. W. Schmidt, "Advances in electronic structure theory: GAMESS a decade later," in Theory and Applications of Computational Chemistry: The First Forty Years, C. Dykstra, G. Frenking, K. Kim, and G. Scuseria, Eds., ed. Elsevier Science, 2005, pp. 1167--1189.
[26]
Argonne National Laboratory: Argonne Leadership Computing Facility. Available: http://www.alcf.anl.gov/,
[27]
G. D. Fletcher, D. G. Fedorov, S. R. Pruitt, T. L. Windus, and M. S. Gordon, "Large-scale MP2 calculations on the Blue Gene architecture using the Fragment Molecular Orbital method," Journal of Chemical Theory and Computation, vol. 8, pp. 75--79, 2012.
[28]
A. Mahajan, S. Leyffer, J. Linderoth, J. Luedtke, and T. Munson. MINOTAUR wiki. Available: http://www.mcs.anl.gov/minotaur, (January 16, 2012)
[29]
R. Zalesny, M. G. Papadopoulos, P. G. Mezey, and J. Leszczynski, Linear-Scaling Techniques in Computational Chemistry and Physics. New York, NY. Springer, 2011.
[30]
J. R. Reimers, Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology. Singapore. Wiley, 2011.
[31]
E. Apra, R. J. Harrison, W. Shelton, V. Tipparaju, and A. Vázquez-Mayagoitia, "Computational chemistry at the petascale: Are we there yet?," in Journal of Physics: Conference Series, 2009, p. 012027.
[32]
Y. Hasegawa, J. I. Iwata, M. Tsuji, D. Takahashi, A. Oshiyama, K. Minami, T. Boku, F. Shoji, A. Uno, and M. Kurokawa, "First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer," in Proceedings of the ACM/IEEE Supercomputing 2005 Conference, Seattle, 2011, pp. 1--11.
[33]
E. Apra, A. P. Rendell, R. J. Harrison, V. Tipparaju, W. A. deJong, and S. S. Xantheas, "Liquid water: obtaining the right answer for the right reasons," in Proceedings of the ACM/IEEE Supercomputing 2009 Conference, Portland, 2009, p. 66.
[34]
K. Kowalski, S. Krishnamoorthy, R. M. Olson, V. Tipparaju, and E. Aprà, "Scalable implementations of accurate excited-state coupled cluster theories: Application of high-level methods to porphyrin-based systems," in Proceedings of the ACM/IEEE Supercomputing 2011 Conference, Seattle, 2011, pp. 1--10.
[35]
Y. Alexeev, R. A. Kendall, and M. S. Gordon, "The distributed data SCF," Computer Physics Communications, vol. 143, pp. 69--82, 2002.
[36]
Y. Alexeev, M. W. Schmidt, T. L. Windus, and M. S. Gordon, "A parallel distributed data CPHF algorithm for analytic Hessians," Journal of Computational Chemistry, vol. 28, pp. 1685--1694, 2007.
[37]
M. Krishnan, Y. Alexeev, T. L. Windus, and J. Nieplocha, "Multilevel parallelism in computational chemistry using Common Component Architecture and Global Arrays," in Proceedings of the ACM/IEEE Supercomputing 2005 Conference, Seattle, 2005, pp. 23--23.
[38]
G. Fletcher, "A parallel multi-configuration self-consistent field algorithm," Molecular Physics, vol. 105, pp. 2971--2976, 2007.
[39]
M. Challacombe and E. Schwegler, "Linear scaling computation of the Fock matrix," Journal of Chemical Physics, vol. 106, pp. 5526--5536, 1997.
[40]
R. J. Harrison, G. I. Fann, T. Yanai, Z. Gan, and G. Beylkin, "Multiresolution quantum chemistry: Basic theory and initial applications," Journal of Chemical Physics, vol. 121, p. 11587, 2004.
[41]
M. S. Gordon, S. R. Pruitt, D. G. Fedorov, and L. V. Slipchenko, "Fragmentation methods: a route to accurate calculations on large systems," Chemical Reviews, vol. 112, pp. 632--672, 2012.
[42]
S. Hirata, M. Valiev, M. Dupuis, S. S. Xantheas, S. Sugiki, and H. Sekino, "Fast electron correlation methods for molecular clusters in the ground and excited states," Molecular Physics, vol. 103, pp. 2255--2265, 2005.
[43]
K. Kitaura, E. Ikeo, T. Asada, T. Nakano, and M. Uebayasi, "Fragment molecular orbital method: an approximate computational method for large molecules," Chemical Physics Letters, vol. 313, pp. 701--706, 1999.
[44]
D. G. Fedorov, T. Nagata, and K. Kitaura, "Exploring chemistry with the Fragment Molecular Orbital method," Physical Chemistry Chemical Physics, vol. 14, pp. 7562--7577, 2012.
[45]
D. G. Fedorov and K. Kitaura, "The importance of three-body terms in the fragment molecular orbital method," Journal of Chemical Physics, vol. 120, pp. 6832--6840, 2004.
[46]
G. D. Fletcher, M. W. Schmidt, B. M. Bode, and M. S. Gordon, "The distributed data interface in GAMESS," Computer Physics Communications, vol. 128, pp. 190--200, 2000.
[47]
J. L. Bentz, R. M. Olson, M. S. Gordon, M. W. Schmidt, and R. A. Kendall, "Coupled cluster algorithms for networks of shared memory parallel processors," Computer Physics Communications, vol. 176, pp. 589--600, 2007.
[48]
D. G. Fedorov, R. M. Olson, K. Kitaura, M. S. Gordon, and S. Koseki, "A new hierarchical parallelization scheme: Generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO)," Journal of Computational Chemistry, vol. 25, pp. 872--880, 2004.
[49]
T. Ikegami, T. Ishida, D. G. Fedorov, K. Kitaura, Y. Inadomi, H. Umeda, M. Yokokawa, and S. Sekiguchi, "Full electron calculation beyond 20,000 atoms: Ground electronic state of photosynthetic proteins," in Proceedings of the ACM/IEEE Supercomputing 2005 Conference, Seattle, pp. 10--10.
[50]
Y. Alexeev. FMO portal: Web interface for FMOtools. Available: http://www.fmo-portal.info, (January 16, 2012)
[51]
D. G. Fedorov, Y. Alexeev, and K. Kitaura, "Geometry optimization of the active site of a large system with the fragment molecular orbital method," Journal of Physical Chemistry Letters, vol. 2, pp. 282--288, 2011.
[52]
D. G. Fedorov, T. Ishida, and K. Kitaura, "Multilayer formulation of the fragment molecular orbital method (FMO)," The Journal of Physical Chemistry A, vol. 109, pp. 2638--2646, 2005.
[53]
C. L. Janssen and I. M. B. Nielsen, Parallel computing in quantum chemistry. CRC Press, 2008.
[54]
T. Ibaraki and N. Katoh, Resource allocation problems: algorithmic approaches. Cambridge, MA. The MIT Press, 1988.
[55]
R. J. Dakin, "A tree-search algorithm for mixed integer programming problems," The Computer Journal, vol. 8, pp. 250--255, 1965.
[56]
I. Quesada and I. E. Grossmann, "An LP/NLP based branch and bound algorithm for convex MINLP optimization problems," Computers & Chemical Engineering, vol. 16, pp. 937--947, 1992.
[57]
M. A. Duran and I. E. Grossmann, "An outer-approximation algorithm for a class of mixed-integer nonlinear programs," Mathematical Programming, vol. 36, pp. 307--339, 1986.
[58]
R. Fletcher and S. Leyffer, "Solving mixed integer nonlinear programs by outer approximation," Mathematical Programming, vol. 66, pp. 327--349, 1994.
[59]
T. Westerlund and F. Pettersson, "An extended cutting plane method for solving convex MINLP problems," Computers & Chemical Engineering, vol. 19, pp. 131--136, 1995.
[60]
A. Mahajan, S. Leyffer, and C. Kirches, "Solving mixed-integer nonlinear programs by QP-diving," Argonne National Laboratory ANL/MCS-P2071-0312, 2012
[61]
R. Horst and T. Hoang, Global Optimization: Deterministic Approaches. Berlin. Springer-Verlag, 1996.
[62]
M. Tawarmalani and N. V. Sahinidis, Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications vol. 65. Dordrecht. Kluwer Academic Publishers, 2002.
[63]
R. Fourer, D. M. Gay, and B. Kernighan, AMPL: A Modeling Language for Mathematical Programming, 2nd Edition Independence, KY. Cengage Learning, 2002.
[64]
J. J. Forrest. Clp project. Available: http://projects.coin-or.org/Clp, (January 16, 2012)
[65]
R. Fletcher and S. Leyffer, "Nonlinear programming without a penalty function," Mathematical Programming, vol. 91, pp. 239--269, 2002.
[66]
T. Ikegami, T. Ishida, D. G. Fedorov, K. Kitaura, Y. Inadomi, H. Umeda, M. Yokokawa, and S. Sekiguchi, "Fragment molecular orbital study of the electronic excitations in the photosynthetic reaction center of Blastochloris viridis," Journal of Computational Chemistry, vol. 31, pp. 447--454, 2010.

Cited By

View all
  • (2017)An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel® Xeon Phi™ processorProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126956(1-12)Online publication date: 12-Nov-2017
  • (2016)Load balancing for data centreInternational Journal of Wireless and Mobile Computing10.1504/IJWMC.2016.07946411:1(47-53)Online publication date: 1-Jan-2016
  • (2013)Inspector/executor load balancing algorithms for block-sparse tensor contractionsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2467282(483-484)Online publication date: 10-Jun-2013
  1. Heuristic static load-balancing algorithm applied to the fragment molecular orbital method

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
      November 2012
      1161 pages
      ISBN:9781467308045

      Sponsors

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 10 November 2012

      Check for updates

      Author Tags

      1. FMO
      2. GAMESS
      3. MINLP
      4. dynamic load balancing
      5. fragment molecular orbitals
      6. heuristic algorithm
      7. optimization
      8. protein-ligand complex
      9. quantum chemistry
      10. static load balancing

      Qualifiers

      • Research-article

      Conference

      SC '12
      Sponsor:

      Acceptance Rates

      SC '12 Paper Acceptance Rate 100 of 461 submissions, 22%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel® Xeon Phi™ processorProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126956(1-12)Online publication date: 12-Nov-2017
      • (2016)Load balancing for data centreInternational Journal of Wireless and Mobile Computing10.1504/IJWMC.2016.07946411:1(47-53)Online publication date: 1-Jan-2016
      • (2013)Inspector/executor load balancing algorithms for block-sparse tensor contractionsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2467282(483-484)Online publication date: 10-Jun-2013

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media