research-article

Regular Paper: Load-Balanced Drift-Diffusion Model Simulation: Cluster Software Performance Evaluation

Authors:

Stylianos Bounanos,

Sebastien Nicolas,

Anthony VickersAuthors Info & Claims

International Journal of High Performance Computing Applications, Volume 21, Issue 2

Pages 222 - 245

https://doi.org/10.1177/1094342007078226

Published: 01 May 2007 Publication History

Abstract

Design of an avalanche photodiode with high gain and low noise that can achieve single photon counting is a research application of the drift-diffusion model. System-level load balancing when combined with application-level load balancing is shown to improve the performance of simulation code on a Linux cluster supercomputer. The two forms of load balancing are required to approach a smooth increase in performance with scaling. Centralized and distributed organization of the adaptive simulation code reflected the choice of system software. Marked performance differences were observed when two contrasting cluster software parallelization systems, MOSIX and Charm++, were applied. The paper compares the two dynamically load-balanced systems to MPI implementations (LAM and MPICH), which are statically load balanced. Also considered is AMPI, which is based on Charm++ and includes system-level load balancing but additionally implements all MPI calls.

References

[1]

Agarwal, T. (2004). Strategies for toplogy-aware task mapping and for rebalancing with bounded migrations. Master's thesis, University of Illinois .

[2]

Aggarwal, G., Motwani, R., and Zhu, A. (2003). The load rebalancing problem, in 15th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 258—265.

Digital Library

[3]

Agha, G. (1986). Actors: A model of concurrent computation in distributed systems, Cambridge, MA: MIT Press .

[4]

Aiyarak, P. (2000). Theoretical and experimental studies of the hot electron barrier light emitter. Ph.D. thesis, University of Essex .

[5]

Amir, Y., Awerbuch, B., Barak, A., Borgstom, R.S., and Keren, A. (2000). An opportunity cost approach for job assignment in a scalable computing cluster, IEEE Transactions on Parallel and Distributed Systems, 14(7): 760—768 .

Digital Library

[6]

Anselm, K.A., Nie, H., Lenox, C., Hansing,C., Campbell, J.C., and Streetman, B.G. (1998). Resonant-cavity-enhanced avalanche photodiodes grown by molecular beam epitaxy on InP for detection near 155 µm, Journal of Vacuum Science Tecnology, 16(3): 1426—1429 .

[7]

Bae-Lev, A. (1993). Semiconductors and electronic devices, 3rd edn, London: Prentice Hall .

Digital Library

[8]

Balaton, Z., Kacsuk, P., and Podhorszki, N. (2001). Application monitoring in the grid with GRM and PROVE, in International Conference on Computational Science, pp. 253—262 .

Digital Library

[9]

Bar, M. ( 2002). openMOSIX, an open source Linux cluster project, at http://www.openmosix.org/.

[10]

Barak, A. and Braverman, A. (1997). Memory ushering in a scalable computing cluster, in IEEE 3rd International Conference on Algorithms and Architectures for Parallel Processing, pp. 211—224.

[11]

Barak, A., Guday, S., and Wheeler, R.G. (1993). The MOSIX distributed operating system, load balancing for UNIX, Lecture Notes in Computer Science, vol. 672, Berlin: Springer .

Digital Library

[12]

Barak, A. and La'adan, O. (1998). The MOSIX multicomputer operating system for high performance cluster computing, Journal of Future Generation Computer Systems, 13(4—5): 361—372 .

Digital Library

[13]

Barak, A., La'adan, O., and Shiloh, A. (1999). Scalable cluster computer with MOSIX on LINUX, in Linux Expo'99, pp. 95—100.

[14]

Bhandarkar, M., Kalé, L.V., de Sturler, E., and Hoeflinger, J. (2001). Object-based adaptive load balancing for MPI programs, in International Conference on Computational Science, pp. 108—117. Lecture Notes in Computer Science, vol. 2074, Berlin: Springer.

Digital Library

[15]

Breyer, R. and Riley, S. (1999). Switched, fast, and gigabit Ethernet, San Francisco, CA: Macmillan .

Digital Library

[16]

Burns, G., Daoud, R., and Vaigl, J. (1994). LAM: An open cluster environment for MPI, in Supercomputing Symposium, pp. 379—386.

[17]

Byna, S., Gropp, W., Sun, X., and Thakur, R. (2003). Improving the performance of MPI derived datatypes by optimizing memory-access cost, in 5th IEEE International Conference on Cluster Computing (CLUSTER'03), pp. 412—419.

[18]

Carey, F., Richardson, W.B., Reed, C.S., and Mulvaney, B. (1996). Circuit, devices, and process simulation, Chichester: Wiley .

Digital Library

[19]

Chakravorty, S. (2002). Implementation of parallel mesh partition and ghost generation for the finite element mesh framework, Master's thesis, University of Illinois .

[20]

Chan, T.F. and Tuminaro, R.S. (1987). Design and implementation of parallel multigrid algorithms, in S. F. McCormick, editor, Third Copper Mountain Conference on Multigrid Methods, pp. 101—115.

[21]

Chuang, S.L. (1995). Physics of optoelectronic devices, Wiley Interscience .

[22]

Darling, J.P. and Mayergoyz, I.D. (1990). Parallel algorithm for the solution of nonlinear Poisson equation of semiconductor device theory and its implementation on the MPP, Journal of Parallel and Distributed Computing, 8(2): 161—168 .

Digital Library

[23]

Frederickson, P.O. and McBryan, O.A. (1988). Parallel super-convergent multigrid, in S. F. McCormick, editor, Multigrid methods: Theory, applications, and supercomputing, Lecture Notes in Pure and Applied Mathematics, vol. 110, pp. 195—210, Berlin: Springer .

[24]

Griebel, M. and Zumbusch, G. (1998). Parallel multigrid in an adaptive PDE solver based on hashing, in E. D'Hollander, G. Joubert, F. Peters, and U. Trottenberg, editors, Parallel Computing Conference, ParCo '97, pp. 589—599.

[25]

Gropp, W., Huss-Lederman, S., Lumsdains, A., Lusk, E., Nitzberg, B., Saphir, W., and Snir, M. (1998). MPI—-The complete reference: Volume 2, The MPI-2 extensions, 2nd edn, Cambridge, MA: MIT Press .

[26]

Gropp, W. and Lusk, E. (1997). A high-performance MPI implementation on a shared-mmeory vector supercomputer, Parallel Computing, 22: 1513—1526 .

Digital Library

[27]

Gropp, W., Lusk, E., Doss, N., and Skjellum, A. (1996). A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, 22(6): 789—828 .

Digital Library

[28]

Gropp, W., Lusk, E., and Skjellum, A. (1994). Using MPI portable parallel programming with the message-passing interface, Cambridge, MA: MIT Press .

Digital Library

[29]

Gummel, H.K. (1964). A self consistent iterative scheme for one-dimensional steady-state transistor calculations, IEEE Transactions on Electron Devices, 11: 455—456 .

[30]

Heise, B. and Jung, M. (1997). Parallel solvers for nonlinear elliptic problems on domain decomposition ideas, Parallel Computing, 22(11): 1527—1544 .

Digital Library

[31]

Huang, C., Lawlor, O. and Kalé, L.V. (2003). Adaptive MPI, in 16th International Workshop on Languages and Compilers for Parallel Computing, paper no. 03-07.

[32]

Kalé, L. and Krishnan, S. (1993). CHARM++: A portable concurrent object oriented system based on C++, in Proceedings of OOPSLA'93, pp. 91—108.

Digital Library

[33]

Kalé, L.V. (2002). The virtualization model of parallel programming: Runtime optimizations and the state of art.

[34]

Kalé, L.V. and Krishnan, S. (1996). Charm++: Parallel programming with message-driven objects, in G. V. Wilson and P. Lu, editors, Parallel programming using C++, pp. 175—213, Cambridge, MA: MIT Press .

[35]

Kalé, L.V., Kumar, S., Zheng, G., and Lee, C.W. (2003). Scaling molecular dynamics to 3000 processors with projections: A performance analysis case study, in Terascale Performance Analysis Workshop, International Conference on Computational Science (ICCS).

[36]

Karypis, G. and Kumar, V. (1998). Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing, 48(1): 96—129 .

Digital Library

[37]

Korman, C.E. and Mayergoyz, I.D. (1990). A globally convergent algorithm for the solution of the steady-state semiconductor device equations, Journal of Applied Physics, 68(3): 1324—1334 .

[38]

Leach, R. (1994). Advanced topics in UNIX: Processes, files, and systems, New York: Wiley .

Digital Library

[39]

Lenox, C., Nie, H., Yan, P., Kinsey, G., Holmes, A.L., Streetman, B.G., and Campbell, J.C. (1999). Resonant-cavity InGaAs/InAlAs avalanche photodiodes with gain-bandwidth of 290 GHz, IEEE Photonic Technology Letters, 11(9): 1162—1164 .

[40]

Lin, P. and Wu, C. (1987). A new approach to analytically solving the two-dimensional Poisson's equation and its applications in short-channel MOSFET modeling, IEEE Transactions on Electron Devices, 34(9): 1947—1956 .

[41]

Mahseyekhi, H.R. (1999). Theoretical and experimental studies of back-gated metal-semiconductor-metal photodetectors, Ph.D. thesis, University of Essex .

[42]

Markus, S., Kim, S.B., Pantazapoulos, K., Houstis, E.N., Ocken, A.L., Wu, P., Weerawarana, S., and Maharry, D. (1996). Performance evaluation of MPI implementations and MPI based parallel ELLPACK solvers, in 2nd MPI Developers Conference, pp. 162—168.

Digital Library

[43]

Matsushima, Y., Akiba, S., Sakai, K., Kushiro, Y., Noda, Y., and Utaka, K. (1982). High-speed response InGaAs/InP heterostructure avalanache photodidode with InGaAsP buffer layers, IEE Electronic Letters, 18(22): 945—946 .

[44]

Mitchell, W.F. (2001). Adaptive grid refinement and multigrid on cluster computer, in IEEE 15th International Parallel and Distributed Processing Symposium.

Digital Library

[45]

Molenaar, J. (1993). Multigrid methods for semiconductor device simulation, CWI Tract, vol. 100, Centre for Mathematics and Computer Science, Amsterdam .

[46]

Nevin, N.J. (1996). The performance of LAM 6.0 and MPICH 1.0.12 on a workstation cluster, Technical Report OSC-TR-1996-4, Ohio Supercomputing Center, Columbus, Ohio .

[47]

Nupairoj, N. and Ni, L. (1996). Performance evaluation of some MPI implementations on workstation clusters, Technical report, Department of Computer Science, Michigan State University .

[48]

Open Systems Laboratory (2004). The LAM/MPI user guide v. 7.1.1, Open Systems Laboratory, Indiana University .

[49]

Parallel Programming Laboratory (2001). The Charm++ programming manual, Version 5.4, Parallel Programming Laboratory, University of Illinois, Urbana .

[50]

Parks, J.W., Smith, A.W., Brenman, K.F., and Tarof, L.E. (1996). Theoretical study of device sensitivity and gain saturation of separate absorption, grading, charge, and multiplication InP/InGaAs avalanche photodiodes, IEEE Electronic Devices, 43(12): 2113—2120 .

[51]

Ramkumar, B. and Kalé, L.V. (1994a). Machine independent AND and OR parallel execution of logic programs: Part I-The binding environment, IEEE Transactions on Parallel and Distributed Systems, 5(2): 170—180 .

Digital Library

[52]

Ramkumar, B. and Kalé, L.V. (1994b). Machine independent AND and OR parallel execution of logic programs: Part II-Compiled execution, IEEE Transactions on Parallel and Distributed Systems, 5(2): 181—192 .

Digital Library

[53]

Rarity, J.G., Wall, T.E., Ridley, K.D., Owens, P.C.M., and Tapster, P.R. (2000). Single-photon counting for the 1300—1600-nm range by use of Peltier-cooled and passively quenched InGaAs avalanche photodiodes, Applied Optics, 39(36): 6746—6753 .

[54]

Ridge, D., Becker, D., Merkey, P., and Sterling, T. (1997). Beowulf: Harnessing the power of parallelism in a pile-of-PCs, in IEEE Aerospace Conference, vol. 2, pp. 79— 91.

[55]

Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., and Dongarra, J. (1998). MPI—-The complete reference: Volume 1, The MPI core, 2nd edn, Cambridge, MA: MIT Press .

Digital Library

[56]

Squyres, J.M. and Lumsdaine, A. (2003). A component architecture for LAM/MPI, in 10th PVM/MPI Users' Group Meeting, Lecture Notes in Computer Science, vol. 2840, pp. 379—387, Berlin: Springer.

[57]

Srinivasan, S. (1995). XDR: External data representation standard, RFC 1832.

Digital Library

[58]

Sterling, T. (ed.) (2002a). Beowulf cluster computing with Linux, Cambridge, MA: MIT Press .

Digital Library

[59]

Sterling, T. (2002b). Network hardware, in Beowulf cluster computing with Linux, pp. 113—130, Cambridge, MA: MIT Press .

Digital Library

[60]

Sterling, T. (2002c). Node hardware, in Beowulf cluster computing with Linux, pp. 31—60, Cambridge, MA: MIT Press .

Digital Library

[61]

Tagushi, K., Torikai, T., and Sugimoto, Y. (1988). Planar-structure InP/InGaAsP/InGaAs avalanche photodiodes with preferential lateral extended guard ring for 1.0—1.6 µm wavelength optical communication use, Journal of Lightwave Technology, 6(11): 1643—1655 .

[62]

Vadali, R.V., Shi, Y., Kumar, S., Kalé, L.V., Tuckerman, M.E., and Martyna, G.J. (2004). Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers, Journal of Computational Chemistry, 25(16): 2006—2022 .

[63]

Vickers, A., Hassan, M.A., Mashakekhi, H.R., Griguoli, A., and Hopkinson, M. (1996). Study of a backgated metal-semiconductor-metal photodetector, Applied Physics Letters, 68(6): 815—817 .

[64]

Wada, A. and Hasegawa, H. (1999). InP materials and devices, Wiley Interscience .

Index Terms

Regular Paper: Load-Balanced Drift-Diffusion Model Simulation: Cluster Software Performance Evaluation

Recommendations

Integrating OpenMP into the Charm++ Programming Model
ESPM2'17: Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware

The recent trend of rapid increase in the number of cores per chip has resulted in vast amounts of on-node parallelism. These high core counts result in hardware variability that introduces imbalance. Applications are also becoming more complex ...
Charm++ and MPI: Combining the Best of Both Worlds
IPDPS '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium

Charm++ and MPI embody two distinct perspectives for writing parallel programs. While MPI provides a process-centric, user-driven model for developing parallel codes, Charm++ supports work-centric, system-driven parallel programming. One of them might ...
Quantifying Overheads in Charm++ and HPX Using Task Bench
Euro-Par 2022: Parallel Processing Workshops
Abstract
Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core architectures with light-weight threads, asynchronous executions, and smart scheduling. In this paper, we present the comparison of the AMT systems Charm++ and HPX with the ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications

International Journal of High Performance Computing Applications Volume 21, Issue 2

May 2007

116 pages

ISSN:1094-3420

Issue’s Table of Contents

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2007

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents