Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Regular Paper: Load-Balanced Drift-Diffusion Model Simulation: Cluster Software Performance Evaluation

Published: 01 May 2007 Publication History

Abstract

Design of an avalanche photodiode with high gain and low noise that can achieve single photon counting is a research application of the drift-diffusion model. System-level load balancing when combined with application-level load balancing is shown to improve the performance of simulation code on a Linux cluster supercomputer. The two forms of load balancing are required to approach a smooth increase in performance with scaling. Centralized and distributed organization of the adaptive simulation code reflected the choice of system software. Marked performance differences were observed when two contrasting cluster software parallelization systems, MOSIX and Charm++, were applied. The paper compares the two dynamically load-balanced systems to MPI implementations (LAM and MPICH), which are statically load balanced. Also considered is AMPI, which is based on Charm++ and includes system-level load balancing but additionally implements all MPI calls.

References

[1]
Agarwal, T. (2004). Strategies for toplogy-aware task mapping and for rebalancing with bounded migrations. Master's thesis, University of Illinois .
[2]
Aggarwal, G., Motwani, R., and Zhu, A. (2003). The load rebalancing problem, in 15th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 258—265.
[3]
Agha, G. (1986). Actors: A model of concurrent computation in distributed systems, Cambridge, MA: MIT Press .
[4]
Aiyarak, P. (2000). Theoretical and experimental studies of the hot electron barrier light emitter. Ph.D. thesis, University of Essex .
[5]
Amir, Y., Awerbuch, B., Barak, A., Borgstom, R.S., and Keren, A. (2000). An opportunity cost approach for job assignment in a scalable computing cluster, IEEE Transactions on Parallel and Distributed Systems, 14(7): 760—768 .
[6]
Anselm, K.A., Nie, H., Lenox, C., Hansing,C., Campbell, J.C., and Streetman, B.G. (1998). Resonant-cavity-enhanced avalanche photodiodes grown by molecular beam epitaxy on InP for detection near 155 µm, Journal of Vacuum Science Tecnology, 16(3): 1426—1429 .
[7]
Bae-Lev, A. (1993). Semiconductors and electronic devices, 3rd edn, London: Prentice Hall .
[8]
Balaton, Z., Kacsuk, P., and Podhorszki, N. (2001). Application monitoring in the grid with GRM and PROVE, in International Conference on Computational Science, pp. 253—262 .
[9]
Bar, M. ( 2002). openMOSIX, an open source Linux cluster project, at http://www.openmosix.org/.
[10]
Barak, A. and Braverman, A. (1997). Memory ushering in a scalable computing cluster, in IEEE 3rd International Conference on Algorithms and Architectures for Parallel Processing, pp. 211—224.
[11]
Barak, A., Guday, S., and Wheeler, R.G. (1993). The MOSIX distributed operating system, load balancing for UNIX, Lecture Notes in Computer Science, vol. 672, Berlin: Springer .
[12]
Barak, A. and La'adan, O. (1998). The MOSIX multicomputer operating system for high performance cluster computing, Journal of Future Generation Computer Systems, 13(4—5): 361—372 .
[13]
Barak, A., La'adan, O., and Shiloh, A. (1999). Scalable cluster computer with MOSIX on LINUX, in Linux Expo'99, pp. 95—100.
[14]
Bhandarkar, M., Kalé, L.V., de Sturler, E., and Hoeflinger, J. (2001). Object-based adaptive load balancing for MPI programs, in International Conference on Computational Science, pp. 108—117. Lecture Notes in Computer Science, vol. 2074, Berlin: Springer.
[15]
Breyer, R. and Riley, S. (1999). Switched, fast, and gigabit Ethernet, San Francisco, CA: Macmillan .
[16]
Burns, G., Daoud, R., and Vaigl, J. (1994). LAM: An open cluster environment for MPI, in Supercomputing Symposium, pp. 379—386.
[17]
Byna, S., Gropp, W., Sun, X., and Thakur, R. (2003). Improving the performance of MPI derived datatypes by optimizing memory-access cost, in 5th IEEE International Conference on Cluster Computing (CLUSTER'03), pp. 412—419.
[18]
Carey, F., Richardson, W.B., Reed, C.S., and Mulvaney, B. (1996). Circuit, devices, and process simulation, Chichester: Wiley .
[19]
Chakravorty, S. (2002). Implementation of parallel mesh partition and ghost generation for the finite element mesh framework, Master's thesis, University of Illinois .
[20]
Chan, T.F. and Tuminaro, R.S. (1987). Design and implementation of parallel multigrid algorithms, in S. F. McCormick, editor, Third Copper Mountain Conference on Multigrid Methods, pp. 101—115.
[21]
Chuang, S.L. (1995). Physics of optoelectronic devices, Wiley Interscience .
[22]
Darling, J.P. and Mayergoyz, I.D. (1990). Parallel algorithm for the solution of nonlinear Poisson equation of semiconductor device theory and its implementation on the MPP, Journal of Parallel and Distributed Computing, 8(2): 161—168 .
[23]
Frederickson, P.O. and McBryan, O.A. (1988). Parallel super-convergent multigrid, in S. F. McCormick, editor, Multigrid methods: Theory, applications, and supercomputing, Lecture Notes in Pure and Applied Mathematics, vol. 110, pp. 195—210, Berlin: Springer .
[24]
Griebel, M. and Zumbusch, G. (1998). Parallel multigrid in an adaptive PDE solver based on hashing, in E. D'Hollander, G. Joubert, F. Peters, and U. Trottenberg, editors, Parallel Computing Conference, ParCo '97, pp. 589—599.
[25]
Gropp, W., Huss-Lederman, S., Lumsdains, A., Lusk, E., Nitzberg, B., Saphir, W., and Snir, M. (1998). MPI—-The complete reference: Volume 2, The MPI-2 extensions, 2nd edn, Cambridge, MA: MIT Press .
[26]
Gropp, W. and Lusk, E. (1997). A high-performance MPI implementation on a shared-mmeory vector supercomputer, Parallel Computing, 22: 1513—1526 .
[27]
Gropp, W., Lusk, E., Doss, N., and Skjellum, A. (1996). A high-performance, portable implementation of the MPI message passing interface standard, Parallel Computing, 22(6): 789—828 .
[28]
Gropp, W., Lusk, E., and Skjellum, A. (1994). Using MPI portable parallel programming with the message-passing interface, Cambridge, MA: MIT Press .
[29]
Gummel, H.K. (1964). A self consistent iterative scheme for one-dimensional steady-state transistor calculations, IEEE Transactions on Electron Devices, 11: 455—456 .
[30]
Heise, B. and Jung, M. (1997). Parallel solvers for nonlinear elliptic problems on domain decomposition ideas, Parallel Computing, 22(11): 1527—1544 .
[31]
Huang, C., Lawlor, O. and Kalé, L.V. (2003). Adaptive MPI, in 16th International Workshop on Languages and Compilers for Parallel Computing, paper no. 03-07.
[32]
Kalé, L. and Krishnan, S. (1993). CHARM++: A portable concurrent object oriented system based on C++, in Proceedings of OOPSLA'93, pp. 91—108.
[33]
Kalé, L.V. (2002). The virtualization model of parallel programming: Runtime optimizations and the state of art.
[34]
Kalé, L.V. and Krishnan, S. (1996). Charm++: Parallel programming with message-driven objects, in G. V. Wilson and P. Lu, editors, Parallel programming using C++, pp. 175—213, Cambridge, MA: MIT Press .
[35]
Kalé, L.V., Kumar, S., Zheng, G., and Lee, C.W. (2003). Scaling molecular dynamics to 3000 processors with projections: A performance analysis case study, in Terascale Performance Analysis Workshop, International Conference on Computational Science (ICCS).
[36]
Karypis, G. and Kumar, V. (1998). Multilevel k-way partitioning scheme for irregular graphs, Journal of Parallel and Distributed Computing, 48(1): 96—129 .
[37]
Korman, C.E. and Mayergoyz, I.D. (1990). A globally convergent algorithm for the solution of the steady-state semiconductor device equations, Journal of Applied Physics, 68(3): 1324—1334 .
[38]
Leach, R. (1994). Advanced topics in UNIX: Processes, files, and systems, New York: Wiley .
[39]
Lenox, C., Nie, H., Yan, P., Kinsey, G., Holmes, A.L., Streetman, B.G., and Campbell, J.C. (1999). Resonant-cavity InGaAs/InAlAs avalanche photodiodes with gain-bandwidth of 290 GHz, IEEE Photonic Technology Letters, 11(9): 1162—1164 .
[40]
Lin, P. and Wu, C. (1987). A new approach to analytically solving the two-dimensional Poisson's equation and its applications in short-channel MOSFET modeling, IEEE Transactions on Electron Devices, 34(9): 1947—1956 .
[41]
Mahseyekhi, H.R. (1999). Theoretical and experimental studies of back-gated metal-semiconductor-metal photodetectors, Ph.D. thesis, University of Essex .
[42]
Markus, S., Kim, S.B., Pantazapoulos, K., Houstis, E.N., Ocken, A.L., Wu, P., Weerawarana, S., and Maharry, D. (1996). Performance evaluation of MPI implementations and MPI based parallel ELLPACK solvers, in 2nd MPI Developers Conference, pp. 162—168.
[43]
Matsushima, Y., Akiba, S., Sakai, K., Kushiro, Y., Noda, Y., and Utaka, K. (1982). High-speed response InGaAs/InP heterostructure avalanache photodidode with InGaAsP buffer layers, IEE Electronic Letters, 18(22): 945—946 .
[44]
Mitchell, W.F. (2001). Adaptive grid refinement and multigrid on cluster computer, in IEEE 15th International Parallel and Distributed Processing Symposium.
[45]
Molenaar, J. (1993). Multigrid methods for semiconductor device simulation, CWI Tract, vol. 100, Centre for Mathematics and Computer Science, Amsterdam .
[46]
Nevin, N.J. (1996). The performance of LAM 6.0 and MPICH 1.0.12 on a workstation cluster, Technical Report OSC-TR-1996-4, Ohio Supercomputing Center, Columbus, Ohio .
[47]
Nupairoj, N. and Ni, L. (1996). Performance evaluation of some MPI implementations on workstation clusters, Technical report, Department of Computer Science, Michigan State University .
[48]
Open Systems Laboratory (2004). The LAM/MPI user guide v. 7.1.1, Open Systems Laboratory, Indiana University .
[49]
Parallel Programming Laboratory (2001). The Charm++ programming manual, Version 5.4, Parallel Programming Laboratory, University of Illinois, Urbana .
[50]
Parks, J.W., Smith, A.W., Brenman, K.F., and Tarof, L.E. (1996). Theoretical study of device sensitivity and gain saturation of separate absorption, grading, charge, and multiplication InP/InGaAs avalanche photodiodes, IEEE Electronic Devices, 43(12): 2113—2120 .
[51]
Ramkumar, B. and Kalé, L.V. (1994a). Machine independent AND and OR parallel execution of logic programs: Part I-The binding environment, IEEE Transactions on Parallel and Distributed Systems, 5(2): 170—180 .
[52]
Ramkumar, B. and Kalé, L.V. (1994b). Machine independent AND and OR parallel execution of logic programs: Part II-Compiled execution, IEEE Transactions on Parallel and Distributed Systems, 5(2): 181—192 .
[53]
Rarity, J.G., Wall, T.E., Ridley, K.D., Owens, P.C.M., and Tapster, P.R. (2000). Single-photon counting for the 1300—1600-nm range by use of Peltier-cooled and passively quenched InGaAs avalanche photodiodes, Applied Optics, 39(36): 6746—6753 .
[54]
Ridge, D., Becker, D., Merkey, P., and Sterling, T. (1997). Beowulf: Harnessing the power of parallelism in a pile-of-PCs, in IEEE Aerospace Conference, vol. 2, pp. 79— 91.
[55]
Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., and Dongarra, J. (1998). MPI—-The complete reference: Volume 1, The MPI core, 2nd edn, Cambridge, MA: MIT Press .
[56]
Squyres, J.M. and Lumsdaine, A. (2003). A component architecture for LAM/MPI, in 10th PVM/MPI Users' Group Meeting, Lecture Notes in Computer Science, vol. 2840, pp. 379—387, Berlin: Springer.
[57]
Srinivasan, S. (1995). XDR: External data representation standard, RFC 1832.
[58]
Sterling, T. (ed.) (2002a). Beowulf cluster computing with Linux, Cambridge, MA: MIT Press .
[59]
Sterling, T. (2002b). Network hardware, in Beowulf cluster computing with Linux, pp. 113—130, Cambridge, MA: MIT Press .
[60]
Sterling, T. (2002c). Node hardware, in Beowulf cluster computing with Linux, pp. 31—60, Cambridge, MA: MIT Press .
[61]
Tagushi, K., Torikai, T., and Sugimoto, Y. (1988). Planar-structure InP/InGaAsP/InGaAs avalanche photodiodes with preferential lateral extended guard ring for 1.0—1.6 µm wavelength optical communication use, Journal of Lightwave Technology, 6(11): 1643—1655 .
[62]
Vadali, R.V., Shi, Y., Kumar, S., Kalé, L.V., Tuckerman, M.E., and Martyna, G.J. (2004). Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers, Journal of Computational Chemistry, 25(16): 2006—2022 .
[63]
Vickers, A., Hassan, M.A., Mashakekhi, H.R., Griguoli, A., and Hopkinson, M. (1996). Study of a backgated metal-semiconductor-metal photodetector, Applied Physics Letters, 68(6): 815—817 .
[64]
Wada, A. and Hasegawa, H. (1999). InP materials and devices, Wiley Interscience .

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications  Volume 21, Issue 2
May 2007
116 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 May 2007

Author Tags

  1. Charm++
  2. MOSIX
  3. adaptive simulation
  4. cluster computing
  5. optoelectronics

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media