Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Skip header Section
Beowulf Cluster Computing with LinuxDecember 2003
Publisher:
  • MIT Press
  • 55 Hayward St.
  • Cambridge
  • MA
  • United States
ISBN:978-0-262-69292-2
Published:01 December 2003
Pages:
504
Skip Bibliometrics Section
Reflects downloads up to 06 Oct 2024Bibliometrics
Skip Abstract Section
Abstract

From the Publisher: Use of Beowulf clusters (collections of off-the-shelf commodity computers programmed to act in concert, resulting in supercomputer performance at a fraction of the cost) has spread far and wide in the computational science community. Many application groups are assembling and operating their own private supercomputers rather than relying on centralized computing centers. Such clusters are used in climate modeling, computational biology, astrophysics, and materials science, as well as non-traditional areas such as financial modeling and entertainment. Much of this new popularity can be attributed to the growth of the open-source movement. The second edition of Beowulf Cluster Computing with Linux has been completely updated; all three stand-alone sections have important new material. The introductory material in the first part now includes a new chapter giving an overview of the book and background on cluster-specific issues, including why and how to choose a cluster, as well as new chapters on cluster initialization systems (including ROCKS and OSCAR) and on network setup and tuning. The information on parallel programming in the second part now includes chapters on basic parallel programming and available libraries and programs for clusters. The third and largest part of the book, which describes software infrastructure and tools for managing cluster resources, has new material on cluster management and on the Scyld system.

Cited By

  1. ACM
    Ang J, Carini G, Chen Y, Chuang I, Demarco M, Economou S, Eickbusch A, Faraon A, Fu K, Girvin S, Hatridge M, Houck A, Hilaire P, Krsulich K, Li A, Liu C, Liu Y, Martonosi M, McKay D, Misewich J, Ritter M, Schoelkopf R, Stein S, Sussman S, Tang H, Tang W, Tomesh T, Tubman N, Wang C, Wiebe N, Yao Y, Yost D and Zhou Y (2024). ARQUIN: Architectures for Multinode Superconducting Quantum Computers, ACM Transactions on Quantum Computing, 5:3, (1-59), Online publication date: 30-Sep-2024.
  2. ACM
    Cayton P, Aguilar M and Pinto C Sunfish: An Open Centralized Composable HPC Management Framework Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, (1507-1511)
  3. ACM
    Fotache M, Greavu-Şerban V, Hrubaru I and Tică A Big Data Technologies on Commodity Workstations Proceedings of the 19th International Conference on Computer Systems and Technologies, (110-115)
  4. Luttgau J, Kuhn M, Duwe K, Alforov Y, Betke E, Kunkel J and Ludwig T (2021). Survey of Storage Systems for High-Performance Computing, Supercomputing Frontiers and Innovations: an International Journal, 5:1, (31-58), Online publication date: 15-Mar-2018.
  5. ACM
    Zerpa L The Message-Passing Interface and Parallel SAT-Solvers Proceedings of the International Conference on Future Networks and Distributed Systems, (1-7)
  6. (2017). Scheduling of online compute-intensive synchronized jobs on high performance virtual clusters, Journal of Computer and System Sciences, 85:C, (1-17), Online publication date: 1-May-2017.
  7. Rahman M, Islam N, Lu X and Panda D (2017). A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters, IEEE Transactions on Parallel and Distributed Systems, 28:3, (633-646), Online publication date: 1-Mar-2017.
  8. Reza H, Aguilar M and Jalal S Regression testing of GPU/MIC systems for HPCC Proceedings of the 2015 International Workshop on Software Engineering for High Performance Computing in Science, (30-37)
  9. Islam N, Lu X, Wasi-ur-Rahman M, Shankar D and Panda D Triple-H Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (101-110)
  10. ACM
    Simons J and Buell J (2010). Virtualizing high performance computing, ACM SIGOPS Operating Systems Review, 44:4, (136-145), Online publication date: 13-Dec-2010.
  11. Yang C and Cheng L Implementation of a Performance-Based Loop Scheduling on Heterogeneous Clusters Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing, (44-54)
  12. Dongarra J, Sterling T, Simon H and Strohmaier E (2009). High-Performance Computing, Computing in Science and Engineering, 7:2, (51-59), Online publication date: 1-Mar-2009.
  13. He Y, Al-Azzoni I and Down D MARO - MinDrift affinity routing for resource management in heterogeneous computing systems Proceedings of the 2007 conference of the center for advanced studies on Collaborative research, (71-85)
  14. Bounanos S, Fleury M, Nicolas S and Vickers A (2007). Regular Paper, International Journal of High Performance Computing Applications, 21:2, (222-245), Online publication date: 1-May-2007.
  15. Hung S and Hsu Y DPCT Proceedings of the Second international conference on High Performance Computing and Communications, (320-329)
  16. Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg G, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H and Vadhiyar S (2006). Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, 50:2/3, (223-238), Online publication date: 1-Mar-2006.
  17. Chaisiri S, Pichitlamken J, Uthayopas P, Rojanapanpat T, Phakhawirotkul S and Vorakosit T Applying Web Service and Windows Clustering for High Volume Risk Analysis Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
  18. Wilson B (2005). Introduction to parallel programming using message-passing, Journal of Computing Sciences in Colleges, 21:1, (207-211), Online publication date: 1-Oct-2005.
  19. Gunawi H, Agrawal N, Arpaci-Dusseau A, Arpaci-Dusseau R and Schindler J Deconstructing Commodity Storage Clusters Proceedings of the 32nd annual international symposium on Computer Architecture, (60-71)
  20. Javadi B, Khorsandi S and Akbari M Study of a cluster-based parallel system through analytical modeling and simulation Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV, (1262-1271)
  21. ACM
    Gunawi H, Agrawal N, Arpaci-Dusseau A, Arpaci-Dusseau R and Schindler J (2005). Deconstructing Commodity Storage Clusters, ACM SIGARCH Computer Architecture News, 33:2, (60-71), Online publication date: 1-May-2005.
  22. Haili X, Hong W, Xuebin C, Sungen D and Honghai Z An implementation of interactive jobs submission for grid computing portals Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44, (67-70)
  23. Bilbao J and Garate G First step in a PC cluster development with openMosix Proceedings of the 4th WSEAS International Conference on Applied Informatics and Communications, (1-5)
  24. Lee L, Li K, Yang C, Tseng C, Liu K and Hung C On implementation of a scalable wallet-size cluster computing system for multimedia applications Proceedings of the 5th Pacific Rim conference on Advances in Multimedia Information Processing - Volume Part III, (697-704)
  25. DeBardeleben N, Ligon III W, Pandit S and Stanzione Jr. D Coven " A Framework for High Performance Problem Solving Environments Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
  26. Chien A Architecture of the Entropia Distributed Computing System Proceedings of the 16th International Parallel and Distributed Processing Symposium
Contributors
  • Luddy School of Informatics, Computing, and Engineering
  • Argonne National Laboratory

Reviews

Balaraman Subbanaidu

Lately, there has been a lot of interest in parallel computing and distributed computing for scientific and commercial applications. Of all the relevant technologies, cluster computing with Linux has been gaining predominance. I did not have an opportunity to go through the first edition of this book [1], but, after reading this well-formatted and well-compiled second edition, I do not regret having missed the first. This edition fulfills the purpose of the title very comprehensively. This is a good book for people working on, or planning to work on, cluster computing. The book's chapters are actually a collection of essays from different authors. Each chapter has a summary, and a discussion of future trends or connectivity to the next chapter. There are many relevant and useful references, including Web links. Some chapters, in particular those covering programming, present coding examples. The overall plan and presentation of the book is admirable. Many chapters of the book talk about hardware, software, installation, and constraints. The initial chapters present a very good introduction, and are suitable for beginners in the area of cluster computing. There is a feeling, when reading these first chapters, that we are going back to learn difficult things about hardware, because the authors talk about the central processing unit (CPU), memory, bus, basic input/output system (BIOS), storage, and so on. There is a brief chapter on Linux as well, which Linux experts can perhaps skip. There are topics on message passing interface (MPI) programming that discuss parallel input/output (I/O), fault tolerance, and so on. There is some discussion of improving the performance of such programs. How C++ and Fortran are used is elucidated. Some useful tools are mentioned, which would help those readers who are working on cluster computing environments. Chapter 5 is quite promising, discussing the implementation of clusters. The chapter thoroughly explains the basic backbone of network infrastructure. Fault-tolerance is described. Since this is an important concept, the next edition of the book is expected to cover it much more. The two kinds of message passing for parallel programming are also discussed in the book. Three chapters are dedicated to workload management tools, such as PBS and Condor. There are useful hints for perfecting them. The introduction to writing parallel programs for clusters in the seventh chapter is informative. The book presents the technology coherently, which is a great achievement for both the editors and authors. Nowhere does one feel lost in the knowledge milieu. The material is not exhaustive, however; in a book of this kind, the coverage can only be comprehensive. Online Computing Reviews Service

John P. Dougherty

The stated purpose of this book is to help the reader understand the Beowulf approach to parallel computing. This task is not simple, nor is it small; the 600-page volume only scratches the conceptual surface of cluster computing in the Beowulf world. The book is part of the Scientific and Engineering Computation Series, from MIT Press, that many in scientific parallel computing have come to rely on for concise and practical information. The book is a collection of threaded essays on topics in cluster computing. After an overview, by one of the editors, chapters are partitioned into those covering enabling technologies, parallel programming, and managing clusters. A short concluding chapter is provided for balance, and to address considerations for future changes in cluster computing. Appendices include a reading list, and relevant links from the World Wide Web. Each chapter provides a starting point for some aspect of Beowulf cluster computing, often with small but useful examples. Many of the chapters provide code examples, as well as recipes for the access, installation, and execution of hardware and software components. This approach generally works under the constraints imposed; it is hard to balance clear conceptual treatment with the dense, detailed considerations encountered in actual cluster computing. The overview chapter is appropriate for novices, as well as for more seasoned professionals desiring a quick refresher. I appreciated the glossary, and the brief definitions of terms that seem to appear and mutate throughout this field. Chapters on hardware, Linux, and networks follow the initial overview; they provide descriptions of various components, and their associated issues. The remainder of the first part, on enabling technologies, outlines the installation, configuration, and tuning of a Beowulf. I found the fifth chapter most promising in its walkthrough of the steps needed to configure a simple eight-node cluster example with Red Hat Linux 9, especially the networking setup. The second part, on parallel programming, is a concise treatment of how to design and implement the most basic and most popular parallel algorithms. Many platforms are considered, including C and sockets, Python, Perl, message passing interface (MPI), and parallel virtual machine (PVM). There is an advanced chapter describing how to improve the performance of MPI programs, and another on how to improve PVM fault-tolerance and adaptability. These chapters are fairly complete, but readers may want to have other MPI or PVM references available. The final part of the book involves cluster management. This part does not flow as well as the others, and is more a set (rather than a sequence) of articles on helpful management facilities and topics. The first two chapters provide background on cluster and workload management, reviewing such issues as monitoring, recovery from failure, and software upgrades. A collection of management tools is then discussed. These tools include Condor, Maui, Portable Batch System (PBS), Scyld, and Parallel Virtual File System (PVFS). There is an interesting chapter, just before the conclusion, comparing two Beowulfs, maintained at Argonne National Laboratory, that are about three years apart in age. The reader can glean some information about the expected usage and eventual path toward replacement for a cluster by comparing the experiences associated with these Beowulfs. I would suggest the editors consider similar reports from other laboratories, perhaps where the scale is smaller (and budgets are a stronger driving issue), and/or the applications are not scientific (for example, economic computation or transaction-based processing for business). There are a few typographical errors and other minor distractions in the text, and I have not had the time or the other resources to verify the code examples. I would also read this book one article at a time, when a specific answer or example is needed. Reading the articles helps get you ready to implement a Beowulf cluster, but (like many projects) the actual construction experience is not possible to capture in prose alone. This book is appropriate for people with a reasonable technical background, involved in applications or projects where a Beowulf is advantageous. In other words, you really need to bring a few things to reading this book: experience with programming, Linux/Unix, and basic networking. The reader should be aware of a strongly related book from the same series [1]. This second book is the result of a workshop on Beowulf setup conducted by the authors, and reads more as a case study than this book. Both books have merit, and, ideally, both are useful when managing a cluster. If the ideal is not possible, then my suggestion is to consider the alternate book first, see if it contains what is needed for your application project, and refer to the current book as specific issues arise.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Recommendations