Abstract
Despite the exponential growth in demand for advanced computational skills driven by big data, machine learning, and artificial intelligence, higher education institutions still face a significant shortage of dedicated course offerings pertaining to High Performance Computing (HPC). This educational deficiency not only hampers the preparedness of undergraduate students for cutting-edge postgraduate programs but also impairs their readiness to enter a dynamic workforce increasingly reliant on sophisticated computational capabilities. Integrating comprehensive HPC courses at the undergraduate level is critical for equipping students with expertise to effectively utilize modern computing technologies, and also for bridging the growing gap between academic preparation and industry demands. At Wake Forest University (WFU), we, members of the HPC Team, are actively working to address the educational gap in HPC by integrating the WFU HPC Facility[4] into higher-level elective courses across various disciplines. Recognizing the foundational importance of these skills, we have developed an introductory course specifically designed to equip students with the knowledge to excel in advanced courses, in graduate and research programs, and to meet the demands of the modern workforce. By integrating the WFU HPC Facility into our curriculum, the University is committed to pioneering a comprehensive educational pathway that empowers students to leverage the full potential of computing technologies in their future careers.
WFU is an R-2 liberal arts institution with around 9,000 students[6] that actively supports undergraduate research through a multitude of departments and programs. Undergraduate research is so paramount to the University mission, that WFU has a dedicated center, the Undergraduate Research and Creative Activities (URECA) Center, just for this purpose. Many students engage in research projects that leverage the resources of the WFU HPC Facility. The facility’s main asset, the Distributed Environment for Academic Computing (DEAC) Cluster, contains approximately 4000 CPU cores and 20TB of RAM, and is a true interdisciplinary tool; in 2023 it was utilized by 15 departments and 500 active users to submit over 1.5 million computational tasks on a vast array of research topics.
The interdisciplinary nature of the DEAC Cluster played an instrumental role in developing an introductory course in HPC that caters to students from a diverse number of majors. Having supported a wide range of researchers, we designed the course to bridge concepts and applications from Computer Science, Engineering, Data Science, and the Natural Sciences to their respective academic domains. By enabling students from multiple disciplines to access foundational HPC skills, we foster a versatile educational environment where collaborative and interdisciplinary learning thrives. One way that we ensure our introductory course is accessible to all students is that we do not require any prerequisite classes to enroll in the course. Students are also not expected to have any prior experience in programming. We have chosen Python as the primary programming language for the course, as it is one of the most versatile and widely-used languages in the fields of data science and machine learning, and can easily interface with parallel frameworks such as MPI and OpenMP. Students gain hands-on experience by developing asynchronous workflows, which are then executed on the DEAC Cluster. This practical focus not only demystifies complex computational concepts but also empowers students to apply their learning in real-world scenarios. HPC serves as a cornerstone for two distinct user groups, each integral to its advancement and application. The first encompasses those who enable and optimize HPC systems, including Computer Scientists, Computer Engineers, Systems Administrators, and Cyberinfrastructure Professionals, who enhance computational efficiency and build the underlying hardware infrastructure. The second group comprises scientists and researchers across diverse fields such as Statistics, Chemistry, Biology, Physics, Engineering, and more, who leverage HPC as a powerful tool for simulating complex phenomena, analyzing large datasets, and researching novel problems in their respective domains. While current course offerings at other institutions seem to prioritize the first group and educate students on how to build and enable an HPC cluster, we have chosen to prioritize curriculum for the second group as the skills they gain through our course’s curriculum will help them as they continue their academic career in higher level electives and independent research projects with faculty advisors.
We choose to offer our course during the Spring semester in order to prepare students who may want to pursue research during the summer session. The first half of the course serves as foundational cluster training, familiarizing students with essential skills to work within an HPC environment. In this segment, students delve into the Linux command line interface (CLI) using Bashcrawl[3] and explore the intricacies of the Linux filesystem and environment modules. A significant focus is placed on understanding and utilizing job schedulers, such as the Slurm resource manager[2]. Another unique feature of this segment is the guided tour of the Wake Forest datacenter. This tour provides students with a tangible understanding of how the theoretical concepts discussed in class are implemented in a real-world HPC cluster. To further provide a reference to the resources requested for their jobs through Slurm, the tour concludes with students disassembling retired compute nodes to learn about the different components that comprise modern servers. The midterm assessment challenges students to submit multiple jobs, analyzing the effects of varying input sizes and the use of multiple CPU processors on calculation speed. Upon completion of this initial phase, students are fully equipped to engage in research activities under an advisor and effectively utilize an HPC cluster outside the confines of the classroom. Many apply for summer grants through the aforementioned URECA program with a faculty advisor.
In the latter half of the course, the curriculum shifts towards more advanced topics, focusing on parallel computing frameworks and technologies. Students are introduced to MPI and OpenMP, which are essential for developing parallel applications that can run efficiently on today’s multi-processor systems. Additionally, the course delves into high-speed interconnects, crucial for optimizing communication between different parts of an HPC cluster. One of our final topics covers GPU computing, with a particular emphasis on using NVIDIA GPUs and the CUDA programming platform, enabling students to harness the power of graphical processing for computational tasks. As an example from our Spring 2024 semester, students built a “chatbot” using Meta’s Llama 2 model[5] on both CPU and GPU using LLaMA C++[1], and compared its performance to ChatGPT while interacting freely with it.
Our course is designed to complement other specialized courses found in Computer Science or in Computer Engineering programs, such as Parallel Algorithms, Computer Vision, or Deep Learning. It aims to introduce these critical computational concepts and provide a solid foundation that prepares students for these more advanced electives. By the end of the course, students are not only familiar with the basic principles of HPC but are also primed to tackle more specialized studies and research in their future academic and professional pursuits.
It is not uncommon that a course may require students to use a specific programming language or software. While there are tools such as Google Colab and zyBooks that provide a browser-based interface to computing resources, those tools can be very limited in what resources they can provide. A faculty member might then want students to install software locally on their laptop, but this can be challenging when students bring their own device to the classroom as they may be running different operating systems or may have different hardware platforms. This can cause the software to behave differently or it may not even be available on that given platform. Courses with significant computational demands are better served utilizing a unified computing environment, and an HPC facility is ideally equipped to provide a consistent learning environment where each student has access to the same software and computing resources. A primary challenge in integrating HPC resources into coursework is instructing students on the use of schedulers for asynchronous workloads. Our introductory HPC course effectively bridges this gap by providing the necessary training and context, enabling students to engage with advanced topics more efficiently, without the steep initial learning curve typically associated with these environments.
Our HPC facility has proven instrumental in enhancing educational experiences across a variety of disciplines, demonstrating significant benefits in classes such as Statistics, Natural Language Processing, Parallel Algorithms, Computer Vision, Physics Laboratory, Cancer Biology, Environmental Physics, Computational Modeling, and more. Moreover, students in fields like Finance and Business and Enterprise Management have also successfully leveraged our HPC resources, and have performed analysis on client data that was protected under a nondisclosure agreement which prevented students from storing the data locally on their laptop or with commercial cloud providers. This integration not only facilitates sophisticated computational tasks, but also allows students and faculty to easily share and store large datasets that the students may need to access without having to consume space on their local device.
One of our primary goals is to promote diversity and interdisciplinary collaboration within this course, and this semester attracted a notably diverse group of students, with majors ranging from Biology and Statistics to Applied Mathematics, Economics, and Computer Science. Although the course is currently catalogued under the Computer Science department, we recognize that associating it with any single discipline could potentially limit its appeal and accessibility. The diverse enrollment underscores the interdisciplinary relevance of HPC skills across various fields of study. We are leveraging the current success and broad interest in the course as a foundation to establish a new academic program dedicated to High Performance Computing. This new program will serve as a hub for integrating computational skills across different disciplines, fostering a broader understanding and application of HPC in various scientific and economic sectors.
The HPC team’s commitment to High Performance Computing education extends beyond traditional academic structures. While we are not developing a new major, minor, certificate track, or concentration in HPC, our objective is to make HPC education accessible and applicable across various disciplines without the constraints of a single departmental bias. This approach allows students from any field to engage with HPC skills as an integral part of their academic and professional development. To achieve this, we are actively collaborating with multiple academic departments to ensure that our HPC course offerings are recognized as fulfilling degree requirements across a range of programs. One way we collaborate with these departments is by altering activities and projects to use different languages and software, such as R and MATLAB, for the Statistics and Engineering departments, while still maintaining the same learning goals we achieve with Python. This strategy not only enhances the versatility of our course but also promotes a more comprehensive integration of the university’s HPC facilities into the curriculum. By doing so, we allow faculty in different departments to integrate our projects into their courses and utilize our HPC facility, even if it is for only one or two projects throughout the semester.
Our efforts are focused on fostering a collaborative academic environment where the HPC facility is not just an isolated resource used for research but a central part of our educational infrastructure. By working across disciplines, we hope to catalyze a deeper engagement with HPC technologies throughout the university, enhancing both teaching and research capacities across departments.
In conclusion, the escalating demand for big data, machine learning, and artificial intelligence is not only transforming industries but also reshaping educational requirements. As these fields expand, the need for substantial computational resources becomes increasingly critical. The HPC facility at Wake Forest University is exceptionally well-equipped to meet these demands, by providing a unified computing environment that supports an array of academic endeavors. Our initiative to develop introductory HPC courses is a strategic response to this need, preparing students to proficiently utilize HPC resources in higher-level electives and beyond. These courses are pivotal in bridging the gap between conventional academic programs and the rigorous computational needs of modern disciplines. Looking forward, the necessity for such educational offerings will only intensify as the reliance on advanced computational technologies grows. By anticipating and responding to these educational demands, Wake Forest University’s HPC academic program not only enhances student readiness for future challenges but also positions the university at the forefront of academic innovation in the computational sciences.