Mpi For Python
Mpi For Python
Abstract—The ability to design effective solutions using par- computer clusters, and specially in the LittleFe [15] educa-
allel processing should be a required competency for every com- tional Cluster [14]. The educational modules can be used to
puting student. However, teaching parallel concepts is sometimes introduce students PDC concepts such as: the map reduce and
challenging and costly, specially at early stages of a computer master-slave models, load distribution, fault tolerance, scaling,
science degree. For such reasons we present a set of modules to
and security and privacy. The programming exercises use the
teach parallel computing paradigms using as examples problems
that are computationally intensive, but easy to understand and Python MPI for Python and Disco (map-reduce) libraries to
can be easily implemented using the Python parallelization deemphasize attention to low-level details and facilitate solu-
libraries MPI for Python and Disco. tions to more elaborate problems. Each module is constructed
around a real-world or research problem to showcase the
Keywords—parallel computing, mpi, mapreduce, master worker importance of parallel computing in practice and to motivate
the students. An extended abstract about the modules was
presented in [14]. Here we extend the work and share our
I. I NTRODUCTION experiences deploying them in our CS department.
Hands-on programming exercises and code demonstra- Our modules include:
tions/experiments are an essential educational resource for
teaching introductory parallel and distributed computing (PDC) • One module with instructions to install MPI for
concepts. The design of effective and applicable exercises can Python and Disco to the Little Fe cluster.
be challenging to the instructor, especially since many libraries • Modules to introduce MPI and MapReduce.
or frameworks for parallel programming require intricate sys-
tem setups and training students in low-level details to get even • Hands On modules with computational problems to
the most basic programs to work. In our teaching experience, be solved in parallel with MPI or MapReduce.
low-level details in parallel computation exercises lead, among
other things, to student (and instructor) frustration and deter For those interested in testing/using our modules, source
the students from focusing on the essential concepts [13]. code can be found at [1].
978-1-5090-3682-0/16
/16 $31.00 © 2016 IEEE
$31.00 © 2016 IEEE 958
DOI 10.1109/IPDPSW.2016.204
Fig. 1. Hello world listings in Python and C++ MPI.
959
representations.
960
Fig. 7. Map/Reduce diagram of the password cracking problem. Node
0 broadcasts the password file to all nodes. The dictionary is divided in
approximately the same size for each node. Cracked passwords are sent back
to the master.
961
of time (flow). NetFlow files consist of large quantities of libraries widely used in high performance computing applica-
connection information such as IP addresses of the source and tions. The modules use real-world examples to engage students
destination hosts, and the traffic in each connection (e.g., 5 in the hands-on experience. We also present PDC concepts that
minutes of UPR network traffic can be as high as 6.5MB, or can be introduced per course of the CS curriculum.
363,149 lines of flows).
The presented modules are easy to set up by the instructor
even in low-cost parallel platforms such as the LittleFe. Their
focus on real-world examples makes them a great resource
to provide hands-on experiences on PDC concepts to students
starting at the introductory courses. The modules make use of
libraries used in real high performance computing applications,
setting the stage for more advanced exercises in later courses
and/or industrial or academic research experiences for students.
ACKNOWLEDGMENT
The authors would like to thank the NSF-SFS award DUE-
Fig. 8. Map/Reduce diagram of the NetFlow data processing example. 1245744 for supporting this work.
962