Keywords

1 Introduction

The number of students exposed to the parallel programming is limited and even not all computer science students are obtaining necessary knowledge and skills. The traditional parallel and distributed programming courses focus often on very basic ideas and problems such as monitors, semaphores etc. and do not address novel high level solutions.

The limited mathematical background of the computer science students makes teaching of the parallel algorithms uneasy task. The difficulties introduced by the programming tools, libraries and job execution with the queueing system makes this task dramatically complex and uneasy. The parallel programming using the message passing model is difficult, the shared memory model is easier to learn but writing codes which scales well is not easy. There is quite potential in the PGAS languages but they are not widely popularized [12]. Development of a scalable parallel programming requires access to the large computational systems located usually in the HPC centers putting additional formal and technical barriers. In order to make teaching of the parallel programming easier, we have to remove as many barriers as possible. The students have to concentrate on the essence of the problem and should not spend time to fight with the not important peculiarities of the parallel computer its operating or queueing system. To make teaching more efficient we should focus on the parallel algorithms and programming paradigms rather then on the teaching programming with the particular library or language. At some point students will have to face technical problems, but they will have much better knowledge, will be better motivated and will be better prepared.

For this purposes, to minimize unnecessary student’s and teacher’s effort, we have created web solution based on the developed by us ZawodyWeb [9] system for on-line validation of the programs sent by the students. The system has been extended to support parallel programs written in different programming paradigms. With the help of UNICORE middleware, the ZawodyWeb system allows to run students problems on the large scale facilities available in the supercomputer centre. The added value is simple web interface which reduces all complexity of the large multiprocessor computers. The developed by us system has been verified during the parallel programming course for the undergraduate students from the computer science program at University of Warsaw and Cardinal Stefan Wyszyński University in Warsaw.

The paper is organized as follows: Sect. 2 provides motivation and briefly describes available solutions. Next sections provide details on the UNICORE middleware used to access large computational resources (Sect. 4) and on the ZawodyWeb on-line contest system (Sect. 3). Section 6 describes parallel languages and paradigms supported by the ZawodyWeb system with the extended description of the PCJ library for parallel programming in Java. Last two sections present results and conclusions.

2 Motivation

At the beginning of traditional learning loop, a teacher prepares suitable learning material. Then, during the lesson, he/she passes the knowledge to the students. Students can acquire knowledge in full, in part or not at all. Programming is practical activity and teaching should involved number of assignments realized in form of the supervised work in the laboratory or individual work of the student. All players would like to know whether the knowledge is well acquired. The traditional method to verify this is based on tests and exams. Unfortunately, this solution requires from students to be familiar with the material and checks only their knowledge. The other possibility to measure students’ skills is through assignments to be solved at home. Unfortunately, this requires significant effort to check assignments, especially in the computer science, where one task can be solved in many different ways.

Over the years, different kinds of programming contests became very popular. The most popular are: ACM International Collegiate Programming Contest (http://cm2prod.baylor.edu), Imagine Cup (http://imaginecup.com) or Top Coder (http://www.topcoder.com). For that purpose many contests hosting services emerged enabling remote validation of implemented solutions. Some of them are more oriented on algorithms and check if the solutions uses optimal one. Vast majority of the on-line system address validation single threaded problems. In few cases, like Potyczki Algorytmiczne (http://potyczki.mimuw.edu.pl/l/36/), the on-line system has been adopted to the parallel problems but limited to the shared memory programming with the small number of cores. Similar limitations has Mooshak system used for the Spanish Parallel Programming Contests [3]. None of known systems is addressing problem of the execution on the large parallel infrastructures nor provides estimates of scalability of the users solution.

3 ZawodyWeb System

To address problem of automatic validation of the computer programs in the students’ education we have developed the ZawodyWeb – a web contest system [9]. The main aim of the system is to make teaching of the programming more efficient and extend it to the work performed by the students themselves in the classroom and at home. The ZawodyWeb is open source software and is available on GitHub [2].

3.1 Overview

In the ZawodyWeb system all teacher and student activities are performed on-line using web browser. A teacher can create problem by entering its name, description, programming languages that can be used to solve the problem, test data (the inputs and outputs) and information about the accessibility. A student has access to the problem description and tries to solve it on the local machine, using the one of the listed programming languages. The student can check solution by himself preparing adequate input data and checking if program he/she wrote generates the same output. Next, the student can submit his solution to the ZawodyWeb system, which automatically validates it. Validation is performed through compilation, execution and then checking of the program results on the given input data sets. For the correct solution, the student receives points depends on the difficulty of the task and the correctness of the solution. When the output of one or more tests differs from those created by the teacher or program execution exceeds the time limit, the student gets that information and can improve his solution. Then he/she can check correctness of the new version. The details of the system can be found elsewhere [9], here we provide brief description.

3.2 Technical Details

The ZawodyWeb system is created in the Java language using the Spring Framework and JavaServer Faces libraries (Facelets, Richfaces, Restfaces, etc.). The system is hosted on the Apache/Tomcat web server. The PostgreSQL database is used to store data and Hibernate is used for mapping Java objects to the data The JudgeManager and Judges are written in Java. Architecture of the system is presented in the Fig. 1.

Fig. 1.
figure 1

The architecture of the ZawodyWeb system.

3.3 Functionality

Most of the ZawodyWeb system functionality is related to the organization of programming contests and to support on-line learning through assignments to be solved in the classroom or at home. System allows to define contests, programming tasks, scores and rankings. It offers different roles for students, teachers (task and contest creators) and administrators. The teacher can use the system for automatic check of the correctness of the solutions. Each student’s submission can be viewed on-line by the teacher, so he/she can respond quickly to a variety of problems arising. If the problem description requires advanced texts or drawings, which cannot be done using HTML, it is possible to display the contents of a PDF file attached to the problem. ZawodyWeb system allows students to ask the teacher questions related to the problems. Questions are sent to the teacher via e-mail and he/she can answer directly or post comments visible by all students in the proper section of the service. The teacher can block visibility of the student’s ranking. There is also possibility to block the ranking for last few minutes, to motivate students to compete and try to solve more problems than their colleagues.

System ZawodyWeb allows the teacher to download the students’ solutions that are visible in the ranking. This helps to catch solutions that are copied from other sources or dependent. Among useful features there is an ability to block submissions or hide the problems for computers outside defined IP range, e.g. from outside the computer lab. Within the system there is a dedicated class that allows performing operations that are not permitted in programming competitions. Among operations that are not tested are: parsing parameters passed to the program, error codes for the program, custom headers, opening and reading or writing files. The ZawodyWeb system has ability to provide additional parameters to compiler. Setting up additional parameters the teacher can force the compiler to check source code compatibility with the selected language standard. There is also an easy way to add new languages and new functionalities by creating a Java class that implements appropriate interface.

4 UNICORE

UNICORE is a grid middleware providing secure and seamless access to the distributed resources. It has been developed since 1997 and is now a software maintained by the dedicated community of developers. It has been successfully used in many scientific projects contributing significantly to the increase of the popularity and applications of distributed computing.

UNICORE middleware [1] builds upon a number of concepts, such as Service Oriented Architecture (SOA) and messaging [10]. The capabilities of a distributed system are organized into well-defined services. The UNICORE has typical 3-layer architecture covering target system infrastructure, middleware and user interfaces. For the end user there are different types of interfaces available: UNICORE Rich Client (URC), UNICORE Commandline Client (UCC) and High Level API (HiLA). UCC is a full-featured client for the UNICORE Grid middleware. It provides client commands for all the UNICORE basic services. In particular user can submit and manage jobs and input and output transfers.

UNICORE 7 provides a generic web services hosting environment. Services can be deployed into this environment, in order to benefit from its general features: persistence, security infrastructure, and so on. Since all services are hosted in the same environment, a common web service level security infrastructure is achieved.

The UNICORE security infrastructure [4] offers access control, centralized user and role management and basic transport level security. Communication security uses secure socket layer and transport layer security (SSL and TLS). Users and server components are identified using a X.509 certificate issued by a trusted certification authority.

5 ZawodyWeb Support for Parallel Computing

In order to support on-line evaluation of the parallel tasks we had to extend ZawodyWeb system with new features. In general, parallel tasks can be divided into two groups: (i) problems that use only one single computer equipped with multiple cores and (ii) problems that run on a clusters containing multiple nodes equipped with multicore processors.

Executing parallel tasks on one single computer (case (i)) is straightforward. A compilation and an execution can be done in the same way as for a serial tasks. For example, adding a -fopenmp flag for compiling solution using GCC compiler and running it with proper environmental variable set, is enough for solution using OpenMP. The OpenMP problems can be evaluated on relatively small resources which can be exclusively used by the ZawodyWeb.

More difficult case is the second one, i.e. executing parallel tasks on clusters especially production ones. In this case of ZawodyWeb system cannot submit jobs directly to the system but has to use queuing system. Because of the administrative restrictions, as well as due to various technical reasons, the ZawodyWeb system cannot be installed directly on the submit host of the cluster queueing system. Even if this is possible, the ZawodyWeb would be limited to a single cluster and dynamic change of the resources used to verify submitted solutions would not be easy. Therefore we have decided to use UNICORE as a layer between ZawodyWeb and queuing systems. With UNICORE, it is possible to run submitted programs on a different clusters, regardless what the underlaying queuing system is. This allows also to install and configure the ZawodyWeb only once, and use computing resources available in various UNICORE Sites registered within one registry. Use of the UNICORE simplifies significantly all problems related to the authorization and authentication required to get access to the large computational resources.

From the user’s point of view, sending solution for parallel tasks, even through UNICORE and queueing system is the same as for single threaded problems evaluated on the resources dedicated to the ZawodyWeb. When user submits problem solution to the ZawodyWeb, the web interface informs JudgeManager. It subsequently informs one of the Judges to prepare and submit proper job using UNICORE middleware. The created job contains script to compile and execute submitted solution on the defined test data. Once solution is submitted using UNICORE, the UNICORE job id (called End Point Reference, EPR) is stored in the ZawodyWeb database and the solution state is set to external-check state.

In order to support new functionality in the ZawdoyWeb system, in addition to the JudgeManager and Judges, the External-Checker judge was created. It periodically checks whether there are tasks with external-check state set. If so, it checks the UNICORE status of the job. The job with the FAILED status is further examined and a failure reason is stored in the ZawodyWeb database, so the user can know the possible cause of failure. In most cases, possible cause of failure is associated with the exceeded memory limit or exceeded time limit. When the job is finished with the SUCCESS status, the output returned by the user’s program is compared with the output prepared by the problem’s author and user’s submission is scored with an appropriate number of points. It can be possibly 0, when the outputs are different.

The integral part of the ZawodyWeb system is definition of the tests used to validate solutions submitted by the user. In order to support parallel execution we have added possibility to configure additional parameters both for the problem (i.e. for all tests) and for each test separately. The parameters are passed to the Judge and are used to construct job description submitted using UNICORE. Using this functionality user can define number of nodes used to run test and number of threads/copies started on each node. This allows to run tests with the different number of cores per node used.

The ZawodyWeb system has been also extended by the information about the execution time of the particular test. It is displayed together with the result of each test. The time measurement is approximate, especially for Java applications, but it provides information which allows user to estimate scalability of the submitted solution.

6 Supported Languages

The ZawodyWeb has been configured to support most popular parallel programming languages such as OpenMP, MPI and selected PGAS languages. In the first two cases students can submit solutions written in C/C++, for the PGAS we have chosen PCJ library which allows to develop parallel applications in Java.

6.1 OpenMP

OpenMP (Open Multi-Processing) is an API that supports multi-platform shared memory multiprocessing programming [5]. It uses a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. The runtime environment assigns the number of threads based on environment variables, in particular OMP_NUM_THREADS.

6.2 MPI

MPI through years has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. It is a language-independent communications protocol which supports both point-to-point and collective communication. MPI is a message-passing application programmer interface, together with protocol and semantic specifications [7]. The execution of the parallel program is usually realized through mpirun (or mpiexec) command which starts parallel execution on the given set of processors. The detailed number of processors used as well as their configuration is passed through files and parameters to the mpirun command.

6.3 PCJ

The PCJ library is Java library providing parallel programming in the PGAS model. In the PCJ, each task has its own local memory. PCJ stores and access variables only locally. Some variables can be shared between tasks. Shared variables can be accessed, read and modified by other tasks.

Each task can access other tasks variables that are stored in a shared memory. Besides that, shareable variable has to have a special annotation @Shared. The library provides methods to perform basic operations like synchronization of tasks, get and put values in asynchronous one-sided way. Additionally, the library offers methods for creating groups of tasks, broadcasting and monitoring variables. The PCJ library fully complies with Java standards therefore the programmer does not have to use additional libraries, which are not part of the standard Java distribution. In particular, it can use, implemented in Java SE 7, Sockets Direct Protocol (SDP), which can increase network performance over infiniband connections.

In the PCJ one instance of JVM is understood as node. PCJ can run on a single multicore node. One node can hold many tasks – separated instances of PCJ thread that run calculations. This design is aligned with novel computer architectures containing hundreds or thousands of nodes, each of them built of several or even more cores. This forces us to use different communication mechanism for inter- and intranode communication.

In the PCJ there is one node called Manager. It is responsible for setting unique identifiers to the tasks, sending messages to other tasks to start calculations, creating groups and synchronizing all tasks in calculations. The Manager node has its own tasks and can execute parallel programs.

In the PCJ, there is possibility to assign tasks into groups. Groups can be used for simplify collective operations like broadcast or synchronize [8].

Each node has its own, unique for whole calculations, identifier. That node is called physical node id or node id in short. All nodes are connected to each other and that connection is accomplished before starting a calculation. At this stage, nodes are exchanging their physical node ids.

The application using PCJ library is run as a Java application using Java Virtual Machine (JVM). In the multinode environment one (or more) JVM has to be started on each node. The PCJ library takes care on this process and allows user to start execution on multiple nodes, running multiple threads on each node. The number of nodes and threads can be easily configured, however the most reasonable choice is to limit on each node number of threads to the number of available cores. Typically, single Java Virtual Machine is run on each physical node.

The communication between different PCJ threads is realized in different manners. If communicating PCJ threads run within the same JVM, the Java concurrency mechanisms can be used to synchronize and exchange information. If data exchange has to be realized between different JVM’s the network communication using, for example, sockets is performed.

Tasks can exchange data in asynchronous way. Sending a value to other task storage is performed using the put method. The get method is used for getting value from other task storage. In these two methods, the other task is nonblocking when process puts or gets message, but the task which initiated exchange process, blocks. There is the getFutureObject method that works in fully nonblocking manner – the initializing task can check if the response is received and in meantime do other calculations. There is also the broadcast method, which also works in asynchronous way. In the broadcast, all nodes are putting broadcasted value into their storage. The broadcast message is sent using tree structure of nodes.

7 Results

The ZawodyWeb system extended with the checking of parallel solutions has been installed at ICM University of Warsaw to support training of the HPC centre users as well as students taking parallel computing course. The web interface allows to verify solutions on the production infrastructure, in particular heterogenous PC cluster hydra with Intel and AMD processors. Cluster partition, that consist of nodes with Intel(R) Xeon(R) CPU X5660 (Westmere-EP) processors, is connected with Infiniband QDR and 1 Gb Ethernet. Each board is equipped with 2 processors Intel Xeon 2.8 GHz processors with 6 cores each. Each node is equipped with 24 GB of memory. The cluster has also number of 4-processor nodes with the AMD Opteron (TM) Processors 6272 (Interlagos) (16 cores each) connected with 10 Gb Ethernet. The job submission to the cluster is managed using SLURM queueing system.

Most of the jobs submitted through ZawodyWeb system is relatively short and in order to maximize turnover we have created reservation of the resources dedicated to the on-line system. The reservation was possible due to our recent developments which improve access to the infrastructure [6].

The reservation is limited to the 2 nodes (in total 24 cores) which allows for the execution of the tested parallel probems up to 24 cores. The ZawodyWeb still allows to submit larger problems to the rest of the system using queues available to all users. The proposed set up can be dynamically changed, and the reservation can be extended in size or limit for the particular period of time if required.

As described in the Sect. 5, the system allows to configure execution of each test or all tests for the particular problem. This is done by the setting up number of environment variables which are passed to the UNICORE job description and than use to modify execution. Example set of variables to modify JVM execution, use of the reservation and executing test on 2 nodes with 12 threads running on each node is presented in the Listing 1.

figure a

7.1 Practical Evaluation

The system has been used to support parallel programing course for undergraduate students from the computer science course at University of Warsaw (selected students who finished 2nd year of study) and Cardinal Stefan Wyszyński University in Warsaw (3rd year students). The student registration in the system was performed in the computer lab during lectures together with supervised solution of the training set of problems. Students had no problem with the access and use of the system. Moreover, due to the lack of the ssh access, hidden system specificity and no need to play with the queueing system, they were able to submit solutions and execute them on the system in a short time. The goal was to obtain maximal number of points for two simple projects (printing “Hello world” string from the different parallel threads). Students were able to finish this using no more than 7 attempts. Some of them obtained correct result with single submission, average number was 3.2 submission. The PCJ was used as the parallel programming paradigm.

As the next session students had to implement 3 simple problems: broadcast, reduction, loop parallelization. All examples had been explained during the lectures and code outline using Java with PCJ has been provided to the students. In this case number of attempts was smaller (maximum 4), however half of the students dropped out after first unsuccessful attempt and did not finished the task.

Last, more advanced session was to solve 4 problems: reduction using communication on the ring, finding smallest element in the dataset, evaluation of the dot product and two-dimensional stencil calculations. The first 3 problems has been solved by the students with a couple of attempts (maximum 6). The time between first and last submission was usually a couple of hours, not more than 2 days. Students have been using Java with PCJ to implement solutions, this problem set has been solved only by the students who already solved simple problems.

The two-dimensional stencil problem was not solved, but this was expected since students were provided only short introduction to the parallel computing (90 min lecture) and solving example problems (60 min) followed by practical introduction to the ZawodyWeb system (30 min).

Taking into account short introduction provided to the students during traditional lectures, the ZawodyWeb system allows student to learn parallel programming in practice. This part of the lecture can be performed on-line, in the time chosen by the student. The observed students’ drop-of is characteristic to any on-line activity and has source in the student’s self-motivation [11].

8 Conclusions

In this paper we have presented ZawodyWeb – the web solution for on-line validation of the programs sent by the students. The system has been extended to support parallel programs written in different programming paradigms. In result, the ZawodyWeb system allows to run students problems on the large scale facilities available in the supercomputer centre.

The system has been used to support teaching of the computer science students on the undergraduate level. The use of ZawodyWeb allowed us to grant students with the possibility to execute problems on the production systems with the thousands of cores. The same time we were able to hide all complexity of the large multiprocessor computers. The detailed feedback from the students is being gathered and analyzed.

We still have plans to extend system. First of all, we would like to reduce turnaround time, i.e. time required to provide submission evaluation to the student. Currently we observe quite large overhead caused by the both UNICORE and queueing systems.

We also plan to change authorization and authentication mechanism. Currently ZawodyWeb system uses dedicated account to submit jobs. We would like to use student’s accounts for these purposes. This scenario is possible now, but user would be able to get access to the directory where input and output files are stored. Therefore user would be able to see tests input as well as correct answer. Currently this is not possible and requires modifications in the UNICORE framework allowing to track where job is submitted from.