Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

MUST#

This guide describes the steps to install and basic usage of MUST, an MPI correctness checker. It has to be built for the combination of compiler and MPI library your application uses.

Installation#

This will install MUST v1.9.0 into $WORK/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0. For the latest versions check the MUST website and adjust the version numbers accordingly.

Run the following commands on the cluster frontend where you intend to use MUST.

  1. Loading all relevant modules:

    1. Remove all loaded modules and make modules available that came via Spack (needed for libxml2)
      module purge
      module add 000-all-spack-pkgs
      
    2. Load modules required for building Spack:
      module add cmake libxml2
      
    3. Load compiler and MPI you want to build MUST for. Here as an example Open MPI 4.1.2 with GCC 11.2.0 is chosen.
      module add openmpi/4.1.2-gcc11.2.0
      
  2. Download MUST:

    wget https://hpc.rwth-aachen.de/must/files/MUST-v1.9.0.tar.gz
    

  3. Optional: Verify SHA512 hash.

    wget https://hpc.rwth-aachen.de/must/files/SHA512SUM
    sha512sum -c --ignore-missing SHA512SUM
    # should print: MUST-v1.9.0.tar.gz: OK
    

  4. Unpack archive, create directory for building (build) and change into that directory:

    tar xzf MUST-v1.9.0.tar.gz
    mkdir build && cd build
    

  5. Compile MUST with CMake:

    cmake ../MUST-v1.9.0 \
      -DCMAKE_INSTALL_PREFIX="$WORK/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0" \
      -DCMAKE_BUILD_TYPE=Release
    

    To adjust the installation path, change the path specified with -DCMAKE_INSTALL_PREFIX, e.g. -DCMAKE_INSTALL_PREFIX="$WORK/some/other/paths/must"

  6. Install:

    make -j install install-prebuilds
    

The following shows the installation procedure as one code snippet.

  • Adjust the variable MUST_VERSION= to the version you want to install.
  • Adjust the compiler and MPI modules to the once your application uses.
  • If needed you can adjust the installation path of MUST by changing the path in -DCMAKE_INSTALL_PREFIX=....
MUST_VERSION=v1.9.0

module purge
module add 000-all-spack-pkgs

module add cmake libxml2

# Load compiler and MPI you want to build MUST for.
# As an example Open MPI 4.1.2 and GCC 11.2.0 is used:
module add openmpi/4.1.2-gcc11.2.0

wget https://hpc.rwth-aachen.de/must/files/MUST-${MUST_VESION}.tar.gz

wget https://hpc.rwth-aachen.de/must/files/SHA512SUM
sha512sum -c --ignore-missing SHA512SUM
# should print: MUST-v1.9.0.tar.gz: OK

tar xzf MUST-${MUST_VERSION}.tar.gz
mkdir build && cd build

# If needed, adjust the installation path specified after -DCMAKE_INSTALL_PREFIX=.
cmake ../MUST-${MUST_VERSION} \
  -DCMAKE_INSTALL_PREFIX="$WORK/apps/must-${MUST_VERSION}-fritz-openmpi-4.1.2-gcc11.2.0" \
  -DCMAKE_BUILD_TYPE=Release

make -j install install-prebuilds

Optional: Create module file#

Optionally you can create a module file for easier loading and handling dependencies like compiler and MPI when using MUST.

Create a file, e.g. $HOME/modules/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0, with the following content.

Adapt the content below the lines marked with TO CHANGE.

Module file that loads depended compiler/MPI and sets up environment for MUST
#%Module

proc ModulesHelp { } {
        puts stderr "\tSets up the environment for MUST v1.9.0\n"
}

module-whatis   "sets up the environment for MUST v1.9.0"

# TO CHANGE: Load here compiler and MPI used to build MUST for.
#            Use the modules from installation step 3.
module load openmpi/4.1.2-gcc11.2.0

# TO CHANGE: specify the path where you installed MUST
set          pkghome    "$::env(WORK)/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0"

setenv       MUST_ROOT ${pkghome}
prepend-path PATH  $pkghome/bin
prepend-path INCLUDE  $pkghome/include
prepend-path LD_LIBRARY_PATH  $pkghome/lib
prepend-path LD_LIBRARY_PATH  $pkghome/lib64
prepend-path CMAKE_PREFIX_PATH  $pkghome

Basic usage#

Instead of mpirun or srun the MUST specific mustrun binary is used to launch your application. The following steps describe how to make mustrun accessible and use it for launching your application.

To allow MUST to show stack traces for detected issues you have to compile your application with debug symbols.

Preparation#

Depending on if you created module file for MUST, on the compute node you can:

  • With module file: add MUST to your environment by running:

    module add <path to the module file>
    

    For example, this will add the example module that loads MUST, Open MPI, and GCC:

    module add $HOME/modules/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0
    

  • Without a module file: add the bin directory under the MUST installation path to the PATH environment variable by:

    export PATH=<MUST installation path>/bin:$PATH
    

Launching your application#

For starting your application under MUST the launcher you normally use, e.g. srun or mpirun, must be replaced by mustrun.
Furthermore mustrun must be provided the name of the original launcher and the flag that specifies how many processes will be launched. Such information is specified for mustrun via --must:... flags.

MUST has different execution modes which introduce more or less overhead at the price of being more or less robust in case of exceptions.

MUST also starts at least one additional process.

We start with the default execution mode that is only useful for small short-running applications. Assume we have a simple command line and rework it to use mustrun:

Original command line:

srun -n 4 ./mpi arg1 arg2

Command line with mustrun as launcher:

mustrun --must:mpiexec srun --must:np -n -n 4 ./mpi arg1 arg2

  • Original launcher srun was replaced with mustrun.
  • The flag --must:mpiexec followed name of the original launcher informs MUST of the launcher to use.
  • The flag --must:np -n tells MUST that the flag for specifying the no. of processes to launch is named -n.
  • The second -n flag is then passed to the original launcher.

This will instrument the application and launch it. At the end issues MUST found will be found in MUST_Output.html.

Adjusting the number of processes to launch#

Remember that MUST will start at least one additional process. This processes are created, by increasing the number of processes to launch that is passed to the original launcher, e.g. srun or mpirun.

In case there is a limit of the number of processes that can be started per node you have to reduce the number specified via -n or -np flag accordingly to account for the additional process(es). The number of processes to account for depends on the execution mode of MUST.

If you hit such the limit of processes per node/job, the message you typically will see looks like:

  • for Open MPI's mpirun:
    There are not enough slots available in the system to satisfy the ABC
    slots that were requested by the application:
    ...
    
  • for Slurm's srun:
    srun: error: Unable to create step for job ABCDEFG: More processors requested than permitted
    
  • for Intel MPI's mpiexec/mpiexec.hydra: TODO: insert error message text

Options#

Flag Description
--must:mpiexec <launcher> Specifies that the launcher <launcher> is used.
--must:np <flag> Tells MUST that flag <flag> is used to specify the no. of processes to launch.
--must:verbose Show more details.
--must:nodesize <np> Number of processes per node(?)

Advanced usage#

MUST provides different execution modes that differ in

  • the overhead they introduce
  • the flags required
  • the number of processes additionally created
  • whether they can deal with an application crash
  • the number of application processes they target

In the following table N denotes the number of processes launch by specifying -n N or -np N.

name flags MUST #proc. app. can crash targeted #proc. note
A 1 yes <32
B --must:nocrash 1 no <100
C --must:nodesize Y 1+floor(N/(Y-1)) yes <100 1
D --must:distributed [--must:fanin Z] > N/Z no tested with 16384 2
D --must:distributed --must:nodesize Y [--must:fanin Z] > N/Y yes tested with 4096 1+2

Notes:

  1. Requires shared memory communication.
  2. Add flag --must:nodl to disable distributed dead lock detection to reduce overhead.

For more details consult the MUST documentation chapter 5 as found on the MUST website.