MUST#
This guide describes the steps to install and basic usage of MUST, an MPI correctness checker. It has to be built for the combination of compiler and MPI library your application uses.
Installation#
This will install MUST v1.9.0 into
$WORK/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0
. For the latest versions check the MUST website and adjust the version numbers accordingly.
Run the following commands on the cluster frontend where you intend to use MUST.
-
Loading all relevant modules:
- Remove all loaded modules and make modules available that came via Spack (needed for
libxml2
)
- Load modules required for building Spack:
- Load compiler and MPI you want to build MUST for. Here as an example Open MPI 4.1.2 with GCC 11.2.0 is chosen.
- Remove all loaded modules and make modules available that came via Spack (needed for
-
Download MUST:
-
Optional: Verify SHA512 hash.
-
Unpack archive, create directory for building (
build
) and change into that directory:
-
Compile MUST with CMake:
cmake ../MUST-v1.9.0 \ -DCMAKE_INSTALL_PREFIX="$WORK/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0" \ -DCMAKE_BUILD_TYPE=Release
To adjust the installation path, change the path specified with
-DCMAKE_INSTALL_PREFIX
, e.g.-DCMAKE_INSTALL_PREFIX="$WORK/some/other/paths/must"
-
Install:
The following shows the installation procedure as one code snippet.
- Adjust the variable
MUST_VERSION=
to the version you want to install. - Adjust the compiler and MPI modules to the once your application uses.
- If needed you can adjust the installation path of MUST by changing the path in
-DCMAKE_INSTALL_PREFIX=...
.
MUST_VERSION=v1.9.0
module purge
module add 000-all-spack-pkgs
module add cmake libxml2
# Load compiler and MPI you want to build MUST for.
# As an example Open MPI 4.1.2 and GCC 11.2.0 is used:
module add openmpi/4.1.2-gcc11.2.0
wget https://hpc.rwth-aachen.de/must/files/MUST-${MUST_VESION}.tar.gz
wget https://hpc.rwth-aachen.de/must/files/SHA512SUM
sha512sum -c --ignore-missing SHA512SUM
# should print: MUST-v1.9.0.tar.gz: OK
tar xzf MUST-${MUST_VERSION}.tar.gz
mkdir build && cd build
# If needed, adjust the installation path specified after -DCMAKE_INSTALL_PREFIX=.
cmake ../MUST-${MUST_VERSION} \
-DCMAKE_INSTALL_PREFIX="$WORK/apps/must-${MUST_VERSION}-fritz-openmpi-4.1.2-gcc11.2.0" \
-DCMAKE_BUILD_TYPE=Release
make -j install install-prebuilds
Optional: Create module file#
Optionally you can create a module file for easier loading and handling dependencies like compiler and MPI when using MUST.
Create a file, e.g. $HOME/modules/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0
, with the following content.
Adapt the content below the lines marked with TO CHANGE
.
#%Module
proc ModulesHelp { } {
puts stderr "\tSets up the environment for MUST v1.9.0\n"
}
module-whatis "sets up the environment for MUST v1.9.0"
# TO CHANGE: Load here compiler and MPI used to build MUST for.
# Use the modules from installation step 3.
module load openmpi/4.1.2-gcc11.2.0
# TO CHANGE: specify the path where you installed MUST
set pkghome "$::env(WORK)/apps/must-v1.9.0-fritz-openmpi-4.1.2-gcc11.2.0"
setenv MUST_ROOT ${pkghome}
prepend-path PATH $pkghome/bin
prepend-path INCLUDE $pkghome/include
prepend-path LD_LIBRARY_PATH $pkghome/lib
prepend-path LD_LIBRARY_PATH $pkghome/lib64
prepend-path CMAKE_PREFIX_PATH $pkghome
Basic usage#
Instead of mpirun
or srun
the MUST specific mustrun
binary is used to launch your application. The following steps describe how to make mustrun
accessible and use it for launching your application.
To allow MUST to show stack traces for detected issues you have to compile your application with debug symbols.
Preparation#
Depending on if you created module file for MUST, on the compute node you can:
-
With module file: add MUST to your environment by running:
For example, this will add the example module that loads MUST, Open MPI, and GCC:
-
Without a module file: add the
bin
directory under the MUST installation path to thePATH
environment variable by:
Launching your application#
For starting your application under MUST the launcher you normally use, e.g. srun
or mpirun
, must be replaced by mustrun
.
Furthermore mustrun
must be provided the name of the original launcher and the flag that specifies how many processes will be launched.
Such information is specified for mustrun
via --must:...
flags.
MUST has different execution modes which introduce more or less overhead at the price of being more or less robust in case of exceptions.
MUST also starts at least one additional process.
We start with the default execution mode that is only useful for small short-running applications. Assume we have a simple command line and rework it to use mustrun
:
Original command line:
Command line with mustrun
as launcher:
- Original launcher
srun
was replaced withmustrun
. - The flag
--must:mpiexec
followed name of the original launcher informs MUST of the launcher to use. - The flag
--must:np -n
tells MUST that the flag for specifying the no. of processes to launch is named-n
. - The second
-n
flag is then passed to the original launcher.
This will instrument the application and launch it. At the end issues MUST found will be found in MUST_Output.html
.
Adjusting the number of processes to launch#
Remember that MUST will start at least one additional process.
This processes are created, by increasing the number of processes to launch that is passed to the original launcher, e.g. srun
or mpirun
.
In case there is a limit of the number of processes that can be started per node you have to reduce the number specified via -n
or -np
flag accordingly to account for the additional process(es). The number of processes to account for depends on the execution mode of MUST.
If you hit such the limit of processes per node/job, the message you typically will see looks like:
- for Open MPI's
mpirun
: - for Slurm's
srun
: - for Intel MPI's
mpiexec
/mpiexec.hydra
: TODO: insert error message text
Options#
Flag | Description |
---|---|
--must:mpiexec <launcher> |
Specifies that the launcher <launcher> is used. |
--must:np <flag> |
Tells MUST that flag <flag> is used to specify the no. of processes to launch. |
--must:verbose |
Show more details. |
--must:nodesize <np> |
Number of processes per node(?) |
Advanced usage#
MUST provides different execution modes that differ in
- the overhead they introduce
- the flags required
- the number of processes additionally created
- whether they can deal with an application crash
- the number of application processes they target
In the following table N
denotes the number of processes launch by specifying -n N
or -np N
.
name | flags | MUST #proc. | app. can crash | targeted #proc. | note |
---|---|---|---|---|---|
A | 1 | yes | <32 | ||
B | --must:nocrash |
1 | no | <100 | |
C | --must:nodesize Y |
1+floor(N/(Y-1)) | yes | <100 | 1 |
D | --must:distributed [--must:fanin Z ] |
> N/Z | no | tested with 16384 | 2 |
D | --must:distributed --must:nodesize Y [--must:fanin Z ] |
> N/Y | yes | tested with 4096 | 1+2 |
Notes:
- Requires shared memory communication.
- Add flag
--must:nodl
to disable distributed dead lock detection to reduce overhead.
For more details consult the MUST documentation chapter 5 as found on the MUST website.