Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Compiling Programs

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

COMPILING, BUILDING, AND

INSTALLING PROGRAMS ON
THE CLUSTER
BUILDING COMPUTER PROGRAMS

•  The process of conver-ng a human-readable file to a machine-readable file.

C program (simple text file wri9en Binary executable file (a set of CPU
in C programming language) instruc-ons encoded in 0’s and 1’s)
Hello World!
0101010100001010101001
#include <stdio.h>
0101110101010010100000

0101110101001011000111
int main()
0110101010101010101010
{
1010001010101010111111
prin9(“Hello World!\n”);
0010111101110000111010
return 0;
1010100111101010111101
}
0101010000110101010101

Sophis-cated programs (e.g. a compiler) are


used to perform this mul--step conversion.
THE BUILD PROCESS
Human-readable
Higher-level Source code
language (e.g. C program)

Preprocessor
Expanded source code

Compiler
Assembly code Not all languages are compiled
languages! The process to the
Assembler leI applies to programs
External wri9en in C, C++, and Fortran.
Object code
libraries
Linker
Lower-level
Binary executable Machine-readable (i.e.
language
can be executed by CPU)
PREPROCESSOR

In C, preprocessor direc-ves
•  Expands or removes special lines of code prior to compila-on. begin with the # symbol and are
NOT considered C code.

Include statements: Define statements: Header guards:

.
. . .
. . #ifndef FOO_H
#include <stdio.h> #define PI 3.1415 #define FOO_H
. . #include “myHeader.h”
. . void myFunc(int);
#endif
.
.
•  Copies contents of stdio.h •  Replaces all instances of PI •  Prevents expanding mul-ple
into file. within file with 3.1415. copies of the same header file
by defining a unique “macro”
for each header file.
COMPILER

•  Converts expanded source code to assembly code.

.
#include <stdio.h> .
main:
int main() .cfi_startproc Portability is an issue with
{ pushq %rbp compiled languages since
prin9(“Hello World!\n”); .cfi_def_cfa_offset 16 assembly language contains
return 0; movq %rsp %rbp
instruc-ons that are specific to
} .
. a CPU’s architecture.

•  Assembly-level instruc-ons are specific to a processor’s Instruc-on Set Architecture (ISA).

•  Example ISAs are x86, x86_64, and ARM. Most machines in HPC today support x86_64.
ASSEMBLER AND LINKER

•  Assembler: converts assembly code to object code.

•  Object code is in a binary format but cannot be executed by a computer’s OS.


•  External libraries are oIen distributed as shared object files that are object code.
•  Hides specific implementa-on since these files are not human readable.
•  No need to be recompiled for each applica-on that uses the library.
•  Stored efficiently in binary format.

•  Linker: s-ches together all object files (including any external libraries) into the final binary executable file.

•  Many applica-ons oIen contain mul-ple source files, each Object


of which need to be included in the final executable binary. File 2
•  The job of the linker is to combine all these object files Object Ext Lib
together into a final executable binary (a.k.a. “executable” File 1 A
or “binary”) that can be run.
Executable
USING COMPILERS ON THE CLUSTER (1/3)
IMPORTANT NOTE: In prac-ce, the steps performed by the preprocessor, compiler, assembler, and linker are
generally obscured from the user into a single step using (in Linux) a single command. In the next several slides, we
will refer to this single command as a compiler, but note that we’re actually talking about a tool that is a
preprocessor + compiler + assembler + linker.

•  GCC: GNU Compiler Collec<on


•  Free and open source
•  Most widely used set of compilers in Linux
•  C compiler: gcc
•  C++ compiler: g++
•  Fortran compiler: gfortran

•  Intel Compiler Suite


•  Licensed and closed source, but ACCRE
purchases a license
•  OIen produces faster binaries than GCC
•  Occasionally more difficult to build code
due to lack of community tes-ng
•  C compiler: icc
•  C++ compiler: icpc
•  Fortran compiler: ifort
USING COMPILERS ON THE CLUSTER (2/3)
gcc hello.c
•  Builds C program with the GCC C compiler.
•  Produces a binary called a.out that can be run by typing ./a.out

gcc –o hello hello.c


•  Produces a binary called hello that can be run by typing ./hello

Error messages result when the build


process fails. The compiler should provide
details about why the build failed.

Warning messages occur when a


program’s syntax is not 100% clear to the
compiler, but it makes an assump-on
and con-nues the build process.
USING COMPILERS ON THE CLUSTER (3/3)
gcc –o hello -Wall hello.c gcc –E hello.c
•  -Wall will show all warning messages •  Show expanded source code

gcc –o hello -g hello.c gcc –S hello.c


•  -g will build the binary with debug symbols •  Create assembly file called hello.s

gcc –o hello –O3 hello.c gcc –c hello.c


•  -O3 will build the binary with level 3 op-miza-ons •  Create object file called hello.o
•  Levels 0 to 3 (most aggressive) available
•  Can lead to faster execu-on -mes
Vectorized loop execu-on is enabled
•  Default is –O0 in GCC and –O2 in Intel suite with –O3 for GCC and –O2 for Intel.

icc –o hello –xHost hello.c Using the –xHost op-on leads to poor
binary portability. Only use this op-on if
•  Use Intel’s C compiler to aggressively op-mize for the specific CPU
you are sure the binary will always be
microarchitecture
executed on a specific processor type.
EXTERNAL LIBRARIES (1/2)

•  Sta<cally Linked Library: naming conven-on: liblibraryname.a (e.g. libcurl.a is a sta-c curl library)

•  Linker copies all library rou-nes into the final executable.


•  Requires more memory and disk space than dynamic linking.
•  More portable because the library does not need to be available at run-me.

•  Dynamically Linked Library: naming conven-on: liblibraryname.so (e.g. libcurl.so is a dynamic curl library)

•  Only the name of the library copied into the final executable, not any actual code.
•  At run-me, the executable searches the LD_LIBRARY_PATH and standard path for the library.
•  Requires less memory and disk space; mul-ple binaries can share the same dynamically linked library at once.
•  By default, a linker looks for a dynamic library rather than a sta-c one.

•  Do NOT need to specify the loca<on of a library at build <me if it’s in a standard loca<on (/lib64, /usr/lib64, /
lib, /usr/lib). For example, libc.so lives in /lib64.
EXTERNAL LIBRARIES (2/2)

•  Linking to libraries in non-standard loca<ons requires the following informa<on at build-<me:

•  Name of library (specified with –llibraryname flag)


•  Loca-on of library (specified with –L/path/to/non/standard/loca-on/lib)
•  Loca-on of header files (specified with –I/path/to/non/standard/loca-on/include)

gcc –L/usr/local/gsl/latest/x86_64/gcc46/nonet/lib –I/usr/local/gsl/latest/x86_64/


gcc46/nonet/include –lgsl –lgslcblas bessel.c –Wall –O3 –o calc_bessel

•  In this example, two libraries (gsl and gslcblas) are linked to the final executable.
•  Alterna-vely, use LIBRARY_PATH and C_INCLUDE_PATH to specify loca-ons of libraries and headers.

•  Check the LD_LIBRARY_PATH and output of the ldd command before running the program:

•  LD_LIBRARY_PATH shows list of directories that linker searches for dynamically linked libraries
•  Run ldd ./my_prog to see the dynamically linked libraries needed by an executable and the current path
to each library
PORTABILITY

Can I build an executable on computer A and run it on computer B?

It depends! Are the pla/orms the same?


Support for specific vectoriza-on
•  CPU instruc-on set architecture (e.g. x86_64) extensions is also required for
•  Opera-ng system portability. For example, you cannot
•  External libraries build a program with AVX2 on plaoorm A
and run it on plaoorm B if AVX2 is not
supported by plaoorm B!
Pla/orm
•  This is why you oIen see different installers for different opera-ng systems – the installer is simply copying a
pre-built binary to your machine!
•  Different CPU architectures are present on the cluster, so be sure to compile without overly aggressive
op-miza-ons or specify the target CPU architecture/family in your SLURM script
(e.g. #SBATCH --constrain=haswell)
OTHER COMPILER FUN FACTS

•  Many different compilers exist but not all compilers are created equal!

•  GCC, Intel, AbsoI, Portland Group (PGI), MicrosoI Visual Studio (MSVS), to name a few.
•  Some are free, others are not!
•  It is not unusual (especially with large projects) for compiler A to build a program while compiler B fails.
•  Error messages and levels of verbosity can also vary widely.

•  Performance of program can be very compiler-dependent!

•  This is especially true in scien-fic and high-performance compu-ng involving a lot of numerical processing.
•  Compiler op-miza-ons are especially tricky, some-mes the compiler needs help from the programmer (e.g.
re-factoring code so the compiler can make easier/safer decisions about when to op-mize code).
•  Some compilers (especially Intel’s) tend to outperform their counterparts because they have more in-mate/
nuanced informa-on about a CPU’s architecture (which are oIen Intel-based!).
AUTOMATING THE PROCESS: MAKEFILES (1/3)
Automating the build process
•  The Make tool allows a programmer to define the dependencies between sets of files in
programming project, and sets of rules for how to (most o_en) build the project.
•  make)u2lity))
•  Default file is called Makefile or makefile. –  Provides)a)way)for)separate)compila2on))
–  Describe)the)dependencies)among)the)project)files))
•  Allows build process to be broken up into discreet steps, if desired. For example, separate rules can be
defined for (i) compiling+assembling, (ii) linking, (iii) tes-ng, and (iv) installing code.
–  Default)file)to)look)for)is)makefile-or)Makefile)
•  Make analyzes the -mestamps of a target and that target’s dependencies to decide whether to execute a

compiler) assembler) linker)

project1.o*
By defining dependencies, you can project1.c* .c executable*
avoid unnecessarily rebuilding certain .o
files. For example, in the example on
the right, project2.c does not need to common.h* .h
be re-compiled if changes have been project2.o*
made to project1.c. .o
project2.c* .c
AUTOMATING THE PROCESS: MAKEFILES (2/3)

•  Make analyzes the <mestamp of a target’s last modifica<on and compares it to that of the target’s
dependencies to decide whether to execute the command(s) defined for that target’s rule.

Makefile Template Example Makefile (see previous slide)


target: dependencies # rule executable: project1.o proect2.o
<tab> command1 # shell command gcc –o executable project1.o project2.o
<tab> command2 # shell command
. project1.o: project1.c common.h
. gcc –c project1.c # generates project1.o

project2.o: roject2.c common.h
•  A “target” is a label/iden-fier for a rule gcc –c project2.c # generates project2.o
•  OIen the target is either the name of a
file or a conven-onal rule (e.g. “install”) •  There are oIen mul-ple rules defined
•  Dependencies are files that the target per Makefile
depend on •  By just typing “make”, the first rule in the
•  Commands must be preceded by a tab file will be executed
AUTOMATING THE PROCESS: MAKEFILES (3/3)

•  No-ce that Make is smart enough to not


rebuild the program if no files have been
modified since our last build.
•  Make is also smart enough to only re-
compile project2.c when it has been
changed but project1.c has not.

To learn more about Makefiles, check out


the following tutorial:
h9ps://swcarpentry.github.io/make-novice/

make make clean


•  Generally builds the en-re project. •  Deletes intermediate build files to start the build process from scratch.

make test make install “make install” generally fails with “permission
denied” errors if you do not have administra-ve
•  Generally runs unit tests. •  Generally installs the soIware. privileges or have not configured the build to
install into a local directory.
AUTOMATING THE PROCESS: CONFIGURE SCRIPTS (1/2)
•  A configure script is an executable file responsible for building a Makefile for a project.

•  Determining the dependencies on a given system is difficult to predict and subject to constant change –
wri-ng a Makefile by hand for each system (or even a subset of representa-ve systems) would be an
enormous challenge and an administra-ve hassle.
•  Instead, a configure script can be used to scan a system in search of all the needed dependencies (including
versions of soIware, loca-ons of external libraries), and build a Makefile that is specific to that system.
•  Configure scripts are indispensible for large projects especially where the number of dependencies is large
and difficult to manage/track.
•  Alterna-ves to the configure script exist (cmake being the most common).

./configure ./configure -–prefix=/my/local/dir


make make
make test make test
make install make install
•  Building projects on Linux at -mes this simple. •  --prefix op-on needed if installing in home
•  Run only if you have administra-ve rights on system. directory on the cluster.
AUTOMATING THE PROCESS: CONFIGURE SCRIPTS (2/2)
•  Many configure scripts support a number of different op<ons for configuring your build.

./configure --help
•  Show command line op-ons.
MAKE AND CONFIGURE MACROS
•  There are a number of “macros” (think of as variables) that have standard meanings in Make and configure
scripts. These macros can generally be exported as environment variables to customize your build.

CC LDFLAGS
•  C compiler command (e.g. gcc) •  Linker flags (e.g. –L/path/to/lib)

CFLAGS LIBS
•  C compiler flags (e.g. –Wall –O3) •  Library names (e.g. –lcurl)

CPP FC
•  C preprocessor command (e.g. gcc) •  Fortran compiler command (e.g. gfortran)

CXX FFLAGS
•  C++ compiler command (e.g. g++) •  Fortran compiler flags (e.g. –O3)

CXXFLAGS MPICC
•  C++ compiler flags (e.g. –Wall –O3) •  MPI C compiler wrapper command (e.g. mpicc)
COMPILED VS. INTERPRETED LANGUAGES

What about interpreted languages?

Compiled Language
•  Faster execu-on -me The tradeoffs listed to the leI
•  Slower development -me are not universally true but in
•  Less portable general apply.
•  C, C++, Fortran

Interpreted Language
•  Slower execu-on -me Many popular modules/packages (e.g. NumPy,
•  Faster development -me SciPy) loaded from interpreted languages are
•  More portable compiled shared object files and offer
•  Python, Matlab, R, Ruby, Julia comparable performance to pure compiled
languages.

You might also like