Shake Manual
Shake is a Haskell library for writing build systems – designed as a replacement for make
. This document describes how to get started with Shake, assuming no prior Haskell knowledge. First, let's take a look at a Shake build system:
import Development.Shake import Development.Shake.Command import Development.Shake.FilePath import Development.Shake.Util main :: IO () main = shakeArgs shakeOptions{shakeFiles="_build"} $ do want ["_build/run" <.> exe] phony "clean" $ do putInfo "Cleaning files in _build" removeFilesAfter "_build" ["//*"] "_build/run" <.> exe %> \out -> do cs <- getDirectoryFiles "" ["//*.c"] let os = ["_build" </> c -<.> "o" | c <- cs] need os cmd_ "gcc -o" [out] os "_build//*.o" %> \out -> do let c = dropDirectory1 $ out -<.> "c" let m = out -<.> "m" cmd_ "gcc -c" [c] "-o" [out] "-MMD -MF" [m] neededMakefileDependencies m
This build system builds the executable _build/run
from all C source files in the current directory. It will rebuild if you add/remove any C files to the directory, if the C files themselves change, or if any headers used by the C files change. All generated files are placed in _build
, and a clean
command is provided that will wipe all the generated files. In the rest of this manual we'll explain how the above code works and how to extend it.
Running this example
To run the example above:
- Install the Haskell Stack, which provides a Haskell compiler and package manager.
- Type
stack install shake
, to build and install Shake and all its dependencies. - Type
stack exec -- shake --demo
, which will create a directory containing a sample project, the above Shake script (namedShakefile.hs
), and execute it (which can be done byrunhaskell Shakefile.hs
). For more details see a trace ofshake --demo
.
Basic syntax
This section explains enough syntax to write a basic Shake build script.
Boilerplate
The build system above starts with the following boilerplate:
import Development.Shake import Development.Shake.Command import Development.Shake.FilePath import Development.Shake.Util main :: IO () main = shakeArgs shakeOptions{shakeFiles="_build"} $ do build rules
All the interesting build-specific code is placed under build rules. Many build systems will be able to reuse that boilerplate unmodified.
Defining targets
A target is a file we want the build system to produce (typically executable files). For example, if we want to produce the file manual/examples.txt
we can write:
want ["manual/examples.txt"]
The want
function takes a list of strings. In Shake lists are written [item1,item2,item2]
and strings are written "contents of a string"
. Special characters in strings can be escaped using \
(e.g. "\n"
for newline) and directory separators are always written /
, even on Windows.
Most files have the same name on all platforms, but executable files on Windows usually have the .exe
extension, while on POSIX they have no extension. When writing cross-platform build systems (like the initial example), we can write:
want ["_build/run" <.> exe]
The <.>
function adds an extension to a file path, and the built-in exe
variable evaluates to "exe"
on Windows and ""
otherwise.
Defining rules
A rule describes the steps required to build a file. A rule has two components, a pattern and some actions:
pattern %> \out -> do actions
The pattern is a string saying which files this rule can build. It may be a specific file (e.g. "manual/examples.txt" %> ...
) or may use wildcards:
- The
*
wildcard matches anything apart from a directory separator. For example"manual/*.txt"
would define a rule for any.txt
file in themanual
directory, includingmanual/examples.txt
, but would not matchmanual/examples.zip
,examples.txt
ormanual/docs/examples.txt
. - The
//
wildcard matches any number of complete path components. For example//*.txt
would define a rule for any.txt
file, includingmanual/examples.txt
. As another example,manual//examples.txt
would match any file namedexamples.txt
insidemanual
, including bothmanual/examples.txt
andmanual/docs/examples.txt
.
It is an error for multiple patterns to match a file being built, so you should keep patterns minimal. Looking at the two rules in the initial example:
"_build/run" <.> exe %> ... "_build//*.o" %> ...
The first matches only the run
executable, using <.> exe
to ensure the executable is correctly named on all platforms. The second matches any .o
file anywhere under _build
. As examples, _build/main.o
and _build/foo/bar.o
both match while main.o
and _build/main.txt
do not.
Lots of compilers produce .o
files, so if you are combining two different languages, say C and Haskell, use the extension .c.o
and .hs.o
to avoid overlapping rules.
The actions are a list of steps to perform and are listed one per line, indented beneath the rule. Actions both express dependencies (say what this rule uses) and run commands (actually generate the file). During the action the out
variable is bound to the file that is being produced.
A simple rule
Let's look at a simple example of a rule:
"*.rot13" %> \out -> do let src = out -<.> "txt" need [src] cmd_ "rot13" src "-o" out
This rule can build any .rot13
file. Imagine we are building "file.rot13"
, it proceeds by:
- Using
let
to define a local variablesrc
, using the-<.>
extension replacement method, which removes the extension from a file and adds a new extension. Whenout
is"file.rot13"
the variablesrc
will becomefile.txt
. - Using
need
to introduce a dependency on thesrc
file, ensuring that ifsrc
changes thenout
will be rebuilt and thatsrc
will be up-to-date before any further commands are run. - Using
cmd_
to run the command linerot13 file.txt -o file.rot13
, which should readfile.txt
and write outfile.rot13
being the ROT13 encoding of the file.
Many rules follow this pattern – calculate some local variables, need
some dependencies, then use cmd_
to perform some actions. We now discuss each of the three statements.
Local variables
Local variables can be defined as:
let variable = expression
Where variable is a name consisting of letters, numbers and underscores (a-z, A-Z, 0-9 and _). All variables must start with a lower-case letter.
An expression is any combination of variables and function calls, for example out -<.> "txt"
. A list of some common functions is discussed later.
Variables are evaluated by substituting the expression everywhere the variable is used. In the simple example we could have equivalently written:
"*.rot13" %> \out -> do need [out -<.> "txt"] cmd_ "rot13" (out -<.> "txt") "-o" out
Variables are local to the rule they are defined in, cannot be modified, and should not be defined multiple times within a single rule.
File dependencies
You can express a dependency on a file with:
need ["file.src"]
To depend on multiple files you can write:
need ["file.1","file.2"]
Or alternatively:
need ["file.1"] need ["file.2"]
It is preferable to use fewer calls to need
, if possible, as multiple files required by a need
can be built in parallel.
Running external commands
The cmd_
function allows you to call system commands, e.g. gcc
. Taking the initial example, we see:
cmd_ "gcc -o" [out] os
After substituting out
(a string variable) and os
(a list of strings variable) we might get:
cmd_ "gcc -o" ["_make/run"] ["_build/main.o","_build/constants.o"]
The cmd_
function takes any number of space-separated expressions. Each expression can be either a string (which is treated as a space-separated list of arguments) or a list of strings (which is treated as a direct list of arguments). Therefore the above command line is equivalent to either of:
cmd_ "gcc -o _make/run _build/main.o _build/constants.o" cmd_ ["gcc","-o","_make/run","_build/main.o","_build/constants.o"]
To properly handle unknown string variables it is recommended to enclose them in a list, e.g. [out]
, so that even if out
contains a space it will be treated as a single argument.
The cmd_
function as presented here will fail if the system command returns a non-zero exit code, but see later for how to treat failing commands differently.
Filepath manipulation functions
Shake provides a complete library of filepath manipulation functions (see the docs for Development.Shake.FilePath
), but the most common are:
str1 </> str2
– add the path components together with a slash, e.g."_build" </> "main.o"
equals"_build/main.o"
.str1 <.> str2
– add an extension, e.g."main" <.> "o"
equals"main.o"
.str1 ++ str2
– append two strings together, e.g."hello" ++ "world"
equals"helloworld"
.str1 -<.> str2
– replace an extension, e.g."main.c" -<.> "o"
equals"main.o"
.dropExtension str
– drop the final extension of a filepath if it has one, e.g.dropExtension "main.o"
equals"main"
, whiledropExtension "main"
equals"main"
.takeFileName str
– drop the path component, e.g.takeFileName "_build/src/main.o"
equals"main.o"
.dropDirectory1 str
– drop the first path component, e.g.dropDirectory1 "_build/src/main.o"
equals"src/main.o"
.
Advanced Syntax
The following section covers more advanced operations that are necessary for moderately complex build systems, but not simple ones.
Directory listing dependencies
The function getDirectoryFiles
can retrieve a list of files within a directory:
files <- getDirectoryFiles "" ["//*.c"]
After this operation files
will be a variable containing all the files matching the pattern "//*.c"
(those with the extension .c
) starting at the directory ""
(the current directory). To obtain all .c
and .cpp
files in the src directory we can write:
files <- getDirectoryFiles "src" ["//*.c","//*.cpp"]
The getDirectoryFiles
operation is tracked by the build system, so if the files in a directory change the rule will rebuild in the next run. You should only use getDirectoryFiles
on source files, not files that are generated by the build system, otherwise the results will change while you are running the build and the build may be inconsistent.
List manipulations
Many functions work with lists of values. The simplest operation on lists is to join two lists together, which we do with ++
. For example, ["main.c"] ++ ["constants.c"]
equals ["main.c", "constants.c"]
.
Using a list comprehension we can produce new lists, apply functions to the elements and filtering them. As an example:
["_build" </> x -<.> "o" | x <- inputs]
This expression grabs each element from inputs
and names it x
(the x <- inputs
, pronounced "x
is drawn from inputs
"), then applies the expression "_build" </> x -<.> "o"
to each element. If we start with the list ["main.c","constants.c"]
, we would end up with ["_build/main.o", "_build/constants.o"]
.
List expressions also allow us to filter the list, for example we could know that the file "evil.c"
is in the directory, but should not be compiled. We can extend that to:
["_build" </> x -<.> "o" | x <- inputs, x /= "evil.c"]
The /=
operator checks for inequality, and any predicate after the drawn from is used to first restrict which elements of the list are available.
Using gcc
to collect headers
One common problem when building .c
files is tracking down which headers they transitively import, and thus must be added as a dependency. We can solve this problem by asking gcc
to create a file while building that contains a list of all the imports. If we run:
gcc -c main.c -o main.o -MMD -MF main.m
That will compile main.c
to main.o
, and also produce a file main.m
containing the dependencies. To add these dependencies as dependencies of this rule we can call:
neededMakefileDependencies "main.m"
Now, if either main.c
or any headers transitively imported by main.c
change, the file will be rebuilt. In the initial example the complete rule is:
"_build//*.o" %> \out -> do let c = dropDirectory1 $ out -<.> "c" let m = out -<.> "m" cmd_ "gcc -c" [c] "-o" [out] "-MMD -MF" [m] neededMakefileDependencies m
We first compute the source file c
(e.g. "main.c"
) that is associated with the out
file (e.g. "_build/main.o"
). We then compute a temporary file m
to write the dependencies to (e.g. "_build/main.m"
). We then call gcc
using the -MMD -MF
flags and then finally call neededMakefileDependencies
.
Top-level variables
Variables local to a rule are defined using let
, but you can also define top-level variables. Top-level variables are defined before the main
call, for example:
buildDir = "_build"
You can now use buildDir
in place of "_build"
throughout. You can also define parametrised variables (functions) by adding argument names:
buildDir x = "_build" </> x
We can now write:
buildDir ("run" <.> exe) %> \out -> do ...
All top-level variables and functions can be thought of as being expanded wherever they are used, although in practice may have their evaluation shared.
A clean command
A standard clean command is defined as:
phony "clean" $ do putInfo "Cleaning files in _build" removeFilesAfter "_build" ["//*"]
Running the build system with the clean
argument, e.g. runhaskell Shakefile.hs clean
will remove all files under the _build
directory. This clean command is formed from two separate pieces. Firstly, we can define phony
commands as:
phony "name" $ do actions
Where name is the name used on the command line to invoke the actions, and actions are the list of things to do in response. These names are not dependency tracked and are run afresh each time they are requested.
The actions can be any standard build actions, although for a clean
rule, removeFilesAfter
is typical. This function waits until after any files have finished building (which will be none, if you do runhaskell Shakefile.hs clean
) then deletes all files matching //*
in the _build
directory. The putInfo
function writes out a message to the console, as long as --quiet
was not passed.
Running
This section covers how to run the build system you have written.
Compiling the build system
As shown before, we can use runhaskell Shakefile.hs
to execute our build system, but doing so causes the build script to be compiled afresh each time. A more common approach is to add a shell script that compiles the build system and runs it. In the example directory you will find build.sh
(Linux) and build.bat
(Windows), both of which execute the same interesting commands. Looking at build.sh
:
#!/bin/sh mkdir -p _shake ghc --make Shakefile.hs -rtsopts -threaded -with-rtsopts=-I0 -outputdir=_shake -o _shake/build && _shake/build "$@"
This script creates a folder named _shake
for the build system objects to live in, then runs ghc --make Shakefile.hs
to produce _shake/build
, then executes _shake/build
with all arguments it was given. The -with-rtsopts
flag instructs the Haskell compiler to disable "idle garbage collection", making more CPU available for the commands you are running, as explained here.
Now you can run a build by typing stack exec ./build.sh
on Linux, or stack exec build.bat
on Windows. On Linux you may want to alias build
to stack exec ./build.sh
. For the rest of this document we will assume build
runs the build system.
Warning: You should not use the -threaded
for GHC 7.6 or below because of a GHC bug. If you do turn on -threaded
, you should include -qg
in -with-rtsopts
.
Command line flags
The initial example build system supports a number of command line flags, including:
build
will compile all files required bywant
.build _build/main.o
will compile enough to create_build/main.o
, ignoring allwant
requirements.build clean
will delete the contents of_build
, because of ourphony
command.build --help
will list out all flags supported by the build system, currently 36 flags. Most flags supported bymake
are also supported by Shake based build systems.build -j8
will compile up to 8 rules simultaneously, by default Shake uses 1 processor.
Most flags can also be set within the program by modifying the shakeOptions
value. As an example, build --metadata=_metadata
causes all Shake metadata files to be stored with names such as _metadata/.shake.database
. Alternatively we can write shakeOptions{shakeFiles="_metadata"}
instead of our existing shakeFiles="_build"
. Values passed on the command line take preference over those given by shakeOptions
. Multiple overrides can be given to shakeOptions
by separating them with a comma, for example shakeOptions{shakeFiles="_build", shakeThreads=8}
.
Progress prediction
One useful feature of Shake is that it can predict the remaining build time, based on how long previous builds have taken. The number is only a prediction, but it does take account of which files require rebuilding, how fast your machine is currently running, parallelism settings etc. You can display progress messages in the titlebar of a Window by either:
- Running
build --progress
- Setting
shakeOptions{shakeProgress = progressSimple}
The progress message will be displayed in the titlebar of the window, for example 3m12s (82%)
to indicate that the build is 82% complete and is predicted to take a further 3 minutes and 12 seconds. If you are running Windows 7 or higher and place the shake-progress
utility somewhere on your %PATH%
then the progress will also be displayed in the taskbar progress indicator:
Progress prediction is likely to be relatively poor during the first build and after running build clean
, as then Shake has no information about the predicted execution time for each rule. To rebuild from scratch without running clean (because you really want to see the progress bar!) you can use the argument --always-make
, which assumes all rules need rerunning.
Lint
Shake features a built in "lint" feature to check the build system is well formed. To run use build --lint
. You are likely to catch more lint violations if you first build clean
. Sadly, lint does not catch missing dependencies. However, it does catch:
- Changing the current directory, typically with
setCurrentDirectory
. You should never change the current directory within the build system as multiple rules running at the same time share the current directory. You can still runcmd_
calls in different directories using theCwd
argument. - Outputs that change after Shake has built them. The usual cause of this error is if the rule for
foo
also writes to the filebar
, despitebar
having a different rule producing it.
There is a performance penalty for building with --lint
, but it is typically small. For more details see this page on all the Lint options.
Profiling and optimisation
Shake features an advanced profiling feature. To build with profiling run build --report
, which will generate an interactive HTML profile named report.html
. This report lets you examine what happened in that run, what takes most time to run, what rules depend on what etc. For a full explanation of how to profile and optimise a build system, including getting accurate timings and using Haskell profiling, see the profiling and optimisation page.
Tracing and debugging
To debug a build system there are a variety of techniques that can be used:
- Run with lint checking enabled (
--lint
), which may spot and describe the problem for you. - Run in single-threaded mode (
-j1
) to make any output clearer by not interleaving commands. - By default a Shake build system prints out a message every time it runs a command. Use verbose mode (
--verbose
) to print more information to the screen, such as which rule is being run. Additional--verbose
flags increase the verbosity. Three verbosity flags produce output intended for someone debugging the Shake library itself, rather than a build system based on it. - To raise a build error call
error "error message"
. Shake will abort, showing the error message. - To output additional information use
putInfo "output message"
. This message will be printed to the console when it is reached. - To show additional information with either
error
orputInfo
, useerror $ show ("message", myVariable)
. This allows you to show any local variables.
Extensions
This section details a number of build system features that are useful in some build systems, but not the initial example, and not most average build systems.
Advanced cmd
usage
The cmd_
has a related function cmd
that can also obtain the stdout and stderr streams, along with the exit code. As an example:
(Exit code, Stdout out, Stderr err) <- cmd "gcc --version"
Now the variable code
is bound to the exit code, while out
and err
are bound to the stdout and stderr streams. If ExitCode
is not requested then any non-zero return value will raise an error.
Both cmd_
and cmd
also take additional parameters to control how the command is run. As an example:
cmd_ Shell (Cwd "temp") "pwd"
This runs the pwd
command through the system shell, after first changing to the temp
directory.
Dependencies on environment variables
You can use tracked dependencies on environment variables using the getEnv
function. As an example:
link <- getEnv "C_LINK_FLAGS" let linkFlags = fromMaybe "" link cmd_ "gcc -o" [output] inputs linkFlags
This example gets the $C_LINK_FLAGS
environment variable (which is Maybe String
, namely a String
that might be missing), then using fromMaybe
defines a local variable linkFlags
that is the empty string when $C_LINK_FLAGS
is not set. It then passes these flags to gcc
.
If the $C_LINK_FLAGS
environment variable changes then this rule will rebuild.
Dependencies on extra information
Using Shake we can depend on arbitrary extra information, such as the version of gcc
, allowing us to automatically rebuild all C files when a different compiler is placed on the path. To track the version, we can define a rule for the file gcc.version
which changes only when gcc --version
changes:
"gcc.version" %> \out -> do alwaysRerun Stdout stdout <- cmd "gcc --version" writeFileChanged out stdout
This rule has the action alwaysRerun
meaning it will be run in every execution that requires it, so the gcc --version
is always checked. This rule defines no dependencies (no need
actions), so if it lacked alwaysRerun
, this rule would only be run when gcc.version
was missing. The function then runs gcc --version
storing the output in stdout
. Finally, it calls writeFileChanged
which writes stdout
to out
, but only if the contents have changed. The use of writeFileChanged
is important otherwise gcc.version
would change in every run. To use this rule, we need ["gcc.version"]
in every rule that calls gcc
.
Shake also contains a feature called "oracles", which lets you do the same thing without the use of a file, which is sometimes more convenient. Interested readers should look at the function documentation list for addOracle
.
Resources
Resources allow us to limit the number of simultaneous operations more precisely than just the number of simultaneous jobs (the -j
flag). For example, calls to compilers are usually CPU bound but calls to linkers are usually disk bound. Running 8 linkers will often cause an 8 CPU system to grid to a halt. We can limit ourselves to 4 linkers with:
disk <- newResource "Disk" 4 want [show i <.> "exe" | i <- [1..100]] "*.exe" %> \out -> do withResource disk 1 $ do cmd_ "ld -o" [out] ... "*.o" %> \out -> do cmd_ "cl -o" [out] ...
Assuming -j8
, this allows up to 8 compilers, but only a maximum of 4 linkers.
Multiple outputs
Some tools, for example bison, can generate multiple outputs from one execution. We can track these in Shake using the &%>
operator to define rules:
["//*.bison.h","//*.bison.c"] &%> \[outh, outc] -> do let src = outc -<.> "y" cmd_ "bison -d -o" [outc] [src]
Now we define a list of patterns that are matched, and get a list of output files. If any output file is required, then all output files will be built, with proper dependencies.
Changing build rules
Shake build systems are set up to rebuild files when the dependencies change, but mostly assume that the build rules themselves do not change (including both the code and the shell commands contained within). To minimise the impact of build rule changes there are three approaches:
Use configuration files: Most build information, such as which files a C file includes, can be computed from source files. Where such information is not available, such as which C files should be linked together to form an executable, use configuration files to provide the information. The rule for linking can use these configuration files, which can be properly tracked. Moving any regularly changing configuration into separate files will significantly reduce the number of build system changes.
Depend on the build source: One approach is to depend on the build system source in each of the rules, then if any rules change, everything will rebuild. While this option is safe, it may cause a significant number of redundant rebuilds. As a restricted version of this technique, for a generated file you can include a dependency on the generator source and use writeFileChanged
. If the generator changes it will rerun, but typically only a few generated files will change, so little is rebuilt.
Use a version stamp: There is a field named shakeVersion
in the ShakeOptions
record. If the build system changes in a significant and incompatible way, you can change this field to force a full rebuild. If you want all rules to depend on all rules, you can put a hash of the build system source in the version field, as described here.
The Haskell Zone
From now on, this manual assumes some moderate knowledge of Haskell. Most of the things in this section are either impossible to do with other build systems or can be faked by shell script. None of the Haskell is particularly advanced.
Haskell Expressions
You can use any Haskell function at any point. For example, to only link files without numbers in them, we can import Data.Char
and then write:
let os = ["_build" </> c -<.> "o" | c <- inputs, not $ any isDigit c]
For defining non-overlapping rules it is sometimes useful to use a more advanced predicate. For example, to define a rule that only builds results which have a numeric extension, we can use the ?>
rule definition function:
(\x -> all isDigit $ drop 1 $ takeExtension x) ?> \out -> do ...
We first get the extension with takeExtension
, then use drop 1
to remove the leading .
that takeExtension
includes, then test that all the characters are numeric.
The standard %>
operator is actually defined as:
pattern %> actions = (pattern ?==) ?> actions
Where ?==
is a function for matching file patterns.
Haskell Actions
You can run any Haskell IO
action by using liftIO
. As an example:
liftIO $ launchMissiles True
Most common IO operations to run as actions are already wrapped and available in the Shake library, including readFile'
, writeFile'
and copyFile'
. Other useful functions can be found in System.Directory
.