Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Julia For Beginners Sample

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

ii

Preface

I began programming as a teenager with fun books containing comic book strips
with wizards and turtles. I read magazines that showed me how to make my own
simple games or make silly stuff happen on the screen. I had fun.
But when I went to university, books started talking about bank accounts, bal-
ances, sales departments, employees and employers. I wondered if my life as
a programmer would mean putting on a gray suit and writing code handling
payroll systems. Oh the horror!
At least half of my class hated programming with a passion. I could not blame
them. Why did programming books have to be so boring, functional and sensi-
ble?
Where was the sense of adventure and fun? Fun is underrated. Who cares if a
book is silly and has stupid jokes if it makes you learn and enjoy learning?
That is one of the reasons I wrote this book. I wanted the reader to enjoy learn-
ing programming. Not through cracking jokes, but by working through pro-
gramming examples that are interesting or fun to do.
I promise you, there will be no examples modeling a sales department. Instead
we will simulate rocket launches, pretend to be Caesar sending a secret message
to his army commanders using old Roman encryption techniques, as well as
simulating a beautiful old handheld mechanical calculator, the Curta, and many
other things.
The second important reason I wanted to write this book is because people keep
telling me: “Julia? Isn’t that a language only for science and scientists?”
Julia has had major success in this area, which is why the Julia community today
is full of brainy people working on hard problems such as developing new drugs,
modeling the spread of infectious diseases, climate change or the economy.
But no, you don’t need to be a genius or a scientists to use Julia. Julia is a
wonderful general purpose programming language for everyone! I am not a
scientist and I have enjoyed using it for over 7 years now. With Julia you will
find that you can solve problems more quickly and elegantly than you have done
in the past. And as a cherry on top, computationally intensive code will run
blisteringly fast.

1
2 PREFACE
Introduction

Software is everywhere around us. Every program you see on your computer,
smart phone or tablet has been made by someone, who wrote code in a program-
ming language.

But that isn’t the only places you’ll find software. You may think of a computer
as box that sits on your desk or a laptop computer, but there are tiny computers
we call micro controllers inside almost any kind of technical device we use. In
cars for instance, these little computers figure out how much gasoline needs to
be injected into the engine cylinder.

Have you ever seen the Falcon 9 rocket land at sea on a barge by firing its rocket
engines right before it smashes into the deck? There are numerous computers
inside this rocket keeping track of how fast the rocket is going, how far it is from
the ground and exactly how much thrust it has to apply, and for how long, to
avoid crashing.

All of these computers run programs that somebody wrote.

Programs are written in many different programming languages. People some-


times ask me what the best programming language is. There is no best language!
It is like asking what the best car is. Some cars are better for certain things than
others. Ferraris are good at going really fast, but not good at transporting your
IKEA furniture. But most of the time, car choices come down to personal pref-
erences and personality. Programming languages are the same. It is both about
the job, but also about how you like to work.

The Julia Programming Language


I have chosen to teach you to the Julia programming language. Why did I pick
this language, when there are hundreds of others to chose from, some which
are much better known?

1. It is a fun language! The language plays on your team instead of against


you.
2. Easy to learn. Some languages languages require you do learn a myriad
of details before you get to do anything at all. Julia lets you learn one small
thing at a time.
3. Powerful. You can get a lot done with very small amounts of code.

3
4 INTRODUCTION

Figure 1: A Falcon 9 booster coming in for landing.


THE JULIA PROGRAMMING LANGUAGE 5

4. Batteries included. Don’t you hate it when you unpack a cool new thing
and it doesn’t work, because batteries are sold separately? A lot of pro-
gramming languages are like that, but not Julia.
5. Fast. You can write code that runs slow in any language, but Julia gives
you the ability to write very high performance code.
6 INTRODUCTION
Overview

The chapters in this book are meant to be read in sequence, and build upon each
other. However not every single chapter needs to be read.
This book tries to balance the needs of three different kinds of readers:

1. The beginner, with minimal programming experience. For this rea-


son the first chapters try to keep things simple and assume limited prior
knowledge of common programming concepts.
2. The experienced developer who wants to move faster and have more
in depth coverage of material specific to Julia. For this reason, later chap-
ters will assume more prior knowledge of various programming concepts.
3. The curious technology enthusiast. Some chapters go into more
technical and historical details, which some readers may find fascinating
or interesting. However these chapters cover material which may not be
strictly necessary to master Julia.
The inspiration for this organization is my own experience reading technology
and science books for fun. Sometimes you want to learn something fast and
get to the meat quickly. Other times you enjoy immersing yourself with geeky
details.
1. [Installing and Setting up Tools]. We cover how to install Julia, setup
tools to work with Julia and a basic introduction to the Julia programming
environment.
2. Working with Numbers. What makes a computer different from a mere
calculator? Exploring the ability to automate complex number calcula-
tions using a programming language.
3. [Working with Text]. We make a simple program to allow somebody to
practice multiplication. This ties together reading input from the user and
writing out messages.
4. Storing Data in Dictionaries. We work through an example of how to con-
vert Roman numerals to decimal numbers by using an important data
structure called a dictionary.
5. [How Does a Computer Work?] Is an introduction to the binary num­
ber system, binary operations and how to work with integers in different
representations. Go through how a microprocessor does arithmetic.
6. [More on Types]. We explain the Julia type system and how it ties in with
multiple dispatch.
7. [Defining Your Own Types] by building a space rocket in code.

7
8 OVERVIEW

8. [Static vs Dynamic Typing]. Julia has type annotations, so what makes


Julia a dynamic language, and how is it different from a statically typed
language such as C/C++, Java or C#?
9. [Conversion and Promotion] of different number types.
10. [Different Kinds of Nothing]. How to represent objects which are not
found, missing or undefined in Julia.
11. [Strings]. Working with text strings. What is Unicode and UTF-8 encod-
ing?
12. [Object Collections]. Shared abilities for different types of object collec-
tions and how you make your own.
13. [Working with Sets]. Creating sets, set operations and what makes a set
different from an array.
14. [Your Own Spreadsheet]. Use Julia like a spreadsheet, or how to work
with multi-dimensional arrays. A simple introduction to linear algebra.
15. [Moving a Rocket] using affine transformation matrices (2D arrays) to
simulate movement and rotation of a space rocket. A geometric look at
matrices, vectors and points.
16. [Functional Programming] concepts such as closures and higher order
functions. A deeper look at using functions in Julia and how to think func-
tional.
17. [Object-Oriented Programming]. How do you apply your object-oriented
thinking to a language which isn’t object-oriented? What does a design
pattern look like in Julia?
18. [Code Organization] in files, modules and packages. Creating dependen-
cies between packages.
19. [Input and Output] to files and network sockets. Asynchronous network
socket communication. Representing objects as strings.
20. Shell Scripting in Julia. Instead of Bash, Julia can be used to do common
scripting operations.
21. [Parametric Types] allow you to write safer and more high performance
code.
22. [Testing] focuses on how to write unit tests in Julia.
23. [Logging] covers the standard logging framework in Julia. The purpose
of logging, logging levels and making your own loggers.
24. [Debugging] is centered on how to use an interactive Debugger in Julia.
Working with Numbers

• Variables. Getting Julia to remember long numbers and strings of text


for you.
• Functions. Store how a calculation is done for later reuse.
• Control flow. Using loops and if-statements to decide what code to run
and how many times to do it.
• Types. The kind of objects we can work with in Julia.
What is programming and why is it useful? To explain that I am going to tell you
a story, about what one of the first programs in history were made to do. Per-
haps you have wondered what those first computers did, back when computers
had no screens, but instead had large walls of blinking lights?
Our story starts in World War II. Mathematics started to become very impor-
tant to winning the war. The Germans sent secret messages through radio tele-
graphs. These messages were sent by the infamous Enigma machines which en-
crypted and decrypted secret messages. Figuring out what the secret message
said was like solving a mathematical puzzle.

Figure 2: Distance traveled by a cannon ball depends on the elevation of the


cannon

Another problem was to figure out the correct angle to orient the artillery can-
nons to hit their targets. Say you want to hit an enemy bunker 8 km away. At

9
10 WORKING WITH NUMBERS

what angle should you fire your cannon?

This is a mathematical problem, and the poor grunts fighting the war could not
bust out a pen and paper and start doing complicated math calculations each
time they wanted to fire a cannon. How do you think they did it?

Calculating Artillery Trajectories


They used books with lots of tables. Before handheld calculators, mathemati-
cians used scientific tables. There was tables for everything: trigonometric func-
tions (sine, cosine), logarithms, normal distributions, etc.

Figure 3: Example of table used to decide correct elevation of a cannot to fire a


given range.

The soldier manning the artillery cannons had books with numerous tables. The
tables would tell them what elevation (angle) to put the cannon in order to fire
the artillery projectile (cannon ball) the desired distance. But these tables could
get very complicated, because so many things affect how far the cannon ball
would go:

• wind
• amount of gunpowder
• the particular kind of cannon (artillery) used.
ANGLE OF REACH 11

This meant they needed countless tables, which is why during WWII the allies
had huge rooms filled with people calculating these tables. People doing these
calculations were called computers. That was what a computer was before elec-
tronic computers. It was a person doing lots of calculations.
The first computers were made to replace thousands of human computer, so
that all these tables could be quickly calculated by a machine. Let us look at
how these calculations where done.

Angle of reach
Say you got a cannon and you want to shot an enemy. Your enemy is at a dis-
tance from you. The cannon ball you fire, exits the cannon with a velocity .
What angle does your cannon need to be elevated?
That angle is called the “angle of reach” and is calculated as follows:

arcsin

In this equation , is the acceleration gravity gives a falling object. On earth


that is . That means if you fired your cannon on Mars or the moon
the result would be different because object don’t fall as fast there, due to lower
gravity.

Figure 4: A triangle with sides of length a, b and h. The longest side h is called
the hypothenuse

The arcsin function is the reverse of the sin function:

sin arcsin
12 WORKING WITH NUMBERS

Normally you give an angle to the sin function. The result is , where is the
hypothenuse and the length of the triangle opposite the angle .
Thus if we want to calculate multiple trajectories for our artillery book we could
use Julia as a calculator.

NOTE Muzzle velocity and range


Artillery cannons typically have muzzle velocities of 700 to 900
m/s, and shoot 20 to 30 km. In our simplified case we get almost
60 km max distance since we ignore things like air resistance
which would significantly reduce artillery range.

Imagine a start velocity and we want to know the angles for


the distances 8, 12, 16 and 25 km. We could write in the Julia REPL:
julia> 0.5*asin(9.81*8000/762.425^2)
0.06771158922454301

julia> 0.5*asin(9.81*12000/762.425^2)
0.10196244313304187

julia> 0.5*asin(9.81*25000/762.425^2)
0.217772776595389

The angles here are in radians so they will be from -π to π, rather than from 0
to 360.
Imagine doing thousands of these calculations. No wonder they needed rooms
full of people calculating! This is going to get boring and tedious even with a cal-
culator. And they did not have electronic calculators but mechanical calculators
which could basically only add and subtract.
This poses several problems. We have to keep writing the same long numbers
over and over again. It is boring and time consuming to write 762.425 again
and again. Sooner or later we will get the wrong answer, mixing up one digit or
forgetting another one.

Is there perhaps some way we can get Julia to remember 762.425


for us?

That is how what we do in natural language. Remember how I talked about our
ability to talk about context? Humans talking to each other don’t need to keep
repeating 762.425 when talking about the velocity. They can just say “initial
velocity”.

Variables and Constants


That is what we do in the math equations as well. We use the letter as a sort
of placeholder. Remember I talked about variables as one of the important
VARIABLES AND CONSTANTS 13

concepts in programming?

In this case it will actually be a constant as our initial velocity will be fixed to
the same value every time we do the calculations. But in programming we still
usually refer to it as a variable.

So what would be a good way of telling Julia that we want to use a letter or a
word to refer to a number?

There are many possible ways of doing this, which are equally valid. It is the
designers of the programming language who decide how to do it. What is most
important, is that it is done in a manner which is easy for programmers to re-
member.

Some languages will write it like this:

v <- 762.425

An early programming language called Pascal made you write it like:

v := 762.425

But in Julia we write:

v = 762.425

NOTE
This is potentially confusing, because it is exactly the same sym-
bol used in mathematics for comparing two numbers. Keep in
mind that in Julia (and most other programming languages) it is
used for assignment and not for comparing values. The mathe-
matical expression is written x == y in Julia.

Thus on the Julia REPL (command line) we can input the values for velocity v
and the acceleration g caused by gravity:

julia> v = 762.425
762.425

julia> g = 9.81
9.81

Once written, Julia will remember the values of v and g and we can keep writing:

julia> 0.5*asin(g*12000/v^2)
0.10196244313304187

However it is still tedious to write this whole equation. Imagine writing this a
thousand times and the only thing you really change is the velocity. Everything
else stays the same each time, yet we have to keep writing it.

Is there perhaps a way in which we can get Julia to not just give a name to a
number, but to give a name to a whole equation or calculation?
14 WORKING WITH NUMBERS

Functions
You guessed it, functions. Functions allow you to give names to whole calcu-
lations. Thus instead of having to remember the details of how something is
calculated, you can simply refer to a calculation by name.
Lets make a naive attempt at writing such a function:
julia> angle = 0.5*asin(g*12000/v^2)

Nope, that won’t work. All we did was calculate an angle and stick it in a variable
named angle. We can write
julia> angle

Over and over again, but the problem is we get the same result each time. We
haven’t told Julia yet which variable we want to keep changing the value of. We
want to change the distance from 8, 12 to 25 km.
Thus somehow, when we define a function with a name, we have to tell Julia
which variable will keep changing. Having used variables already, we can use
a variable name to refer to this distance. But how can we tell Julia that the
variable named distance should keep changing but not the other variables?

Function arguments
When we define the function we specify function arguments. That is a list of the
variables which we want to change each time we use our function.
julia> angle(distance) = 0.5*asin(g*distance/v^2)
angle (generic function with 1 method)

This defines a function angle, with one single argument named distance. Now
I can write:
julia> angle(8000)
0.06771158922454301

julia> angle(12000)
0.10196244313304187

julia> angle(25000)
0.217772776595389

That is a lot better. Now we can calculate angles much faster and we don’t have
to memorize the gravitational acceleration, initial velocity and the equation any-
more.

Arrays
But it is still tedious to do the calculations. Say we have multiple distances in a
list:
ARRAYS 15

and you want to calculate the angle for each of these distances. Is there a simpler
way than writing angle() five times? Or what if we had a hundred or thousand
numbers. Manually calling angle for each of these numbers would be very time
consuming.
You may already be familiar with spreadsheet applications such as Microsoft
Excel or Apple’s Numbers. These programs excel at performing a calculation
on multiple numbers stored in tables. Below is an example from a spreadsheet
for calculating angles for different distances.

Figure 5: Spreadsheet application Numbers, used to calculate angles

At the bottom of the image you can spot the formula used to calculate angles
given distances.
Julia also offers a way of working with tables of numbers. In Julia you create
the equivalent of tables with the Array data type. Julia arrays allow you to work
with numbers in rows and columns. When an array is just a row or column, we
call it one-dimensional. When an array is made up of several rows or columns
we call it a two-dimensional array or a matrix.
16 WORKING WITH NUMBERS

Below are examples of one-dimensional and two-dimensional arrays. We will


cover more details of how to work with arrays in the [Object Collections] chap-
ter.

element indicies column indicies

1 2 3 4 1 2 3 4

5 7 8 9 1 11 12 13 14

v[2] 2 21 22 23 24

A 1D array v and a 2D array A . We have marked


in blue the 2nd element in v . It contains value 7.
3 31 32 33 34
The 3rd column of A is marked containing
values 12, 23 and 33 A[:, 3]

Figure 6: 1D and 2D arrays

We can store an array of numbers in a variable just like single numbers (scalars).
Notice how we can use _ to separate digits in long numbers.
julia> distances = [8_000, 12_000, 25_000, 31_000, 42_000]
5-element Array{Int64,1}:
8000
12000
25000
31000
42000

NOTE Assigning values to variables


Remember assignment means putting a value into a variable. An-
other way of looking at it, is that you are fixing a label to a value.
You can have all sorts of long and complicated numbers and ar-
rays you need to remember. Sticking a label on the number helps
you remember it. Just like we can make it easier to remember
a calculation by giving it a name. Thus function names are labels
stuck on calculations and variables are labels stuck on numbers,
arrays or other data types.

The benefit of having numbers in arrays is that you can work with the numbers
collectively. For instance you can tell Julia to find the average of all the numbers
in an array or the sum of them.
julia> sum(distances)
118000

julia> sum([8_000, 12_000, 25_000, 31_000, 42_000])


ACCESSING ELEMENTS 17

118000

julia> 8_000 + 12_000 + 25_000 + 31_000 + 42_000


118000

All three expressions are equivalent.

To get the median or mean we would need to use the statistics package.

julia> using Statistics

julia> mean(distances)
23600.0

julia> median(distances)
25000.0

Accessing Elements
You can get hold of one or more values stored in an array in different ways. If
we deal with a 1 dimensional array, then the position of the element is defined
by an index. If we have a 2 dimensional array, the position is given by a row and
column.

julia> distances[1]
8000

julia> distances[2]
12000

julia> distances[5]
42000

julia> distances[end]
42000

julia> two_dim = [2 4 8; 10 12 14]


2×3 Array{Int64,2}:
2 4 8
10 12 14

julia> two_dim[1, 2]
4

julia> two_dim[2, 2]
12
18 WORKING WITH NUMBERS

Higher Order Functions


You can also use functions which don’t produce single numbers as result (single
values are called scalars). Some functions take arrays as input and produce
arrays as output.
julia> angles_rad = map(angle, distances)
5-element Array{Float64,1}:
0.06771158922454301
0.10196244313304187
0.217772776595389
0.27527867645775067
0.3938981904086532

map is a function which takes two arguments. The first argument is a function
and the second is an array. When you run map(f, xs) it will give you a new
array, where each number is the result of applying the function f given as first
argument to each successive number in the array xs given as second argument.

NOTE
Higher order functions are functions which take other functions
as arguments. So map is a higher-order function, while sum isn't.
Higher-order functions will be covered more in detail in [Func-
tional Programming]

Let us apply map one more time. We got our angles in radians. How about
turning all your angles into degrees? You can use the Julia function rad2deg
for this purpose.
julia> rad2deg(π)
180.0

julia> rad2deg(π/2)
90.0

julia> angles_deg = map(rad2deg, angles_rad)


5-element Array{Float64,1}:
3.8795882866898173
5.842017660365961
12.477460991761141
15.772306350976407
22.56870386825631

Functions such as sum and map allow us to do what early computers did, when
they made artillery trajectory tables: perform the same calculation repeatedly.
This makes a computer different from a calculator. Calculators must be manu-
ally operated for every repeated calculation.
These functions may seem magical. Somehow they are able to look at each indi-
vidual element in an array and do something with it. How do you do that? Can
you build functions like this yourself? Yes you can!
LOOPS 19

Loops
Most programming languages today have what we call loop constructs or state-
ments. The common ones are the for­loop and the while­loop.
Now we will do something we haven’t done thus far. We will write functions
spanning multiple lines. When writing statements or functions which span mul-
tiple lines we need to inform Julia of where they begin and end. All the code be-
tween the function and end keyword is part of the function. Here is the angle
function written over multiple lines:
function angle(distance)
0.5*asin(g*distance/v^2)
end

This is more typical Julia syntax. We usually start a programming statement


with a keyword such as for or function telling Julia what sort of statement it
is. A statement is vaguely the same thing as a sentence in a natural language.
Statements can be nested. A Julia program is basically a long list of statements
in which each of the statements in the list may contain other statements.
Here we write our own sum function called addup using a for­loop
function addup(numbers)
total = 0
for num in numbers
total = total + num
end
total
end

The first statement is a function statement which Julia can figure out by look-
ing at the first keyword. The very last end keyword indicates the end of the
function statement.
The for­loop starts with the for keyword and ends with end. Every statement
between these two lines are performed multiple times. One way to think about
how a function is executed (performed) is to imagine a recursive substitution of
variables and function calls for values.

A Step by Step Evaluation of a Function


Let us look at how a function call like this is evaluated:
nums = [2, 4, 6]
total = addup(nums)

Step 1

We substitute nums for its value, the array [2, 4, 6].


total = addup([2, 4, 6])
20 WORKING WITH NUMBERS

Step 2
When calling the function we’ve got to imagine substituting the arguments of
the function with the passed value:
function addup([2, 4, 6])
total = 0
for num in [2, 4, 6]
total = total + num
end
total
end

Step 3
Lets focus on the for­loop alone. We will successively assign the variable num a
value in the array [2, 4, 5].
for 2 in [2, 4, 6]
total = 0 + 2
end

Step 4
total has a new value 2 on next iteration.

for 4 in [2, 4, 6]
total = 2 + 4
end

Step 5
total is now 6. While num is also 6 as that is the last value in the numbers array.

for 6 in [2, 4, 6]
total = 6 + 6
end

Step 6
The last value in a function evaluates to the whole value of the function, so total
= addup(nums) becomes:

total = 12

Different Ways of Looping Over an Array


If you write e.g. for x in [2, 4, 6] then x will successively take on the values
2, 4 and 6. That is, it will take on the values of the elements in the given array.
However it is also possible to iterate over a range of numbers rather than the
values of an array. In Julia we describe a range as i:j where i is the start of
MAKE YOUR OWN MAP FUNCTION 21

the range and j is the last index (number) in the range. 2:6 is an example of a
range.
total = 0
for x = 2:6
total = total + x
end

total will get the value 2 + 3 + 4 + 5 + 6, as those are the values x will
successively assume.
You can use ranges in all sort of circumstances. You can use them instead of
specifying an array.
julia> sum(2:6)
20

julia> sum([2, 3, 4, 5, 6])


20

A more basic but equivalent way of writing the first iteration is to use a while­
loop
total = 0
x = 2
while x <= 6
total = total + x
x = x + 1
end

In this case we will keep performing the lines between while and end until the
condition x <= 6 is no longer true. So when x turns into 7, it will no longer
be true as 7 is not less than or equal to 6. We can compare numbers with these
operators:
• < less than
• > greater than
• <= less than or equal
• >= greater than or equal
• == equal to
• != not equal to

You might wonder where I am going with all this? This is prerequisite knowl-
edge to be able to explain how you can write a map function yourself. Still we
have only looked very briefly at how you access individual values in an array.

Make Your Own Map Function


We now have the building blocks to create our own map function. Let me take
you through some of the individual parts we need to make.
We get some input array, and we need to create an array for the results. Let us
make an empty array:
22 WORKING WITH NUMBERS

julia> angles = []
0-element Array{Any,1}

That doesn’t look right. It says we made an empty array to hold Any value. That
means we could put Bool, AbstractString or whatever there. Since we didn’t
put any numbers into it, Julia isn’t able to figure out what we intend to use it
for. So we need to help out Julia by writing the Type of the items as a prefix:
julia> angles = Float64[]
0-element Array{Float64,1}

We can use a for loop to iterate over distances and calculate angles.
julia> for dist in distances
push!(angles, angle(dist))
end

push! is a function which pushes values at the end of an array. You can see the
previously empty angles array has been filled with the same results as we got
previously with map.
julia> angles
5-element Array{Float64,1}:
0.06771158922454301
0.10196244313304187
0.217772776595389
0.27527867645775067
0.3938981904086532

When we put all these parts together we get our own map function. Let us call
the function transform. This map function is of course not as flexible as the
one bundled with Julia. For instance ours assume the output is always floating
point numbers.
We cannot solve that problem until we have covered more about types. The only
types you know of thus far are different types of numbers.
function transform(fun, xs)
ys = Float64[]
for x in xs
push!(ys, fun(x))
end
ys
end

Picking Array Elements


With our transform function we took every element in the array and trans-
formed it to another value, resulting in a new array of the same size.
However another useful task is to go over all the elements of an array and only
pick particular values that we like to keep. Usually that means we get an array
back which is smaller than the one we started with.
PICKING ARRAY ELEMENTS 23

When might this be useful?


The firing tables for artillery cannons are usually calculated from more compli-
cated equations that we have used here. When creating these equations, engi-
neers and scientists have to make sure that they get accurate results.
One can do that by firing the cannons repeatedly and recording the range for dif-
ferent angles. One way to estimate the range is to calculate the average distance
of several cannon balls fired at the same angle.
When writing down lots of measurements it is common that people make mis-
takes such as forgetting the decimal point, writing what looks like a 7 where
there should have been a 1 and so on.
Say these where the actual measured distances in kilometers, which should
have been recorded:

However due to sloppy writing, these are the recordings the scientists got:

We can write our own mean function to calculate the average.


mean(xs) = sum(xs)/length(xs)

Then we assign the accurate measured distances and poorly recorded distances
to two separate variables.
measured = [11.5, 10.8, 11.4, 12.1, 12.2, 10.9]
recorded = [11.5, 10.8, 71.4, 12.1, 122, 10.9]

We can then calculate the average in the REPL.


julia> mean(measured)
11.483333333333334

julia> mean(recorded)
39.78333333333334

We get a result which is almost 4x off. Imagine we got hundreds of numbers in


a table. It may not be easy to spot the numbers that are off. This is when Julia’s
filter function is handy.

Remember how map(f, xs) function applies a function f on every element in


xs? Function f takes a value x and returns another value y:

filter is similar but it expects a function that returns true or false instead of
a number. We call this a boolean value. Where do these values come from?
24 WORKING WITH NUMBERS

julia> x = 4
4

julia> x > 5
false

julia> x < 5
true

julia> x == 4
true

julia> x != 4
false

You can see that expression using comparison operators such as >, < and ==
gives true or false as result. Functions which give a boolean result instead of
a number are called predicates. Let us define a predicate to use with our filter
function.
isvalid(x) = x < 14

This is an example of using it.


julia> isvalid(11)
true

julia> isvalid(15)
false

We can use it with filter to get only valid distances and calculate the average.
julia> filter(isvalid, recorded)
4-element Array{Float64,1}:
11.5
10.8
12.1
10.9

julia> mean(filter(isvalid, recorded))


11.325

For the actual measured values this filtering will have no impact on the result.
julia> filter(isvalid, measured)
6-element Array{Float64,1}:
11.5
10.8
11.4
12.1
12.2
10.9

julia> mean(filter(isvalid, measured))


MAKE YOUR OWN FILTER FUNCTION 25

11.483333333333334

julia> mean(filter(isvalid, measured)) == mean(measured)


true

julia> mean(filter(isvalid, recorded)) == mean(recorded)


false

Make Your Own Filter Function


Based on what we have learned making the transform function we could begin
by writing this function. We are no longer transforming the input values, but
the problem with this version is that it is adding every value x in the array xs.
function pick(pred, xs)
ys = Float64[]
for x in xs
push!(ys, x)
end
ys
end

What we need is a way to make a decision on whether the x should be added or


not. To make decision on what code to run we use what is called an if­statement.
In the code segment below, the x is only added to the end of the ys array if it is
below 14.
if x < 14
push!(ys, x)
end

The code between if and end is only run if the expression x < 14 evaluates to
the boolean value true. This if-statement is equivalent to:
if isvalid(x)
push!(ys, x)
end

It does not matter what isvalid does with x as long as it evaluates to (returns)
a boolean value (true or false).
With this knowledge we can modify the pick function to make it work.
function pick(pred, xs)
ys = Float64[]
for x in xs
if pred(x)
push!(ys, x)
end
end
ys
end
26 WORKING WITH NUMBERS

In this version we use the predicate function pred passed in as first argument
with the if-statement to decide whether an element should be added or not.
Let us test out our function and see if it works.
julia> pick(isvalid, [4, 14, 18, 3, 1])
3-element Array{Float64,1}:
4.0
3.0
1.0

julia> pick(isvalid, [16, 18])


0-element Array{Float64,1}

We can compare it with Julia’s built-in filter function to see if it gives the same
result
julia> xs = [4, 14, 18, 3, 1]
5-element Array{Int64,1}:
4
14
18
3
1

julia> pick(isvalid, xs) == filter(isvalid, xs)


true
Storing Data in
Dictionaries

• Dictionary for storing key-value pairs.


• Pairs. A datatype for storing two related values.
• Tuples. Collection of one or more values of different types.
We will look at a new data type called a dictionary, by working through a code ex-
ample for converting roman numerals to decimal values and back. We will use
a dictionary to keep track of what value a letter such as I, V and X corresponds
to in the decimal system.

Roman Numerals
While roman numerals are not very practical to use today, they are useful to
learn about in order to understand number systems. In particular when pro-
gramming you will encounter various number systems.
Both Roman numerals and the binary, system used by computers, may seem
very cumbersome to use. However it often appears that way because we don’t
use the numbers as they were intended.
It is hard to make calculations using Roman numerals with pen and paper com-
pared to Arabic numerals (which is what we use). However the Romans did
not use pen and paper to perform calculations. Rather they performed their
calculations using a roman abacus.
It is divided into multiple columns. You can see the I, X and C column:
• In the I column every pebble is a 1.
• In X, every pebble represent 10.
• In C, every pebble represent 100.
Above each of these columns we got the V, L and D columns, which represent
the values 5, 50 and 500.

NOTE
The beauty of the Roman system is that you can quickly write
down exactly what the pebbles on the abacus say. Likewise it is

27
28 STORING DATA IN DICTIONARIES

Figure 7: Roman abacus

quick to arrange pebbles on a Roman abacus to match a Roman


numeral you have read. For this reason Roman numerals where
used all the way into the 1500s in Europe, long after Arabic nu-
merals had been introduced.

Let us look at how we can use this knowledge to parse roman numerals and turn
them into Arabic numerals. Put the code below into a text file and save it.
roman_numerals =
Dict('I' => 1, 'X' => 10, 'C' => 100,
'V' => 5, 'L' => 50, 'D' => 500,
'M' => 1000)

function parse_roman(s)
s = reverse(uppercase(s))
vals = [roman_numerals[ch] for ch in s]
result = 0
for (i, val) in enumerate(vals)
if i > 1 && val < vals[i - 1]
result -= val
else
result += val
end
end
result
end
THE DICT TYPE 29

Load this file into the Julia REPL environment to test it out. This is an example
of using parse_roman with different roman numerals as input.

Let us go through how the code works.

The Dict Type


We map or translate the Roman letter I, V, X etc to numbers using what is called
a dictionary. A dictionary is made up of multiple pairs.
julia> 'X' => 10 # 1
'X' => 10

julia> pair = 'X' => 10 # 2


'X' => 10

julia> dump(pair) # 3
Pair{Char,Int64}
first: Char 'X'
second: Int64 10

julia> pair.first # 4
'X': ASCII/Unicode U+0058 (category Lu: Letter, uppercase)

julia> pair.second
10
30 STORING DATA IN DICTIONARIES

1. A pair of the letter X and 10.


2. Pairs can be stored in a variable and examined later.
3. Dump allows us to look at the fields of any value. The fields of a pair value
in this case.
4. Extracting the first value in the pair.
We provide a list of these pairs to create a dictionary.
The code below shows how we create a dictionary to map letters used by Roman
numerals to their corresponding decimal value.
julia> roman_numerals =
Dict('I' => 1, 'X' => 10, 'C' => 100,
'V' => 5, 'L' => 50, 'D' => 500,
'M' => 1000)
Dict{Char,Int64} with 7 entries:
'M' => 1000
'D' => 500
'I' => 1
'L' => 50
'V' => 5
'X' => 10
'C' => 100

When used in a dictionary we refer to the first values in each pair as the keys in
the dictionary. The second values in each pair form the values of the dictionary.
So I, X and C are keys, while 1, 10 and 100 e.g. are values.
We can ask a dictionary for the value corresponding to a key. This takes a Ro-
man letter and returns the corresponding value.
julia> roman_numerals['C']
100

julia> roman_numerals['M']
1000

Looping over Characters


We can use this dictionary to help us convert roman letters to corresponding
values. In the parse_roman function we do this conversion with [ro-
man_numerals[ch] for ch in s]. This is called an array comprehension.
We will look at a regular for-loop doing exactly the same thing first. This makes
it easier to understand what the array comprehension does.
In this example we start with roman numerals “XIV” which we want to convert.
julia> s = "XIV"
"XIV"

julia> vals = Int8[]


0-element Array{Int8,1}
ENUMERATE 31

julia> for ch in s
push!(vals, roman_numerals[ch])
end

julia> vals
3-element Array{Int8,1}:
10
1
5

“XIV” is turned into the array of values [10, 2, 5] named vals. However the
job is not quite done. Later we need to combine these values into one number.

Before converting input strings, our code turns every letter into uppercase. “xiv”
would not get processed correctly, because all the keys to our dictionary are
uppercase.

We reverse the input, so we can process the lowest values first.

julia> s = "xiv"
"xiv"

julia> s = reverse(uppercase(s))
"VIX"

Enumerate
In our for-loop we need to keep track of the index of the value val of each loop
iteration. To get the index we use the enumerate function. That is what you
see used in the line for (i, val) in enumerate(vals). Here is a simple
demonstration of how it works:

julia> collect(2:3:11)
4-element Array{Int64,1}:
2
5
8
11

julia> collect(enumerate(2:3:11))
4-element Array{Tuple{Int64,Int64},1}:
(1, 2)
(2, 5)
(3, 8)
(4, 11)

The collect function will simulate looping over something, just like a for-loop.
Except it will collect all the values encountered into an array, which it returns.
So you can see with enumerate you get a pair of values upon each iteration: an
integer index and the value at that index.
32 STORING DATA IN DICTIONARIES

Conversion
We cannot simply add up the individual roman letters converted to their corre-
sponding values. Consider the roman number XVI. It turns into [10, 5, 1].
We could add that and get the correct result 16. However XIV is supposed to
mean 14, because with Roman numerals when you got a smaller value in front
of a larger one, such as IV, then you subtract the smaller value from the larger.
So we cannot just sum up the corresponding array [10, 1, 5]. Instead we
reverse it and work our way upwards. At every index we ask if the current value
is lower than the previous one. If it is, we subtract from the result. Otherwise
we add.
if i > 1 && val < vals[i - 1]
result -= val
else
result += val
end

That is what val < vals[i - 1] does. It compares the current value val, to the
previous value vals[i -1]. result is used to accumulate the value of all the
individual Roman letters.

Using Dictionaries
Now that we have looked at a practical code example utilizing the dictionary
type Dict in Julia, let us explore some more ways of interacting with a dictio-
nary.

Creating Dictionaries
There are a multitude of ways to create a dictionary. Here are some examples.
Multiple arguments, where each argument is a pair object:
julia> Dict("two" => 2, "four" => 4)
Dict{String,Int64} with 2 entries:
"two" => 2
"four" => 4

Pass an array of pairs to the dictionary constructor (a function named the same
as the type it makes instances of).
julia> pairs = ["two" => 2, "four" => 4]
2-element Array{Pair{String,Int64},1}:
"two" => 2
"four" => 4

julia> Dict(pairs)
Dict{String,Int64} with 2 entries:
"two" => 2
"four" => 4
USING DICTIONARIES 33

Pass an array of tuples to the dictionary constructor. Unlike pairs, tuples may
contain more than two values. For dictionaries they must only contain a key
and a value though.
julia> tuples = [("two", 2), ("four", 4)]
2-element Array{Tuple{String,Int64},1}:
("two", 2)
("four", 4)

julia> Dict(tuples)
Dict{String,Int64} with 2 entries:
"two" => 2
"four" => 4

Create an empty dictionary and fill it up later with key-value pairs.


julia> d = Dict()
Dict{Any,Any} with 0 entries

Notice the {Any, Any} part. This describes what Julia has inferred is the type
of the key and value in the dictionary. Compare this with the other examples
where you see {String, Int64}. When you provide some keys and values upon
creation of the dictionary, Julia is able to guess the type of the key and value.
When you create an empty dictionary, Julia cannot guess the types anymore
and assumes the key and value could be Any type.
You can however explicitly state the type of the key and value:
julia> d = Dict{String, Int64}()
Dict{String,Int64} with 0 entries

julia> d["five"] = 5
5

Which means if you try to use values of the wrong type for key and value, you
will get an error (something called an exception is thrown). In this case we are
trying to use an integer 5, as key when a text string key is expected.
julia> d[5] = "five"
ERROR: MethodError: Cannot `convert` an object of type Int64 to an object of type String
Closest candidates are:
convert(::Type{T}, !Matched::T) where T<:AbstractString at strings/basic.jl:209
convert(::Type{T}, !Matched::AbstractString) where T<:AbstractString at strings/basic.jl:210
convert(::Type{T}, !Matched::T) where T at essentials.jl:171

Types will be covered in the [More on Types] chapter.


Sometimes you get keys and values in separate arrays. However you can still
combine them into pairs, to create dictionaries using the zip function.
julia> words = ["one", "two"]
2-element Array{String,1}:
"one"
"two"
34 STORING DATA IN DICTIONARIES

julia> nums = [1, 2]


2-element Array{Int64,1}:
1
2

julia> collect(zip(words, nums))


2-element Array{Tuple{String,Int64},1}:
("one", 1)
("two", 2)

julia> Dict(zip(words, nums))


Dict{String,Int64} with 2 entries:
"two" => 2
"one" => 1

Element Access
We have already looked at one way of getting and setting dictionary elements.
But what happens if we try to retrieve a value for a key that does not exist?
julia> d["seven"]
ERROR: KeyError: key "seven" not found

We get an error. We can of course simply add it:


julia> d["seven"] = 7;

julia> d["seven"]
7

But how do we avoid producing an error when we are not sure if a key exists?
One solution is the get() function. If the key does not exist, a sentinel value is
returned instead. The sentinel can be anything. The example below uses -1.
julia> get(d, "eight", -1)
-1

Or we could simply ask the dictionary if it has the key.


julia> haskey(d, "eight")
false

julia> d["eight"] = 8
8

julia> haskey(d, "eight")


true
Shell Scripting

• Directory Navigation using Julia functions.


• Filesystem Operations such as copying, moving or finding files.
• Navigate inside files using seek, mark and reset.
• Unix Pipes. Working with external processes in similar fashion to Bash1
and other shell environments.
In the previous chapter we dealt with the IO system in general and focused on
functions which work on any IO device.
However in this chapter emphasis is on file and directory operations you com-
monly do from a Unix shell script. Julia happens to be a good language for shell
scripting. Tasks done in a Unix shell script can equally well be done with Julia.
To replicate shell functionality, we need to learn more about how we navigate
the filesystem, pipe data between running programs (processes) and search in-
side files with Julia.

Working with Files and Directories


Before looking into the Julia APIs it can be useful have a short introduction
to Unix shell tools. We will use this directory hierarchy to practice filesystem
operations:
animals/
├── invertebrates
│   ├── arthropods
│   │   ├── crustaceans
│   │   └── insects
│   ├── flatworms
│   └── molluscs
└── vertebrates
├── amphibians
├── birds
├── fish
└── mamals
1
The Bourne Again Shell, which is a play on the name Bourne Shell.

35
36 SHELL SCRIPTING

It shows a taxonomy of animals, grouped into various subgroups. You can cre-
ate this hierarchy yourself either using a graphical file manager or the command
line.
Once done, you can use the Unix command line tools to go into the amphibians
directory and create an empty file called frog:
$ cd animals/vertebrates/amphibians
$ touch frog
$ cd ..
$ cd ..
$ cd ..

We can do exactly the same from the Julia REPL:


julia> cd("animals/vertebrates/amphibians")
julia> touch("salamander")
"salamander"

julia> cd("..")
julia> cd("..")
julia> cd("..")

After this the animals directory will look like this:


animals/
├── invertebrates
│   ├── arthropods
│   │   ├── crustaceans
│   │   └── insects
│   ├── flatworms
│   └── molluscs
└── vertebrates
├── amphibians
│   ├── frog
│   └── salamander
├── birds
├── fish
└── mamals
However it is cumbersome to use cd("..") every time to return to our starting
directory. Fortunately there is a variant of cd which takes a function f as first
argument. Function f is called after you have switched directory. After f has
completed, then cd will jump back to the original location in the filesystem.
We can demonstrate how this works with the pwd function which returns the
current working directory. That is the directory affected by your directory and
file commands.
julia> pwd()
"~"
WORKING WITH FILES AND DIRECTORIES 37

julia> cd(pwd, "animals/vertebrates/amphibians")


"~/animals/vertebrates/amphibians"

julia> pwd()
"~"

You can see in this example, that when cd calls pwd we are in the ani-
mals/vertebrates/amphibians location, but afterwards we are back to our
home directory ~. Please note I have edited the output of pwd for clarity. You
will likely see a full path and not ~.
This example is not very useful, so let us pair cd with a more useful function,
such as readdir. This is the Julia equivalent of the Unix shell command ls:
julia> readdir("animals")
2-element Array{String,1}:
"invertebrates"
"vertebrates"

julia> readdir("animals/vertebrates/amphibians")
2-element Array{String,1}:
"frog"
"salamander"

If we combine it with cd you can get a better sense of how useful it is to take a
function as an argument.
julia> cd(readdir, "animals/vertebrates/amphibians")
2-element Array{String,1}:
"frog"
"salamander"

Remember whenever the first argument is a function we can use the do-end
form instead. This makes it easy for us to add more files using the touch func-
tion.
cd("animals/vertebrates/mamals") do
touch("cow")
touch("human")
end

This can be done more succinct by using the foreach function. foreach will
apply a function on every element in a collection.
cd("animals/vertebrates/birds") do
foreach(touch, ["crow", "seagul", "mockingjay"])
end

We don’t have any insects yet. They belong under arthropods. If we don’t know
if that directory already exists we can use:
mkpath("animals/invertebrates/arthropods/insects")

To add crabs e.g. we need the crustaceans group. To avoid writing the same
paths out multiple times one can store it in a variable, and use joinpath to
38 SHELL SCRIPTING

construct new paths.


julia> arthropods = "animals/invertebrates/arthropods/"
"animals/invertebrates/arthropods/"

julia> crustaceans = joinpath(arthropods, "crustaceans")


"animals/invertebrates/arthropods/crustaceans"

By keeping the path in a variable we can easily reuse it in different circum-


stances.
mkpath(crustaceans)
cd(crustaceans) do
touch("bever")
end

Oops they are not supposed to be there. How to delete?


cd(crustaceans) do
rm("bever")
end

After all these file and directory manipulations, we should have a hierarchy of
files and directories looking like this:
animals
├── invertebrates
│   ├── arthropods
│   │   ├── crustaceans
│   │   └── insects
│   ├── flatworms
│   └── molluscs
└── vertebrates
├── amphibians
│   ├── frog
│   └── salamander
├── birds
│   ├── crow
│   ├── mockingjay
│   └── seagul
├── fish
└── mamals
├── cow
└── human

Working with Paths


This hierarchy is useful to demonstrate how various functions for working with
directory paths behave in Julia. basename gives the last entry in a path:
REORGANIZING ASSETS EXAMPLE 39

julia> basename("animals/vertebrates/mamals/human")
"human"

With dirname we can get the directory part of the path. For instance, what
directory the file human is inside:
julia> mamals = dirname("animals/vertebrates/mamals/human")
"animals/vertebrates/mamals"

As seen before we can join a directory path with a file to create a file path:
julia> joinpath(mamals, "human")
"animals/vertebrates/mamals/human"

Julia has various function to get the absolute path, relative path and home di-
rectory:
julia> abspath("animals")
"/Users/erikengheim/animals"

julia> relpath("animals/vertebrates/../invertebrates")
"animals/invertebrates"

julia> abspath(homedir())
"/Users/erikengheim"

Reorganizing Assets Example


Say we want to make an application where users can click through lists over an-
imals and look at pictures of the animals with a description below. Our current
storage structure isn’t setup for this. We just have a single file to describe each
animal. But really what we want is an image file and a description file.
But what if we already have a large library containing data about animals, not
structured this way? How do we reorganize it?
We could do it by hand. Move directories and files around using a file manager.
But we are programmers, and we can get the computer to do all the repetitive
work of reorganizing files for us. So let us look at a program that can do this.
We want to create a directory structure similar to the one seen below:
animals
└── vertebrates
├── amphibians
│   ├── frog
│   │   ├── description.txt
│   │   └── looks.jpg
│   └── salamander
│   ├── description.txt
│   └── looks.jpg
├── birds
40 SHELL SCRIPTING

│   ├── crow


│   │   ├── description.txt
│   │   └── looks.jpg
│   ├── mockingjay
│   │   ├── description.txt
│   │   └── looks.jpg
│   └── seagul
│   ├── description.txt
│   └── looks.jpg
├── fish
└── mamals
├── cow
│   ├── description.txt
│   └── looks.jpg
└── human
├── description.txt
└── looks.jpg
Our solution is to split this into two separate problems:
• replace_animal(animal) function replaces an animal file with a direc-
tory containing a description.txt file, describing the animal and a
looks.jpg image file, showing the animal.
• visitfiles(fn, root) which traverses the directory hierarchy looking
for files. Each time a file is found the fn function is applied to that file.
Thus visitfiles will find the files to change, and replace_animal will per-
form the actual transformation.
function visitfiles(fn, root::AbstractString)
if isfile(root)
fn(root)
return
end

cd(root) do
for file in readdir()
visitfiles(fn, file)
end
end
end

This is a recursive function, meaning it calls itself. Recursive functions tend to


greatly simply processing of hierarchical data structures. You will find it used
for binary trees, linked lists, graphs and many other data structures. This is how
the function works:
1. Check if root is a file. If it is, perform our animal replacement. To keep
things flexible we avoid hardcoding that. Instead a function fn is taken
REORGANIZING ASSETS EXAMPLE 41

as an argument, allowing the caller to specify what should happen to each


file.
2. If root is not a file, we assume it is a directory and we enter that directory
with cd(root).
3. With readdir() we get a list of directory entries. which we process in
turn.
4. For each entry we want to check if it is a file or a directory to step into.
However visitfiles already does that, so we can just call it over again.
Hence we get a recursion.
We can run this in the REPL to check whether it finds the correct files. If you
program on a Mac, you would get the same problem as I have popping up, which
is a bunch of .DS_Store files.
julia> visitfiles("animals") do file
println(joinpath(pwd(), file))
end
~/animals/.DS_Store
~/animals/invertebrates/.DS_Store
~/animals/invertebrates/arthropods/.DS_Store
~/animals/vertebrates/.DS_Store
~/animals/vertebrates/amphibians/frog
~/animals/vertebrates/amphibians/salamander
~/animals/vertebrates/birds/crow
~/animals/vertebrates/birds/mockingjay
~/animals/vertebrates/birds/seagul
~/animals/vertebrates/mamals/cow
~/animals/vertebrates/mamals/human

Fortunately visitfiles provides a convenient way to get rid of these files:


visitfiles("animals") do file
if file == ".DS_Store"
rm(file)
end
end

Next we create a function for replacing animal files.


function replace_animal(animal::AbstractString)
rm(animal)
mkdir(animal)
cd(animal) do
foreach(touch, ["description.txt", "looks.jpg"])
end
end

With any function manipulating the filesystem, it is useful to backup your files
first or simply print out the actions which would have been performed, rather
than actually performing them. In this case the latter is not practical since we
actually need to create a directory to enter. Here is a walkthrough:
42 SHELL SCRIPTING

1. We use rm(animal) to remove each animal file.


2. mkdir(animal) is used to create a directory with a name identical to the
file removed.
3. With cd(animal) we enter this directory and use touch to create the de-
scription and image files.
In a more realistic implementation we would likely have copied this image file
from somewhere else.
We can do a small-scale test of the function first, to see if it works as expected:
julia> touch("foobar")
"foobar"

julia> replace_animal("foobar")

shell> ls foobar
description.txt looks.jpg

When one is confident it works, you can visit all the files and perform a replace:
julia> visitfiles(replace_animal, "animals")

Navigate Inside Files


Files unlike a lot of other IO devices, have the notion of a position inside the
file. You can move to specific positions within the file and record the current
position. Later you can revert to a previously recorded position. This is done
with the functions:
• seekstart(io) moves to the beginning of the file.
• seekend(io) move to the end of the file.
• seek(io, pos) move position pos in file.
• mark(io) records current position in file.
• reset(io) to position previously marked.

When are these functions useful? Remember how we created rocket engines
and tanks by reading CSV files? In this case we processed every line and ev-
ery line produced an object. In such cases, seeking through a file and marking
positions has little value.
However in other cases you work with larger files where there are only particu-
lar parts you are interested in or the data isn’t clearly structured by lines. For
instance when parsing a source code file, a statement doesn’t necessarily limit
itself to a line.
Let use the construction of the book you are reading as an example. It was
originally written in markdown, but there are many flavors of markdown and
you may have to switch from one type of markdown to another. In this case
most of the text can be preserved but there are particular syntactic structures
you want to change.
For instance in Pandoc or Github style markdown, inline math equation are
written as:
NAVIGATE INSIDE FILES 43

$y = 10x + b$

While in Markua style markdown, inline math equations would be written as:
`y = 10x + b`$

Converting this kind of text can be tricky, because you have to distinguish inline
math which uses a single $ and math blocks which use double dollar signs $$,
like this:
$$y = 10x + b$$

The function below takes the name of a file, opens that file, and search for inline
math equations, replace them and write the result back to file.
function relace_inline_math(filename)
out = IOBuffer()
open(filename) do io
while !eof(io)
s = readuntil(io, '$')
write(out, s)

if s[end] == '`'
write(out, '$')
continue
end

mark(io)
s = readuntil(io, '$')

if isempty(s)
reset(io)
s = readuntil(io, raw"$$")
write(out, '$')
write(out, s)
write(out, raw"$$")
continue
end

write(out, '`')
write(out, s)
write(out, raw"`$")
end
end
seekstart(out)
s = read(out, String)
open(filename, "w") do io
write(io, s)
end
end

Generally I would recommend against writing functions of this length. How-


ever one has to consider complexity. Most lines have very low complexity. The
44 SHELL SCRIPTING

size is also partially an outcome of the imperative nature of the code: State is
repeatedly mutated. More functional oriented code tends to be easier and more
natural to write as short functions.
Anyway, let us talk through this function. The bulk of the code is made up of the
while !eof(io) loop which keeps reading from the file. The loop ends when
we have reached EOF (End Of File).

Simplify Loops with Continue


One trick, which is common to use in big loops like this, is to use the continue
statement. By using this we can avoid deep nesting of if-else-statements inside
loops. When Julia hits a continue statement inside a loop it will jump to the
beginning of the loop.
The first case, where we use this strategy, is when we try to figure out if we have
encountered Markua styled inlined math:
if s[end] == '`'
write(out, '$')
continue
end

Say we read this line of text in the markdown document:


`y = 10x + b`$

In this case we would have:


s = "`y = 10x + b`"

As you can see checking the last character of s would verify whether this was
the case. In this case we are lucky because we don’t need any further processing.
There is no need to continue running the reset of the code, we can just skip to
the beginning of the loop again, hence the use of continue.

Readuntil
Most of the code is built around skipping through the text until the next inter-
esting part using the readuntil function.
It will be tricky to understand this code without a clarification of how exactly this
function works. We will go through some simple examples to demonstrate how
it. The IOBuffer type gives a practical solution to simulating the interactions
with a file. It allows us to treat a simple text string as if it was the contents of a
file.
julia> buf = IOBuffer("two + four = six");

julia> readuntil(buf, '+')


"two "

julia> readuntil(buf, '+')


" four = six"
NAVIGATE INSIDE FILES 45

julia> eof(buf)
true

julia> readuntil(buf, '+')


""

Notice that after readuntil has reached the end of the IO stream object (EOF),
it will just keep returning empty strings.
Also observe that the character +, which we read until, gets swallowed but not
included in the string returned. We can include it if we want to.
We use seekstart to move to the start of the stream, so we can repeat the read-
ing, this time with keep=true, to retain the character we are reading until.
julia> seekstart(buf);

julia> readuntil(buf, '+', keep=true)


"two +"

julia> readuntil(buf, '+', keep=true)


" four = six"

In our relace_inline_math function you can see that we read until the dollar
sign without keeping it in the returned string s.
s = readuntil(io, '$')

The reason for this is that we are replacing expressions enclosed with dollar
symbols with backticks. Thus we don’t need to save the dollar symbols.

Mark and Reset


After checking if we have encountered an inline math expression which has al-
ready been converted, we want to make sure we have not hit a math block en-
closed with double dollar signs $$.
To deal with this we save the current position in the IO stream, before reading
until the next dollar sign. If the string returned is empty, it must mean a dollar
sign immediately followed and we are dealing with a math block.
mark(io)
s = readuntil(io, '$')

if isempty(s)
reset(io)
s = readuntil(io, raw"$$")
write(out, '$')
write(out, s)
write(out, raw"$$")
continue
end
46 SHELL SCRIPTING

We deal with math blocks by locating the end of the block using readuntil(io,
raw"$$"). Other than that we write to our output stream exactly what we read.
Afterwards we are done and can jump to beginning of loop with continue.

Seekstart
Our output is an IOBuffer named out which we keep writing our transformed
text to. When done processing we want to write the contents of this IOBuffer
to the same file as the one we read from. That is accomplished with this code
segment:
seekstart(out)
s = read(out, String)
open(filename, "w") do io
write(io, s)
end

Notice that we use seekstart(out) before calling read. That is because at this
point we are at the end of the IO stream object. Any attempt at reading from it
would produce an empty string. There is nothing at the end. We need to move
to the beginning and read from there.

Reading and Writing From External Processes


IO objects are not limited to files or sockets, they can also be stdin and stdout of
external processes. Sometimes a Unix command offers important functionality
not present in Julia. Thus we need a way of sending information back and forth
between Julia and external processes.
Say we want to search for particular files among our animal directories. E.g. a
list of all the .jpg files. This could be done with the Unix find command:
$ find animals -type f -name "*.jpg"
animals/vertebrates/mamals/human/looks.jpg
animals/vertebrates/mamals/cow/looks.jpg
animals/vertebrates/amphibians/frog/looks.jpg
animals/vertebrates/amphibians/salamander/looks.jpg
animals/vertebrates/birds/mockingjay/looks.jpg
animals/vertebrates/birds/seagul/looks.jpg
animals/vertebrates/birds/crow/looks.jpg

But how can this functionality be utilized from Julia instead of reimplementing
the functionality from scratch? Let me begin by showing you a code example.
function findfiles(start, glob)
readlines(`find $start -type f -name $glob`)
end

This shows the function in action:


julia> files = findfiles("animals", "*.jpg");

julia> files[1:3]
READING AND WRITING FROM EXTERNAL PROCESSES 47

3-element Array{String,1}:
"animals/vertebrates/mamals/human/looks.jpg"
"animals/vertebrates/mamals/cow/looks.jpg"
"animals/vertebrates/amphibians/frog/looks.jpg"

julia> files = findfiles("animals", "*.txt");

julia> files[end-3:end]
4-element Array{String,1}:
"animals/vertebrates/amphibians/salamander/description.txt"
"animals/vertebrates/birds/mockingjay/description.txt"
"animals/vertebrates/birds/seagul/description.txt"
"animals/vertebrates/birds/crow/description.txt"

Notice we are able to read the output, from the process, as if it was a regular file.
Using readlines we even get an array of strings, we can easily slice and dice.
Let us look at a simple example to better understand how this works:
julia> dir = "animals"
"animals"

julia> cmd = `ls $dir`


`ls animals`

julia> typeof(cmd)
Cmd

Notice this kind of looks like string interpolation. The value of the variable dir
gets interpolated with $dir. However the backticks cause a Cmd object rather
than a String object to be created.
A Cmd object can be opened and read from just like a regular file:
julia> io = open(cmd)
Process(`ls animals`, ProcessRunning)

julia> typeof(io)
Base.Process

julia> readline(io)
"invertebrates"

julia> read(io, Char)


'v'

julia> read(io, Char)


'e'

julia> read(io, Char)


'r'

julia> read(io, String)


48 SHELL SCRIPTING

"tebrates\n"

julia> close(io)

The io object returned when we open a Cmd object is of type Process. Remem-
ber the IO type hierarchy we showed before. It shows that Process is a concrete
type at the bottom of this hierarchy.

IO

print(io, data) Allow code


show (io, data) agnostic to
specific IO
read (io, type) devices
write(io, data)

haskey(io, key)
get(io, key default)

AbstractPipe IOStream IOBuffer

close(io) close(io) IOBuffer(string)


eof(io) eof(io) take!(io)
read(io) position(io)
write(io, bytes) seek(io, pos)
skip(io, delta) Treat a string like a file
Wrapper
around IO
read(io, data)
object to add
readline(io)
meta data
write(io, bytes)

IOContext Process
Regular filesystem files
Read and write to
IOContext(io, properties)
external process
haskey(io, key)
get(io, key default)

Figure 8: Type hierarchy for IO types.

Safety of Calling External Commands


If you come from other programming languages such as Python, Ruby or Perl
you may have learned that using backticks to run shell commands is a big no-no,
however in Julia this is not the case. Let me explain why.
In e.g. Perl and Ruby using backticks will cause a shell process to be spawned,
which will interpret the text inside the backticks as if you wrote it in the shell.
Hackers can exploit this, by throwing in special characters that the shell inter-
pret in a particular way the writer of the original code had not intended.
However in Julia, what you write between the backticks does not get passed to
a Unix shell. Instead Julia has its own parser, that parses this code and cre-
ates a Cmd object. This makes Julia nicer for shell scripting. You never have to
quote variables, and you can easily use ranges and arrays. Let us look at some
examples:
julia> elements = [3, 8, 10, "hello", false];
READING AND WRITING FROM EXTERNAL PROCESSES 49

julia> run(`echo $elements`);


3 8 10 hello false

julia> range = 1:2:12


1:2:11

julia> run(`echo $range`);


1 3 5 7 9 11

In this case we are using run instead of readline. This is basically the same as
running the shell command and getting the output sent to stdout (your terminal
window).

Pipes
In a Unix shell we have an awesome concept called pipes. These allow you to
pipe the output of one command into the input of another command. Here is a
demonstration of this concept:
$ ls animals/vertebrates
amphibians birds fish mamals

$ ls animals/vertebrates | sort -r
mamals
fish
birds
amphibians

The ls command sends a list of filesystem entries to stdout. The sort com-
mand will take everything you write on the keyboard (stdin) and send it sorted
to stdout (console).
However by using the pipe symbol | we connect the stdout of ls to the stdin of
sort -r. The sort command has no idea that its input is coming from another
command.
Pipes gave a lot of flexibility to early Unix systems. Small programs doing a
single thing could be chained together using pipes to create new functionality.
We can create these sort of pipes between Julia Cmd objects as well:
julia> dir = "animals/vertebrates"
"animals/vertebrates"

julia> pipe = pipeline(`ls $dir`, `sort -r`)


pipeline(`ls animals/vertebrates`, stdout=`sort -r`)

julia> io = open(pipe);

julia> readline(io)
"mamals"
50 SHELL SCRIPTING

julia> readline(io)
"fish"

julia> readline(io)
"birds"

julia> close(io)

In fact we can chain together any number of commands:


julia> ls = `ls $dir`
`ls animals/vertebrates`

julia> rsort = `sort -r`;

julia> upper = `tr a-z A-Z`;

julia> run(pipeline(ls, rsort, upper));


MAMALS
FISH
BIRDS
AMPHIBIANS

Environment Variables
Another important part of working with the Unix shell is environment variables.
These are accessible through a special global dictionary called ENV.
julia> ENV["JULIA_EDITOR"]
"mate"

julia> ENV["SHELL"]
"/usr/local/bin/fish"

julia> ENV["TERM"]
"xterm-256color"

julia> ENV["LANG"]
"en_US.UTF-8"

Environment variables can be useful in many contexts, not just when working
with the shell. For instance the text editor, TextMate which I typically use for
programming, has a plugin-system based around:
• Stdin and stdout redirection.
• Environment variables.
A plugin-script basically reads input from stdin and writes output to stdout.
In addition information can be conveyed from TextMate to the plugin-script
through environment variables. Scripts launched by TextMate will see these
environment variables. You don’t see them in your regular shell.
ENVIRONMENT VARIABLES 51

This is based on the Unix behavior of how running programs (processes) inher-
ent their environment from their parent process (the processes that spawned
them). Here are some of the environment variables used by TextMate:

• TM_CURRENT_LINE Current line text. The line the caret is on.


• TM_CURRENT_WORD Word at location of caret.
• TM_SELECTED_TEXT Currently selected text.
• TM_LINE_INDEX Position of caret on line.
• TM_LINE_NUMBER Line number at caret position.

To use Julia code in your plugin, you have turn the source code file into an
executable script.

Turning Scripts Into Executables


Unix-like operating systems, such as Linux and macOS, allow you to make a file
containing source code executable. You do this by putting a line in the begin­
ning of the source code which informs the operating system which interpreter
should run the script:

#!/usr/local/bin/julia

row = ENV["TM_LINE_NUMBER"]
col = ENV["TM_LINE_INDEX"]
println(row, ", ", col)

The first line has to start with a hashbang #!, next comes the location of the
interpreter to run the script. Because the hash symbol # marks the beginning of
comments in most script languages including Julia, this line is ignored by the
interpreter executing the code.

NOTE Interpreters and Julia


While we refer to Julia as an interpreter here, that is technically
incorrect, since Julia is a just-in-time compiled. However running
source code directly has traditionally been done by interpreters
such as Sh, Bash and Perl. So the hashbang #! was made for
interpreters.

Another variant commonly used when you don’t want to hardcode the location
of the interpreter is to use the env command:

/usr/bin/env julia

In this case the OS will use the Julia executable which can be located using the
PATH environment variable.

Our example script is not very useful, other than to demonstrate how a plugin-
system can work. The picture below shows the plugin editor for TextMate,
where the code for the plugin has been added.
52 SHELL SCRIPTING

In TextMate a plugin is referred to as a bundle. As you can see we have made


a plugin called “Erik’s Bundle,” which is made up of various command you can
execute.
A command called where has been added. The drawer sticking out on the left
describes how the Julia script implementing where is called. We could have
specified a key combination but in this case, we are using a “Tab trigger.” The
trigger is specified as “where.” This means you type “where” inside an editor
window and hit tab to trigger the command.
This will cause our script to run. TextMate will fill into TM_LINE_NUMBER what
line number the caret is on. In TM_LINE_INDEX it will put the column it is on
that line. Then it will execute the script.
The script writes this info to stdout. We have configured the TextMate com-
mand to output what it writes to stdout to the text editor window where the
caret is located.
The beauty of this system for writing plugins is that you can write your plugins
in any language you please. Julia did not exist when TextMate got created, but
that doesn’t matter. The other benefit is that you don’t have to have access to
TextMate to test the script. Recreating the environment of TextMate is easy.
Let us do that. Put the code into a file called where.jl or whatever you pre-
fer. Next you need to allow this file to be executed, but toggling the execution
privilege:
$ chmod +x where.jl

Before running the script we want to simulate TextMate by setting the


TM_LINE_NUMBER and TM_LINE_INDEX. Let us pretend the caret is on line 8 and
COMMAND LINE ARGUMENTS 53

column 3:
$ export TM_LINE_NUMBER=8
$ export TM_LINE_INDEX=3

How you set environment variables will differ depending on the shell you use.
The example above is from the Bash2 shell as that is widely used. If you used
Fish3 shell instead it would be:
$ set -x TM_LINE_NUMBER 8
$ set -x TM_LINE_INDEX 3

We can then run the script from the command line and see what output it gives.
$ ./where.jl
8, 3

Now you may wonder why I picked TextMate as an example, given that it is not
a widely used text editor and only works on macOS. It is very simple: Most other
text editors have their plugin-system tied to specific programming languages.

Command Line Arguments


Shell commands can usually take a number of arguments:
find animals -type f -name "*.jpg"

In this example animals, -type, f, -name and "*.jpg" are the command line
arguments. If you want to create a shell command by writing a Julia script you
need a way of obtaining these arguments.
This is done in a very similar way to how we obtained environment variables.
Instead of a dictionary we have a global variable named ARGS, containing all the
arguments.
Here is a simple demonstration of replicating the Unix cat command:
#!/usr/bin/env julia
for file in ARGS
s = read(file, String)
print(s)
end

We loop over each element in the ARGS array which should contain a file name.
Then we open the file and print its output.
Say we put this code inside a file called cat.jl, we have to give it execute per-
mission:
$ chmod +x cat.jl

To test the command I made two files foo.txt and bar.txt, with a single line
in each. You can test with whatever files you like.
2
The Bourne Again Shell, which is a play on the name Bourne Shell.
3
Fish is a less known user-friendly shell.
54 SHELL SCRIPTING

$ ./cat.jl foo.txt
foo text

$ ./cat.jl bar.txt
bar text

$ ./cat.jl foo.txt bar.txt


foo text
bar text

The reason why you need to put a ./ in front of the file to execute it is because,
we have not placed it in a location stored in the PATH environment variable. If
you put cat.jl in for instance /usr/local/bin or another location which the
OS typically search for executable files then you would not need the ./ prefix.

Why Use Julia Instead of Bash?


Shell environments such as Bash, Zsh, Fish and Sh where made for interacting
with files and processes. So why would you use Julia instead?
These tools are fine for very short scripts, but as soon as you go much beyond
5-6 lines it will almost always be easier with a proper programming language
such as Julia.
For instance if we look at this example of string manipulation in Bash it is not
immediately obvious how string manipulation works:
s="Hello World"
echo ${s/World/Mars}
echo ${s:3}

If we write the same in Julia it is far more obvious what is being done:
s = "Hello World"
println(replace(s, "World" => "Mars"))
println(s[4:end])

Using Julia you get access to superior handling of arrays, and the ability to do
set operations, and write proper functions.
The major downside is that Julia is not installed on every operating system.
However Julia programs can be ahead-of-time compiled for easier distribution.
We will not cover that in this book as that is a more advanced topic. Anyone
interested should explore the PackageCompiler package.

You might also like