Introduction To Linux
Introduction To Linux
Wits Bioinformatics
Introduction to Linux
Scott Hazelhurst
University of the Witwatersrand
www.bioinf.wits.ac.za/courses/linux
February 2015
Contents
1 Introduction 1
5 Process Control 15
1 Introduction
Material can be found at www.bioinf.wits.ac.za/courses/linux
1
Operating system function
Acts as a software layer to provide access to the underlying hardware
• Higher-level of abstraction
• Sharing of resources
• Memory
• Files
• Processing power
• Communication
• Protection
Examples
Unix-like
• Unix, AIX, Solaris
• Free BSD
• MacOS X
Windows
• XP
• Windows 7,8
Lots of others
Historical
• IBM 370; OS/2
Mobile
• OSE Symbian
Safety-critical systems:
• Integrity
• OSEK
• Good quality
• Free
Distributions
Many distributions (flavours) of Linux
Examples
Ubuntu, RedHat, Fedora, Suse, Debian, Scientific Linux, Centos, Mandrake
Can download from www.mirror.ac.za
• Dual boot
• Multi-boot
• Virtualisation
– VirtualBox
– Parallels/VMWare
3
Many different CLI shells – allows interation with OS.
• bash/sh
• csh/tcsh
• many others
On the whole very similar Usually a program like Terminal or xterm that provides access
• Many shell languages – can be used as a programming language
GUI
• Easier to learn
• More intuitive
• Easier mental models
• Memory cues
• Quicker for many things
CLI
• you may not have an option
• More powerful, control
• Great for repetitive tasks
4
Example 1: run water
CLI Equivalent
Example 2
Move file Documents/january.txt to Data/feb.txt
• mv Documents/january.txt Data/feb.txt
Example 3
Suppose you have directories /opt/data/exp/YY/text/local/control
Copy the files xxxx-YY-mar-ddd.eXXX from all these directories into a directory /tmp/exp_data/
march
• GUI?
• CLI:
cp /opt/data/exp/*/text/local/control/*-mar*.e* \
/tmp/exp_data/march
5
Example 4
You have new data in a file myseq.fa and a directory db containing 1875 files.
• Run the water program 1875 times to compare your myseq.fa file against each of the files in db in
turn.
The command options and ordering of command parameters may differ slightly from version to version
of Unix so it is best to use the on-line help provided by Unix to determine the exact available options and
parameter order.
A small note on syntax convention. In Unix, the fullstop character does not have any significant
purpose within file names and may appear several times in a single file name (and need not occur at all).
However,
• To make life easier for ourselves we adopt a conventions of naming files – do .py is used for Python
programs, .c is used for C programs, .pdf is used for PDF files, .tex is used for LATEX files, and so
on. Binary executables often do not have a suffix.
• Many program use these conventions to guess the contents of files. So a program like Firefox may
use the suffix to guess which external program to use to display a file. The LATEX program expects
its main input files to have .tex suffixes. But these are conventions that are not enforced by the
operating system. Using sensible conventions is as much for your benefit than for the computer’s.
• File names beginning with a fullstop have significance to Unix because they contain environment
and configuration information useful to both the system and user. Normally files that are start with
a “.” are not shown when you list the directory and some times not in the GUI file browser.
Command options are also preceded by minus signs to distinguish them from command parameters. For
example, wc data.txt says count the number of characters, words and lines in the file data.txt. But if we
include the option -l then we only count the number of lines: wc -l data.txt If a file name is required
as a parameter to a command and is not provided, the shell will by default use the standard input and
output i.e. the terminal.
We’ll look at what command you can enter and how you interact with the shell in this section.
6
Getting started
Normally you log-in
• Enter commands
• Case-sensitive
Changing password
The standard Un*x command for changing passwords is passwd command though there are variants for
networked systems.
Command history
To see a list of the most recent commands issued to the shell:
• history
• !number
Command completion
If you press the tab key, shell tries to complete as much of the command or file name as possible.
• If you type ls /usr/l followed by TAB, system is not able to complete further as there are several
options. But if you press TAB twice all the options shown.
Command editing
• The left and right cursor keys, and the delete key allow you to edit the current command;
• The down and up cursor keys allow you to move forward and backward in history.
7
On-line help
• apropos
e.g., apropos music
• man
e.g., man passwd
• info
e.g., info ls
In the GNU/Linux file system tree each user has a unique location to work called their HOME directory.
Users are automatically placed in their home directory when they login.
httpd/
Play/
your-home-directory/
ROOT = /
students/
home/
Work/
usr/
Figure 1: Linux directory Tree
Many commands by default assume you mean the current working directory.
The system administrator can set up their system so that home directories are where convenient for
the organisation
• On MacOS X, /Users/bob
• But often variations – safest is to refer to ~bob or the environment variable $HOME
• ls -l : (small L) lists files in the current directory with a number of details for each file;
There are many other options for ls – you can use the man page to find out.
9
Paths
Each file has a path – where the file is in the file system.
• JOKES
No path given – current working directory is implied
• funny/JOKES
The file JOKES in the directory funny that is in the current directory
• funny/very/JOKES
and so on. . .
Special paths
• ~
Tilde: home directory of current user
• ~scott
Home directory of user scott
• ..
Parent directory (relative)
• .
Current directory (relative)
• The name
• other stuff
10
4.4 Manipulating files
manipulating files
Examining files
1. cat
2. more
4. head -n 25 fname
5. tail
You can use cat filename to see the contents of an entire text file on the standard output (terminal).
On the other hand, more filename allows you to view the entire text file on the standard output one
screen at a time (at the more prompt, a space character will scroll a full screen down while a carriage
return character will scroll one line down).
• cp source dest
• cp a.txt b.txt
• cp a.txt /data/dir
• cp -r data backup
The destination can either be a file (with a path name) or a directory. If it a directory a copy of the file is
made and put in the destination giving the copy the original name. If the destination is a file then a copy
is made of the source and the copy is given the name of the destination.
Note that by default cp does not copy directories. You must use the -r option (or another recursive
option). Like many commands, cp has many options. Doing a info coreutils ’cp invocation’ will
show you these.
mv source dest
Deleting file
rm
• rm data.dat
11
4.6 Manipulating directories
Creating, deleting subdirectories
mkdir
• mkdir newdir
• mkdir /usr/local/other
• mkdir newdir/subdir
• mkdir -p newdir/subdir/subsub/other
To delete a directory and all its contents, you must first delete the files in the directory using rm and then
use rmdir. An alternative approach is to use rm -r for recursive delete. This a very powerful and very
dangerous option. There is hardly a system administrator in the world
Changing directory
To change the current working directory, do: cd newdirpath
• cd ../gnumeric
• head -n-3 snps.txt show everything except the last 3 lines (this doesn’t work on standard MacOS
X);
• tail -f snps.txt show the last 10 lines of the file and then wait for further input. This is useful if
you have a program that is writing to a file, and in another shell you want to monitor the output.
12
Extracting colums and rows
Being able to extract out interesting from a file is important.
• cut: extract columns from a file. There are two basic modes (you can read about others in the man
file). Columns can be extracted based upon horizontal position (numbering columns by character,
using the -b option. Or, columns can be extracted by field where the different columns are assumed
to separated by field delimiter (by default a tab).
The examples above allow us to extact and manipulate files with data. We can do the same thing very sim-
ply using a program called Excel and with small files this is probably easier because the GUI allows more
intuitive interaction. But consider a realistic example where there are 1000 rows and 10000 columns.
Such a large file would be very slow and clumsy to manipulate through Excel or similar program.
The grep command can be used to extract rows based upon what they contain.
grep – extracting lines that match
• Show context
grep -C 2 rs837812 *map
• Which file
grep -H NA317813 *fam
• grep -f patterns.txt *
Use the lines in the files patterns.txt as the things to grep for.
Instead of just using plain text files, grep also allows you to search for regular expressions, but this is an
advanced topic we are not covering now.
There are other powerful tools that can be used too – awk and sed are very powerful tool that allows
extracting and manipulating files.
There are many useful ways of combining files, including paste and join.
4.8 Permissions
File access is determined by the file’s protection status. You may control access to your files and directories
by granting and denying access privileges to either the user (you), the group the user belongs to (the pg
group for example) or all other users. These privileges are read, write and execute.
13
Permissions
Privileges specified for
• user (owner)
• group
• other
Privileges are:
• read
• write
Part of a listing of files in a directory may look like this (using ls -l):
Here we have three files: theory.tex, Personal and a.out. In each case the owner is the user jayesh and the
group is pg.
The 10 characters on the left indicate file protection and is interpreted in the following manner. An r
indicates that the user can read the file and w means the user can change the file. The x stands for execute.
For ordinary files, an x indicates that this file stores a program that is ready to run and can be run. For
directories, x indicates permission to enter the directory.
1. character 1 – if there is a dash the file is an ordinary file; if it is a d, this is a directory; if it is a l this
is a link.
In the example above, the user jayesh can read and change the file Theory.essay, members of the pg group
and indeed all other users can just read it. The file Personal is actually a directory. jayesh can read and
write to this directory, and also enter the directory. Noone else can access the directory. The file a.out can
be read, written and executed by jayesh. Members of the pg group can read and execute the file, while all
other users can just read it.
Changing permissions
chmod changes permissions on any files.
• chmod g+rwx file1 will give members of the group associated with the file the ability to read, write
and execute it.
• Or chmod o-rx file1 will remove read and execute privileges or other users.
• chmod ug+x file1 gives the owner (user) and group permission to execute.
14
• chmod a+rwx will give all users permissions to read, write and execute files.
• chmod o=r file gives others the ability to read the file and removes any other permissions that others
may have.
• chmod g=rx,o=r file gives group ability to read and execute, others the ability to read the file and
removes any other permissions previously had
Directories:
• For directories, x means that the permission holder can enter the directory.
Numeric permission:
• 4: read permission. 2: write permission. 1: execute permission.
• other: execute
For each directory in your home directory set the permissions so that you can read, write and execute
but that no-one else has any permissions. The only exception is public_html which you should give read
and execute privileges to all users.
There are more complex permissions possible too in the standards permission model together with
access control lists. But this is beyond our course.
5 Process Control
Process control
Many jobs can execute at the same time (same, different users)
• foreground
• background
• Each job has job number (specific to that shell) Use % to refer to this
15
Viewing jobs
Jobs in current shell
• ps shows PID
• ps -u scott
• ps -a
• Control-C kills
• Control-Z suspends
• fg put it foreground
• bg put in background
kill pid
kill 3617
kill -9 3617
Running a job
To run a job in the foreground give the command:
emacs
python hello.py
xcal
xeyes
xeyes &
emacs newprog.py &
16
Monitoring Load
• top
The top command shows the processes that are using most of the CPU. There’s a whole lot of other
info you can get too.
• The w command shows you who’s logged on, as well as current load.
top also shows you memory usage. This can often be critical for system performance. The most
important columns in this respect are %MEM and RES.
• /proc/cpuinfo
• /proc/meminfo
screen -S name
17
Listing terminal sessions
The command is screen -ls
Note that each terminal has the name you gave it and a unique number.
Reattaching terminal sessions
You can reattach a session to any terminal.
screen -r name
One slightly annoying feature of screen is that it uses C-a as the default escape key sequence (e.g., C-a d
detaches the current session). This is annoying because C-a is a commonly used key sequence in editing
commands (go to the beginning of the current line). You can chang the behaviour by using the -e option.
For example,
screen -e^Mm -s update
will create a new terminal session called update, and instead of C-a being the escape sequence, C-m willb.
Gaining super-powers
Actions allowed determined by permissions
• Some actions only allowed for root user
• Some users can be authorised to act for root
sudo
ls /root
sudo ls /root
su bob
It is strongly recommended to use sudo rather than su. This limits inadvertent damage you can do to your
system.
The PATH environmental variable tells the system where to look for executables. These directories are
searched in order, and the first path found with the named executable will be used.
Typically the current working directory is not on the PATH. That is why when you want to execute a
script in the current working directory that you have to say ./myscript rather than just myscript This
helps (a little) in preventing inadvertent execution of scripts.
• PATH: binaries
• PYTHONPATH
• R LIBS
• PERL5LIB
• PYTHONPATH
• HOME
• HOSTNAME / HOST
• USER
19
Initialisation
Environment variables (and other initialisation) can be set by start up scripts
• /etc/profile
Redirection
command >filename
command < filename
The symbol > means “put the output in the following file, rather than to the terminal” and the symbol
< means “get the input to this command from the following file, rather than from the terminal”.
>> used instead of > appends output to the file rather than overwriting contents of the file.
ls > /tmp/wdirfiles
wc -l < /tmp/wdirfiles
ls | wc -l
It is common practice to put the output of one program into the input of another via a temporary file as in
the example above. This can be achieved by first using a command with an output redirection followed by
one with an input redirection. However, by doing so we incur a storage overhead cost for a temporary file
to store the input to the second program. Furthermore, it would be more efficient to run both programs
in parallel so that a continual output from the first program can be fed into the second program. This
observation leads to one of the fundamental contributions of the Unix system, namely the pipe.
A pipe (denoted by a vertical bar i.e.. |) is a way to connect the output of one program to the input
of another without any temporary file; a pipeline is a connection of two or more programs through pipes.
All programs in the pipeline execute in parallel to achieve good performance. Only data dependencies
restrict the flow of data between these programs. Any program that reads from the terminal can read
from a pipe instead and any program that writes on the terminal can write to a pipe. This is where the
convention of reading the standard input when no files are named pays off: any program that adheres to
the convention can be used in pipelines. grep and sort are two examples often used in pipelines.
For example, the command: who | grep mary | wc -l counts the number of times user mary is
logged on.
Example
List in alphabetic order the capitals of the countries which use the West Africa CFA franc.
Get the data file and inspect it
Now extract column data and sort
7.3 xargs
Converts standard input to arguments.
Suppose we have a file that contains status of various files
january.dat GOOD
feb.dat BAD
march.dat BAD
april.dat GOOD
may.dat BAD
...
Programming in bash
Can type commands in shell, or save in file and run
• e.g., doexps.sh
• hash-bang
• make executable
• run like so ./doexps.sh
Normally, an executable program is expected to be machine code that can run directly on the machine. If
the program is a script in bash or some other language, you need to tell the system. One way of doing it
is to explicitly tell it:
• bash doexps.sh
But it is useful both to save typing and more importantly to help users of your scripts to have a way of
telling the system how to interpret your file. Recall that the name of the file does not have this information.
The hash-bang is a special sequence on the first line that specifies how the script should be executed. For
example, a bash script would have
• #! /bin/bash
and if it were a Python program
• #! /usr/bin/python
Example script
#! /bin/bash
N=10
BASE="gwas14"
BED=${BASE}.bed
BIM=${BASE}.bim
FAM=${BASE}.fam
plink --bfile sample --bmerge $BED $BIM $fam --make-bed --out xxx