Basic Linux Introduction
Basic Linux Introduction
Curitiba - 2019
Main objectives
1- Connection
To open a terminal on a remote machine we use the ssh protocol.
If you are already on a Linux machine
Example:
In windows you should use some ssh application, such as SSHclient or Putty or the ubuntu
version in windows10.
2. Files, File System, Moving accross the directory tree (using the TERMINAL)
In Unix the files are organized in directories (equivalent to windows folders). This, which is
called "file system", has a tree structure.
In linux the files have: permissions, owner and name. We access this information through the
command ls (list) with the -l option
File NAMES
There are two ways to refer to the name of a file, the short form and the "absolute".
The first one refers simply to the name. In the second we indicate in addition to the name (short
or relative) its location.
For example, the file named hello.txt located in the rnaseq directory, which in turn is inside the
/ home directory, has an absolute name.
/home/rnaseq/hello.txt
Note that there may be several files named hello.txt in different directories. For example:
/home/falvarez/hello.txt
These are two different files, although both have the same short name. We can only
refer to a file by its short name when we are in the same directory where the file is located. Also
keep in mind that the names (and the commands) are "case sensitive", that is to say, uppercase
is different from lowercase (Hello.Txt is NOT equal to hello.txt).
The permissions are read (r), write (w) and execute (x). Only programs and scripts
should have execution permission.
The dot ".", Double dot ".." and the forward slash "/"
(..) The double point means the parent of the current directory (the directory above which we
are)
For example, if we are in media2/course, the "double dot" refers to /media2
The slash / is used to separate directories in an absolute name, this bar alone indicates the root
directory.
4 First steps
cp /home/rnaseq/hello.txt ./
Or
cp ../rnaseq/hello.txt ./
ls
To list all files in your home directory including those whose names begin with a dot, type:
ls -a
As you can see, ls -a lists files that are normally hidden.
mkdir rawdata
The command cd directory means change the current working directory to ’directory’. The
current working directory may be thought of as the directory you are in, i.e. your current position
in the file-system tree. To change to the directory you have just made, type:
cd rawdata
cd ..
will take you one directory up the hierarchy (back to your home directory).
pwd
Go to this address and search for Saccharomyces cerevisiae. Then, in the results of genomes
click on Genome link. The genome web page will appear and you will have access to download
the files on your computer. But, as we wanted to download the sequences on the server we
need the use link and the command wget. So, copy the link (with the right button) and paste it
on the prompt after the command:
wget
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_0001
46045.2_R64_genomic.fna.gz
Download the genome file and the gff file using wget and the link to the file.
Exploring data
1. Uncompress the file using gunzip
gunzip GCF_000146045.2_R64_genomic.fna.gz
ls
6. Keep only the id of the chromosomes in the fasta genome file. The name of the
fastas include the ID and the Description.
The system takes the input from the standard input. By default it is the keyboard, but we can
change it to be a file. The standard output device, called stdout, is where the system output is
sent. By default the stdout is the screen, but it can also be redirected to a file, with the >
operator.
The pipe symbol "|" allows us to execute orders sequentially, so that the output of the first
command is the input of the second. As an example, we can count the lines of a file using pipe,
and redirecting this information to a new file:
7. Exercise
Determine how many rna-seq’s reads belong to each gene of S. cerevisiae