This document contains information about programming in R, including practical examples. It discusses accessing and subsetting data, using regular expressions for text search, creating functions, and using loops. Examples are provided to demonstrate creating vectors, accessing subsets of vectors, using regular expressions to find patterns in text, creating functions to convert between units or estimate values, and using for loops to repeat operations over multiple elements. The document suggests R is useful for working with big data in biology and other fields due to its ability to automate tasks, integrate with other tools, and handle large datasets through programming.
19. Animal biomass (Brazilian rainforest)
from Fittkau & Klinge 1973
Other insects Amphibians
Reptiles
Birds
Mammals
Earthworms
Spiders
Soil fauna excluding
earthworms,
ants & termites
Ants & termites
20. We use modern technologies to
understand insect societies.
• evolution of social behaviour
• molecules involved in social behaviour
• consequences of environmental change
31. Practicals
• Aim: get relevant data handling skills
• Doing things by hand:
• impossible?
• slow,
• error-prone,
• Automate!
• Basic programming
• in R
• no stats!
33. Practicals: contents
• Done:
• data accessing/subsetting
• New:
• search/replace
• regular expressions
• New:
• functions
• loops
• Friday: (Introduction to Unix & High performance computing)
Text search on steroids
Reusable pieces of work
Repeating the same thing many times
35. • create a variable that contains the number 35
• create a variable that contains the string “I love tofu”
• give me a vector containing the sequence of numbers
from 5 to 11
• access the second number
• replace the second number with 42
• add 5 to the second number
• now add 5 to all numbers
• now add an extra number: 1999
• can you sum all the numbers?
42. Regular expressions (regex):
Text search on steroids.
Regular expression Finds
David David
Dav(e|(id)) David, Dave
Dav(e|(id)|(ide)|o) David, Dave, Davide, Davo
At{1,2}enborough
Attenborough,
Atenborough
Atte[nm]borough
Attenborough,
Attemborough
At{1,2}[ei][nm]bo{0,1}ro((ugh)|w){0,1}
Atimbro,
attenbrough,
ateinborow
Easy counting, replacing all with “Sir David Attenborough”
Yes: ”HATSOMIKTIP"
yes: ”HAVSONYYIKTIP"
not: ”HAVSQMIKTIP"
43. Regex special symbols
Regular expression Finds Example
[aeiou] any single vowel “e”
[aeiou]*
between 0 and infinity
vowels vowels, e.g.’
“eeooouuu"
[aeoiu]{1,3} between 1 and 3 vowels “oui”
a|i one of the 2 characters “"
((win)|(fail))
one of the two
words in ()
fail
Yes: ”HATSOMIKTIP"
yes: ”HAVSONYYIKTIP"
not: ”HAVSQMIKTIP"
44. More Regex Special symbols
• Google “Regular expression cheat sheet”
• ?regexp
Synonymous with
[:digit:] [0-9]
[A-z] [A-z], ie [A-Za-z]
s whitespace
. any single character
.+ one to many of anything
b* between 0 and infinity letter ‘b’
[^abc] any character other than a, b or c.
( (
[:punct:]
any of these: ! " # $ % & ' ( ) * + , - . /
: ; < = > ? @ [ ] ^ _ ` { |
46. You want to scan a protein sequence database for a
particular binding site.Type a single regular expression that
will match the first two of the following peptide sequences,
but NOT the last one:
"HATSOMIKTIP"
"HAVSONYYIKTIP"
"HAVSQMIKTIP"
48. Variants of a microsatellite sequence are responsible for
differential expression of vasopressin receptor, and in turn for
differences in social behaviour in voles & others. Create a regular
expression that finds AGAGAGAGAGAGAGAG dinucleotide
microsatellite repeats with lengths of 5 to 500
49. Again
Make a regular expression
• matching “LMTSOMIKTIP” and “LMVSONYYIKTIP” but not
“LMVSQMIKTIP”
• matching all variants of “ok” (e.g., “O.K.”,“Okay”…)
52. Which species names include ‘y’?
Create a vector with only species names, but replace all ‘y’
with ‘Y!
ants <- read.table("https://goo.gl/3Ek1dL")
colnames(ants) <- c("genus", "species")
Remove all vowels
Replace all vowels with ‘o’
58. “for”
Loop
> possible_colours <- c('blue', 'cyan', 'sky-blue', 'navy blue',
'steel blue', 'royal blue', 'slate blue', 'light blue', 'dark
blue', 'prussian blue', 'indigo', 'baby blue', 'electric blue')
> possible_colours
[1] "blue" "cyan" "sky-blue" "navy blue"
[5] "steel blue" "royal blue" "slate blue" "light blue"
[9] "dark blue" "prussian blue" "indigo" "baby blue"
[13] "electric blue"
> for (colour in possible_colours) {
+ print(paste("The sky is oh so, so", colour))
+ }
[1] "The sky is so, oh so blue"
[1] "The sky is so, oh so cyan"
[1] "The sky is so, oh so sky-blue"
[1] "The sky is so, oh so navy blue"
[1] "The sky is so, oh so steel blue"
[1] "The sky is so, oh so royal blue"
[1] "The sky is so, oh so slate blue"
[1] "The sky is so, oh so light blue"
[1] "The sky is so, oh so dark blue"
[1] "The sky is so, oh so prussian blue"
[1] "The sky is so, oh so indigo"
[1] "The sky is so, oh so baby blue"
59. What does this loop do?
for (index in 10:1) {
print(paste(index, "mins befo lunch"))
}
60. Again
• What does the following code do (decompose on pen and
paper)
61. for (letter in LETTERS) {
begins_with <- paste("^", letter, sep="")
matches <- grep(pattern = begins_with,
x = ants$genus)
print(paste(length(matches), "begin with", letter))
}
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"
> ants <- read.table("https://goo.gl/3Ek1dL")
> colnames(ants) <- c("genus", “species")
> head(ants)
genus species
1 Anergates atratulus
2 Camponotus sp.
3 Crematogaster scutellaris
4 Formica aquilonia
5 Formica cunicularia
6 Formica exsecta
What does this loop do?