This document provides an overview of phylogenetic analysis tools and techniques available in R. It discusses how to get sequence data from GenBank, align sequences, perform phylogenetic inference using various methods like neighbor joining and maximum likelihood, visualize and analyze trees, model trait evolution, reconstruct ancestral states, simulate trees, and access phylogenetic data from online repositories. Examples are given for many of the tasks using popular R packages like ape, phangorn, picante, and phytools.
3. The run down Get sequence data Align sequence data Phylogenetic inference NJ, maxlik, parsimony, Bayesian, UPGMA Visualize phylogenies Traits on trees Phylogenetic signal Trait evolution Ancestral state character reconstruction Tree simulations Get trees Phylogenetic community structure Bonus stuff: polytomy resolver
4. Basic trees in R Example require(ape) tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);") tr1 # print tree summary write.tree(tr1) # print tree in newick format "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);" tr1$tip.label # tip labels "B" "C" "D" "A" tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10 tr1$node.label # node labels NULL [MEANING – no node labels] # Assign properties to trees tr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tips tr1$tip.label # did it work? "sleepy" "happy" "grumpy" "frumpy“ Etcetera for other tree properties
5. Get sequence data # install and load ape install.packages("ape"); require(ape) # get data from Genbank # make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species cotton_acc <- c("U56806", "U12712", "U56810", "U12732", "U12725", "U56786", "U12715", "AF057758", "U56790", "U12716", "U12729", "U56798", "U12727", "U12713", "U12719", "U56811", "U12728", "U12730", "U12731", "U12722", "U56796", "U12714", "U56789", "U56797", "U56801", "U56802", "U12718", "U12710", "U56804", "U12734", "U56809", "U56812", "AF057753", "U12711", "U12717", "U12723", "U12726") # get data from Genbank require(ape) cotton <- read.GenBank(cotton_acc, species.names = T) # name the sequences with species names instead of access numbers names_accs <- data.frame(species = attr(cotton, "species"), accs = names(cotton)) names(cotton) <- attr(cotton, "species")
6. Align sequence data run external: clustal, mafft # multiple sequence alignment ### Get clustalw here, and install: http://www.clustal.org/ # set to your working directory setwd(“/path on your computer to/ClustalW2") # write fasta file to directory write.dna(cotton, "cotton.fas", format = "fasta") # run clustal multiple alignment, prints clustal output to console system(paste('"./clustalw2" cotton.fas')) # should work on OSX or Windows # read the alignment back in to R cotton_clustalaligned <- read.dna("cotton.aln", format="clustal") Manual aligment may have to be done, dare I say it, not in R
7. Get and align sequences DIY Get together with a few other people…or not Choose some species to investigate Get their accession numbers on GenBank Download sequence data from Genbank If you are really adventurous, also align sequences
8. Phylogenetic inference Tools R Packages: ape, phangorn, phyclust, phytools, scaleboot ape has the most functionality for phylogenetic inference You should be able to call MrBayes form R, but I don’t know how – package phyloch?
10. Phylogenetic inference---Continued Bayesian You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R… … however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/ ), fyi
11. Phylogenetic inference DIY With your partners…or not Use the sequence data from GenBank you got earlier (if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package) Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
12. Visualize phylogenies R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo # visualize phylogenies install.packages("ape") require(ape) tree <- rcoal(10) tree plot(tree) plot(tree, type = "cladogram") plot(tree, type = "unrooted") plot(tree, type = "radial") plot(tree, type = "fan")
13. Visualize phylogenies DIY Get together with a few other people…or not Use the tree you made, or use one provided with ape, or other packages Do basic plotting, e.g.: plot(mytree) Then see if you can color the branches, label the branches with the edge lengths change the tip labels etc.
14. Traits on trees phylogenetic signal R Packages: ape, picante, caper, phytools Examples from picante and phytools: # phylogenetic signal install.packages("picante") require(picante) randtree <- rcoal(20) randtraits <- rTraitCont(randtree) Kcalc(randtraits[randtree$tip.label],randtree) install.packages("phytools") require(phytools) tree <- rbdtree(1,0,Tmax=4) # make a tree x <- fastBM(tree) # simulate traits phylosig(tree, x, method="lambda", test=TRUE) # calcualte physig, lambda phylosig(tree, x, method="K", test=TRUE) # calcualte physig, K
15. Traits on trees modeling trait evolution R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot Above can do: trait evolution of traits, including: discrete and continuous , and with Brownian motion or OU models See also: Rbrownie Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa here: http://www.webpages.uidaho.edu/~lukeh/software/index.html
16. Ancestral state reconstruction R Packages: ape, ouch, phytools Function ‘ace’ in the ape package works nicely But very sensitive to parameters Example data(bird.orders) x <- rnorm(23) out <- ace(x, bird.orders) out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)
17. Tree simulations R Packages: Treesim, geiger, ape, phybase Example require(ape) tree <- rcoal(10) # Make a random tree trait <- rTraitCont(tree, model = "BM") # Simulate a trait on that tree # Write a function to make a tree, simulate a BM trait, and take the mean of that trait myfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = "BM") mean(trait) } # do it 100 times and make a data.frame required for ggplot2 plotting dat <- replicate(100, myfunc(10)) dat2 <- data.frame(dat) # plot results require(ggplot2) ggplot(dat2, aes(dat)) + geom_histogram()
18. Get trees rOpenSci’s treeBASE package on CRAN: http://cran.r-project.org/web/packages/treebase/ install.packages("treebase") # install require(treebase) # load tree <- search_treebase("Derryberry", "author")[[1]] # search metadata(tree$S.id) # metadata for tree plot(tree) # plot the tree
19. Phylogenetic community structure R Packages: picante (includes phylocom functionality) --Although, not bladj for some reason, talk to me if you want to run bladj from R Example Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index data(phylocom) comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE) Also, new approach to phycommstruct in R from Matt Helmus, code here: http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
20. Bonus: Polytomy resolver MEE paper: “ A simple polytomy resolver for dated phylogenies” by Kuhn, Mooers, and Thomas Paper http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo