Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Phylogenetics in R Scott Chamberlain November 18, 2011
What sorts of phylogenetics things can I do in R?
The run down Get sequence data Align sequence data Phylogenetic inference NJ, maxlik, parsimony, Bayesian, UPGMA Visualize phylogenies Traits on trees Phylogenetic signal Trait evolution Ancestral state character reconstruction Tree simulations Get trees Phylogenetic community structure Bonus stuff: polytomy resolver
Basic trees in R Example require(ape) tr1 <- read.tree(text = &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;) tr1  # print tree summary write.tree(tr1)  # print tree in newick format  &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot; tr1$tip.label # tip labels  &quot;B&quot; &quot;C&quot; &quot;D&quot; &quot;A&quot; tr1$edge.length # edge labels  0.04 0.01 0.05 0.05 0.06 0.10 tr1$node.label # node labels  NULL  [MEANING – no node labels] # Assign properties to trees tr1$tip.label <- c('sleepy','happy','grumpy','frumpy')  # label tips tr1$tip.label  # did it work?  &quot;sleepy&quot; &quot;happy&quot; &quot;grumpy&quot; &quot;frumpy“ Etcetera for other tree properties
Get sequence data # install and load ape install.packages(&quot;ape&quot;); require(ape) # get data from Genbank # make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species cotton_acc <- c(&quot;U56806&quot;, &quot;U12712&quot;, &quot;U56810&quot;, &quot;U12732&quot;, &quot;U12725&quot;, &quot;U56786&quot;, &quot;U12715&quot;, &quot;AF057758&quot;, &quot;U56790&quot;, &quot;U12716&quot;, &quot;U12729&quot;, &quot;U56798&quot;, &quot;U12727&quot;, &quot;U12713&quot;, &quot;U12719&quot;, &quot;U56811&quot;, &quot;U12728&quot;, &quot;U12730&quot;, &quot;U12731&quot;, &quot;U12722&quot;, &quot;U56796&quot;, &quot;U12714&quot;, &quot;U56789&quot;, &quot;U56797&quot;, &quot;U56801&quot;, &quot;U56802&quot;, &quot;U12718&quot;, &quot;U12710&quot;, &quot;U56804&quot;, &quot;U12734&quot;, &quot;U56809&quot;, &quot;U56812&quot;, &quot;AF057753&quot;, &quot;U12711&quot;, &quot;U12717&quot;, &quot;U12723&quot;, &quot;U12726&quot;) # get data from Genbank require(ape) cotton <- read.GenBank(cotton_acc, species.names = T)  # name the sequences with species names instead of access numbers names_accs <- data.frame(species = attr(cotton, &quot;species&quot;), accs = names(cotton)) names(cotton) <- attr(cotton, &quot;species&quot;)
Align sequence data run external: clustal, mafft # multiple sequence alignment ###  Get clustalw here, and install:  http://www.clustal.org/ # set to your working directory setwd(“/path on your computer to/ClustalW2&quot;) # write fasta file to directory write.dna(cotton, &quot;cotton.fas&quot;, format = &quot;fasta&quot;)  # run clustal multiple alignment, prints clustal output to console system(paste('&quot;./clustalw2&quot; cotton.fas'))  # should work on OSX or Windows # read the alignment back in to R cotton_clustalaligned <- read.dna(&quot;cotton.aln&quot;, format=&quot;clustal&quot;)  Manual aligment  may have to be done, dare I say it, not in R
Get and align sequences DIY Get together with a few other people…or not Choose some species to investigate Get their accession numbers on GenBank Download sequence data from Genbank If you are really adventurous, also align sequences
Phylogenetic inference  Tools R Packages: ape, phangorn, phyclust, phytools, scaleboot ape  has the most functionality for phylogenetic inference You should be able to call MrBayes form R, but I don’t know how – package phyloch?
Phylogenetic inference  Fitting evol models:  see fxn  modelTest  in package phangorn NJ install.packages(“ape&quot;);  require(ape) data(woodmouse) trw <- nj(dist.dna(woodmouse)) plot(trw) Maximum likelihood install.packages(&quot;phangorn&quot;);  require(phangorn) data(Laurasiatherian) dm <- dist.logDet(Laurasiatherian)  njtree <- NJ(dm) MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameter MLfit_ <- optim.pml(MLfit, model = &quot;GTR&quot;)  MLfit_$tree plot(MLfit_$tree) Parsimony install.packages(&quot;phangorn&quot;);  require(phangorn) data(Laurasiatherian)  dm = dist.logDet(Laurasiatherian)  tree = NJ(dm)  treepars <- optim.parsimony(tree, Laurasiatherian)
Phylogenetic inference---Continued Bayesian You can do this (maybe) with the package phyloch (get here:  http://www.christophheibl.de/Rpackages.html   ), by calling MrBayes from R… … however, MrBayes is giving way to RevBayes here:  http://sourceforge.net/projects/revbayes/ ), fyi
Phylogenetic inference DIY With your partners…or not Use the sequence data from GenBank you got earlier (if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package) Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
Visualize phylogenies R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo # visualize phylogenies install.packages(&quot;ape&quot;) require(ape) tree <- rcoal(10) tree plot(tree) plot(tree, type = &quot;cladogram&quot;) plot(tree, type = &quot;unrooted&quot;) plot(tree, type = &quot;radial&quot;) plot(tree, type = &quot;fan&quot;)
Visualize phylogenies DIY Get together with a few other people…or not Use the tree you made, or use one provided with ape, or other packages  Do basic plotting, e.g.:  plot(mytree) Then see if you can  color the branches,  label the branches with the edge lengths change the tip labels etc.
Traits on trees phylogenetic signal R Packages: ape, picante, caper, phytools Examples from picante and phytools: # phylogenetic signal install.packages(&quot;picante&quot;) require(picante) randtree <- rcoal(20) randtraits <- rTraitCont(randtree) Kcalc(randtraits[randtree$tip.label],randtree) install.packages(&quot;phytools&quot;) require(phytools) tree <- rbdtree(1,0,Tmax=4) # make a tree x <- fastBM(tree) # simulate traits phylosig(tree, x, method=&quot;lambda&quot;, test=TRUE) # calcualte physig, lambda phylosig(tree, x, method=&quot;K&quot;, test=TRUE) # calcualte physig, K
Traits on trees modeling trait evolution R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot Above can do: trait evolution of traits, including:  discrete  and  continuous , and with  Brownian motion  or  OU models See also:  Rbrownie Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa here:  http://www.webpages.uidaho.edu/~lukeh/software/index.html
Ancestral state reconstruction R Packages: ape, ouch, phytools Function ‘ace’ in the ape package works nicely But very sensitive to parameters Example data(bird.orders) x <- rnorm(23) out <- ace(x, bird.orders) out$ace  will have the ancestral character values (which you’ll have to match to nodes of your tree)
Tree simulations R Packages: Treesim, geiger, ape, phybase Example require(ape) tree <- rcoal(10) # Make a random tree trait <- rTraitCont(tree, model = &quot;BM&quot;) # Simulate a trait on that tree # Write a function to make a tree, simulate a BM trait, and take the mean of that trait myfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = &quot;BM&quot;) mean(trait) } # do it 100 times and make a data.frame required for ggplot2 plotting dat <- replicate(100, myfunc(10)) dat2 <- data.frame(dat) # plot results require(ggplot2) ggplot(dat2, aes(dat)) + geom_histogram()
Get trees rOpenSci’s treeBASE package on CRAN:  http://cran.r-project.org/web/packages/treebase/ install.packages(&quot;treebase&quot;) # install require(treebase) # load tree <- search_treebase(&quot;Derryberry&quot;, &quot;author&quot;)[[1]] # search metadata(tree$S.id) # metadata for tree plot(tree) # plot the tree
Phylogenetic community structure R Packages: picante  (includes phylocom functionality) --Although, not bladj for some reason, talk to me if you want to run bladj from R Example Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index data(phylocom) comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE) Also, new approach to phycommstruct in R from Matt Helmus, code here: http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
Bonus: Polytomy resolver MEE paper: “ A simple polytomy resolver for dated phylogenies”  by Kuhn, Mooers, and Thomas Paper http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract   Supp info has R scripts:  http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo
Resources Bodega Phylogenetics Wiki:  Home:  http://bodegaphylo.wikispot.org/Front_Page   BROWNIE tutorial:  http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution   Phylogenetic signal tutorial:  http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R   R phylo-wiki (from NESCent):  http://www.r-phylo.org/wiki/HowTo/Table_of_Contents   CRAN task view, Phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html   rmesquite:   https://r-forge.r-project.org/R/?group_id=213   R-phylogenetics listserve :  https://stat.ethz.ch/mailman/options/r-sig-phylo/

More Related Content

Phylogenetics in R

  • 1. Phylogenetics in R Scott Chamberlain November 18, 2011
  • 2. What sorts of phylogenetics things can I do in R?
  • 3. The run down Get sequence data Align sequence data Phylogenetic inference NJ, maxlik, parsimony, Bayesian, UPGMA Visualize phylogenies Traits on trees Phylogenetic signal Trait evolution Ancestral state character reconstruction Tree simulations Get trees Phylogenetic community structure Bonus stuff: polytomy resolver
  • 4. Basic trees in R Example require(ape) tr1 <- read.tree(text = &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot;) tr1 # print tree summary write.tree(tr1) # print tree in newick format &quot;(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);&quot; tr1$tip.label # tip labels &quot;B&quot; &quot;C&quot; &quot;D&quot; &quot;A&quot; tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10 tr1$node.label # node labels NULL [MEANING – no node labels] # Assign properties to trees tr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tips tr1$tip.label # did it work? &quot;sleepy&quot; &quot;happy&quot; &quot;grumpy&quot; &quot;frumpy“ Etcetera for other tree properties
  • 5. Get sequence data # install and load ape install.packages(&quot;ape&quot;); require(ape) # get data from Genbank # make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species cotton_acc <- c(&quot;U56806&quot;, &quot;U12712&quot;, &quot;U56810&quot;, &quot;U12732&quot;, &quot;U12725&quot;, &quot;U56786&quot;, &quot;U12715&quot;, &quot;AF057758&quot;, &quot;U56790&quot;, &quot;U12716&quot;, &quot;U12729&quot;, &quot;U56798&quot;, &quot;U12727&quot;, &quot;U12713&quot;, &quot;U12719&quot;, &quot;U56811&quot;, &quot;U12728&quot;, &quot;U12730&quot;, &quot;U12731&quot;, &quot;U12722&quot;, &quot;U56796&quot;, &quot;U12714&quot;, &quot;U56789&quot;, &quot;U56797&quot;, &quot;U56801&quot;, &quot;U56802&quot;, &quot;U12718&quot;, &quot;U12710&quot;, &quot;U56804&quot;, &quot;U12734&quot;, &quot;U56809&quot;, &quot;U56812&quot;, &quot;AF057753&quot;, &quot;U12711&quot;, &quot;U12717&quot;, &quot;U12723&quot;, &quot;U12726&quot;) # get data from Genbank require(ape) cotton <- read.GenBank(cotton_acc, species.names = T) # name the sequences with species names instead of access numbers names_accs <- data.frame(species = attr(cotton, &quot;species&quot;), accs = names(cotton)) names(cotton) <- attr(cotton, &quot;species&quot;)
  • 6. Align sequence data run external: clustal, mafft # multiple sequence alignment ### Get clustalw here, and install: http://www.clustal.org/ # set to your working directory setwd(“/path on your computer to/ClustalW2&quot;) # write fasta file to directory write.dna(cotton, &quot;cotton.fas&quot;, format = &quot;fasta&quot;) # run clustal multiple alignment, prints clustal output to console system(paste('&quot;./clustalw2&quot; cotton.fas')) # should work on OSX or Windows # read the alignment back in to R cotton_clustalaligned <- read.dna(&quot;cotton.aln&quot;, format=&quot;clustal&quot;)  Manual aligment may have to be done, dare I say it, not in R
  • 7. Get and align sequences DIY Get together with a few other people…or not Choose some species to investigate Get their accession numbers on GenBank Download sequence data from Genbank If you are really adventurous, also align sequences
  • 8. Phylogenetic inference Tools R Packages: ape, phangorn, phyclust, phytools, scaleboot ape has the most functionality for phylogenetic inference You should be able to call MrBayes form R, but I don’t know how – package phyloch?
  • 9. Phylogenetic inference Fitting evol models: see fxn modelTest in package phangorn NJ install.packages(“ape&quot;); require(ape) data(woodmouse) trw <- nj(dist.dna(woodmouse)) plot(trw) Maximum likelihood install.packages(&quot;phangorn&quot;); require(phangorn) data(Laurasiatherian) dm <- dist.logDet(Laurasiatherian) njtree <- NJ(dm) MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameter MLfit_ <- optim.pml(MLfit, model = &quot;GTR&quot;) MLfit_$tree plot(MLfit_$tree) Parsimony install.packages(&quot;phangorn&quot;); require(phangorn) data(Laurasiatherian) dm = dist.logDet(Laurasiatherian) tree = NJ(dm) treepars <- optim.parsimony(tree, Laurasiatherian)
  • 10. Phylogenetic inference---Continued Bayesian You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R… … however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/ ), fyi
  • 11. Phylogenetic inference DIY With your partners…or not Use the sequence data from GenBank you got earlier (if you didn’t align the sequences, don’t worry about it – OR use data set provided with ape or other package) Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
  • 12. Visualize phylogenies R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo # visualize phylogenies install.packages(&quot;ape&quot;) require(ape) tree <- rcoal(10) tree plot(tree) plot(tree, type = &quot;cladogram&quot;) plot(tree, type = &quot;unrooted&quot;) plot(tree, type = &quot;radial&quot;) plot(tree, type = &quot;fan&quot;)
  • 13. Visualize phylogenies DIY Get together with a few other people…or not Use the tree you made, or use one provided with ape, or other packages Do basic plotting, e.g.: plot(mytree) Then see if you can color the branches, label the branches with the edge lengths change the tip labels etc.
  • 14. Traits on trees phylogenetic signal R Packages: ape, picante, caper, phytools Examples from picante and phytools: # phylogenetic signal install.packages(&quot;picante&quot;) require(picante) randtree <- rcoal(20) randtraits <- rTraitCont(randtree) Kcalc(randtraits[randtree$tip.label],randtree) install.packages(&quot;phytools&quot;) require(phytools) tree <- rbdtree(1,0,Tmax=4) # make a tree x <- fastBM(tree) # simulate traits phylosig(tree, x, method=&quot;lambda&quot;, test=TRUE) # calcualte physig, lambda phylosig(tree, x, method=&quot;K&quot;, test=TRUE) # calcualte physig, K
  • 15. Traits on trees modeling trait evolution R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot Above can do: trait evolution of traits, including: discrete and continuous , and with Brownian motion or OU models See also: Rbrownie Various dev evol modeling frameworks to be included in geiger soon: auteur, mecca, medusa, and fossilmedusa here: http://www.webpages.uidaho.edu/~lukeh/software/index.html
  • 16. Ancestral state reconstruction R Packages: ape, ouch, phytools Function ‘ace’ in the ape package works nicely But very sensitive to parameters Example data(bird.orders) x <- rnorm(23) out <- ace(x, bird.orders) out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)
  • 17. Tree simulations R Packages: Treesim, geiger, ape, phybase Example require(ape) tree <- rcoal(10) # Make a random tree trait <- rTraitCont(tree, model = &quot;BM&quot;) # Simulate a trait on that tree # Write a function to make a tree, simulate a BM trait, and take the mean of that trait myfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = &quot;BM&quot;) mean(trait) } # do it 100 times and make a data.frame required for ggplot2 plotting dat <- replicate(100, myfunc(10)) dat2 <- data.frame(dat) # plot results require(ggplot2) ggplot(dat2, aes(dat)) + geom_histogram()
  • 18. Get trees rOpenSci’s treeBASE package on CRAN: http://cran.r-project.org/web/packages/treebase/ install.packages(&quot;treebase&quot;) # install require(treebase) # load tree <- search_treebase(&quot;Derryberry&quot;, &quot;author&quot;)[[1]] # search metadata(tree$S.id) # metadata for tree plot(tree) # plot the tree
  • 19. Phylogenetic community structure R Packages: picante (includes phylocom functionality) --Although, not bladj for some reason, talk to me if you want to run bladj from R Example Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index data(phylocom) comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE) Also, new approach to phycommstruct in R from Matt Helmus, code here: http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
  • 20. Bonus: Polytomy resolver MEE paper: “ A simple polytomy resolver for dated phylogenies” by Kuhn, Mooers, and Thomas Paper http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo
  • 21. Resources Bodega Phylogenetics Wiki: Home: http://bodegaphylo.wikispot.org/Front_Page BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R R phylo-wiki (from NESCent): http://www.r-phylo.org/wiki/HowTo/Table_of_Contents CRAN task view, Phylogenetics: http://cran.r-project.org/web/views/Phylogenetics.html rmesquite: https://r-forge.r-project.org/R/?group_id=213 R-phylogenetics listserve : https://stat.ethz.ch/mailman/options/r-sig-phylo/