Distance and Parsimony methods (Part 3: computer lab)
Software R package ape: Distance-based methods
- ape is one of the most widely used phylogenetic software
- It is an R package and it has a huge variety of functions
- In particular, today we will use it for distance-based tree estimation methods
- Full documentation
- We will follow this great tutorial
In-class group dynamic
Time: 20 minutes
Instructions:
- Follow the R commands to obtain a ME tree from the sample data (or your own data!). The commands are listed in the PDF tutorial that we are using as guideline, in our reproducible script notebook-log.md and on the following slides.
- After the allotted time, we will compare our work all together.
Solution
1) Installing necessary packages:
install.packages("adegenet", dep=TRUE)
install.packages("phangorn", dep=TRUE)
2) Loading the packages
library(ape)
library(adegenet)
library(phangorn)
3) Loading the sample data
dna <- fasta2DNAbin(file="http://adegenet.r-forge.r-project.org/files/usflu.fasta")
4) Computing the genetic distances. They choose a Tamura and Nei 1993 model which allows for different rates of transitions and transversions, heterogeneous base frequencies, and between-site variation of the substitution rate (more on Models of Evolution).
D <- dist.dna(dna, model="TN93")
5) Get the NJ tree
tre <- nj(D)
6) Before plotting, we can use the ladderize
function which reorganizes the internal structure of the tree to get the ladderized effect when plotted
tre <- ladderize(tre)
7) We can plot the tree
plot(tre, cex=.6)
title("A simple NJ tree")
How do we interpret the tree?
Summary of Software R package ape
Main distance functions:
nj
(ape
package): the classical Neighbor-Joining algorithm.bionj
(ape
): an improved version of Neighbor-Joining: Gascuel 1997. It uses information on variances of evolutionary distancesfastme.bal
andfastme.ols
(ape
): minimum evolution algorithms: Desper and Gascuel, 2002hclust
(stats
): classical hierarchical clustering algorithms including single linkage, complete linkage, UPGMA, and others.
Software: R package phangorn: Parsimony-based methods
- phangorn is another widely used phylogenetic software
- It is an R package and it has a huge variety of functions
- In particular, today we will use it for parsimony-based tree estimation methods
- Full documentation
- We will follow this great tutorial
In-class group dynamic
Time: 20 minutes
Instructions:
- Follow the R commands to obtain a MP tree from the sample data (or your own data!). The commands are listed in the PDF tutorial that we are using as guideline or in our reproducible script notebook-log.md or on the following slides.
- After the allotted time, we will compare our work all together.
Solution
1) Installing necessary packages (if you have not installed them for the distance section above)
install.packages("adegenet", dep=TRUE)
install.packages("phangorn", dep=TRUE)
2) Loading
library(ape)
library(adegenet)
library(phangorn)
3) Loading the sample data and convert to phangorn object:
dna <- fasta2DNAbin(file="http://adegenet.r-forge.r-project.org/files/usflu.fasta")
dna2 <- as.phyDat(dna)
4) We need a starting tree for the search on tree space and compute the parsimony score of this tree (422)
tre.ini <- nj(dist.dna(dna,model="raw"))
parsimony(tre.ini, dna2)
5) Search for the tree with maximum parsimony:
> tre.pars <- optim.parsimony(tre.ini, dna2)
Final p-score 420 after 2 nni operations
6) Plot tree:
plot(tre.pars, cex=0.6)
Further learning:
Continue the parsimony steps in the PDF tutorial on the same sample data.
Homework
See the details of the Distance and Parsimony HW in here.